KR100205956B1

KR100205956B1 - Language code translation device and method

Info

Publication number: KR100205956B1
Application number: KR1019970002488A
Authority: KR
Inventors: 권혁철
Original assignee: 이계철; 한국전기통신공사
Priority date: 1997-01-28
Filing date: 1997-01-28
Publication date: 1999-07-01
Also published as: KR19980066773A

Abstract

1. 청구범위에 기재된 발명이 속한 기술분야1. TECHNICAL FIELD OF THE INVENTION

해싱 함수를 이용한 서로 상이한 언어 코드의 상호 변환장치 및 그 방법.Interconverter and method for mutually different language codes using hashing function.

2. 발명이 해결하려고 하는 기술적 과제2. The technical problem to be solved by the invention

한글-한자 사전과 한자-한글 사전을 중복하여 저장함으로써, 많은 저장공간의 필요 및 변환 속도가 느려지는 종래의 문제점을 해결하고자 함.By repeatedly storing the Hangul-Hanja dictionary and the Hanja-Hangul dictionary, it is intended to solve the conventional problems of slowing down the conversion speed and the need for a lot of storage space.

3. 발명의 해결방법의 요지3. Summary of Solution to Invention

서로 다른 언어 코드의 대응되는 음절 쌍을 결합하여 하나의 해싱값을 계산한 후, 이를 이용하여 변환될 각 단어를 저장하고, 사용자로부터 변환하고자 하는 제1 언어 코드의 단어가 입력되면 이를 음절단위로 분리하여 각각의 음절에 대응하는 후보 제2 언어 코드를 생성한 후, 각각의 음절에 대한 해싱값을 계산하고, 상기 해싱값들 중 첫 음절에 해당하는 해싱값을 인덱스 키로하여 상기 저장된 해싱값 중 일치하는 해싱값을 검색하고, 이후, 일치하는 부분 중 다음 해싱값과 일치하는 부분을 찾아 원래의 언어 코드로 복원하도록 구성됨.Computes one hashing value by combining the corresponding syllable pairs of different language codes, and stores each word to be converted using the same. When a word of the first language code to be converted is input from the user, it is converted into syllable units. After generating the candidate second language code corresponding to each syllable, the hashing value is calculated for each syllable, and the hashing value corresponding to the first syllable among the hashing values is used as an index key. It is configured to search for a matching hashing value, and then find a portion of the matching portion that matches the next hashing value and restore the original language code.

4. 발명의 중요한 용도4. Important uses of the invention

워드프로세서, 개인 단말기 등의 한글-한자 상호 변환기에 이용됨.Used for Hangul-Hanja interchange converter such as word processor and personal terminal.

Description

Interconversion device and method of different language code using hashing function

본 발명은 해싱 함수를 이용한 서로 상이한 언어 코드(예; 한글, 한자)의 상호 변환장치 및 그 방법에 관한 것으로, 특히 한글과 한자의 상호 변환을 위해 상호 대응되는 언어 코드를 하나의 음절 단위로 해싱 함수를 취한 후, 상기 서로 다른 언어의 대응되는 각각의 해싱값을 혼합하여 계산한 하나의 해싱값을 이용, 각 단어에 대해 트라이(Trie) 구조로 메모리에 저장함으로써, 메모리의 공간을 대폭적으로 감소시킬 수 있는 서로 상이한 언어 코드의 상호 변환장치 및 그 방법에 관한 것이다.The present invention relates to a mutual translator and a method of mutually different language codes (eg, Hangul, Hanja) using a hashing function. In particular, a hashing of language codes corresponding to each other in order to convert between Hangul and Hanja is performed by one syllable unit. After taking a function, by using a hash value calculated by mixing each corresponding hash value of the different languages, the memory space is greatly reduced by storing in a trie structure for each word. The present invention relates to a mutual translator of different language codes and a method thereof.

종래의 휴대용 단말기, 문서 편집기, 탁상 출판 시스템, 전자 출판 시스템 등에 사용되는 한글-한자 상호변환기는 한글과 한자를 상호 변환하기 위해서 한글-한자 사전과 한자-한글 사전을 중복하여 사용하고 있으며, 한글과 한자의 저장시 특별한 압축 기법을 사용하고 있지는 않다.The Hangul-Hanja interchange converter used in the conventional portable terminal, text editor, desktop publishing system, electronic publishing system, etc. uses the Hangul-Hanja dictionary and the Hanja-Hangul dictionary in order to convert Hangul and Hanja. There is no special compression technique for storing Chinese characters.

이로 인해, 한글-한자 사전과 이 역사전인 한자-한글 사전을 저장하는데, 많은 저장 공간을 필요로 하는 문제점이 있었다. 예를 들어, 패배(敗北)와 패인(敗因)을 저장하는데 있어, 최소한 32바이트(Byte)의 저장 공간을 필요로 한다.For this reason, there is a problem in that it requires a lot of storage space to store the Hangul-Hanja dictionary and the history-Hanja-Hangul dictionary. For example, to store defeats and panes, you need at least 32 bytes of storage.

또한, 한글과 한자 단어의 문자열 비교 횟수가 많고, 중복된 사전을 모두 검색해야 함으로 변환 시간이 많이 소요될 뿐만 아니라, 단어의 추가나 삭제시 2개의 사전을 동시에 관리해야만 함으로 사전 관리 알고리즘이 복잡한 문제점이 있었다.In addition, the number of string comparisons between Hangul and Hanja words is high, and both the dictionaries need to be searched for, so it is not only time-consuming to convert, but also the dictionary management algorithm has to be complicated when adding or deleting words. .

따라서, 상기와 같은 종래기술의 문제점을 해결하기 위하여 안출된 본 발명은 한글과 한자의 상호 변환을 위해 상호 대응되는 언어 코드를 하나의 음절 단위로 해싱 함수를 취한 후, 대응되는 한글과 한자 1음절의 해싱값을 혼합하여 한 음절에 대한 최종 해싱값을 각각 계산하고, 이들을 이용하여 한 단어씩에 대해 트라이 구조로 메모리에 저장함으로써, 메모리의 공간을 대폭적으로 감소시킬 수 있고, 변환 시간의 단축 및 사전 관리가 보다 용이한 서로 상이한 글자 코드의 상호 변환장치 및 그 방법을 제공하는데 그 목적이 있다.Accordingly, the present invention devised to solve the problems of the prior art as described above takes a hashing function of the language code corresponding to each syllable unit for mutual conversion between Hangul and Hanja, and then corresponds to the Hangul and Hanja syllables. By mixing the hashing values of each and calculating the final hashing value for each syllable, and storing them in memory in a tri-structure for each word by using them, the space of the memory can be greatly reduced, and the conversion time can be shortened and It is an object of the present invention to provide an apparatus and a method for converting mutually different character codes, which are easier to manage.

도 1 은 본 발명에 따른 서로 상이한 한글과 한자 1음절의 글자 코드를 혼합하여 해싱값을 생성하는 과정을 설명하는 도면,1 is a view illustrating a process of generating a hashing value by mixing different Hangul and Hanja one syllable character codes according to the present invention;

도 2 는 본 발명에 따른 해싱값에서 한글을 복원하는 과정을 설명하는 도면,2 is a view for explaining a process of restoring Hangul from a hashing value according to the present invention;

도 3 은 본 발명에 따른 해싱값에서 한자를 복원하는 과정을 설명하는 도면,3 is a view for explaining a process of restoring Chinese characters from a hashing value according to the present invention;

도 4 는 본 발명에 따른 한글-한자 상호 변환장치의 개략적인 블럭도,4 is a schematic block diagram of a Hangul-Hanja interconversion device according to the present invention;

도 5 는 본 발명에 따른 한글-한자 상호 변환방법의 처리 흐름도.5 is a flowchart illustrating a Hangul-Hanja interconversion method according to the present invention;

*도면의 주요부분에 대한 부호의 설명* Explanation of symbols for main parts of the drawings

11 : 한글-한자 맵핑 테이블11: Hangul-Hanja Mapping Table

12 : 한자-한글 변환부12: Hanja-Hangul conversion unit

13 : 한글,한자 해싱값 계산부13: Hangul, Hanja hash value calculation unit

14 : 한글-한자 해싱값 저장부14: Hangul-Hanja hashing value storage unit

15 : 한글,한자 복원부15: Hangul, Hanja Restoration Department

상기 목적을 달성하기 위한 본 발명에 따른 서로 상이한 언어 코드의 상호 변환장치는, 제1 언어 코드를 갖는 음절에 대응되는 제2 언어 코드의 음절을 결합하여 계산한 해싱값들을 이용하여 형성한 단어를 저장하는 제1 저장수단과, 상기 제1 언어 코드의 음절에 대응하는 상기 제2 언어 코드의 음절들을 맵핑하여 저장하는 제2 저장수단과, 사용자로부터 입력된 상기 제2 언어 코드의 음절에 대해 이에 대응하는 상기 제1 언어 코드의 음절로 변환하는 변환수단과, 사용자로부터 입력된 상기 제1 언어 코드의 단어를 음절단위로 입력받고, 상기 제2 저장수단으로부터 상기 입력된 제1 언어 코드 단어의 각 음절에 대응하는 후보 제2 언어의 음절들을 입력받거나, 또는 사용자로부터 입력된 제2 언어 코드의 단어를 음절단위로 입력받고, 상기 변환수단으로부터 상기 입력된 제2 언어코드의 단어에 대응하는 음절단위의 후보 제1 언어 코드를 입력받아 입력된 단어에 대한 음절 단위의 해싱값을 계산하는 해싱값 계산수단과, 사용자로부터 입력된 언어 코드 단어를 음절단위로 입력받고, 상기 해싱값 계산수단에 의해 생성된 해싱값들을 인덱스 키로하여 상기 제1 저장수단에서 추출된 해싱값들을 입력받아 해싱값을 원래의 언어 코드로 복원하여 출력하는 복원수단, 및 상기 사용자로부터 입력된 언어 코드의 단어를 음절단위로 분리하고, 상기 제1 및 제2 저장수단의 액세스 제어 및 메모리 관리를 수행하는 제어수단을 구비한 것을 특징으로 한다.In order to achieve the above object, a mutual translator of different language codes according to the present invention includes a word formed by using hashing values calculated by combining syllables of a second language code corresponding to a syllable having a first language code. First storage means for storing, second storage means for mapping and storing syllables of the second language code corresponding to the syllable of the first language code, and a syllable of the second language code input from the user. A conversion means for converting the corresponding first language code into syllables, and a word of the first language code input from a user in syllable units, and each of the first language code words input from the second storage means. Receiving syllables of a candidate second language corresponding to syllables, or receiving words of a second language code input by a user in syllable units, and converting them into the converting means. A hashing value calculation means for receiving a candidate first language code corresponding to a syllable unit corresponding to the word of the second language code and calculating a hashing value of the syllable unit for the input word; and a language code word input from the user. Restoring means for inputting syllable units, receiving hashing values extracted from the first storage means using hashing values generated by the hashing value calculating means as index keys, and restoring the hashing values to original language codes and outputting the hashing values; And a control means for separating words of a language code input from the user by syllable units, and performing access control and memory management of the first and second storage means.

또한, 본 발명에 따른 서로 상이한 언어 코드의 상호 변환방법은, 서로 다른 언어 코드를 상호 변환하는 장치에 적용되는 서로 다른 언어 코드의 상호변환방법에 있어서, 서로 다른 언어 코드의 대응되는 음절 쌍을 결합하여 하나의 해싱값을 계산한 후, 이를 이용하여 변환될 각 단어를 저장하는 제 1 단계와, 사용자로부터 변환하고자 하는 제1 언어 코드의 단어가 입력되면 이를 음절단위로 분리하여 각각의 음절에 대응하는 후보 제2 언어 코드를 생성하는 제 2 단계와, 상기 음절 단위로 분리된 제1 언어코드와 이에 대응하는 후보 제2 언어 코드를 이용하여 각각의 음절에 대한 해싱값을 계산하는 제 3 단계와, 상기 해싱값들 중 첫 음절에 해당하는 해싱값을 인덱스 키로하여 상기 저장된 해싱값 중 일치하는 해싱값을 검색하는 제 4 단계와, 상기 검색결과 일치하는 해싱값이 존재하면 상기 검색 결과의 레코드 중 두 번째 음절에 해당하는 해싱값과 일치하는 해싱값을 갖는 레코드가 있는지 검색하는 제 5 단계, 및 입력된 단어의 각 음절에 대해 검색한 결과 동일한 해싱값을 갖는 부분을 추출한 후, 원래의 언어 코드로 복원하여 출력하는 제 6 단계를 포함하여 이루어진 것을 특징으로 한다.In addition, the mutual conversion method of different language codes according to the present invention, in the method of interconversion of different language codes applied to the device for mutual conversion of different language codes, combining the corresponding syllable pairs of different language codes After calculating a hashing value, the first step of storing each word to be converted using the same, and if the word of the first language code to be converted from the user is input, it is divided into syllable units to correspond to each syllable. A second step of generating a candidate second language code, a third step of calculating a hashing value for each syllable by using the first language code separated by the syllable unit and the candidate second language code corresponding thereto; And a fourth step of searching for a matching hashing value among the stored hashing values by using a hashing value corresponding to a first syllable among the hashing values as an index key. A fifth step of searching for a record having a hashing value that matches the hashing value corresponding to the second syllable among the records of the search result if the hashing value matching the, and the search result for each syllable of the input word And extracting a portion having the same hashing value, and restoring and outputting the original language code.

이하, 첨부된 도면을 참조하여 설명하는 본 발명은 한글과 한자의 상호 변환에 대해서만 예를 들어 설명하기로 하며, 그렇다고 하여 본 발명의 권리범위가 이에 한정되는 것은 아니다.Hereinafter, the present invention described with reference to the accompanying drawings will be described by way of example only for the mutual conversion of Hangul and Hanja, but the scope of the present invention is not limited thereto.

도 1 은 본 발명에 따른 한글과 한자 1음절의 글자 코드를 혼합하여 해싱값을 생성하는 과정을 설명하는 도면을 나타낸다.1 is a diagram illustrating a process of generating a hashing value by mixing a character code of a Hangul and Hanja one syllable according to the present invention.

서로 다른 코드 영역을 가진 2가지 언어의 단어를 상호 변환하기 위해 해싱함수를 이용하는 경우 본 발명에서 제안하는 해싱함수는 수학식1과 같이 나타낼 수 있다.When a hashing function is used to mutually convert words of two languages having different code areas, the hashing function proposed in the present invention may be expressed as Equation 1.

[수학식 1][Equation 1]

여기서,

는 코드 영역이 다른 A라는 언어와 B라는 언어를 서로 혼합하여 C라는 코드 영역으로 변환하는 해싱함수를 나타내며, 이는 충돌(collision)이 없는 완전 해싱함수이다. 또한,

는 해싱값 C로부터 원시 언어 코드인 A나 B를 복원하는 복원 함수를 나타낸다. 이 함수는 해싱함수

의 역함수로서,

이 완전 해싱함수이므로 반드시 1대1로 대응하는 출력이 존재한다.here,

Denotes a hashing function that converts a language A and another language B into a code region C, which is a complete hashing function without collision. Also,

Denotes a reconstruction function that restores the native language code A or B from the hash value C. This function is a hashing function

As an inverse of,

Since this is a complete hashing function, there is a one-to-one corresponding output.

따라서, 위의 조건을 만족하는 해싱 함수와 복원 함수를 이용하여 KSC-5601이나 Uni-코드(code)와 같이 같은 코드 체계 내에서 두 언어 간의 상호 변환기를 구현할 수 있을 뿐만 아니라 다른 코드 체계의 언어 간에도 상호 변환기를 구현할 수 있다.Thus, by using a hashing function and a reconstruction function that satisfy the above conditions, it is possible not only to implement a translator between two languages within the same code system such as KSC-5601 or Uni-code, but also between languages of different code systems. Interconverters can be implemented.

이렇게 변환하고자 하는 언어의 코드 배치에 따라 다양한 해싱 함수와 복원 함수를 만들 수 있다. A코드 영역과 B코드 영역을 혼합하여 C코드 영역에 매핑하는 과정은 이미 증명된 피전-홀(pigeon-hole) 문제와 같기 때문에 해싱 함수와 복원 함수는 반드시 존재한다.[참고, Udi Manber, Introduction to Algorithems - A Creative Approach, Addison Wesley, 1989]Depending on the code layout of the language you want to convert, you can create various hashing and recovery functions. Since the process of mixing A code domain and B code domain and mapping to C code domain is the same as the already proven pig-hole problem, a hashing function and a reconstruction function exist. [Reference, Udi Manber, Introduction] to Algorithems-A Creative Approach, Addison Wesley, 1989]

한글과 한자의 변환 과정에서 다음과 같은 조건들이 전제된다.The following conditions are assumed in the conversion process of Hangul and Hanja.

첫째, 한글을 한자로 변환할 경우 한쪽 코드는 이미 알고 있다. 즉, 한글-한자 변환의 경우 한글 코드를 알고 있는 상태에서 대응하는 한자를 찾고, 한자-한글 변환의 경우는 그 반대이다.First, when converting Hangul to Hanja, one side of the code is already known. That is, in the case of Hangul-Hanja conversion, the corresponding Hanja is found while the Hangul code is known, and in the case of Hanja-Hangul conversion, the opposite is the case.

둘째, 한글-한자 상호 변환 사전에 수록되는 단어는 KSC-5601 코드를 기본으로 한다. 즉, 한글 2350자와 한자 4888자만을 대상으로 하므로 한글과 한자의 코드 영역이 상당히 제한되어 있다.Second, the words in the Hangul-Hanja conversion dictionary are based on the KSC-5601 code. That is, since only 2350 Hangul characters and 4888 Hanja characters are used, the code area of Hangul and Hanja is quite limited.

셋째, 동음이의어의 경우 한글은 동일하지만 대응하는 한자 코드가 다르다. 따라서, 이를 혼합하여 해싱하면 서로 다른 해싱 값이 출력되므로 문제시되지 않는다.Third, in the case of homonyms, the Hangul is the same but the corresponding Hanja codes are different. Therefore, mixing and hashing them does not matter because different hashing values are output.

KSC-5601 코드 표준안에 명시한 2바이트(Byte) 한글, 한자, 특수문자의 코드 배치는 [표1]과 같다.The code layout of 2 bytes of Korean, Chinese and special characters specified in the KSC-5601 code standard is shown in [Table 1].

[표 1]TABLE 1

KSC-5601 한글, 한자, 특수문자 코드 영역KSC-5601 Korean, Chinese and Special Character Code Areas

KSC-5601KSC-5601 상위 바이트High byte 하위 바이트Lower byte 글자 수Character count 특수 문자Special Characters 0xA1 - 0xAC0xA1-0xAC 0xA1 - 0xFE0xA1-0xFE 1128자1128 characters 한글Hangul 0xB0 - 0xC80xB0-0xC8 0xA1 - 0xFE0xA1-0xFE 2350자2350 characters 한자Chinese character 0xCA - 0xFD0xCA-0xFD 0xA1 - 0xFE0xA1-0xFE 4888자4888 characters

도면을 참조하여 한글 '사(0xBBF7)'와 대응하는 한자 '社(0xDEE4)'를 하나의 해싱값으로 만드는 과정을 설명하면, 먼저, 한글 '사'와 한자 '社'의 코드에서 각각 해싱상수(0xA1A1)를 감산하여 한글 해싱값과 한자 해싱값을 계산한다. 그리고, 이 한글 해싱값과 한자 해싱값을 가산하여 하나의 해싱값 '0x5799'를 계산한다.Referring to the drawing, the process of making the Chinese character 'Sa (0xDEE4)' corresponding to Hangul 'Sa (0xBBF7)' into one hashing value will be described first. Subtract (0xA1A1) to calculate the Hangul hashing value and the Hanja hashing value. Then, the hashing value '0x5799' is calculated by adding the Hangul hashing value and the Hanja hashing value.

한글과 한자를 상호 변환하기 위한 전자 사전에는 이 해싱 값이 저장된다. 즉, 상기와 같은 한글과 한자의 상호 대응하는 1음절씩의 코드에 대한 해싱값을 자소단위로 데이타를 저장하는 데이타 저장 구조중의 하나인 트라이(trie) 구조로 각 단어에 대해 저장한다. 예를 들어 한글 '학교'와 한자 '學校'를 해싱값으로 저장하는 경우 한글 '학'과 한자 '學' 및 한글 '교'와 한자 '校'에 대해 각각 해싱값을 계산하고, 이 해싱값을 이용 '학교'라는 단어를 저장할 때, '학(學)'자의 해싱값 다음에 '교(校)'의 해싱값을 연결시켜 저장한다.This hashing value is stored in the electronic dictionary for translating Hangul and Hanja. That is, the hashing values of the codes of the one syllable corresponding to each of the Hangul and the Hanja as described above are stored for each word in a trie structure, which is one of data storage structures storing data in units of phonemes. For example, when storing Hangul 'School' and Hanja '學校' as hashing values, Hashing values are calculated for Hangul 'Hak' and Hanja '學' and Hangul 'Gyo' and Hanja '校' respectively. When the word 'school' is stored, the hash value of the 'school' is connected to the hash value of the 'school'.

한글-한자, 한자-한글 변환과정에서 이 해싱 함수를 통해 계산된 값으로 전자 사전을 검색한다. 검색한 결과는 한글과 한자가 혼합된 해싱값들로 되어 있다. 이로부터 원래 문자열을 복원하기 위해서는 해싱 함수의 역함수를 사용한다.During the Hangul-Hanja and Hanja-Hangul conversion, the electronic dictionary is searched by the calculated value through this hashing function. The search results consist of hashing values that contain Hangul and Hanja. To restore the original string from this, use the inverse function of the hashing function.

도 2 는 해싱값으로부터 한글을 복원하는 과정을 설명하는 도면이다.2 is a diagram illustrating a process of restoring Hangul from a hashing value.

해싱값으로부터 한글을 복원하기 위해서는 해싱값, 해싱 상수(0xA1A1)와 변환하고자 하는 한자를 알고 있어야 한다.To recover Hangul from a hashing value, you must know the hashing value, hashing constant (0xA1A1), and the Hanja you want to convert.

즉, 해싱값으로부터 한글을 복원할 때는 해싱값에 해싱 상수를 두 번 더한 후, 이 값에서 입력된 한자 1음절의 코드 값을 감산하면 된다.In other words, when restoring Hangul from a hashing value, the hashing value is added to the hashing value twice, and then the code value of the Hanja syllable that is inputted from this value is subtracted.

도 3 은 해싱값으로부터 한자를 복원하는 과정을 설명하는 도면으로서, 도 2 와 마찬가지로, 해싱값으로부터 한자를 복원할 때는 해싱값에 해싱 상수를 두 번 더한 후, 이 값에서 입력된 한글 1음절의 코드 값을 감산하면 된다.FIG. 3 is a diagram illustrating a process of restoring Chinese characters from a hashing value. Similar to FIG. 2, when restoring Chinese characters from a hashing value, a hashing constant is added to a hashing value twice, and then a Hangul one syllable is input. You can subtract the code value.

이를 해싱 함수와 복원 함수의 수식으로 표현하면 [수학식2]와 같다.If this is expressed as a formula of a hashing function and a restoring function, Equation 2 is expressed.

[수학식 2][Equation 2]

여기서,

는 해싱 값,

은 한글 코드 값,

는 한자 코드값, 0xA1A1은 해싱 상수를 나타낸다.here,

Is the hashing value,

Is the Hangul code value,

Is a Chinese character code value, and 0xA1A1 is a hashing constant.

도 4 는 본 발명에 따른 한글-한자 상호 변환장치의 개략적인 블럭 구성도로서, 도면에서 '11'은 한글-한자 매핑(mapping) 테이블, '12'는 한자-한글 변환부, '13'은 한글,한자 해싱값 계산부, '14'는 한글-한자 해싱값 저장부, '15'는 한글,한자 복원부를 각각 나타낸다.4 is a schematic block diagram of a Hangul-Hanja interconversion apparatus according to the present invention, where '11' is a Hangul-Hanja mapping table, '12' is a Hanja-Hangul conversion unit, and '13' is a The Hangul and Hanja hashing value calculator, '14' represents the Hangul-Hanja hashing value storage unit, and '15' represents the Hangul and Hanja restoration units, respectively.

본 발명은 한 음절이 아닌 한 단어에 대한 한글과 한자 상호 변환시에 적용되며, 한 음절에 대한 한글과 한자 상호 변환시에는 종래와 같이 수행한다.The present invention is applied when converting between Hangul and Hanja for a word instead of one syllable, and performing conversion as between conventional Hangul and Hanja for one syllable.

한글과 한자를 상호 변환하기 위해서는 먼저, 한글-한자 해싱값 저장부(14)에 대응하는 한글과 한자 1음절에 대한 해싱값이 계산되어 이 해싱값들이 각 단어별로 트라이 구조로 저장되어 있어야 한다.In order to convert between Hangul and Hanja, first, a hashing value of the Hangul and Hanja syllables corresponding to the Hangul-Hanja hashing value storage unit 14 is calculated and these hashing values should be stored in a tri-structure for each word.

먼저, 한글 단어를 입력받아 입력된 한글 단어를 1음절 단위로 분리한 다음 일반적으로 모든 워드프로세서 등에서 지원하는 방식과 같이 한글-한자 매핑 테이블(11)을 통해 각 음절에 대한 후보 한자들을 구한다. 이렇게 구한 각 음절에 대한 후보 한자들과 입력된 한글 단어의 음절은 한글,한자 해싱값 계산부(13)에 의해 입력된 한글 한음절과 대응하는 한자 1자씩에 대한 해싱값을 계산하여 다수개의 해싱값을 생성한다.First, the Hangul word is input, the input Hangul word is divided into one syllable unit, and candidate Chinese characters for each syllable are obtained through the Hangul-Hanja mapping table 11 as in a general method supported by all word processors. The candidate Hanja for each syllable and the syllable of the input Hangul word are calculated by hashing the Hangul and Hanja syllables inputted by the Hangul Hanja syllable calculator 13 and calculating the hashing values for each Hanja. Create a value.

한글,한자 해싱값 계산부(13)에서 계산된 해싱값들 중 먼저, 입력된 한글 단어의 첫음절에 대한 다수의 해싱값을 인덱스 값으로 하여 한글-한자 해싱값 저장부(14)에서 찾고, 그런 다음 검색한 해싱값에 트라이 구조로 연결된 해싱값 중에 두 번째 음절에 해당하는 해싱값을 갖는 값이 있는지를 검색한다.Among the hashing values calculated by the Hangul and Hanja hashing value calculator 13, first, a plurality of hashing values for the first syllable of the input Hangul word are searched in the Hangul-Hanja hashing value storage unit 14, Then, the searched hashing value is searched for a hashing value corresponding to the second syllable among the hashing values connected by the tri structure.

이러한 과정을 통해 해당되는 해싱값들이 추출되면 한글,한자 복원부(15)는 복원 함수를 이용하여 원래의 한문 단어를 사용자에게 출력한다.When the corresponding hashing values are extracted through this process, the Hangul and Hanja restoring unit 15 outputs the original Chinese word to the user using a restoration function.

한자 단어의 경우는 한자-한글 변환부(12)에 의해 입력된 한자 단어의 각 음절에 대한 후보 한글들을 생성하여 한글,한자 해싱값 계산부(13)에 입력된다.In the case of a Hanja word, candidate Hanguls for each syllable of the Hanja word input by the Hanja-Hangul conversion unit 12 are generated and input to the Hangul and Hanja hash value calculation unit 13.

한글,한자 해싱값 계산부(13)는 한글을 한자로 변환할 경우와 마찬가지로 입력된 한자 단어의 음절과 후보 한글 음절에 대한 해싱값을 계산하고, 한글-한자 변환과 동일한 과정을 통해 한글 단어를 사용자에게 출력한다.The Hangul and Hanja hashing value calculator 13 calculates hashing values for the syllables and candidate Hangul syllables of the input Hanja word and converts the Hangul word through the same process as the Hangul-Hanja conversion. Print to user

상기의 동작 과정을 '학교(學校)' 단어를 예를 들어 설명하면 다음과 같다.The above operation process will be described using the word 'school' as an example.

사용자가 한글 '학교'를 입력하여 한자로 변환하고자 하는 경우, 사용자가 한글 '학교'를 입력하면 한글-한자 상호변환 제어부는 입력된 '학교' 단어를 '학'과 '교'의 음절 단위로 분리한다.If the user wants to convert Hangul to 'Hanja' by inputting Hangul 'School', when the user enters Hangul 'School', the Hangul-Hanja interchange conversion control unit converts the word 'school' into syllable units of 'Hak' and 'Kyo' Separate.

그리고, 한글-한자 맵핑 테이블(11)을 이용하여 '학'에 대한 후보 한자 5개를 구해 5개의 후보 한자와 한글 '학'에 대해 해싱을 취해 5개의 해싱값을 계산한다. 마찬가지로 한글-한자 맵핑 테이블(11)을 이용하여 '교'에 대한 후보 한자 25개를 구해 25개의 해싱값을 계산한다.Then, using the Hangul-Hanja mapping table 11, five candidate Chinese characters for 'hak' are obtained, hashing is performed on five candidate Chinese characters and Hangul 'hak', and five hashing values are calculated. Similarly, using the Hangul-Hanja mapping table 11, 25 candidate Hanja characters for Kyo are obtained to calculate 25 hashing values.

한글-한자 해싱값 저장부(14)에는 '학'에 대한 해싱값과 '교'에 대한 해싱값이 트라이 구조로 연결되어 있으며, 따라서 '학교' 단어에 대해서는 하나의 연결만이 존재한다. 즉, 첫음절이 '학'인 경우의 해싱값은 5개가 있을 수 있으나, '학'다음에 연결된 '교'에 대한 해싱값을 갖는 경우는 '학교'의 의미에 맞는 경우는 하나밖에 없다. 만약, 입력된 한글 단어에 대한 한자 단어가 다수개 있을 경우에는 다수개를 사용자에게 출력한다.In the Hangul-Hanja hash value storage unit 14, the hashing value for 'hak' and the hash value for 'kyo' are connected in a tri-structure, so there is only one connection for the word 'school'. That is, when the first syllable is 'hak', there may be five hashing values. However, when the first syllable has a hashing value for 'school' connected to 'hak', there is only one case that satisfies the meaning of 'school'. If there are a plurality of Hanja words for the input Hangul word, a plurality is output to the user.

여기서, 반전한 한글-한자 맵핑 테이블과 해싱 사전은 일반 메모리에 저장된다. 이 부분은 음절 트라이를 이용한 검색이나, 이진 탐색 등 다양한 자료 구조와 검색 알고리즘을 이용하여 구성할 수 있다.Here, the inverted Hangul-Hanja mapping table and hashing dictionary are stored in general memory. This part can be constructed using various data structures and search algorithms, such as search using syllable tries, or binary search.

그 외에 한글,한자 해싱값 계산부와 한글, 한자 복원부는 덧셈 회로와 뺄셈 회로를 이용하여 구현할 수 있다.In addition, the Hangul, Hanja hash value calculation unit, and the Hangul, Hanja recovery unit can be implemented using an addition circuit and a subtraction circuit.

또한, 한글-한자 맵핑 테이블과 해싱 사전을 연관 메모리를 이용하여 저장하는 경우 한글-한자 상호 변환 속도가 다른 기법에 비해 빨라지고, 테이블 검색 회로가 불필요함으로 장치가 단순화되는 잇점이 있다.In addition, when the Hangul-Hanja mapping table and the hashing dictionary are stored using the associated memory, the Hangul-Hanja conversion speed is faster than other techniques, and the apparatus is simplified because the table search circuit is unnecessary.

도 5 는 본 발명에 따른 한글-한자 상호 변환방법의 처리 흐름도를 나타낸다.5 is a flowchart illustrating a Hangul-Hanja interconversion method according to the present invention.

먼저, 음절 단위로 대응되는 한글과 한자의 쌍에 대해 각각 해싱값을 계산하고(101), 이 계산된 해싱값을 이용하여 트라이 구조로 각 단어를 해싱값으로 저장한다(102).First, a hashing value is calculated for each pair of Hangul and Hanja corresponding to each syllable unit (101), and each word is stored as a hashing value in a tri-structure using the calculated hashing value (102).

그리고, 사용자로부터 변환하고자 하는 한글이 입력되면(103,104) 입력된 한글 단어를 음절 단위로 분리하고(105), 한글-한자 매핑 테이블을 참조하여 각각의 음절에 대한 후보 한자들을 구한다(106).When the Hangul to be converted is input from the user (103, 104), the input Hangul word is divided into syllable units (105), and the candidate Hanja for each syllable is obtained by referring to the Hangul-Hanja mapping table (106).

그리고, 각 한글 음절과 이에 대응하는 후보 한자 음절들을 결합하여 해싱값을 계산하고(109), 이렇게 구한 해싱값들 중 첫 음절에 해당하는 해싱값을 첫 번째 인덱스 값으로 하여 한글-한자 해싱 사전을 검색한다(110).Then, a hashing value is calculated by combining each Hangul syllable and the candidate Hanja syllables corresponding thereto (109), and the Hangul-Hanja Hashing Dictionary is obtained by using the hashing value corresponding to the first syllable among the obtained hashing values as the first index value. Search 110.

검색 결과, 해당되는 값이 존재하면 찾은 해싱값 다음에 연결된 해싱값들 중 두 번째 음절에 대한 해싱값과 동일한 값이 존재하는지 검색한다(111).As a result of the search, if a corresponding value exists, it is searched whether there is a value equal to a hashing value for the second syllable among the connected hashing values after the found hashing value (111).

이러한 방식으로 입력된 모든 단어의 각 음절에 대해 검색을 수행하여(112) 한글-한자 해싱 사전으로부터 동일한 값이 하나라도 존재하면 이 값을 추출하여 해싱값을 원래의 한자로 복원한 후(113), 이 복원된 한자를 회면에 출력한다(114).In this way, a search is performed on each syllable of all words inputted in this manner (112). If there is any same value from the Hangul-Hanja hashing dictionary, this value is extracted, and the hashing value is restored to the original Hanja (113). In step 114, the restored Chinese characters are output to the screen.

여기서, 검색하는 과정은 종래와 유사하며, 종래와 같이 입력된 한글 단어에 모두 해당하는 한자 단어가 없으면, 일치하는 한자 단어까지만 사용자에게 출력한다.Here, the searching process is similar to the conventional one, and if there is no Hanja word corresponding to all Korean words input as before, only the matching Hanja word is output to the user.

사용자로부터 변환하고자 하는 한자가 입력되면(103,104) 입력된 한자 단어를 음절 단위로 분리하고(107), 한자-한글 변환을 통해 각 음절에 대한 후보 한글들을 구한다(108).When the Chinese characters to be converted are input from the user (103, 104), the input Chinese characters words are divided into syllable units (107), and candidate Hanguls for each syllable are obtained through the Hanja-Hangul conversion (108).

그리고, 한글-한자 변환 과정과 동일한 과정(109 내지 113)을 통해 입력된 한자 단어에 대한 한글을 복원하여 출력한다(114).In operation 114, the Hangul for the Hanja word input through the same process (109 to 113) as the Hangul-Hanja conversion process is output.

이상에서 설명한 본 발명은 본 발명이 속하는 기술분야에서 통상의 지식을 가진자에게 있어 본 발명의 기술적 사상을 벗어나지 않는 범위내에서 여러가지 치환, 변형 및 변경이 가능하므로, 전술한 실시예 및 도면에 한정되는 것이 아니다.The present invention described above is capable of various substitutions, modifications, and changes within the scope without departing from the technical spirit of the present invention for those skilled in the art to which the present invention pertains, and thus is limited to the above-described embodiments and drawings. It is not.

상기와 같이 이루어지는 본 발명은 다음과 같은 특유한 효과를 갖는다.The present invention made as described above has the following unique effects.

첫째, 한글-한자 상호 변환 사전을 하나의 사전으로 구성할 수 있고, 종래에 비해 저장 공간을 1/4 이상 감소시킬 수 있다. 즉, 패배(敗北)와 패인(敗因)을 예로 들면, 종래에는 32바이트의 기억 공간이 필요하였으나, 본 발명은 트라이 구조로 표현하는데 6바이트가 필요하고, 트라이 노드를 유지하는데 추가로 2바이트가 필요하여 총 8바이트의 기억 공간이 필요하다.First, the Hangul-Hanja conversion dictionary may be configured as a dictionary, and the storage space may be reduced by 1/4 or more as compared with the conventional art. In other words, in the case of defeat and pain, for example, 32 bytes of storage space are conventionally required, but the present invention requires 6 bytes to represent a tri structure, and 2 bytes are additionally used to maintain a tri node. A total of 8 bytes of storage is needed.

둘째, 한글-한자 상호 변환을 종래에 비해 월등히 빨리 수행할 수 있다.Second, the Hangul-Hanja conversion can be performed much faster than before.

셋째, 한글-한자 상호 변환장치의 하드웨어화가 용이하며, 이렇게 하여 개인용 단말기에 추가할 경우 제한된 크기의 메모리에 보다 많은 개수의 한글과 한자 단어 쌍을 저장함으로써, 한글-한자 상호 변환기의 성능을 개선할 수 있다.Third, it is easy to hardwareize the Hangul-Hanja interconverter, and when it is added to a personal terminal, it is possible to improve the performance of the Hangul-Hanja translator by storing a larger number of Hangul and Hanja word pairs in a limited size memory. Can be.

넷째, 연관 메모리를 사용하는 경우 한글-한자 상호 변환 속도를 더욱 높일 수 있다.Fourth, when the associative memory is used, the Hangul-Hanja conversion speed can be further increased.

Claims

An apparatus for translating different language codes,

First storage means for storing a word formed by using hashing values calculated by combining syllables of a second language code corresponding to a syllable having a first language code;

Second storage means for mapping and storing syllables of the second language code corresponding to syllables of the first language code;

Conversion means for converting a syllable of the second language code input from a user into a syllable of the first language code corresponding thereto;

Receives a word of the first language code input from the user in syllable units, receives syllables of a candidate second language corresponding to each syllable of the input first language code word from the second storage means, or the user A syllable unit for a word input by receiving a word of a second language code input from a syllable unit and receiving a candidate first language code of a syllable unit corresponding to a word of the input second language code from the converting means Hashing value calculating means for calculating a hashing value of;

Receives a language code word input from a user in syllable units, receives hashing values extracted from the first storage means using hashing values generated by the hashing value calculating means as index keys, and converts the hashing value to the original language code. Restoring means for restoring the output; And

And a control means for separating words of a language code input from the user by syllable units and performing access control and memory management of the first and second storage means.

The method of claim 1,

And said first and second storage means comprise an associative memory.

The method according to claim 1 or 2,

The first storage means,

Inter-converter of different language codes, characterized in that each word is stored by storing a hashing value in a syllable unit tri structure.

The method of claim 3, wherein

The hashing value calculation means,

A hashing constant is subtracted from each of the one syllable code value of the first language code and the one syllable code value of the second language code, and then the summed values of the subtracted first and second language codes are added to hashing. Interconverter of different language code, characterized in that the value is calculated.

The method of claim 3, wherein

The restoration means,

And adding a hashing constant twice to a hashing value output from the first storage means, and subtracting a syllable for a word input from the user to restore an original language code.

In the method of interconversion of different language codes applied to a device for translating different language codes,

Calculating a hashing value by combining corresponding syllable pairs of different language codes, and storing each word to be converted using the same;

A second step of generating a candidate second language code corresponding to each syllable by dividing the word of the first language code to be converted from the user into syllable units;

Calculating a hashing value for each syllable by using the first language code separated by the syllable unit and the candidate second language code corresponding thereto;

A fourth step of searching for a matching hashing value among the stored hashing values by using a hashing value corresponding to a first syllable among the hashing values as an index key;

A fifth step of searching for a record having a hashing value corresponding to a hashing value corresponding to a second syllable among records of the search result if a hashing value matching the search result exists; And

And extracting a portion having the same hashing value as a result of searching for each syllable of the input word, and then restoring and outputting the original language code.

The method of claim 6,

The hashing value is,

Subtract a hashing constant from each of the one syllable code value of the first language code and the one syllable code value of the second language code, and then calculate the sum of the subtracted first and second language codes. Interconversion method of different language codes, characterized in that the.

The method of claim 7, wherein

And a hashing value is stored in a tri-structure of syllable units when the word is stored in the first step.

The method of claim 8,

The hashing value restoration is,

And adding a hashing constant twice to the extracted hashing value, subtracting a syllable for a word input from the user, and restoring an original language code.