JPS6029989B2

JPS6029989B2 - Korean sorting control system

Info

Publication number: JPS6029989B2
Application number: JP56048282A
Authority: JP
Inventors: 澄佐々木; 正太郎喜柳; 敏夫斎藤; 秀昭柳
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1981-03-31
Filing date: 1981-03-31
Publication date: 1985-07-13
Also published as: JPS57162017A

Description

【発明の詳細な説明】本発明は、韓国語ソート制御システム、特にハングル文
字と漢字と英・数字とを含む韓国語単語列を読みに対応
した順序に順序付けするに当って、各文字毎に読みを数
値コード化した辞書を用意し、当該辞書を索引して上記
韓国語単語を数値コード化して、上記順序付けを行なう
ようにした韓国語ソート制御システムに関するものであ
る。DETAILED DESCRIPTION OF THE INVENTION The present invention provides a Korean language sorting control system, in particular, in ordering a Korean word string including Hangul characters, Chinese characters, and alphanumeric characters into an order corresponding to the reading. The present invention relates to a Korean language sorting control system which prepares a dictionary in which the pronunciations are numerically coded, indexes the dictionary, numerically codes the Korean words, and performs the above ordering.

一般に上記ハングル文字は、いわば「子音−母音一子音
」の組合わせによって構成された表音文字と考えてよく
、第１図Ａ図示の如き母音群と第１図Ｂ図示の如き子音
群とによって構成される。なお第１図Ａ，Ｂにおいて、
各上段はハングル字素を示し、各下段は対応する発音を
示している。したがって、例えば第２図Ａに示す２文字
は、第２図Ｂ図示のように夫々の字素に対応する発音を
割つけてみると判る如く、第２図Ｃ図示の“ｈａｎ■ｅ
ｌ”と発音される。In general, the above-mentioned Hangul characters can be thought of as phonetic characters composed of a combination of "consonant - vowel and one consonant", and are composed of a vowel group as shown in Figure 1A and a consonant group as shown in Figure 1B. configured. In addition, in Fig. 1 A and B,
Each upper row shows a Hangul glyme, and each lower row shows the corresponding pronunciation. Therefore, for example, the two characters shown in FIG. 2A are ``han■e'' shown in FIG.
It is pronounced as "l".

これらの通常使用頻度の高いハングル文字や、上記漢字
および英・数字に対して、最近ＫＥＦ（Ｋｏｒｅａｎ
ＰｒＭｅｓｓｉｎｇＥ幻ｅｎｄｅｄＦｅａｔｍｅ）コ
ードが制定されている。Recently, KEF (Korean
PrMessingE phantom ended Featme) code has been established.

しかし、該ＫＥＦコードは各文字の読みに一意に関連づ
けられているものではないために、上記ハングル文字を
含む韓国語単語について、例えば名簿作成などのために
読みに対応して順序付けを行なう場合にはその処理がき
わめて煩雑となる。本発明は、この点を解決するシステ
ムを提供することを目的としており、第２図○図示の如
く、各ハングル文字が１つまたは複数個の字素の組合わ
せによって構成されることに着目して各字素を読み対応
に数値コード化したコードを割付けるようにし、当該コ
ード化された数値コードにもとづいて、昇順および／ま
たは降順に配列するようにすることを目的としている。However, since the KEF code is not uniquely associated with the pronunciation of each character, when ordering Korean words containing the above-mentioned Hangul characters according to their pronunciations, for example, to create a list, etc. The process becomes extremely complicated. The present invention aims to provide a system that solves this problem, and focuses on the fact that each Hangul character is composed of a combination of one or more graphemes, as shown in Figure 2. The object of this invention is to allocate a numerically encoded code corresponding to the reading of each grapheme, and to arrange the characters in ascending order and/or descending order based on the encoded numerical code.

そしてそのため、本発明の韓国語ソート制御システムは
、ハングル文字と漢字と美・数字とに対して夫々予め定
められたコードが割付けられているハングル文字を含む
単語列を上記ハングル文字や漢字の読みに対応する順序
に順序付けられたコードを利用して順序付けする韓国語
ソート制御システムにおいて、上記各ハングル文字と漢
字と英・数字とに対応して少なくとも当該文字の読みと
上記漢字に対応して与えられる画数とを数値コード化し
て格納する文字属性辞書をそなえると共に、上記単語列
の各単語に対応して当該単語に含まれる各文字について
上記文字属性辞書あるいは当該文字属性辞書から準備さ
れたソート用文字属性ロード・モジュールを索引して当
該単語に対応する数値コードを生成する数値コード生成
部をそなえ、かつ該数値コード生成部によって生成され
た数値コ−ド‘こもとづいて上記単語列中の各単語を上
記数値コードの昇順および／または降順に配列するソー
ト処理部をそなえ、該ソート処理部によって配列された
結果の複数の数値コード列の夫々の数値コードから夫々
対応する単語を抽出するようにしたことを特徴としてい
る。以下図面を参照しつつ説明する。第３図は本発明シ
ステムの一実施例全体構成図、第４図ＡないしＥは本発
明に用いる数値コードの態様を説明する説明図、第５図
は第３図図示の文字属性辞書に格納される数値コードの
一実施例態様、第６図は第３図図示のソート用文字属性
ロード・モジュールに格納される数値コードの一実施例
態様、第７図は与えられたソート対象単語に対応して数
値コードが附与されたソート用デー夕を説明する説明図
、第８図は上記文字属性辞書の−実施例態様、第９図は
上記ソート用文字属性ロード・モジュールの一実施例態
様、第１０図ＡないしＤは本発明にいう頭音規則を説明
する説明図、第１１図ＡないしＣは韓国語に存在する重
子音の取扱いを説明する説明図、第１２図は重子音を正
しく取扱った結果において得られるソート結果を説明す
る説明図、第１３図は頭音規則を適用しない場合と適用
した場合とのソート結果を説明する説明図を示す。第３
図において、１は与えられた単語の１つ×、２は数値コ
ード生成部、３はソート処理部、４はソート用データ、
５はソート結果、６は文字属性辞書、７はソート用文字
属性ロード・モジュールを表わしている。Therefore, the Korean language sorting control system of the present invention selects word strings containing Hangul characters to which predetermined codes are assigned to Hangul characters, Chinese characters, and numbers. In a Korean sorting control system that uses codes ordered in an order corresponding to In addition to providing a character attribute dictionary that numerically encodes and stores the number of strokes in the word string, a sorting dictionary prepared from the character attribute dictionary or the character attribute dictionary for each character included in the word corresponds to each word in the word string. It is equipped with a numeric code generation unit that indexes a character attribute load module and generates a numeric code corresponding to the word, and each of the word strings in the word string is A sorting processing unit is provided for arranging words in ascending order and/or descending order of the numerical codes, and a corresponding word is extracted from each numerical code of a plurality of numerical code strings arranged by the sorting processing unit. It is characterized by what it did. This will be explained below with reference to the drawings. Fig. 3 is an overall configuration diagram of an embodiment of the system of the present invention, Figs. 4 A to E are explanatory diagrams explaining aspects of numerical codes used in the present invention, and Fig. 5 is stored in the character attribute dictionary shown in Fig. 3. 6 shows an example of the numeric code stored in the character attribute load module for sorting shown in FIG. 3, and FIG. 7 corresponds to a given word to be sorted. FIG. 8 is an embodiment of the character attribute dictionary, and FIG. 9 is an embodiment of the sorting character attribute load module. , Figures 10 A to D are explanatory diagrams explaining the initialization rules according to the present invention, Figures 11 A to C are explanatory diagrams explaining the handling of double consonants that exist in Korean, and Figure 12 is an explanatory diagram explaining the handling of double consonants. FIG. 13 is an explanatory diagram illustrating the sorting results obtained when handling correctly, and FIG. 13 is an explanatory diagram illustrating the sorting results when the initialization rule is not applied and when it is applied. Third
In the figure, 1 is one of the given words, 2 is a numerical code generator, 3 is a sort processing unit, 4 is data for sorting,
5 represents the sorting result, 6 represents the character attribute dictionary, and 7 represents the character attribute load module for sorting.

第３図図示実施例については、後でより具体的に説明す
るが、概念的に言えば次の如く順序付け処理が行なわれ
るものと考えてよい。The embodiment shown in FIG. 3 will be explained in more detail later, but conceptually speaking, it can be considered that the ordering process is performed as follows.

‘１｝今単語×１がソート対象単語として入力された
とする。'1} Assume that word x 1 is now input as a word to be sorted.

この場合、各文字「陸」，ら７も・・・について、ＫＥ
Ｆコードをキィとして、文字属性辞書６および／または
ソート用文字属性ロード・モジュール７を索引する。■
そして各文字「陸」・・・に対応する読みの数値コー
ド「２６５６０２」，「０２３４００」……を抽出した
結果与えられる読み指示部と、後述する重子音の存在を
指示する重子音指示部と、漢字に対する画数を指示する
画数指示部とよりなる数値コードを生成する。In this case, for each character "Riku", and 7..., KE
The character attribute dictionary 6 and/or the sorting character attribute load module 7 are indexed using the F code as a key. ■
Then, a reading instruction part given as a result of extracting the reading numerical codes "265602", "023400", etc. corresponding to each character "Riku", etc., and a double consonant indication part that indicates the presence of a double consonant, which will be described later. , generates a numerical code consisting of a stroke count instruction part that specifies the number of strokes for a kanji.

（３１該数値コードは上記単語ＸＩの頭部に連結され
て、ソート用データ４がつくられる。(31) The numerical code is connected to the head of the word XI to create sorting data 4.

‘４１ソート処理部３は、当該ソート用データ４を重
みをもつ数値とみなして、従来周知の如く、昇厭あるし
、は降順に順序付けする。'41 The sorting processing unit 3 considers the sorting data 4 to be a numerical value with weight, and orders the sorting data 4 in ascending or descending order, as is well known in the art.

｛５｝ソート処理部３は、上記順序付けされた結果の
ソート用データについて、頭部に付与されている数値コ
ードを削除して、単語のみのソート結果５を出力する。{5} The sorting processing unit 3 deletes the numerical code given to the head of the sorting data of the ordered results and outputs the sorting result 5 of only words.

以下単語を数値コード化する態様について説明する。第
４図Ａはハングル文字および漢字の読みを数値コード化
する態様を示している。上述の如くハングル文字は一般
に３個の字素の組合わせで構成される。漢字の読みもハ
ングル文字に対応づけることができる。これらは、初声
，中声，終声と呼ばれる。該読みをコード化するに当っ
ては、予め定められたコードに対応して、初声のために
１バイト分、中声のために１バイト分、終声のために１
バイト分が準備される。なお終声が存在しない場合には
１６隻表示で▼００▼が与えられる。第４図Ｂはいわゆ
る空白を数値コード化する態様を示している。この場合
にも３バイト分が用意されるが、最初の１バイト分に１
６隼表示で▼００▼が与えられかつ後の２バイト分に「
空白」を示すＫＥＦコード▼４０４０▼が与えられる。
第４図Ｃは英・数字を数値コード化する態様を示してい
る。この場合にも３バイト分が用意され、最初の１バイ
ト分に１６進表示で▼ＦＦ▼が与えられ、後の２バイト
分に夫々の英・数字文字のＫＥＦコードが与えられる。
韓国語には第１図Ｂ図示の「フフレ「［［一，「ｄ且し
「仏レ「スス一の如く重子音が存在する。該重子音の取
扱いについては後述されるが、初声と終声とに夫々童子
音が存在することがある。このために、１つの単語を構
成する文字数をＮとするとき、（Ｎ十３）／４の商に相当するバイト数を重子音指示部として用意し、
当該単語を構成する各文字について２ビットを準備する
。The manner in which words are numerically encoded will be described below. FIG. 4A shows a mode of numerically encoding the pronunciations of Hangul characters and Chinese characters. As mentioned above, Hangul characters are generally composed of a combination of three graphemes. Kanji readings can also be mapped to Hangul characters. These are called the first voice, middle voice, and final voice. When encoding the reading, one byte is used for the first voice, one byte for the middle voice, and one byte for the final voice, corresponding to a predetermined code.
A part-time job is prepared. If there is no final voice, ▼00▼ will be given with 16 ships displayed. FIG. 4B shows a mode in which so-called blank spaces are numerically encoded. In this case, 3 bytes are also prepared, but 1 byte is provided for the first byte.
▼00▼ is given in 6-Hayabusa display, and “
A KEF code ▼4040▼ indicating “blank” is given.
FIG. 4C shows a mode of converting alphanumeric characters into numerical codes. In this case as well, 3 bytes are prepared, the first 1 byte is given ▼FF▼ in hexadecimal notation, and the next 2 bytes are given KEF codes of alphanumeric characters.
In Korean, there are double consonants such as "Fufure" [[1, "d" and "Susu 1" shown in Figure 1 B. The handling of these double consonants will be explained later, but they can be For this reason, when the number of characters that make up one word is N, the number of bytes corresponding to the quotient of (N13)/4 is used as the double consonant indicator. Prepared as
Two bits are prepared for each character that makes up the word.

そして、第４図Ｄ図示の如く、或る文字の初声に重子音
が存在するとき先頭ビットに「１」が与えられ、終声に
重子音が存在するとき後のビットに「１」が与えられる
。したがって、或る文字についての２ビットには、「０
０」，「１０」，「０１」，「１１」のいずれかが与え
られることとなる。更に漢字の場合には、読み自体でみ
るといわゆる同音異字が存在する。As shown in Figure 4D, when a double consonant is present in the first voice of a certain character, "1" is given to the first bit, and when a double consonant is present in the final voice of a certain character, "1" is given to the later bit. Given. Therefore, the 2 bits for a certain character contain "0".
One of "0", "10", "01", and "11" will be given. Furthermore, in the case of kanji, there are so-called homonyms when looking at the reading itself.

このことを考慮して、ソート対象単語を構成する各文字
について、第４図Ｅ図示の如く、１バイト分の画数指示
部が準備される。そして、非漢字の場合には１６隼表示
で▼００▼が与えられ、漢字の場合には１６隻表示で当
該漢字の画数が与えられる。第３図図示のソート用デー
タ４においては、入力された単語ＸＩについて、■ 漢
字「陸」に対応する読み数値コード「２６５６０２」が
与えられ、ハングル文字「打ちに対応する読み数値コー
ド「０２３４００」が与えられ、空白に対応して数値コ
ード「００４０４０」が与えられ、・・・・・・，英字
「Ａ」に対応して数値コード「ＦＦＡに１」が与えられ
、数字「１」に対応して数値コード「ＦＦＡ紐１」が与
えられている。In consideration of this, a 1-byte stroke number indicating section is prepared for each character constituting the word to be sorted, as shown in FIG. 4E. In the case of a non-kanji character, ▼00▼ is given in a 16-hayabusa display, and in the case of a kanji character, the number of strokes of the kanji is given in a 16-hayabusa display. In the sorting data 4 shown in FIG. 3, for the input word XI, ■ the reading numerical value code "265602" corresponding to the Chinese character "Riku" is given, and the reading numerical value code "023400" corresponding to the Hangul character "Uchi" is given. is given, the numeric code ``004040'' is given for the blank space, and the numeric code ``1 for FFA'' is given for the alphabetic letter ``A'', which corresponds to the number ``1''. The numerical code "FFA cord 1" is given.

（Ｂ’更に重子音指示部においては、次の如く数値コー
ドｒ２３００００」が与えられている。(B'Furthermore, in the double consonant indication section, the following numerical code r230000'' is given.

即ち、入力された単語×１中に９文字存在することかり
、（９十３）／４＝３となり、３バイト分の重子音指示部が準備される。That is, since there are 9 characters in the input word x 1, (913)/4=3, and a 3-byte double consonant indicating part is prepared.

そして、文字「陸」が重子音をもたないことから第４図
Ｄの２ビットとしてｒｏｏ」が、文字ワフ上が初声に重
子音をもつことから「１０」が、空白が重子音をもたな
いことから「００」が、文字「一が初声と終声とに重
子音をもつことから「１１」が、・・・・・・与えられ
、結局順に並べると「００１０，００１１，００００，
……」となることから１６隻表示で「２３００００」が
与えられている。【Ｃ）また画数指示部においては、単
語Ｘ中の漢字「陸」と「肉」とについて夫々画数「順一
と「０６」とが与えられている。第５図は、第３図図示
の文字属性辞書に格納される数値コードの一実施例態様
を示している。Since the character ``Riku'' does not have a double consonant, the 2 bits in Figure 4 D are ``roo'', and since the letter ``Wahu'' has a double consonant in the first voice, ``10'' is used, and the blank has a double consonant. ``00'' is given because the letter ``1'' has double consonants in the initial and final sounds, and ``11'' is given because the letter ``1'' has a double consonant in the first and final sounds. 0000,
...'', so ``230,000'' is given with 16 ships displayed. [C) Also, in the stroke number instruction section, the stroke numbers "Junichi" and "06" are given to the kanji characters "Riku" and "Meat" in the word X, respectively. FIG. 5 shows an embodiment of the numerical code stored in the character attribute dictionary shown in FIG.

上述の如く、ハングル文字は夫々字素によって組立てら
れる。このことから、各字素あるし、は字素群について
、読みに対応した順序をもつ数値コードを割付けておく
ようにされる。即ち、図示「７」に対応して▼０２▼を
、「７７」に対応して▼０３▼を、・・・・・・の如く
割付けておくようにする。当該文字属性辞書は、本発明
の対象とする読みによるソートにのみ利用されるもので
はないことから、図示の如く重子音に対しては奇数コー
ドが与えられている。一方、第６図は、第３図図示のソ
ート用文字属性ロード・モジュールに格納される同様な
数値コードを示している。As mentioned above, each Hangul character is composed of glymes. For this reason, numerical codes with an order corresponding to the pronunciation are assigned to each grapheme or grapheme group. That is, ▼02▼ is assigned to correspond to "7" shown in the figure, ▼03▼ is assigned to correspond to "77", and so on. Since the character attribute dictionary is not only used for sorting by reading, which is the object of the present invention, odd codes are given to double consonants as shown in the figure. On the other hand, FIG. 6 shows a similar numerical code stored in the sorting character attribute load module shown in FIG.

該ロード・モジュールは、ソートのためにのみ利用され
るものであることから、重子音は元の子音と同じコード
が与えられている点において、第５図図示の場合と異な
る。即ち子音「７」に対してもまた重子音「７７」に対
しても同じコード▼０２▼が与えられている。この理由
は後述される。第３図図示のソート用データ中の文字ら
フＬについては、重子音「フフ」に対応する数値コード
▼ｏ２▼と、母音「ｆ」に対応する数値コード▼乳▼と
、終声が存在しないことを示すコード▼００▼とによっ
て、１つの数値コード▼０２３４００▼が与えられてい
る。Since the load module is used only for sorting, the case differs from that shown in FIG. 5 in that the double consonant is given the same code as the original consonant. That is, the same code ▼02▼ is given to both the consonant "7" and the double consonant "77". The reason for this will be explained later. Regarding the character Rafu L in the sorting data shown in Figure 3, there is a numerical code ▼o2▼ corresponding to the double consonant “fufu”, a numerical code ▼milk▼ corresponding to the vowel “f”, and a final voice. One numeric code ▼023400▼ is given by the code ▼00▼ indicating that it is not possible.

第７図は例えばソート対象単語が「ユーザ指定ソート・
キー」の形で４文字分（８バイト）で与えられたとする
場合に、生成されるソート用デー夕のフオーマツトを示
している。Figure 7 shows, for example, that the words to be sorted are
This shows the format of sorting data that is generated when 4 characters (8 bytes) are given in the form of "key".

即ち、４文字分の各文字について、第４図ＡないしＣに
関連して説明した如き３バイト分の数値コードが合計１
２バイト分与えられる。また重子音指定部として（４十
３）／４＝１…余り３であることから、１バイト分が与
えられる。That is, for each of the four characters, there is a total of one three-byte numerical code as explained in connection with Figures 4A to C.
Two bytes are given. Also, as the double consonant designation part, (43)/4=1...Since the remainder is 3, 1 byte is given.

更に画数指示部として４バイト分が与えられる。そして
、このように生成された数値コードが入力された単語Ｘ
の頭部に附加されて、全体としてソート用データ４とな
る。該ソート用データ４は図示最左端が最上位桁となる
如き重みをもった数値コードであり、各ソート用データ
は当該数値の大小関係にもとづいて第３図図示のソート
処理部３によって昇”頃あるし・は降順に配列される。
そして、その結果から、単語×の部分のみが抽出されて
、第３図図示のソート結果５となる。第８図は文字属性
辞書の一実施例態様を示している。Furthermore, 4 bytes are given as a stroke number instruction section. Then, the word X into which the numerical code generated in this way is input
is added to the head of the file, and the data becomes sorting data 4 as a whole. The sorting data 4 is a weighted numerical code such that the leftmost digit in the figure is the most significant digit, and each sorting data is sorted by the sorting processing unit 3 illustrated in FIG. 3 based on the magnitude relationship of the numerical values. Kororashi・are arranged in descending order.
Then, from the result, only the word x portion is extracted, resulting in sorting result 5 shown in FIG. FIG. 8 shows an embodiment of the character attribute dictionary.

当該辞書６内には、各文字に対応して音読み数値コード
と読み数値コードとが与えられている。該辞書６内にお
いては、重子音については奇数値のコードをもつものと
して格納されている。なお漢字ハングル文字の場合には
、音読みと訓読みとの区別が存在しない。しかし、韓国
語においては、後述する如く、頭音規則が存在しており
、図示の読み数値コードとして頭音規則が適用された場
合の読みが数値コードとして格納される。勿論頭音規則
の適用を受けない文字については、図示音読み数値コー
ド城と読み数値コード城とで同じ数値コードが格納され
ている。このために、後述する如く、単語の先頭に当該
文字が現われている場合に、頭音規則が適用されるか否
かに拘らず、図示読み数値コードを、当該文字に対する
コードとして抽出すればよい。第９図はソート用文字属
性ロード・モジュールの一実施例態様を示している。In the dictionary 6, a phonetic reading numerical value code and a reading numerical value code are given corresponding to each character. In the dictionary 6, double consonants are stored as having odd-valued codes. In the case of Hangul kanji, there is no distinction between Onyomi and Kunyomi. However, in Korean, as will be described later, there is an initialization rule, and the reading when the initialization rule is applied is stored as a numerical code as the illustrated numerical reading code. Of course, for characters to which the initial sound rule is not applied, the same numerical code is stored in the illustrated phonetic reading numerical code castle and the reading numerical code castle. To this end, as will be described later, when the character appears at the beginning of a word, the illustrated reading numerical code can be extracted as the code for the character, regardless of whether or not the initialization rule is applied. . FIG. 9 shows an embodiment of the sorting character attribute load module.

該モジュール７内には、各文字に対応して、図示の如く
「総画数情報」，「音読み数値コードハ「重子音情報」
，「読み数値コード（観音規則適用）」などが格納され
ている。総画数情報は、第８図図示の文字属性辞書６中
にも存在し、第４図Ｅを参照して説明した情報である。
また音読み数値コードと読み数値コードとは、第８図図
示の文字属性辞書６に関連して説明したので省略するが
、ハングル文字については頭音規則を適用する場合の読
みの数値コードが「読み数値コード」として格納されて
いる。ただ、該モジュール７内における読みの数値コー
ドとしては、第６図に関連して説明した如く、重子音に
対応するコー日ま偶数値のコードに書替えられている。
そして、このことを補なうために、該モジュール７にお
いては「重子音情報」が各文字毎に１バイト分準備され
ている。即ち、図示の如く重子音が存在する態様に応じ
て、「０○００○○００」，「００００００１○」「０
○○○０００１」，「００００００１１」のいずれかの
パターンが重子音情報として格納されている。In the module 7, corresponding to each character, "total number of strokes information", "sound reading numerical code" and "double consonant information" are stored as shown in the figure.
, ``Numerical reading code (Kannon rule applied)'', etc. are stored. The total number of strokes information also exists in the character attribute dictionary 6 shown in FIG. 8, and is the information described with reference to FIG. 4E.
Furthermore, the on-yomi numerical code and the pronunciation value code have been explained in connection with the character attribute dictionary 6 shown in FIG. It is stored as a numeric code. However, as explained in connection with FIG. 6, the numerical code for the reading in the module 7 is rewritten to an even-numbered code for the co-digital consonant.
In order to compensate for this, the module 7 prepares one byte of "duplex consonant information" for each character. That is, as shown in the figure, depending on the state in which double consonants are present, "0○00○○00", "0000001○", "0
Either pattern ``○○○0001'' or ``00000011'' is stored as double consonant information.

第９図図示のソート用文字属性ロード・モジュール７に
格納されている文字を以下第１水準の文字と呼び、第８
図図示の文字属性辞書６に格納されている文字を第２水
準の文字と呼ぶ。The characters stored in the character attribute load module 7 for sorting shown in FIG.
The characters stored in the illustrated character attribute dictionary 6 are called second-level characters.

第３図図示の数値コード生成処理部２において、上記文
字属性辞書６−から第２水準の文字を索引して利用する
に当っては、読みに対応する数値コードを先頭から調べ
、もしも奇数コードであれば当該コードの値から−１し
た上で上記重子音情報として上述のビットを「１」にす
るようにする。この理由は０第１１図を参照して後述
される。ここで上述の頭音規則について、第１０図を参
照して説明しておく。In the numerical code generation processing unit 2 shown in FIG. If so, the value of the code is subtracted by 1, and the above-mentioned bit is set to "1" as the double consonant information. The reason for this will be explained later with reference to FIG. The above-mentioned initialization rule will now be explained with reference to FIG. 10.

韓国語においては、第１０図Ａ，Ｂ，Ｃ図示の各上段に
示されるハングル文字が単語の先頭に現われるときには
、図示各下段夕に示されるハングル文字を利用するよう
にされる。即ち、単語の先頭に現われた場合に発音が変
化される。例えば第１０図Ｄ図示の如く、単語「良心」
を発音するに当っては、本来図示「誤」の欄に示される
ようにハングル文字が対応するにｏも拘らず、図示「正
一の欄に示されるように変更される。このことから、第
８図および第９図に示した読み数値コード城こは、文字
「音」に対応して文字「翼ヒ」に対応する数値コードが
格納されると考えてよい。In Korean, when the Hangul characters shown in the upper rows of FIGS. 10A, B, and C appear at the beginning of a word, the Hangul characters shown in the lower rows of the diagram are used. That is, the pronunciation changes when it appears at the beginning of a word. For example, as shown in Figure 10D, the word "conscience"
When pronouncing , even though the Hangul character o originally corresponds as shown in the "Incorrect" column in the illustration, it is changed as shown in the "Shoichi" column in the illustration.From this, It can be considered that the numerical reading code ``joko'' shown in FIGS. 8 and 9 stores the numerical code corresponding to the character ``tsubasahi'' in correspondence with the character ``on''.

そして、頭音規則が適用されるか否かによって、抽出す
る数値コードを選択するようにされる。第１１図Ａない
しＣは、ソート処理に当っての重子音の取扱いについて
説明する説明図である。Then, the numerical code to be extracted is selected depending on whether or not the initial sound rule is applied. FIGS. 11A to 11C are explanatory diagrams illustrating how double consonants are handled in the sorting process.

今文字「フｒ」，ワフＬ，「フ！」とをソートするに当
っては、第１１図Ｂ図示の如く配列されるべきものであ
る。しかし、これらの文字については、文字属性辞書６
内で、第５図から明らかな如く、文字「オ」に対して▼
ｏ２３４００▼が与えられ、文字ワフＬに対して▼ｏ３
３４００▼が与えられ、文字「フｌ」に対して▼０２球
００▼が与えられている。この結果、読みに対応する数
値コードのみを比較してソートすると、０２斑００く０
２５ＣＯひく０３３４００であることから、第１１図Ｃ
図示の如く配列されることとなる。When sorting the characters ``r'', ``waff L'', and ``fu!'', they should be arranged as shown in FIG. 11B. However, for these characters, character attribute dictionary 6
As is clear from Figure 5, for the letter "o" ▼
o23400▼ is given, and for the letter Waf L, ▼o3
3400▼ is given, and ▼02 ball 00▼ is given for the letter "Fl". As a result, if we compare and sort only the numerical codes corresponding to the readings, we can see that 02 spots 00ku0
Since 25CO minus 033400, Figure 11C
They will be arranged as shown.

この点を改善すべ〈、ソート用データ４として、重子音
情報城を用意すると共に、重子音に対応する数値コード
を通常の子音のそれと同じコードをとるようにしている
。This point should be improved by preparing double consonant information as sorting data 4 and using the same numerical code for double consonants as that for normal consonants.

即ち、文字「フト」に対応して▼０２３４００，…・・
・，０（重子音域）▼を与え、文字マレ対応して▼０２
３４００’・・．・・・’２（重子音域）▼を与え、文
字「ァｌ」に対応して▼０２皮００・・・・・・０（重
子音域）▼を与えるようにしている。このようにするこ
とによって、０２３４００…０＜０２３４００…２＜
０２９０……０となり、第１１図Ｂ図示の如く正しく
配列することが可能となる。That is, corresponding to the character "futo" ▼023400,...
・、Give 0 (heavy consonant range) ▼ and deal with the characters with the wrong character ▼02
3400'... ...'2 (heavy consonant range) ▼ is given, and corresponding to the letter ``al'', ▼02 rin 00...0 (heavy consonant range) ▼ is given. By doing this, 023400...0<02 3400...2<
02 90...0, and it becomes possible to arrange them correctly as shown in FIG. 11B.

第１２図は重子音を上述の如く正しく取扱った結果にお
いて得られるソート結果を説明している。FIG. 12 explains the sorting results obtained when the double consonants are treated correctly as described above.

図中の左欄に示される複数の単語がソート対象単語であ
るとするとき、図示右欄に示す如く正しく配列される。
このためには、第８図および第９図に関連して説明した
如く、文字属性辞書６に存在する文字則ち第２水準の文
字を索引した場合には、読みに対応する数値コードが奇
数値をもっていたとき、当該コードの値を−１した上で
重子音情報城にビット「１」を立てて使用するようにす
る。When a plurality of words shown in the left column of the figure are words to be sorted, they are correctly arranged as shown in the right column of the figure.
For this purpose, as explained in connection with FIGS. 8 and 9, when characters existing in the character attribute dictionary 6, that is, characters of the second level, are indexed, the numerical code corresponding to the reading is odd. When the code has a numerical value, it is used by subtracting the value of the code by 1 and setting bit "1" in the double consonant information field.

第１２図は、頭音規則を適用しなかった場合と適用した
場合との、本発明によるソート結果を示している。FIG. 12 shows the sorting results according to the present invention when the initialization rule is not applied and when it is applied.

図示左欄に示される複数の単語がソート対象単語である
としたとき、頭音規則を適用しなかった場合には、図示
右欄上段に示す如く配列されるが、頭音規則を適用した
場合には、図示右欄下段に示す如く配列される。以上説
明した如く、本発明においては、韓国語単語を読みに対
応した形で配列する処理を行なうことが可能となる。When the words shown in the left column of the illustration are the words to be sorted, if the initial sound rule is not applied, they will be arranged as shown in the upper row of the right column of the illustration, but if the initial sound rule is applied. are arranged as shown in the lower row of the right column in the figure. As explained above, according to the present invention, it is possible to perform a process of arranging Korean words in a form corresponding to their pronunciation.

そして、第１水準の文字を高速アクセス可能なソート用
文字属性ロード・モジュール内に用意することによって
、読みに対応する数値コードの高速度で生成することが
可能となる。更に各ハングル文字に対応して、頭音規則
を適用する場合の読みに対応する数値コードと、適用し
ない場合の数値コードとを用意しているので、頭音規則
を適用した場合でも正しく配列することが可能となる。
そして、頭音規則の適用対象外のハングル文字について
も、上述の読み数値コード城にコードを格納しているの
で、頭音規則を適用してソートするに当つては、単語の
先頭に来た文字について上記読み数値コード城の値を抽
出するように制御を簡単化することができる。By preparing the first level characters in a character attribute load module for sorting that can be accessed at high speed, it becomes possible to generate numerical codes corresponding to readings at high speed. Furthermore, for each Hangul character, we have prepared a numerical code that corresponds to the pronunciation when the initialization rule is applied, and a numerical code that corresponds to the reading when the initialization rule is not applied, so even when the initialization rule is applied, the characters can be arranged correctly. becomes possible.
Even for Hangul characters that are not subject to the initialization rule, the codes are stored in the above-mentioned pronunciation value code castle, so when sorting by applying the initialization rule, if the character is at the beginning of a word, Control can be simplified by extracting the value of the above-mentioned reading numerical code for characters.

[Brief explanation of the drawing]

第１図および第２図は本発明の前提問題を説明する説明
図、第３図は本発明システムの一実施例全体構成図、第
４図ＡないしＥは本発明に用いる数値コードの態様を説
明する説明図、第５図は第３図図示の文字属性辞書に格
納される数値コード一実施例態様、第６図は第３図図示
のソート用文字属性ロード・モジュールに格納される数
値コードの一実施例態様、第７図は与えられたソート対
象単語に対応して数値コードが附与されたソート用デー
タを説明する説明図、第８図は上記文字属性辞書の一実
施例態様、第９図は上記ソート用文字属性ロード・モジ
ュールの一実施例態様、第１０図ＡないしＤは本発明に
いう頭音規則を説明する説明図、第１１図ＡないしＣは
韓国語に存在する重子音の取扱いを説明する説明図、第
１２図は重子音を正しく取扱った結果において得られる
ソート結果を説明する説明図、第１３図は頭音規則を適
用しない場合と適用した場合とのソート結果を説明する
説明図を示す。図中、１は単語、２は数値コード生成部、３はソート処
理部、４はソート用データ、５はソート結果、６は文字
属性辞書、７はソート用文字属性ロード・モジュールを
表わす。第１図第２図第３図第４図第５図第６図第７図第９図第８図第１０図第１１図第１２図第１３図Figures 1 and 2 are explanatory diagrams explaining the prerequisite problems of the present invention, Figure 3 is an overall configuration diagram of an embodiment of the system of the present invention, and Figures 4 A to E show aspects of the numerical code used in the present invention. An explanatory diagram to explain, FIG. 5 is an embodiment of the numerical code stored in the character attribute dictionary shown in FIG. 3, and FIG. 6 is a numerical code stored in the character attribute load module for sorting shown in FIG. 3. One embodiment of the character attribute dictionary, FIG. 7 is an explanatory diagram illustrating sorting data in which a numerical code is assigned corresponding to a given word to be sorted, and FIG. 8 is an embodiment of the above-mentioned character attribute dictionary. FIG. 9 shows an embodiment of the character attribute load module for sorting, FIGS. 10A to D are explanatory diagrams explaining the initialization rules of the present invention, and FIGS. 11A to C exist in Korean. Figure 12 is an explanatory diagram explaining the handling of double consonants, Figure 12 is an explanatory diagram explaining the sorting results obtained when handling double consonants correctly, Figure 13 is sorting when the initialization rule is not applied and when it is applied. An explanatory diagram explaining the results is shown. In the figure, 1 is a word, 2 is a numerical code generation section, 3 is a sort processing section, 4 is data for sorting, 5 is a sorting result, 6 is a character attribute dictionary, and 7 is a character attribute load module for sorting. Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Figure 9 Figure 8 Figure 10 Figure 11 Figure 12 Figure 13

Claims

[Claims] 1. Word strings containing Hangul characters to which predetermined codes are assigned to Hangul characters, Chinese characters, and alphanumeric characters, respectively, are ordered in an order corresponding to the readings of the Hangul characters and Chinese characters. In a Korean sorting control system that uses the given code to order, at least the reading of the character and the number of strokes given corresponding to each of the above Hangeul characters, Kanji, and alphanumeric characters are numerically coded. In addition to providing a character attribute dictionary for converting and storing the character attribute dictionary, the character attribute dictionary for sorting or a character attribute load module for sorting prepared from the character attribute dictionary is provided for each character included in the word corresponding to each word in the word string. A numeric code generation section that indexes and generates a numeric code corresponding to the word is provided, and based on the numeric code generated by the numeric code generation section, each word in the word string is arranged in ascending order of the numeric code. and/or a sorting processing section for arranging in descending order, and corresponding words are extracted from each numerical code of a plurality of numerical code strings as a result of the arrangement by the sorting processing section. Korean sorting control system. 2 The character attribute dictionary and/or the character attribute load module for sorting are provided with on-yomi numeric codes corresponding to the Kanji in the words, and readings when the initialization rule is not applied corresponding to the Hangul characters. The Korean language sorting control system according to claim 1, characterized in that the Korean language sorting control system has a numeric code corresponding to the ``pronunciation'' and a numeric code corresponding to the pronunciation when the initialization rule is applied.