JPS61128366A

JPS61128366A - 'kana'/'kanji' converter

Info

Publication number: JPS61128366A
Application number: JP59251204A
Authority: JP
Inventors: Hirokawa Hayashi; 林　大川
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1984-11-28
Filing date: 1984-11-28
Publication date: 1986-06-16

Abstract

PURPOSE:To compress the occupying capacity of a memory by constituting the titled converter of n connection matrix list bodies obtained by extracting lines or rows having different arrangement of elements from a connection matrix list and registering the extracted lines or rows in each digit and n connection matrix list indexes indicating the positions of the elements in the bodies. CONSTITUTION:Each line in the connection matrix list divided into four rows is regarded as one record, the record is divided into an upper digit and a lower digit and an upper digit connection matrix list body registering only an upper digit different code having different arrangement of a bit string of the upper digits and a lower digit connection matrix body registering only a lower digit different bit string having the different arrangement of the lower digit bit string are registered in each digit in the order of the connection matrix list. In addition, a lower digit connection matrix list index and an upper digit connection matrix list index are formed individually for the lower digit connection matrix list body and an upper digit connection matrix list body. Thus, the occupying capacity of the memory can be reduced sharply and retrieval can be executed easily.

Description

【発明の詳細な説明】技術分野本発明は、カナ漢字変換装置に関し、特に、単語間の接
続性を示す接続行列表を圧縮するのに好適なカナ漢字変
換装置に関する。TECHNICAL FIELD The present invention relates to a kana-kanji conversion device, and more particularly to a kana-kanji conversion device suitable for compressing a connection matrix table showing connectivity between words.

従来技術カナ漢字変換装置における入力方式としては。Conventional technology As an input method for a kana-kanji conversion device.

■単語単位方式、■漢字部指定方式、■文節単位方式、
■べた書き方式等があるが、■単語単位方式以外の、■
漢字部指定方式、■文節単位方式。■Word unit method, ■Kanji section designation method, ■Phrase unit method,
■There are solid writing methods, etc., but ■other than the word-by-word method,■
Kanji section designation method, ■Bunsetsu unit method.

■べた書き方式等においては、入力文＆ζ対して文法解
析を行う必要がある。この文法解析の際、単語間の接続
可能性を判定する必要があり、そのために単語間の接続
情報を表わす接続検定機が広く使用されている。■In the solid writing method, etc., it is necessary to perform grammatical analysis on the input sentence &ζ. During this grammar analysis, it is necessary to determine the possibility of connection between words, and for this purpose, connection testers that display connection information between words are widely used.

接続検定機は通常、行列の形式で示され、接続行列表と
呼ばれている。A connectivity tester is usually represented in the form of a matrix and is called a connectivity matrix table.

第７図は、従来の一般的な接続行列表を示す図である。FIG. 7 is a diagram showing a conventional general connection matrix table.

第７図に示すように、ｒ前の語」の項目にはｒ前の語」
の品詞を配し、「後の語」の項目ｌ；は「後の語」の品
詞を配して、ｒ前の語」と「後の語」の接続性は、ｒ前
の語」と「後の語」の品詞レベルでの接続性を判断して
いる。なお、第７図において、接続値゛０′は接続不能
であることを示し、接続値’１’、’２’、３′は接続
可能であることを示し、接続値が大きいほど接続の確率
が高いことを示している０例えば、動詞ｌの後に動詞１
が接続することは有り得ない、また１名詞１の後には、
動詞１，２，３．名詞１，２．助詞１，２が接続し得る
が、接続する確率の最も高いのは接続値゛３′の助詞１
であり１次位が接続値゛２′の助詞２であり、第３位は
接続値゛ｌ′の動詞１，２，３、名詞１，２である。As shown in Figure 7, the item ``The word before r'' is filled with ``The word before r''.
The part of speech of the "later word" item l; is assigned the part of speech of the "later word", and the connectivity between the "r previous word" and the "later word" is the "r previous word". The connectivity of the "later word" at the part-of-speech level is determined. In Fig. 7, the connection value '0' indicates that connection is impossible, and the connection values '1', '2', and 3' indicate that connection is possible, and the larger the connection value, the higher the probability of connection. For example, after the verb l, the verb 1
It is impossible for these to be connected, and after 1 noun 1,
Verb 1, 2, 3. Nouns 1, 2. Particles 1 and 2 can connect, but particle 1 with connection value ``3'' has the highest probability of connecting.
The first rank is the particle 2 with the connective value "2', and the third rank is the verbs 1, 2, 3 and nouns 1, 2 with the connective value "l'.

従来、接続行列表は、ｒＮＨＫ技術研究」第２５巻、第
５号に掲載された論文、相沢、江原「計算機におけるカ
ナ漢字変換ＪＰＰ２６１〜２９８に示されているように
、２５６行×１２８列程度のものが用いられているが、
これを単純に表形式で記憶すると、単語と単語の接続の
可否を１ビツト（０，１）で表す場合でも２５６Ｘ１２
８＝３２７６８ビツト−４０９６バイトで約４にバイト
もの記憶容量を必要とする。この接続行列表は１通常、
主記憶上に置かれるので、主記憶占有量が問題となって
いる。さらに単語の品詞分類を細分化しく３４０行×２
５６列）、接続の可否も０．１ではなく、上記第７図の
ように、接続の強さを表す接続重み、または接続確率（
例えば０，１，２．３の２ビツトで表す）で示す場合に
は、３４０Ｘ２５６Ｘ２＝１７４０８０！２１７６Ｘ８
＝２１７６バイトで２２にバイトもの記憶容量を必要と
する。Conventionally, the connection matrix table is about 256 rows x 128 columns, as shown in the paper published in rNHK Technical Research, Volume 25, No. 5, Aizawa, Ehara, "Kana-Kanji Conversion on Computers JPP 261-298". are used, but
If you simply memorize this in a table format, even if you express the connection between words with 1 bit (0, 1), it will be 256 x 12
8 = 32768 bits - 4096 bytes, which requires approximately 4 bytes of storage capacity. This connection matrix table is usually 1,
Since it is placed on the main memory, the main memory occupancy is a problem. Furthermore, the part-of-speech classification of words is further subdivided into 340 lines x 2
56 column), the possibility of connection is not 0.1, but as shown in Figure 7 above, the connection weight representing the strength of connection, or the connection probability (
For example, if it is expressed as 2 bits 0, 1, 2.3), then 340X256X2=174080!2176X8
= 2176 bytes, which requires a storage capacity of 22 bytes.

従来、この接続行列表を圧縮するため。Traditionally, to compress this connectivity matrix table.

■単語の種類によっては接続する単語の範囲が・限られ
、接続しないＯの部分がかたまっているので、単語を適
当に大分類して（例えば体言、助詞類、助動詞類等）接
続行列表を分割し、容量を少なくする方法。■Depending on the type of word, the range of words that can be connected is limited, and the O parts that do not connect are clustered, so categorize the words appropriately (e.g., nominal, particles, auxiliary verbs, etc.) and create a connection matrix table. How to divide and reduce capacity.

＋３０でない要素だけを集めたノンゼロ要素表を用いる
方法１行列を幾つかのブロックに分け、要素がＯのみで
あるブロックは記憶しない等の、０要素の多い行列を扱
う一般的な手法、が採られていた。Method 1 Using a non-zero element table that collects only elements that are not +30 A general method for handling matrices with many 0 elements, such as dividing a matrix into several blocks and not storing blocks whose only element is 0, is adopted. It was getting worse.

しかし、■の方法は１表の数が多くなり扱、いが複雑に
なるうえ、それほど大きな圧縮効果が得られず、さらに
、単純に分割しにくい例外的な単語の扱いが面倒である
。■の方法では１元の表に再構成する手続が面倒である
。　　　　“目　　　　　的本発明の目的は、上記のような従来技術の問題点を解決
し、メモリ占有量を大幅に圧縮し、かつ容易に検索し得
る接続行列表を備えたカナ漢字変換装置を提供すること
にある。However, method (2) requires a large number of tables, making it complicated to handle, and does not provide a very large compression effect.Furthermore, it is troublesome to handle exceptional words that are difficult to divide. In method (2), the procedure for reconfiguring the table into a one-element table is troublesome. “Objective” The object of the present invention is to solve the problems of the prior art as described above, to significantly reduce the amount of memory occupied, and to provide a kana-kanji conversion device equipped with a connection matrix table that can be easily searched. There is a particular thing.

構　　　成上記目的を達成するため１本発明の構成は、単語辞書、
単語間の接続情報を示す接続行列表を有し１表音文字に
て入力された文字列に対し、前記単語辞書、要素が２進
ｎ桁で表現された接続行列表を用いてカナ漢字変換処理
を行うカナ漢字変換装置において、前記接続行列表を行
または列単位で複数個のブロックに分割した場合に各ブ
ロックに形成される各行または各列の中から、要素を桁
別に見たときにその要素の並び方が異なる行または列の
みを抽出して桁別に登録したｎ個の接続重。Configuration In order to achieve the above object, the present invention includes a word dictionary,
It has a connection matrix table that shows connection information between words, and for character strings input as one phonetic character, kana-kanji conversion is performed using the word dictionary and a connection matrix table in which elements are expressed in n binary digits. In a kana-kanji conversion device that performs processing, when the connection matrix table is divided into multiple blocks in units of rows or columns, when elements are viewed by digit from each row or column formed in each block, n connection weights in which only rows or columns in which the elements are arranged differently are extracted and registered for each digit.

列表本体と、該ｎ個の接続行列表本体の各要素が前記接
続行列表のどの要素に該当するかを示すｎ個の接続行列
表索引とにより構成したことに特徴がある。The present invention is characterized in that it is composed of a sequence table main body and n connection matrix table indexes indicating to which element of the connection matrix table each element of the n connection matrix table bodies corresponds.

以下、本発明の構成を一実施例により詳細に説明する。Hereinafter, the configuration of the present invention will be explained in detail using an example.

第２図は１本発明の一実施例によるカナ漢字変換装置の
ブロック図である６第２図において、■は入力部、２は解析対象文字列作成
部、３は辞書検索部、４は単語辞書、５は接続可否検定
部、６は接続行列表、７は評価部、８はバックトラック
制御部、９は出力部である。FIG. 2 is a block diagram of a kana-kanji conversion device according to an embodiment of the present invention.6 In FIG. 5 is a dictionary, 5 is a connectivity test section, 6 is a connection matrix table, 7 is an evaluation section, 8 is a backtrack control section, and 9 is an output section.

第３図は第２図における単語辞書４の具体的な内容の一
例を示す図である。FIG. 3 is a diagram showing an example of specific contents of the word dictionary 4 in FIG. 2.

第３図に示すように、単語辞ＩＦ４には、「読み」、「
表記」、「品詞」、同音語選択に必要な「順位」が記載
しである。As shown in Figure 3, the word dictionary IF4 includes "yomi", "
"Orthography", "part of speech", and "rank" necessary for homophone selection are listed.

日本語による文章は１表音文字（平仮名１バ仮名、ロー
マ字）にて入力部ｌから入力され、辞書検索の対象とな
る解析対象文字列は、解析対象文字列作成部２により作
成される。作成された解析対象文字列は、その先頭から
辞書検索部３により単語辞書４が検索され、その「読み
」に対応する全ての変換候補が抽出される。A Japanese sentence is inputted from the input unit 1 using one phonetic character (one hiragana, one bakana, and the Roman alphabet), and an analysis target character string to be searched in a dictionary is created by an analysis target character string creation unit 2. The dictionary search unit 3 searches the word dictionary 4 from the beginning of the created character string to be analyzed, and all conversion candidates corresponding to the "yomi" are extracted.

接続可否検定部５は、辞書検索部３により抽出された変
換候補について、直前の変換済単語（変換結果）との接
続の可否を、接続行列表６をもとに検定し、接続可能な
変換候補があるか否かを検定する。The connectability test section 5 tests whether or not the conversion candidates extracted by the dictionary search section 3 can be connected to the immediately preceding converted word (conversion result) based on the connection matrix table 6, and determines connectable conversions. Test whether there are any candidates.

評価部７は、接続可能な変換候補について、順位、読み
長、接続の重み等をパラメータとする評価式を用いて評
価を行い、評価値の最も高い変換候補を変換結果として
、出力部９より出力する。The evaluation unit 7 evaluates the connectable conversion candidates using an evaluation formula whose parameters are rank, reading length, connection weight, etc., and outputs the conversion candidate with the highest evaluation value as the conversion result from the output unit 9. Output.

バックトラック制御部８は、辞書検索の結果、該当する
変換候補が１個も存在しない場合、および直前の変換済
単語（変換結果）に接続し得る変換候補が１個も存在し
ない場合は、前の解析が誤っている可能性があるので、
ただちに未登録語処理を行うことなく、直前での解析を
やり直す。As a result of the dictionary search, if there is no corresponding conversion candidate or if there is no conversion candidate that can be connected to the immediately previous converted word (conversion result), the backtrack control unit 8 The analysis may be incorrect, so
To redo the previous analysis without immediately processing unregistered words.

第１図は１本発明の一実施例による接続行列表の圧縮過
程を説明するための図である。FIG. 1 is a diagram for explaining the process of compressing a connection matrix table according to an embodiment of the present invention.

第１図（、）は圧縮前の接続行列表を示し、３４０行×
２５６列で、各要素は２ビツト（０、’ｌ　、　２　。Figure 1 (,) shows the connection matrix table before compression, with 340 rows x
There are 256 columns, each element is 2 bits (0, 'l, 2.

３の４段階）の情報を有する。It has information on 4 stages of 3).

第１図（ｂ）は、第１図（ａ）の接続行列表を縦（列単
位）に等分に４分割した、３４０行×６４列の４つの表
（ブロック）を示す図である０分割により得られた４つ
の表は、それぞれ行の長さが６４列で、各要素は２ビツ
トの情報を有する。FIG. 1(b) is a diagram showing four tables (blocks) of 340 rows and 64 columns, which are obtained by dividing the connection matrix table of FIG. 1(a) into four equal parts vertically (by column). The four tables obtained by the division each have a row length of 64 columns, and each element has 2 bits of information.

なお１図中の記号の、■、■、■は、分割によす得られ
た４つの表を識別するために、便宜上印したものである
。Note that the symbols ■, ■, and ■ in Figure 1 are marked for convenience in order to identify the four tables obtained by division.

第１図（ｅ）は本発明の一実施例による接続行列表索引
、および接続行列表本体を示す図である。FIG. 1(e) is a diagram showing a connection matrix table index and a connection matrix table body according to an embodiment of the present invention.

まず、第１図（ｂ）の４つの各機■〜（りの各行を１つ
のレコードとみなす、この場合、第１図（ｄ）に示した
ように、各レコードの要素は、０，１゜２．３の４段階
の情報を持ち、２進２桁”ｏｏ’。First, consider each line of each of the four machines (1) to (2) in Figure 1(b) as one record. It has 4 levels of information, ゜2.3, and has 2 binary digits "oo".

’０１’、”１０’、’１１’で表現されているので、
上記レコードを上位桁と下位桁に分けて見ると、上位術
、下位桁には、それぞれビット列の並び方が同じものが
存在する。例えば、第１図（ｄ）の「桁別２進１指ビツ
ト列」欄では、■に下位桁、■に上位桁を示しているが
１本図番；おいて、第３レコードと第３レコードの下位
桁は、ともに００１０１１０１・・・であり、同じであ
る。そこで。Since it is expressed as '01', '10', '11',
If we divide the record into upper and lower digits, the upper and lower digits each have the same arrangement of bit strings. For example, in the "binary one-finger bit string by digit" column in Figure 1(d), ■ indicates the lower digit and ■ indicates the upper digit; The lower digits of both records are 00101101... and are the same. Therefore.

上位術のビット列の並び方が異なるもの（これを、上位
桁異なりビット列と呼ぶ）、　および下位桁のビット列
の並び方が異なるもの（これを、下位桁異なりビット列
と呼ぶ）のみを、それぞれ桁別に。Only those in which the bit strings of the upper digits are arranged differently (this is called a bit string with different upper digits) and the bit string with a different arrangement of lower digits (this is called a bit string with different lower digits) are separated by digit.

第１図（ｂ）の表■〜■の順に登録する。上位桁異なり
ビット列のみを登録したものを上位桁接続行列表本体、
下位桁異なりビット列のみを登録したものを下位桁接続
行列表本体と呼ぶ。下位桁接続行列表本体、上位桁接続
行列表本体は、それぞれのレコード位置を指標する下位
桁接続行列表索引。Register in the order of tables 1 to 2 in FIG. 1(b). The upper digit connection matrix table itself is the one in which only the bit strings with different upper digits are registered,
A table in which only bit strings with different lower digits are registered is called a lower digit connection matrix table body. The lower digit connection matrix table body and the upper digit connection matrix table body are lower digit connection matrix table indexes that index the respective record positions.

下位桁接続行列表索引を個別に有する。It has a separate lower digit connection matrix table index.

このようにした場合、実測結果では、下位桁異なりビッ
ト列は５４３個、上位桁異なりビット列は１８２個であ
った。したがって、第１図の方法で圧縮すると、第１１
１（ａ）では３４０行×２５６列×２ビット＝２１７６
０バイトであるのに対し、第１図（Ｃ）では、下位術接続行列表本体二６４列×下位桁異なりビット列
数（５４３行）ｘｌビット＝４３４４バイト上位術接続行列表本体二６４列×上位桁異なりビット列
数（１８２行）ｘｉビット＝１４５６パイトとなる、また、下位桁接続行列表索引を２バイト。In this case, the actual measurement results showed that 543 bit strings differed in the lower digits, and 182 bit strings differed in the upper digits. Therefore, when compressed using the method shown in Figure 1, the 11th
In 1(a), 340 rows x 256 columns x 2 bits = 2176
In contrast, in Figure 1 (C), the lower order connection matrix table body 264 columns x number of bit strings with different lower digits (543 rows) x l bits = 4344 bytes the upper order connection matrix table body 264 columns x The number of bit strings differs in the upper digits (182 rows) xi bits = 1456 bytes, and the lower digit connection matrix table index is 2 bytes.

上位桁接続行列表索引を１バイトで表現すれば、下位桁
接続行列表索引：３４０行×４個×２バイト＝２７２０
バイト上位桁接続行列表索引：３４０行×４個×１バイト＝１
３６．０バイトとなり、接続行列表全体では、４３４４バイト＋１４５
６バイト＋２７２０バイト＋１３６０バイト＝９８８０
バイトで約９．９バイトとなり約１／２に圧縮すること
ができる。If the upper digit connection matrix table index is expressed in 1 byte, the lower digit connection matrix table index: 340 rows x 4 pieces x 2 bytes = 2720
Byte upper digit connection matrix table index: 340 rows x 4 pieces x 1 byte = 1
36.0 bytes, and the entire connection matrix table is 4344 bytes + 145
6 bytes + 2720 bytes + 1360 bytes = 9880
It is approximately 9.9 bytes, and can be compressed to approximately 1/2.

第４図は、第１図（Ｃ）に示した下位桁、上位桁接続行
列表索引を用いて下位桁、上位桁接続行列表本体を検索
する際の接続可否検定部の処理フローを示す図であ、る
０、− 前の語の「品詞」を示すコード（単語辞書４から得られ
る）から、圧縮前の仮想的な接続行列表（正規の接続行
列表）における行アドレスをセットする（４０１）、次
に、後の語の「品詞」を示すコードから、圧縮前の仮想
的な接続行列表における列アドレスをセットする（４０
２）。FIG. 4 is a diagram showing the processing flow of the connectability verification unit when searching the main body of the lower digit and upper digit connection matrix table using the lower digit and upper digit connection matrix table index shown in FIG. 1(C). , 0, - Set the row address in the virtual connection matrix table (regular connection matrix table) before compression from the code indicating the "part of speech" of the previous word (obtained from the word dictionary 4) ( 401), then set the column address in the virtual connection matrix table before compression from the code indicating the "part of speech" of the next word (40
2).

この行２列アドレスから、本実施例による下位桁接続行
列表索引、上位桁接続行列表索引それぞれの行アドレス
、および下位桁接続行列表本体。From this row and two column addresses, the row addresses of the lower digit connection matrix table index and the upper digit connection matrix table index according to this embodiment, and the lower digit connection matrix table main body.

上位桁接続行列表本体それぞれの列アドレスを求める（
４０３）。Find the column address of each upper digit connection matrix table body (
403).

今、正規の接続行列表において、前の語の位置を示す行
アドレスをｉ、後の語の位はを示す列アドレスをｊとす
る。この場合、ｊ／６４の商の第１位をｎとすると、後
の語は、第１図（ｂ）の（ｎ、＋１）表に属することと
なる。したがって、下位桁接続行列表索引、下位桁接続
行列表索引の対応する行アドレスＰＬ＋Ｐ２は、ＰＬまたはｐ　２　＝　ｉ　＋　ｎ　Ｘ　３４０　　　
　　　（１）により求めることができる。Now, in the regular connection matrix table, let i be the row address indicating the position of the previous word, and j be the column address indicating the position of the subsequent word. In this case, if the first place of the quotient of j/64 is n, the following words belong to the (n, +1) table in FIG. 1(b). Therefore, the corresponding row address PL+P2 of the lower digit connection matrix table index and the lower digit connection matrix table index is PL or p 2 = i + n X 340
It can be obtained from (1).

一方、後の語の位はを示す列アドレスｊＴこ対応する下
位桁接続行列表本体、下位桁接続行列表本体の列アドレ
ス（１１＋’１２は。On the other hand, the column address jT indicating the digit of the next word is the corresponding lower digit connection matrix table body, and the column address of the lower digit connection matrix table body is (11+'12).

（Ｉｌ＊たはＣＩｚ＝ｊ−ｎＸ６４　　　　　’　　（
２）により求めることができる。(Il* or CIz=j-nX64' (
2).

前の語に対応する下位桁接続行列表索引、下位桁接続行
列表索引の行アドレスＰ１＋Ｐ２が得られると、それら
行アドレスＰＬ＋Ｐ２により、下位桁接続行列表本体、
上位桁接続行列表本体との接続番号をそれぞれ認識する
ことができるので（４０４）、認識した接続番号に相当
する下位桁接続行列表本体、上位桁接続行列表本体の行
をそれぞれ検索し、上記列アドレスｑｌ＋’１２との交
点よりそれぞれ１ビツトの情報を得る（４０５）。２個
の１ビツトの情報を得た後、これらにより２進２術の情
報を生成し、これを接続値とする（４０６）。When the row addresses P1+P2 of the lower digit connection matrix table index and the lower digit connection matrix table index corresponding to the previous word are obtained, the lower digit connection matrix table main body,
Since each connection number with the upper digit connection matrix table body can be recognized (404), the rows of the lower digit connection matrix table body and the upper digit connection matrix table body corresponding to the recognized connection numbers are searched, respectively, and the above One bit of information is obtained from each intersection with column address ql+'12 (405). After obtaining the two pieces of 1-bit information, binary information is generated from them, and this is used as a connection value (406).

このように、本実施例では、簡単な手続により元の表（
正規の接続行列表）に再構成することができる。In this way, in this example, the original table (
can be reconstructed into a regular connection matrix table).

第５図は、第２の実施例を説明するための図である。本
実施例は、接続行列表を４分割して下位桁、上位桁ごと
に見た場合の、異なりビット列のみを下位桁接続行列表
本体、上位桁接続行列表本体にそれぞれ登録した点は、
第１図と同じである。FIG. 5 is a diagram for explaining the second embodiment. In this embodiment, when the connection matrix table is divided into four parts and viewed by lower digits and upper digits, the difference is that only the bit strings are registered in the lower digit connection matrix table body and the upper digit connection matrix table body, respectively.
Same as Figure 1.

第１図と異なるのは、下位桁接続行列表本体を２５６レ
コードの頁単位に分割することで、下位桁接続行列表索
引のルコードを、１０ビツト（本体ページ選択ビット２
ビツト十ページ内アドレス８ビツト）で表現し、下位桁
接続行列表索引の圧縮化を図った点である。The difference from Figure 1 is that the lower digit connection matrix table body is divided into pages of 256 records, and the lower digit connection matrix table index code is set to 10 bits (main page selection bit 2).
The key point is that the lower digit connection matrix table index is compressed by expressing it with 8 bits of address within 10 bits page).

もっとも、本実施例においては、１頁を、２５６レコー
ドとしたため第１図のように下位桁接続行列表本体の総
行数が５４３行の場合、実質的には第３頁の第３１行ま
でに全ての情報が格納され、それ以降は空きとなる。However, in this embodiment, one page is made up of 256 records, so if the total number of rows in the lower digit connection matrix table is 543 as shown in FIG. All information is stored in , and after that it becomes empty.

この方法によれば、下位桁接続行列表索引は、１０ビッ
ト×３４０行×４個＝１７００バイトとなり、上位桁接
続行列表索引の１３６０バイト。According to this method, the lower digit connection matrix table index is 10 bits x 340 rows x 4 = 1700 bytes, and the upper digit connection matrix table index is 1360 bytes.

下位桁接続行列表本体の４３４４バイト、上位桁接続行
列表本体の１４５６バイトと合わせると。Combined with 4344 bytes of the lower digit connection matrix table body and 1456 bytes of the upper digit connection matrix table body.

接続行列表全体を約８．９にバイトにすることができ、
メモリ容量を第１の実施例よりさらに約ＩＫバイ１−削
減することができる。The entire connection matrix table can be reduced to approximately 8.9 bytes,
The memory capacity can be further reduced by approximately IKby1 compared to the first embodiment.

第６図は、第３の実施例による接続行列表を示す図であ
る。FIG. 6 is a diagram showing a connection matrix table according to the third embodiment.

第１図に示した下位桁接続行列表本体の下位桁異なりビ
ット列と、上位桁接続行列表本体の上位桁具なりビット
列との間には、同一のビット列が存在する。そこで、本
実施例では、この同一のビット列の重複を避け、共通接
続行列表本体に一括して登録した。The same bit string exists between the lower digit different bit string of the lower digit connection matrix table shown in FIG. 1 and the upper digit bit string of the upper digit connection matrix table shown in FIG. Therefore, in this embodiment, the same bit strings are registered all at once in the main body of the common connection matrix table to avoid duplication.

すなわち、共通接続行列表本体の前部に、まず上位桁具
なりビット列を格納し、後部に、下位桁具なりビット列
と上位桁具なりビッート列との同一のビット列を除外し
た残りの下位桁具なりビット列を格納した。なお、上記
格納方法とは逆に、共通接続行列表本体の前部に、まず
下位桁具なりビット列を格納し、後部に、下位桁具なり
ビット列と上位桁具なりビット列との同一のビット列を
除外した残りの上位桁具なりビット列を格納することも
できる。In other words, the upper digits or bit strings are stored first in the front part of the common connection matrix table body, and the remaining lower digits are stored in the rear part after excluding the same bit strings between the lower digits or bit strings and the upper digits bit strings. A bit string was stored. In addition, contrary to the above storage method, the lower digit bit string is first stored in the front part of the common connection matrix table body, and the same bit string of the lower digit bit string and the upper digit bit string is stored in the rear part. It is also possible to store the excluded remaining high-order digits or bit strings.

実測結果によると、同一のビット列は３０個存在し、接
続行列表本体の総行数は６９５行となフた。したがって
、共通接続行列表本体のメモリ容量は、６４列×６９５
行×１ビット＝５５６０バイトとなる。下位桁接続行列
表索引を第２の実施例のように圧縮しく１７００バイト
）、上位桁接続行列表索引のルコードを１バイトで表現
した場合（１３６０バイト）、本実施例による接続行列
表全体のメモリ容量は８６２０バイトとなり、第２の実
施例よりさらにメモリ容量を低減することができる。According to actual measurement results, there were 30 identical bit strings, and the total number of rows in the main body of the connection matrix table was 695 rows. Therefore, the memory capacity of the common connection matrix table body is 64 columns x 695
Row x 1 bit = 5560 bytes. When the lower digit connection matrix table index is compressed to 1700 bytes as in the second embodiment, and the code of the upper digit connection matrix table index is expressed in 1 byte (1360 bytes), the entire connection matrix table according to this embodiment The memory capacity is 8620 bytes, making it possible to further reduce the memory capacity compared to the second embodiment.

なお、上記各実施例は、正規の接続行列表を４分割した
例であったが９本発明は４分割に限定されることなく１
分割数は自由である。In addition, although each of the above embodiments was an example in which a regular connection matrix table was divided into four, the present invention is not limited to four divisions.
The number of divisions is free.

また、接続行列表の要素が０〜３の２進２桁の例で説明
したが、２進２桁以上の多値で示される要素の場合も同
様に１桁ごとの具なりビット列に、注目して上記各実施
例と同様の手法により接続行列表を構成することにより
、接続行列表に必要なメモリ容量を大幅に低減すること
ができる。さらに、列単位で分割する例により説明した
が１行単位で分割した場合にも同様の効果を得ることが
できる。また、メモリ占有量が問題となるシステムでは
、下位術、上位桁接続行列表本体を外部ファイルとし、
内部メモリ上の下位桁、上位桁接続行列表索引により検
索することも可能である。もちろん、下位桁、上位桁接
続行列表本体、下位桁。In addition, although we have explained the example in which the elements of the connection matrix table are two binary digits from 0 to 3, when the elements are multi-valued with two or more binary digits, pay attention to the bit string of each digit in the same way. By configuring the connection matrix table using the same method as in each of the above embodiments, the memory capacity required for the connection matrix table can be significantly reduced. Further, although the explanation has been given using an example of dividing on a column-by-column basis, the same effect can be obtained when dividing on a row-by-row basis. In addition, in systems where memory occupancy is a problem, the lower order and upper digit connection matrix tables can be stored as external files.
It is also possible to search by indexing the lower digit and upper digit connection matrix table in the internal memory. Of course, the lower digits, the upper digit connection matrix table body, and the lower digits.

上位桁接続行列表索引の両方を外部ファイルとすること
も可能である。さらに、上記各実施例は、べた書き入力
方式のカナ漢字変換装置に適用した例であったが、本発
明は、漢字部指定方式、文節単位方式のカナ漢字変換装
置にも適用し得ることは言うまでもない。It is also possible to use both the upper digit connection matrix table indexes as external files. Further, although each of the above embodiments was applied to a kana-kanji conversion device using a solid writing input method, the present invention can also be applied to a kana-kanji conversion device using a kanji part specification method or a bunsetsu unit method. Needless to say.

効　　　果以上説明したように、本発明のカナ漢字変換装置によれ
ば、メモリ占有量を大幅に圧縮し、かつ容易に検索し得
る接続行列表を実現することが可能となる。Effects As explained above, according to the kana-kanji conversion device of the present invention, it is possible to significantly reduce the amount of memory occupied and to realize a connection matrix table that can be easily searched.

[Brief explanation of drawings]

第１図は本発明の一実施例による接続行列表を説明する
ための図、第２図は第１図を適用したカナ漢字変換装置
のブロック図、第３図は第２図における単語辞書の一例
を示す図、第４図は第２図における接続可否検索部の処
理フローを示す図、第５図は本発明の第２の実施例によ
る接続行列表を説明するための図、第６図は本発明の第
３の実施例による接続行列表を示す図、第７図は従来の
一般的な接続行列表を示す図である。に入力部、２：解析対象文字列作成部、３：辞書検索部
、４：単語辞書、５：接続可否検定部、６：接続行列表
、７：評価部、８：バックトラック制御部、９：出力・
部。第１図図第　　　　　１　　　　　図（ｄ）第２図第　　　３　　　図第　　　４　　　図 ■ 第　　　　６　　　　図第　　　　７　　　　図Fig. 1 is a diagram for explaining a connection matrix table according to an embodiment of the present invention, Fig. 2 is a block diagram of a kana-kanji conversion device to which Fig. 1 is applied, and Fig. 3 is a diagram of a word dictionary in Fig. 2. A diagram showing an example, FIG. 4 is a diagram showing the processing flow of the connection possibility search unit in FIG. 2, FIG. 5 is a diagram for explaining the connection matrix table according to the second embodiment of the present invention, and FIG. 7 is a diagram showing a connection matrix table according to the third embodiment of the present invention, and FIG. 7 is a diagram showing a conventional general connection matrix table. Input section, 2: Analysis target character string creation section, 3: Dictionary search section, 4: Word dictionary, 5: Connectivity test section, 6: Connection matrix table, 7: Evaluation section, 8: Backtrack control section, 9 :output·
Department. Figure 1 Figure 1 (d) Figure 2 Figure 3 Figure 4 ■ Figure 6 Figure 7

Claims

[Claims]

(1) It has a word dictionary and a connection matrix table that shows connection information between words, and for character strings input in phonetic characters, the word dictionary and a connection matrix table in which elements are expressed in n binary digits. In a kana-kanji conversion device that performs kana-kanji conversion processing using n connection matrix table bodies in which only rows or columns whose elements are arranged differently when viewed by digit are extracted and registered for each digit, and each element of the n connection matrix table bodies is 1. A kana-kanji conversion device comprising n connection matrix table indexes indicating to which element it corresponds.

(2) Part or all of the n connection matrix table bodies are divided into pages, and the connection matrix table index corresponding to the connection matrix table bodies divided into pages is the connection matrix table index corresponding to the divided connection matrix table bodies. 2. The kana-kanji conversion device according to claim 1, further comprising a page selection bit indicating a page number of the main body of the sequence table.

(3) The kana-kanji conversion device according to claim 1, wherein the n connection matrix table bodies are integrated while avoiding duplication of arrangement of elements.