JPH06175893A

JPH06175893A - Character code converting device and document retrieving device using the same

Info

Publication number: JPH06175893A
Application number: JP4326423A
Authority: JP
Inventors: Katsumi Tada; 勝己多田; Hisamitsu Kawaguchi; 川口　　久光; Kanji Kato; 寛次加藤; Atsushi Hatakeyama; 敦畠山; Masatsugu Shinozaki; 雅継篠崎
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1992-12-07
Filing date: 1992-12-07
Publication date: 1994-06-24
Anticipated expiration: 2018-10-14
Also published as: JP3455981B2

Abstract

PURPOSE:To provide the character code converting device and character string retrieving device for performing collation processing at high speed even concerning a text which one-byte characters and two-byte characters are mixed. CONSTITUTION:This device is provided with plural character code fetching means 410a and 410b (windows A and B) for fetching the bytes of characters two by two in the state of shifting them one by one, character code selecting means 420 for selecting a character code to be outputted based on the preceding selected result and the code converted result from among character codes, and code converting means 430 for discriminating a character type showing whether the character code is the one-byte character or the two-byte character and for converting this character type to the bit compressed character code system.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は，情報処理システム，特
に情報検索システムにおけるフルテキストサーチに係
り，１バイト文字コードと２バイト文字コードが混在す
るようなテキストに対しても高速な文字列照合を実現す
る文字コード変換装置および文書検索装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a full-text search in an information processing system, particularly an information search system, and a high-speed character string collation even for a text in which a 1-byte character code and a 2-byte character code are mixed. The present invention relates to a character code conversion device and a document search device that realize the above.

【０００２】[0002]

【従来の技術】情報処理システムの分野では，文字列デ
ータからなる文書（以後，テキストと呼ぶ）群の中か
ら，検索者が指定したある特定の文字列（以後，検索タ
ームと呼ぶ）を含む全ての文書を探し出すことが一つの
重要な処理となっている。2. Description of the Related Art In the field of information processing systems, a certain character string (hereinafter, referred to as a search term) designated by a searcher is included from a group of documents (hereinafter, referred to as text) including character string data. Finding all the documents is one important process.

【０００３】このような検索システムを実現するための
文書検索装置がいくつか提案されている。その中の代表
的な文書検索装置の構成を図２に示し，その内容につい
て説明する。Several document search devices have been proposed to realize such a search system. The structure of a typical document retrieval device among them is shown in FIG. 2, and the contents will be described.

【０００４】（エルエーホラー：“ハードウェア
システムズフォーテキストインフォメーション
リトリーバル”，エーシーエム，エスアイジ
ーアイアール，第６回コンファレンス１９８３
年，Ｌ．Ａ．Ｈｏｌｌａａｒ：“Ｈａｒｄｗａｒｅ
ｓｙｓｔｅｍｓｆｏｒＴｅｘｔＩｎｆｏｒｍａｔ
ｉｏｎＲｅｔｒｉｅｖａｌ”，ＡＣＭＳＩＧＩＲ６
ｔｈＣｏｎｆｅｒｅｎｃｅ１９８３）文書検索装置
１において検索制御手段１０１は検索装置全体の制御と
ホストコンピュータとの通信を行う。すなわち，ホスト
コンピュータから送られてくる検索要求２０１を受け付
け，これを解析し，文字列照合手段１０２と複合条件判
定手段１０３へ検索情報２０２として送出する。また，
検索制御手段１０１は記憶装置制御手段１０４を制御し
て，文字列記憶手段１０５に格納されたテキスト２０４
を文字列照合手段１０２へ読み出す。(LAA Horror: "Hardware Systems For Text Information Retrieval", ACM, SIG, IR, 6th Conference 1983
Year, L. A. Hollaar: “Hardware
systems for Text Informat
Ion Retrieval ”, ACM SIGIR6
(th Conference 1983) In the document search device 1, the search control means 101 controls the search device as a whole and communicates with the host computer. That is, the search request 201 sent from the host computer is accepted, analyzed, and sent as search information 202 to the character string collating means 102 and the composite condition judging means 103. Also,
The search control means 101 controls the storage device control means 104 to control the text 204 stored in the character string storage means 105.
Is read out to the character string collating means 102.

【０００５】文字列照合手段１０２は，テキスト２０４
の中に検索要求２０１に合致する文字列，すなわち検索
タームがあるかどうかを調べ，もし該当するものがあれ
ば，その検索タームに該当する文字列を識別する情報２
０５を複合条件判定手段１０３へ出力する。複合条件判
定手段１０３は該検索ターム識別情報２０５に対して，
検索要求２０１中に指示されたＡＮＤやＯＲで構成され
る論理条件などが満足されるか否かを調べる。指定され
た複合条件が満足される場合には，該当する文書の識別
情報を検索結果２０６としてホストコンピュータへ返送
する。The character string collating means 102 uses the text 204
Information 2 that checks whether or not there is a character string that matches the search request 201, that is, a search term, and if there is a corresponding one, identifies the character string that corresponds to that search term 2
05 is output to the composite condition determination means 103. The compound condition judging means 103, for the search term identification information 205,
It is checked whether or not the logical condition composed of AND and OR specified in the search request 201 is satisfied. When the specified composite condition is satisfied, the identification information of the corresponding document is returned as the search result 206 to the host computer.

【０００６】上述した文書検索装置１の要となる文字列
照合手段１０２における文字列の照合方式としては，有
限オートマトンを用いて複数の検索タームを１回のテキ
スト走査で探索する方法が知られている。その代表的な
ものとして，Ａｈｏらが提案している方法がある。（エ
ー．ブイ．エーホアンドエム．ジェイ．コラッシッ
ク：“エフィシェントストリングマッチング”，コ
ミュニケーションズエーシーエム，第１８巻，第６
号，１９７５年，Ａ．Ｖ．ＡｈｏａｎｄＭ．Ｊ．Ｃｏ
ｒａｓｉｃｋ：“ＥｆｆｉｃｉｅｎｔＳｔｒｉｎｇ
Ｍａｔｃｈｉｎｇ”，ＣＡＣＭ，ＶＯＬ．１８，Ｎｏ．
６，１９７５）そして，このＡｈｏによるオートマトン
を用いて２バイトコードで表される文字（以後，２バイ
ト文字と呼ぶ）からなる入力テキストに対して高速に照
合処理を行うハードウェアとして特開昭６０−１０５０
４０号公報が提案されている。ここで述べられている文
字列照合回路について図３のブロック図を用いて説明す
る。As a character string collating method in the character string collating means 102, which is the key to the document retrieval apparatus 1 described above, a method of searching a plurality of retrieval terms by one text scanning using a finite automaton is known. There is. A typical method is the method proposed by Aho et al. (A.V.A.H.O. and M.J.Collasic: "Efficient String Matching", Communications ACM, Vol. 18, Vol. 6
No., 1975, A.S. V. Aho and M. J. Co
rasick: "Efficient String"
Matching ", CACM, VOL. 18, No.
6, 1975), and as hardware for performing high-speed collation processing on an input text consisting of a character represented by a 2-byte code (hereinafter referred to as a 2-byte character) by using the automaton by Aho. -1050
No. 40 publication is proposed. The character string collating circuit described here will be described with reference to the block diagram of FIG.

【０００７】本従来技術は文字コードレジスタ２１１と
２１３，切り換え回路２１２，状態遷移テーブル２２
０，状態番号レジスタ２１４と２５０，および判別回路
２６０から構成される。This prior art is based on the character code registers 211 and 213, the switching circuit 212, and the state transition table 22.
0, state number registers 214 and 250, and a discrimination circuit 260.

【０００８】以下，本従来技術の文字列照合動作につい
て概説する。The character string collating operation of this prior art will be outlined below.

【０００９】まず，初期設定として状態遷移テーブル２
２０に指定された検索タームを照合するためのオートマ
トンを格納する。さらにこのオートマトンの初期状態で
ある０を状態番号レジスタ２１４に設定する。First, as an initial setting, the state transition table 2
The automaton for matching the search term specified in 20 is stored. Further, 0, which is the initial state of this automaton, is set in the state number register 214.

【００１０】照合時動作は，テキスト２０４から２バイ
ト文字を１文字ずつ文字コードレジスタ２１１に取り込
むことから始まる。文字コードレジスタ２１１では取り
込まれた文字コードが上位バイト３００と下位バイト３
０１に分けられ，切り換え回路２１２によって上位，下
位の順で１バイトずつ交互にデータ線３０２を経由して
文字コードレジスタ２１３にセットされる。状態遷移テ
ーブル２２０は文字コードレジスタ２１３からの出力３
０３および現状態での状態番号（以後，現状態番号と呼
ぶ）を格納する状態番号レジスタ２１４からの出力３０
７をアドレスとしてアクセスされる。この文字コード３
０３と現状態番号３０７によってアドレッシングされる
状態遷移テーブル２２０の各エントリには遷移先の状態
番号，検索タームが照合されたことを表す特定の値，お
よび初期状態０への強制的な遷移を指示する特定の値の
いずれかが格納されており，状態番号３０４として出力
される。状態番号３０４は新たな状態番号３０５（以
後，次状態番号と呼ぶ）として状態番号レジスタ２５０
に保持される。判別回路２６０では状態番号レジスタ２
５０から出力される状態番号３０５を見て検索タームが
照合されたか否かを判定する。すなわち状態番号３０５
が検索タームの照合を表す特定の値の場合には，照合結
果２０５を出力する。特定の値でない場合には状態番号
３０５は新たな現状態番号３０６として状態番号レジス
タ２１４に格納される。またこの値が初期状態０への強
制的なリセットを表す値のときには新たな現状態番号３
０６として０が状態番号レジスタ２１４に格納される。The matching operation starts by fetching double-byte characters from the text 204 character by character into the character code register 211. In the character code register 211, the captured character code is the upper byte 300 and the lower byte 3
The data is divided into 01 and is alternately set to the character code register 213 via the data line 302 by 1 byte alternately in the order of high order and low order. The state transition table 220 is the output 3 from the character code register 213.
03 and the output 30 from the state number register 214 that stores the state number in the current state (hereinafter referred to as the current state number)
7 is accessed as an address. This character code 3
03 and the current state number 307, each entry of the state transition table 220 indicates the state number of the transition destination, a specific value indicating that the search term is matched, and a compulsory transition to the initial state 0. One of the specific values to be stored is stored and is output as the state number 304. The state number 304 is a new state number 305 (hereinafter referred to as the next state number), and the state number register 250
Held in. In the discrimination circuit 260, the status number register 2
It is determined whether or not the search term is collated by looking at the status number 305 output from 50. Ie state number 305
When is a specific value indicating the collation of the search term, the collation result 205 is output. If it is not a specific value, the state number 305 is stored in the state number register 214 as a new current state number 306. If this value represents a forced reset to the initial state 0, the new current state number 3
0 is stored in the state number register 214 as 06.

【００１１】以上の一連の動作が繰り返されることによ
り文字列照合動作が実現される。A character string collating operation is realized by repeating the above series of operations.

【００１２】以下，本従来技術の動作について具体例で
説明する。図４に示すオートマトンを例に用いる。The operation of this prior art will be described below with reference to a specific example. The automaton shown in FIG. 4 is used as an example.

【００１３】本図は入力テキストの中から検索タームと
して与えられた“海抜”，１６進数を用いたコード表現
では(0x3324)(0x4834)を照合するためのオートマトンを
示したものである。ここでは文字コードとしてＪＩＳ
(２バイト)コードを用いている。円はオートマトンの状
態を，矢印は状態遷移を表し，各矢印に付記された文字
コードによってこの遷移が引き起こされることを示して
いる。各円の内部に記された数値は，その状態の状態番
号を示す。状態０は，本オートマトンの初期状態であ
る。遷移が記述されていない入力文字コードに対して
は，全て初期状態０に遷移する。遷移にスラッシュ
“／”が重ね書きされている矢印２７０は“海抜”が照
合されたことを示す遷移を表す。すなわち，状態３にお
いて矢印２７０の遷移が発生した場合“海抜”が照合さ
れたことになる。This figure shows an automaton for collating (0x3324) (0x4834) in the code representation using "sea level", a hexadecimal number given as a search term from the input text. Here, the character code is JIS
(2 bytes) code is used. The circles represent the states of the automaton, the arrows represent the state transitions, and the character code attached to each arrow indicates that this transition is triggered. The number written inside each circle indicates the state number of that state. State 0 is the initial state of this automaton. All input character codes for which transitions are not described transit to the initial state 0. An arrow 270 in which a slash "/" is overwritten on the transition represents a transition indicating that "above sea level" is collated. That is, when the transition of the arrow 270 occurs in the state 3, "above sea level" is verified.

【００１４】以下，同図を用いて本従来技術の文字列照
合動作について説明する。このオートマトンは初期状態
０から状態遷移が始まる。初期状態０では，入力文字コ
ードが“0x33”であると状態１へ遷移する。本図に示さ
れていないが“0x33”以外の文字コードが入力された場
合は初期状態０に遷移する。状態１についても同様に，
入力文字コードが“0x24”ならば状態２へ，“0x33”な
らば状態１へ遷移し，それ以外は初期状態０へ戻る。以
下，他の状態における遷移についても同様である。状態
３において，入力文字コードが“0x34”ならば照合結果
が格納されている矢印２７０の遷移が起こり“海抜”が
照合されたことになる。The character string collating operation of this prior art will be described below with reference to FIG. The state transition of this automaton starts from the initial state 0. In the initial state 0, if the input character code is “0x33”, the state changes to state 1. Although not shown in the figure, when a character code other than "0x33" is input, the state transits to the initial state 0. Similarly for state 1,
If the input character code is "0x24", transition to state 2; if "0x33", transition to state 1; otherwise return to initial state 0. The same applies to transitions in other states below. In state 3, if the input character code is "0x34", the transition of the arrow 270 in which the collation result is stored occurs and "sea level" is collated.

【００１５】図４のオートマトンを格納した状態遷移テ
ーブル２２０の例を図５に示す。An example of the state transition table 220 storing the automaton shown in FIG. 4 is shown in FIG.

【００１６】現状態番号３０７が0x00で入力文字コード
３０３の値が0x33のとき，0x00と0x33に対応する0x01が
次に遷移すべきオートマトンの次状態番号３０４として
出力される。現状態番号３０７が0x03で文字コード３０
３が0x34のときには状態遷移テーブル２２０からの出力
は指示された検索タームが照合されたことを示す特定の
値0xFFとなる。この値を基に，判別回路２６０では指示
した検索タームが照合されたか否かが判定される。すな
わち状態番号レジスタ２５０からの出力３０５が0xFFで
あったときには指示した検索タームが照合されたと判断
し照合結果を出力する。When the current state number 307 is 0x00 and the value of the input character code 303 is 0x33, 0x01 corresponding to 0x00 and 0x33 is output as the next state number 304 of the automaton to be transited to next. The current state number 307 is 0x03 and the character code is 30.
When 3 is 0x34, the output from the state transition table 220 has a specific value 0xFF indicating that the instructed search term has been collated. Based on this value, the discrimination circuit 260 determines whether or not the instructed search term has been collated. That is, when the output 305 from the state number register 250 is 0xFF, it is determined that the instructed search term has been collated, and the collation result is output.

【００１７】状態番号レジスタ２１４からの出力３０７
が特定の値0xFEのときには状態番号レジスタ２５０の内
容を無視して，状態番号レジスタ２１４が強制的に0x00
にリセットされる。Output 307 from status number register 214
Is a specific value 0xFE, the contents of the status number register 250 are ignored and the status number register 214 is forced to 0x00.
Is reset to.

【００１８】この強制的なリセットが必要となる具体例
を図６に示す。FIG. 6 shows a specific example in which this forced reset is required.

【００１９】テキスト“山と岩”はＪＩＳ（２バイト）
コードでは(0x3B33)(0x2448)(0x3464)で表されるがその
中に“海抜”を表す文字コード(0x3324)(0x4834)が含ま
れている。従って“海抜”，すなわち(0x3324)(0x4834)
という検索タームが２バイト文字の上位バイト側から現
れたのか下位バイト側から現れたのかを識別せずに２バ
イト文字で表された文字列“山と岩”すなわち(0x3B33)
(0x2448)(0x3464)を１バイトずつ２つに分割して検索を
行うと，１バイトだけずれた状態で照合され誤った照合
結果を出力することになってしまう。このように２バイ
ト文字の下位バイト側を初期状態として文字列の照合処
理を開始してしまうことをバイトずれが生じたという。The text "mountain and rock" is JIS (2 bytes)
The code is represented by (0x3B33) (0x2448) (0x3464), but it contains the character code (0x3324) (0x4834) representing "above sea level". Therefore, "above sea level", that is, (0x3324) (0x4834)
Character string "mountain and rock" represented by double-byte characters without distinguishing whether the search term appears from the upper byte side or the lower byte side of the double-byte character, that is, (0x3B33)
If (0x2448) (0x3464) is divided into two by 1 byte and the search is performed, the data is collated with a 1-byte shift and an incorrect collation result is output. It is said that the byte shift occurs when the collating process of the character string is started with the lower byte side of the 2-byte character as the initial state.

【００２０】このバイトずれによる誤照合を防止するた
めに本従来技術では図５に示すように状態遷移テーブル
２２０中に特殊な値“0xFE”を埋め込んでおき，判別回
路２６０で“0xFE”が検出された時には強制的にリセッ
トをかけるようにしている。In order to prevent erroneous collation due to this byte shift, in the prior art, a special value "0xFE" is embedded in the state transition table 220 as shown in FIG. 5, and the discrimination circuit 260 detects "0xFE". I am forced to reset when it is given.

【００２１】[0021]

【発明が解決しようとする課題】このように本従来技術
による文書検索装置では２バイト文字を１バイトに分割
してそれぞれ１バイト単位に照合処理を行うことによ
り，状態遷移テーブルの容量が大規模化するのを回避し
ているが，２バイト文字の照合処理を１回で行うのに比
べて処理速度が１／２になってしまうという問題が生じ
る。As described above, in the document retrieval apparatus according to the prior art, the capacity of the state transition table is large because the 2-byte character is divided into 1 bytes and the collation processing is performed in 1-byte units. However, there is a problem that the processing speed becomes half as compared with the case where the collation process of the 2-byte character is performed once.

【００２２】また，２バイト文字コードのみからなるテ
キストに対してはバイトずれによる誤照合を回避する工
夫が行われてはいるが，実際の日本語テキストのように
１バイト文字コードと２バイト文字コードが混在して現
われるテキストを照合することはできない。In addition, for texts consisting only of double-byte character codes, some measures have been taken to avoid erroneous collation due to byte shifts. However, as in actual Japanese text, single-byte character codes and double-byte character codes are used. You cannot match text that appears with mixed code.

【００２３】本発明の第一の目的は２バイト文字から成
るテキストにおいても状態遷移テーブルの容量を膨大に
することなく高速に照合処理を行なうことのできる文字
コード変換装置および文書検索装置を提供することであ
る。A first object of the present invention is to provide a character code conversion device and a document search device capable of performing collation processing at high speed without enlarging the capacity of the state transition table even for a text composed of double-byte characters. That is.

【００２４】本発明の第二の目的は１バイト文字と２バ
イト文字とが混在するテキストにおいても誤照合を生じ
ることなく文字列照合処理を行なうことのできる文字コ
ード変換装置および文書検索装置を提供することであ
る。A second object of the present invention is to provide a character code conversion device and a document search device capable of performing a character string collation process without causing an erroneous collation even in a text in which 1-byte characters and 2-byte characters are mixed. It is to be.

【００２５】[0025]

【課題を解決するための手段】これらの課題は文書検索
装置の直前に，１バイト文字および２バイト文字の混在
するテキストを２バイトで構成するビット圧縮した形の
所定の文字コード体系にすべて変換する文字コード変換
手段を設けることによって達成される。All of these problems are to be converted into a predetermined character code system in a bit-compressed form consisting of 2 bytes of a text in which 1-byte characters and 2-byte characters are mixed immediately before the document retrieval device. This is achieved by providing a character code conversion means for

【００２６】そして，上記文字コード変換手段は，１バ
イト文字と２バイト文字とが混在するテキストから互い
に１バイトずつずらした状態で２バイトずつ取り込む二
つの文字コード取り込み手段（以後，ウィンドウと呼
ぶ）と，直前の変換ステップでどのウィンドウを選択し
たかという情報とそのウィンドウによって切り出された
文字コードが１バイト文字であったか２バイト文字であ
ったかという文字種情報に基づいて，上記二つのウィン
ドウの中からその先頭の１バイトが１バイト文字，また
は２バイト文字の先頭バイトとなるウィンドウ（以後，
境界ウィンドウと呼ぶ）を選択することにより文字コー
ドの切り出しを行う文字コード選択手段と，上記文字コ
ード選択手段によって選択された文字コードが１バイト
文字であるか２バイト文字であるかといった文字種を判
定し，各文字種に応じて文字コードを変換するコード変
換手段によって構成する。Further, the character code conversion means is two character code acquisition means (hereinafter referred to as a window) for acquiring two bytes from a text in which one-byte characters and two-byte characters are mixed, while shifting them by one byte. Based on the information indicating which window was selected in the previous conversion step and the character type information indicating whether the character code cut out by that window was a 1-byte character or a 2-byte character, the A window in which the first byte is the first byte of a single-byte character or a double-byte character (hereafter,
A character code selection means for cutting out a character code by selecting (a boundary window) and a character type such as whether the character code selected by the character code selection means is a 1-byte character or a 2-byte character. The code conversion means converts the character code according to each character type.

【００２７】[0027]

【作用】本発明の提供する文字コード変換手段では，入
力テキストから二つのウィンドウに互いに１バイトずら
した状態で２バイトずつ文字コードを取り込み，取り込
んだ２バイトの先頭バイトが１バイト文字コードである
かあるいは２バイト文字コードの上位バイトとなるウィ
ンドウから，すなわち文字境界となるウィンドウから出
力される文字コードを選択することにより文字コードを
切り出す。つまり，２バイト文字の下位バイト側から取
り込んだウィンドウ（バイトずれの生じるウィンドウ）
によって出力される文字コードを切り出さないようにす
ることにより，１バイト文字と２バイト文字とが混在す
る入力テキストにおいてもバイトずれを生じることなく
文字コードを切り出すことができる。In the character code conversion means provided by the present invention, the character code is captured from the input text in two windows in a state where they are shifted by one byte from each other, and the first byte of the captured two bytes is the one-byte character code. Alternatively, the character code is cut out by selecting the character code output from the window that is the upper byte of the 2-byte character code, that is, the window that is the character boundary. In other words, a window loaded from the lower byte side of a 2-byte character (window with byte shift)
By not cutting out the character code output by, the character code can be cut out without causing a byte shift even in the input text in which 1-byte characters and 2-byte characters are mixed.

【００２８】また，切り出された文字コードはコード変
換部において，ビット圧縮した形の所定の２バイト文字
コード体系にすべてコード変換する。これにより状態遷
移テーブルの中で対応する文字コードが存在しない領域
に対するデータを持つ必要がなくなるため状態遷移テー
ブルをコンパクトにすることができ，その結果，高速か
つ安価な文書検索装置を提供することが可能となる。Further, the cut-out character codes are all code-converted into a predetermined 2-byte character code system in a bit-compressed form in the code conversion unit. As a result, there is no need to have data for an area in the state transition table where the corresponding character code does not exist, so the state transition table can be made compact, and as a result, a fast and inexpensive document retrieval device can be provided. It will be possible.

【００２９】本発明の詳細な原理について説明する。The detailed principle of the present invention will be described.

【００３０】図７に本発明における文字コードの取り込
み例を示す。FIG. 7 shows an example of fetching character codes in the present invention.

【００３１】本図は入力テキストとして“AとBの・・
・”，すなわち１６進コード表現では(0x41)(0x82C6)(0
x42)(0x82CC)・・・が入力された場合を例にして文字コ
ードの取り込み方法を具体的に説明する。In this figure, the input text is "A and B ...
-", That is, (0x41) (0x82C6) (0
The method of importing a character code will be specifically described by taking the case where x42) (0x82CC) ... Is input as an example.

【００３２】なお，本例において文字コードは１バイト
文字をＪＩＳコード体系で，２バイト文字をシフトＪＩ
Ｓ（ＳＪＩＳ）コード体系（以後，ＪＩＳ／ＳＪＩＳコ
ード体系と呼ぶ）で表すものとする。In this example, the character code is a 1-byte character according to the JIS code system, and a 2-byte character is shifted according to JI.
It is represented by the S (SJIS) code system (hereinafter referred to as the JIS / SJIS code system).

【００３３】文字コードは互いに１バイトずつずらした
状態で二つのウィンドウ，すなわちウィンドウＡおよび
ウィンドウＢに２バイトずつ取り込まれる。つまり，図
７において１回目の取り込みではウィンドウＡに(0x41)
および(0x82)が，ウィンドウＢには(0x82)および(0xC6)
が取り込まれる。The character codes are fetched in two windows, that is, in two windows, that is, in each of the windows A and B, while being shifted by one byte from each other. That is, in FIG. 7, the first capture takes place in window A (0x41)
And (0x82), but in window B (0x82) and (0xC6)
Is captured.

【００３４】これらＡとＢの二つのウィンドウの中から
先頭１バイトが１バイト文字であるか，あるいは２バイ
ト文字の上位バイトであるウィンドウ（境界ウィンド
ウ）の出力を文字コードセレクタによって選択する。本
図において，先のウィンドウであるウィンドウＡの先頭
１バイト(0x41)が１バイト文字であるので，まず１回目
の変換ステップでは境界ウィンドウとしてウィンドウＡ
を選択する。２回目以降の変換ステップでは，どのウィ
ンドウが境界ウィンドウであるかをその直前の変換ステ
ップの処理結果から判定する。すなわち，前変換ステッ
プにおいてウィンドウＡから１バイト文字を切り出した
場合，もしくはウィンドウＢから２バイト文字を切り出
した場合には，ウィンドウＢが次に選択すべき境界ウィ
ンドウとなる。またウィンドウＡから２バイト文字を切
り出した場合，もしくはウィンドウＢから１バイト文字
を切り出した場合にはウィンドウＡが次に選択すべき境
界ウィンドウとなる。From the two windows A and B, the output of the window (boundary window) whose first 1 byte is a 1-byte character or the upper byte of a 2-byte character is selected by the character code selector. In this figure, the first 1 byte (0x41) of the previous window, window A, is a 1-byte character, so in the first conversion step, window A is used as the boundary window.
Select. In the second and subsequent conversion steps, which window is the boundary window is determined from the processing result of the conversion step immediately before that. That is, when a 1-byte character is cut out from the window A or a 2-byte character is cut out from the window B in the previous conversion step, the window B becomes the boundary window to be selected next. When a 2-byte character is cut out from the window A or a 1-byte character is cut out from the window B, the window A becomes the boundary window to be selected next.

【００３５】本図の例では１回目の変換ステップでウィ
ンドウＡから切り出した文字コード(0x41)が１バイト文
字であるので，２回目の変換ステップではウィンドウＢ
から文字コードを切り出す。２回目の変換ステップでウ
ィンドウＢから切り出した文字コード(0x82C6)が２バイ
ト文字であるので，３回目の変換ステップではウィンド
ウＢから文字コードを切り出す。ＪＩＳ／ＳＪＩＳコー
ド体系では文字コードの長さが２バイト以下なので，ウ
ィンドウＡもしくはウィンドウＢの少なくともどちらか
一方に境界ウィンドウが現れることになる。In the example of this figure, since the character code (0x41) cut out from the window A in the first conversion step is a 1-byte character, the window B is converted in the second conversion step.
Cut out the character code from. Since the character code (0x82C6) cut out from the window B in the second conversion step is a 2-byte character, the character code is cut out from the window B in the third conversion step. In the JIS / SJIS code system, since the character code length is 2 bytes or less, a boundary window appears in at least one of the window A and the window B.

【００３６】このようにして，常にウィンドウの先頭バ
イトからバイト長を意識しながら文字コードを切り出す
ことによって，１バイト文字と２バイト文字を正しく取
り出すことが可能になる。In this way, the character code is always cut out from the first byte of the window while being aware of the byte length, so that the 1-byte character and the 2-byte character can be correctly extracted.

【００３７】すなわち，文字コードを互いに１バイトず
つずらした状態で二つのウィンドウに取り込み，境界ウ
ィンドウを選択することによって文字コードの切り出し
を行うため，１バイト文字コードと２バイト文字コード
の混在したテキストに対しても，バイトずれを生じるこ
となく正しく文字コードを切り出すことが可能となる。
そして，切り出した文字コードに対して１バイト文字
であるか２バイト文字であるかといった文字種の判定を
行う。そして各文字種に応じたコード変換を行うことに
よって，ＪＩＳ／ＳＪＩＳコード体系で書かれたテキス
トをビット圧縮した形の所定の文字コード体系にコード
変換する。That is, since the character codes are fetched into the two windows while being shifted by 1 byte from each other, and the character code is cut out by selecting the boundary window, the text in which the 1-byte character code and the 2-byte character code are mixed Even with respect to, it is possible to cut out the character code correctly without causing a byte shift.
Then, the character type of the cut-out character code such as 1-byte character or 2-byte character is determined. By performing code conversion according to each character type, the text written in the JIS / SJIS code system is code-converted into a predetermined character code system in a bit-compressed form.

【００３８】次に，ＪＩＳ／ＳＪＩＳコード体系におけ
る文字種の判定方法の原理について説明する。Next, the principle of the character type determination method in the JIS / SJIS code system will be described.

【００３９】ＪＩＳ／ＳＪＩＳコード体系において，切
り出された文字コードが１バイト文字であるか２バイト
文字であるかの判定は，その上位１バイトの値を調べる
ことによって行うことができる。すなわち，上位１バイ
トが0x80〜0x9Fおよび0xE0〜0xEAの範囲内にあるときに
は，その文字コードは２バイト文字コードと判定でき
る。また，それ以外の範囲にあるときには１バイト文字
コードと判定できる。In the JIS / SJIS code system, it is possible to determine whether the cut out character code is a 1-byte character or a 2-byte character by checking the value of the upper 1 byte. That is, when the upper 1 byte is within the range of 0x80 to 0x9F and 0xE0 to 0xEA, the character code can be determined to be a 2-byte character code. If it is in any other range, it can be determined to be a 1-byte character code.

【００４０】この後，上記判定結果に基づいてＪＩＳ／
ＳＪＩＳコード体系で表された文字コードを，２バイト
構成のビット圧縮した形の所定の文字コード体系にコー
ド変換する。ＪＩＳ／ＳＪＩＳコード体系からビット
圧縮した形の文字コード体系（以後，圧縮文字コード体
系と呼ぶ）への変換方式を図８に示す。Then, based on the above judgment result, JIS /
A character code represented by the SJIS code system is converted into a predetermined character code system in a bit-compressed form of 2 bytes. FIG. 8 shows a conversion system from the JIS / SJIS code system to a bit-compressed character code system (hereinafter referred to as a compressed character code system).

【００４１】すなわち，ＪＩＳ（１バイト）コードに対
しては圧縮変換後の文字コード体系で0x0000〜0x00FFの
領域に，ＳＪＩＳ（２バイト）コードに対しては0x0100
〜0x1FA7の領域にマッピングする。That is, for the JIS (1 byte) code, it is in the area of 0x0000 to 0x00FF in the character code system after compression conversion, and for the SJIS (2 byte) code, it is 0x0100.
Map to the area of ~ 0x1FA7.

【００４２】以上，説明した手法を用いて入力テキスト
をビット圧縮した形のコード体系に変換することによっ
て，状態遷移テーブルの容量を減らすことができる。以
下，その理由について説明する。As described above, the capacity of the state transition table can be reduced by converting the input text into the bit-compressed code system using the method described above. The reason will be described below.

【００４３】文字列照合を行う上で必要な状態遷移テー
ブルの大きさは一般的に次式で与えられる。The size of the state transition table required for character string matching is generally given by the following equation.

【００４４】[0044]

【数１】状態遷移テーブル容量＝（最大状態数）×（文
字コードの種類数）×（状態番号のｂｉｔ数）これによると，ＪＩＳ／ＳＪＩＳコード体系で表された
テキストに対して２バイト文字コード用の状態遷移テー
ブルを作成したときに必要なメモリの容量は次式で表さ
れる。[Equation 1] State transition table capacity = (maximum number of states) × (number of character code types) × (number of state number bits) According to this, a double-byte character for a text represented in JIS / SJIS code system The memory capacity required when the state transition table for code is created is expressed by the following equation.

【００４５】[0045]

【数２】（２５６）×（２¹⁶）×（８）＝１２８Ｍｂｉｔ＝１６ＭＢｙｔｅただし，最大状態数は２５６（＝２⁸）とする。## EQU2 ## (256) × (2 ¹⁶ ) × (8) = 128 Mbit = ¹⁶ MByte However, the maximum number of states is 256 (= 2 ⁸ ).

【００４６】このように２バイト文字コード用の状態遷
移テーブルを格納するために必要なメモリ容量は膨大な
ものになってしまう。この理由は，ＪＩＳ／ＳＪＩＳコ
ード体系においては，非常に幅広い領域に文字コードが
分布しているなかで対応する文字の存在しない領域（図
８（ａ）においてハッチングの施されていない領域）が
極めて多いためである。この結果，すべての文字コード
に対する状態遷移を記述できるようにするために非常に
大きな状態遷移テーブルが必要になっている。As described above, the memory capacity required for storing the state transition table for the 2-byte character code becomes enormous. The reason for this is that in the JIS / SJIS code system, a region where no corresponding character exists (a region not hatched in FIG. 8A) is extremely wide in the distribution of character codes in a very wide region. Because there are many. As a result, a very large state transition table is required in order to be able to describe state transitions for all character codes.

【００４７】しかし，ＪＩＳ／ＳＪＩＳコード体系で表
されていた入力テキストを，図８（ｂ）に示すようなコ
ード体系に圧縮変換することによって，すべての文字は
１３ビットのコードで表現することができる。However, by compressing and converting the input text represented by the JIS / SJIS code system into the code system as shown in FIG. 8B, all the characters can be represented by a 13-bit code. it can.

【００４８】すなわち，圧縮変換後の文字コード体系に
おいて状態遷移テーブルを作成したときに必要なメモリ
の容量は次式で与えられる。That is, the memory capacity required when the state transition table is created in the character code system after compression conversion is given by the following equation.

【００４９】[0049]

【数３】（２５６）×（２¹³）×（８）＝１６Ｍｂｉｔ＝２ＭＢｙｔｅただし，最大状態数は２５６（＝２⁸）とする。Equation 3] (256) × (2 ¹³⁾ × (8) = 16Mbit = 2MByte However, the maximum number of states is 256 (= 2 ^8).

【００５０】以上のことから文字コードの圧縮変換を行
うことにより１６ＭＢｙｔｅ必要であった状態遷移テー
ブルのメモリ容量をその８分の１である２ＭＢｙｔｅに
削減することができる。From the above, by performing the compression conversion of the character code, it is possible to reduce the memory capacity of the state transition table, which required 16 MByte, to 2MByte which is ⅛ of that.

【００５１】以上説明したように，１バイトずつずらし
た状態で２バイトずつ取り込む二つの文字コード取込み
手段（ウィンドウ）と，そのウィンドウが境界ウィンド
ウであるか否かを判定し，１バイト文字あるいは２バイ
ト文字をそれぞれ文字単位で切り出す文字コード選択手
段と，これをビット圧縮した文字コード体系に変換する
コード変換手段を設けることにより状態遷移テーブルの
容量を格段に削減することができ，ひいては安価で高速
な文書検索装置を実現することが可能になる。As described above, two character code capturing means (windows) for capturing 2 bytes each in a state of shifting by 1 byte and whether or not the window is a boundary window are determined, and 1-byte characters or 2 The capacity of the state transition table can be markedly reduced by providing a character code selection means for cutting out each byte character in character units and a code conversion means for converting this to a bit-compressed character code system, which is inexpensive and high speed. It becomes possible to realize an excellent document retrieval device.

【００５２】また，ＪＩＳ／ＳＪＩＳコード体系におい
て漢字を含む全ての文字を表現するためには１６ビット
のコードが必要であったが，ビット圧縮した形のコード
体系にコード変換することによって全ての文字を１３ビ
ットのコードで表現することができるため，磁気ディス
ク装置などの記憶媒体中に格納することのできるデータ
の容量を等価的に約２０％向上させることができ，資源
を有効に利用することが可能になる。さらに，記憶媒体
への書込み速度，読込み速度についても等価的に約２０
％向上させることが可能となり，回線などの通信手段を
有効に利用することが可能となる。Further, in the JIS / SJIS code system, a 16-bit code was required to represent all characters including Chinese characters, but all characters can be converted by bit-compressed code system. Can be expressed by a 13-bit code, the capacity of data that can be stored in a storage medium such as a magnetic disk device can be equivalently improved by about 20%, and resources can be effectively used. Will be possible. Furthermore, the writing speed and reading speed to the storage medium are equivalently about 20.
%, It becomes possible to effectively use communication means such as lines.

【００５３】[0053]

【実施例】以下，本発明の第一の実施例について図９を
用いて説明する。EXAMPLE A first example of the present invention will be described below with reference to FIG.

【００５４】本図は本実施例の構成を示すブロック図で
ある。This drawing is a block diagram showing the structure of this embodiment.

【００５５】本実施例では文字列記憶手段１０５と文字
列照合手段１０２の間に文字コード変換手段４００を設
け，文字列記憶手段１０５から読み出された１バイト文
字と２バイト文字が混在する入力テキスト２０４を本文
字コード変換手段４００によってコード圧縮した形の２
バイトで構成される内部文字コード体系にすべて変換し
て文字列照合手段１０２へ送出する。文字列照合手段１
０２では文字コード変換手段４００によってコード圧縮
された文字コード２０７を入力として１文字単位で文字
列の照合処理を行なう。In this embodiment, the character code conversion means 400 is provided between the character string storage means 105 and the character string collation means 102, and the 1-byte characters and the 2-byte characters read from the character string storage means 105 are mixed. 2 in a form in which the text 204 is code-compressed by the character code conversion means 400
All the characters are converted into an internal character code system composed of bytes and sent to the character string collating means 102. String matching means 1
In 02, the character code 207 compressed by the character code conversion means 400 is input, and a character string collation process is performed for each character.

【００５６】なお，本実施例の入力テキストはＪＩＳ／
ＳＪＩＳコード体系で表されている。The input text of this embodiment is JIS /
It is represented by the SJIS code system.

【００５７】文字コード変換手段４００の構成を図１に
ブロック図で示す。FIG. 1 is a block diagram showing the configuration of the character code conversion means 400.

【００５８】本文字コード変換手段４００は文字コード
取込みウィンドウ４１０ａおよび４１０ｂ，文字コード
選択手段４２０，コード変換手段４３０によって構成さ
れる。The character code conversion means 400 comprises character code acquisition windows 410a and 410b, a character code selection means 420, and a code conversion means 430.

【００５９】文字列記憶手段１０５から読み出された入
力テキスト２０４は互いに１バイトずつずらした状態で
文字コード取込みウィンドウＡ４１０ａおよび文字コー
ド取込みウィンドウＢ４１０ｂに２バイトずつ取り込ま
れる。文字コード選択手段４２０では，各ウィンドウの
出力する文字コード４０１および４０２の中から先頭バ
イトが１バイト文字コードか，あるいは２バイト文字コ
ードの上位バイトとなるものを選択する。すなわち，先
頭バイトが文字境界となるウィンドウの出力を選択す
る。つまり，文字境界にあるウィンドウがＡのときには
文字コード４０１を，文字境界にあるウィンドウがＢの
ときには文字コード４０２を選択することによって文字
コードの切り出しを行う。文字コード選択手段４２０に
よって切り出された文字コード４０３には，１バイト文
字や２バイト文字の他に倍角表示や網かけ表示などを表
す制御コード（検索に不要でありコード変換時に削除さ
れるため，以後削除コードと呼ぶ），および文書番号や
ページ番号，インデント量などのバイナリデータ，バイ
ナリデータの直前に置かれこれに続くコードがバイナリ
データであることを示す特定の制御コード（以後，バイ
ナリデータマーカと呼ぶ），さらには検索対象文書の終
了などの特別な意味を持つ制御コードが混在している。
コード変換手段４３０では，文字コード選択手段４２０
によって切り出された文字コード４０３に対してこれら
の文字の種類の判定を行う。そして，各文字種に応じた
コード変換処理を行うことにより図８（ｂ）に示すビッ
ト圧縮した形の１３ビットの内部文字コード体系にコー
ド変換し，圧縮変換後の文字コード２０７として文字列
照合手段１０２に送出する。また，ここで判定された文
字種情報のうち２バイト文字であることを示すフラグ
は，次の変換ステップでの境界ウィンドウを判定するた
めに２バイト文字フラグ４０４として文字コード選択手
段４２０に返送される。The input texts 204 read from the character string storage means 105 are fetched by 2 bytes each in the character code fetch window A410a and the character code fetch window B410b while being shifted by 1 byte from each other. The character code selection means 420 selects one of the character codes 401 and 402 output from each window whose leading byte is a 1-byte character code or an upper byte of a 2-byte character code. That is, the output of the window whose first byte is the character boundary is selected. That is, the character code is cut out by selecting the character code 401 when the window on the character boundary is A and the character code 402 when the window on the character boundary is B. In the character code 403 cut out by the character code selection means 420, in addition to single-byte characters and double-byte characters, control codes indicating double-width display, half-tone display, etc. (because they are unnecessary for retrieval and are deleted during code conversion, Hereinafter, it is referred to as a deletion code), and binary data such as document number, page number, indent amount, and a specific control code placed immediately before the binary data and indicating that the code following the binary data is binary data (hereinafter referred to as a binary data marker). In addition, control codes having a special meaning such as the end of the document to be searched are mixed.
In the code conversion means 430, the character code selection means 420
These character types are determined for the character code 403 cut out by. Then, by performing code conversion processing according to each character type, code conversion is performed into the 13-bit internal character code system in the bit-compressed form shown in FIG. 8B, and the character string collating means is used as the character code 207 after compression conversion. 102. The flag indicating that the character type information is a 2-byte character is returned to the character code selection means 420 as a 2-byte character flag 404 for determining the boundary window in the next conversion step. .

【００６０】なお，本実施例においては入力テキスト中
に混在するバイナリデータとして文書番号を，特別な意
味を持つ制御コードとしては検索対象文書の終了を表す
コード（以後，ＥＯＦコードと呼ぶ）を例に挙げて説明
する。そして，文書番号は１バイトのバイナリデータマ
ーカに続く２バイトのコードによって構成されものとす
る。In this embodiment, a document number is used as binary data mixed in the input text, and a control code having a special meaning is a code indicating the end of the document to be searched (hereinafter referred to as EOF code). To explain. The document number is composed of a 2-byte code following the 1-byte binary data marker.

【００６１】次に，本実施例における文字コード取込み
ウィンドウＡ４１０ａおよび文字コード取込みウィンド
ウＢ４１０ｂおよび文字コード選択手段４２０の構成を
図１０に示す。Next, FIG. 10 shows the configurations of the character code acquisition window A 410a, the character code acquisition window B 410b, and the character code selection means 420 in this embodiment.

【００６２】はじめに，文字コード取込みウィンドウＡ
４１０ａおよびＢ４１０ｂの構成および動作について説
明する。First, the character code acquisition window A
The configuration and operation of 410a and B410b will be described.

【００６３】例えば，テキストとして“AとBの・・
・”，すなわち１６進コード表現で(0x41)(0x82C6)(0x4
2)(0x82CC)・・・が入力されたときの動作について説明
する。For example, the text "A and B ...
・ ”, That is, in hexadecimal code representation (0x41) (0x82C6) (0x4
2) The operation when (0x82CC) ... Is input will be described.

【００６４】まず，１回目の入力で最初の２バイト(0x4
182)が取り込まれる。First, the first 2 bytes (0x4
182) is taken in.

【００６５】取り込まれた２バイトのコードのうち，上
位側１バイト(0x41)は上位バイト用レジスタ４１１ａお
よび４１１ｂに，下位側１バイト(0x82)は下位バイト用
レジスタ４１２ａおよび４１２ｂに格納される。Of the fetched 2-byte code, the upper 1 byte (0x41) is stored in the upper byte registers 411a and 411b, and the lower 1 byte (0x82) is stored in the lower byte registers 412a and 412b.

【００６６】そして，２回目の入力が行われるときに，
上位バイト用レジスタ４１１ａに格納されている(0x41)
はレジスタ４１３に，下位バイト用レジスタ４１２ａお
よび４１２ｂに格納されている(0x82)はそれぞれレジス
タ４１４および４１５に格納され，新たに取り込まれた
２バイト(0xC642)の上位側１バイト(0xC6)が上位バイト
用レジスタ４１１ａおよび４１１ｂに，下位側１バイト
(0x42)が文字コードレジスタ４１２ａおよび４１２ｂに
格納される。Then, when the second input is made,
Stored in upper byte register 411a (0x41)
Is stored in the register 413, the lower byte registers 412a and 412b (0x82) are stored in the registers 414 and 415, respectively. Byte registers 411a and 411b have lower 1 byte
(0x42) is stored in the character code registers 412a and 412b.

【００６７】ウィンドウＡとして切り出される２バイト
コード４０１は文字コードレジスタ４１３および４１４
からの出力，すなわち(0x41)と(0x82)によって構成され
る。また，ウィンドウＢとして切り出される２バイトコ
ード４０２は文字コードレジスタ４１５および４１１ｂ
からの出力，すなわち(0x82)と(0xC6)によって構成され
る。The 2-byte code 401 cut out as the window A is the character code registers 413 and 414.
It is composed of the output from, that is, (0x41) and (0x82). The 2-byte code 402 cut out as the window B is the character code registers 415 and 411b.
From (0x82) and (0xC6).

【００６８】つまり，本実施例ではウィンドウＡからの
出力される２バイトコード４０１(0x4182)とウィンドウ
Ｂからの出力される２バイトコード４０２(0x82C6)と
は，それぞれ互いに１バイトずつずれた状態で文字コー
ド選択手段４２０に取り込まれることになる。That is, in the present embodiment, the 2-byte code 401 (0x4182) output from the window A and the 2-byte code 402 (0x82C6) output from the window B are shifted by 1 byte from each other. It is taken into the character code selection means 420.

【００６９】以上が文字コード取込みウィンドウ４１０
ａおよび４１０ｂの動作である。The above is the character code import window 410.
a and 410b.

【００７０】次に，文字コード選択手段４２０について
説明する。Next, the character code selection means 420 will be described.

【００７１】文字コード選択手段４２０は文字コードセ
レクタ４２５，ＥＮＯＲ回路４２３，レジスタ４２４に
よって構成される。The character code selection means 420 comprises a character code selector 425, an ENOR circuit 423 and a register 424.

【００７２】文字コード選択手段４２０では，前変換ス
テップにおいてウィンドウＡに１バイト文字が取り込ま
れたとき，あるいはウィンドウＢに２バイト文字が取り
込まれたときには，続く文字コードを正しく１文字分切
り出すために，次の変換ステップでウィンドウＢに取り
込まれた文字コードを切り出す。また，前変換ステップ
においてウィンドウＡに２バイト文字が取り込まれたと
き，あるいはウィンドウＢに１バイト文字が入力された
ときには，同様に続く文字コードを正しく１文字分切り
出すために，次の変換ステップでウィンドウＡに取り込
まれた文字コードを切り出すという処理を行う。In the character code selection means 420, when a 1-byte character is fetched in the window A or a 2-byte character is fetched in the window B in the previous conversion step, the succeeding character code is properly cut out by one character. , The character code captured in the window B in the next conversion step is cut out. In addition, when a double-byte character is captured in window A or a single-byte character is input in window B in the previous conversion step, in the same manner as in the next conversion step, the next succeeding character code is cut out correctly. The character code captured in the window A is cut out.

【００７３】以下，その動作を具体的に説明する。The operation will be specifically described below.

【００７４】文字コードセレクタ４２５では文字コード
取込みウィンドウＡ４１０ａからの出力４０１および文
字コード取込みウィンドウＢ４１０ｂからの出力４０２
を入力として，その先頭の１バイトが１バイト文字もし
くは２バイト文字の上位バイトとなるウィンドウ，すな
わち境界ウィンドウから出力されている文字コードを選
択する。The character code selector 425 outputs 401 from the character code acquisition window A 410a and 402 from the character code acquisition window B 410b.
Is input, the character code output from the window whose first 1 byte is the upper byte of the 1-byte character or 2-byte character, that is, the boundary window is selected.

【００７５】この境界ウィンドウは，その直前の変換ス
テップで選択したウィンドウを示すウィンドウフラグ４
２１，およびその時に入力した文字の種別（１バイト文
字であるか２バイト文字であるか）を表す２バイト文字
フラグ４０４によって選択される。境界ウィンドウフラ
グ４２１は，値が０のときに前変換ステップでウィンド
ウＡが選択されたことを示し，値が１のときにウィンド
ウＢが選択されたことを示す。また，２バイト文字フラ
グ４０４は値が０のときに前変換ステップの入力文字が
１バイト文字であることを，値が１のときに２バイト文
字であることを表す。This boundary window is a window flag 4 indicating the window selected in the conversion step immediately before it.
21 and a 2-byte character flag 404 indicating the type of character input at that time (whether it is a 1-byte character or a 2-byte character). The boundary window flag 421 indicates that the window A is selected in the previous conversion step when the value is 0, and indicates that the window B is selected when the value is 1. The 2-byte character flag 404 indicates that the input character of the previous conversion step is a 1-byte character when the value is 0, and it is a 2-byte character when the value is 1.

【００７６】これらの情報に基づいて，ウィンドウＡか
らの出力４０１とウィンドウＢからの出力４０２の中か
ら一方を選択する論理の真理値表を図１１に示す。FIG. 11 shows a truth table of a logic for selecting one of the output 401 from the window A and the output 402 from the window B based on the above information.

【００７７】本図において，Ｉ₁は前変換ステップにお
ける境界ウィンドウフラグ４２１，Ｉ₂は前変換ステッ
プにおける２バイト文字フラグ４０４，Ｉ₀は現変換ス
テップにおける境界ウィンドウフラグ４２２である。In the figure, I ₁ is a boundary window flag 421 in the previous conversion step, I ₂ is a 2-byte character flag 404 in the previous conversion step, and I ₀ is a boundary window flag 422 in the current conversion step.

【００７８】本図の真理値表は図１０のＥＮＯＲ回路４
２３とレジスタ４２４で実現される。The truth table of this figure is the ENOR circuit 4 of FIG.
23 and a register 424.

【００７９】ここで，レジスタ４２４には境界ウィンド
ウフラグ４２２が格納され，次変換ステップでの境界ウ
ィンドウを選択する際に前変換ステップ境界ウィンドウ
フラグ４２１として参照される。Here, the boundary window flag 422 is stored in the register 424, and is referred to as the previous conversion step boundary window flag 421 when the boundary window in the next conversion step is selected.

【００８０】次に，文字コード選択手段４２０の具体的
な動作について説明する。Next, the specific operation of the character code selection means 420 will be described.

【００８１】文字コードセレクタ４２５では，境界ウィ
ンドウフラグＩ₀が０のときには入力４０１を選択する
ことによってウィンドウＡに取り込まれた文字コード
を，境界ウィンドウフラグＩ₀が１のときには入力４０
２を選択することによってウィンドウＢに取り込まれた
文字コードを選択する。In the character code selector 425, the character code fetched in the window A by selecting the input 401 when the boundary window flag I ₀ is 0, and the input code 40 when the boundary window flag I ₀ is 1.
The character code captured in window B is selected by selecting 2.

【００８２】前変換ステップにおいてウィンドウＡに１
バイト文字が切り出されたとき，すなわちＩ₁＝０かつ
Ｉ₂＝０のとき，あるいはウィンドウＢに２バイト文字
が取り込まれたとき，すなわちＩ₁＝１かつＩ₂＝１のと
きにはＩ₀の値は１となり，文字コードセレクタ４２５
は入力４０２を選択することによってウィンドウＢに取
り込まれた文字コードを切り出す。1 in window A in the previous conversion step
When byte character is cut out, that is, when I ₁ = 0 and I ₂ = 0, or window when the 2-byte character is incorporated into B, that I ₁ = 1 and the value of I ₀ when the I ₂ = 1 Becomes 1 and the character code selector 425
Selects the input 402 to cut out the character code captured in the window B.

【００８３】また，前変換ステップにおいてウィンドウ
Ａに２バイト文字が取り込まれたとき，すなわちＩ₁＝
０かつＩ₂＝１のとき，あるいはウィンドウＢに１バイ
ト文字が入力されたとき，すなわちＩ₁＝１かつＩ₂＝０
のときにはＩ₀の値は０となり，文字コードセレクタ４
２５は入力４０１を選択することによってウィンドウＡ
に取り込まれた文字コードを切り出す。When a double-byte character is fetched in the window A in the previous conversion step, that is, I ₁ =
0 and I ₂ = 1 or when a 1-byte character is input to window B, that is, I ₁ = 1 and I ₂ = 0
Becomes zero the value of I ₀ when the character code selector 4
25 selects window A by selecting input 401
Cut out the character code captured in.

【００８４】前述した例を用いて文字コード選択手段４
２０の具体的な動作を説明する。Character code selection means 4 using the example described above.
The specific operation of 20 will be described.

【００８５】初期値として，前ステップ境界ウィンドウ
フラグＩ₁が０，前ステップ文字種フラグＩ₂が１に設定
されている。As the initial values, the previous step boundary window flag I ₁ is set to 0 and the previous step character type flag I ₂ is set to 1.

【００８６】１回目の変換ステップではＩ₁＝０，Ｉ₂＝
１であり，Ｉ₀＝０となるためウィンドウＡが選択さ
れ，ここから１バイト文字(0x41)が切り出される。２回
目の変換ステップでは前回切り出された文字が１バイト
コード(0x41)のためＩ₂＝０となり，Ｉ₁＝０とからＩ₀
＝１となるためウィンドウＢからの出力が選択される。
また，２回目の変換ステップでウィンドウＢから２バイ
ト文字(0x82C6)が切り出されるためＩ₂＝１となり，Ｉ₁
＝１とからＩ₀＝１となるため３回目の変換ステップで
はウィンドウＢからの出力が選択される。In the first conversion step, I ₁ = 0, I ₂ =
Since 1 and I ₀ = 0, window A is selected, and a 1-byte character (0x41) is cut out from this. In the second conversion step, the character cut out last time is 1-byte code (0x41), so that I ₂ = 0, and I ₁ = 0 to I ₀
Since = 1 is set, the output from window B is selected.
Also, since the 2-byte character (0x82C6) is cut out from the window B in the second conversion step, I ₂ = 1 and I ₁
Since = 1 and I ₀ = 1 are set, the output from the window B is selected in the third conversion step.

【００８７】以下，同様にして１バイト文字と２バイト
文字とを識別しながら１バイト文字と２バイト文字が混
在するテキストの中から正しく文字コードを切り出して
行く。Similarly, the character code is cut out correctly from the text in which the 1-byte character and the 2-byte character are mixed while identifying the 1-byte character and the 2-byte character in the same manner.

【００８８】以上が文字コード選択手段４２０の動作の
詳細である。The above is the detailed operation of the character code selection means 420.

【００８９】最後に，図１におけるコード変換手段４３
０について説明する。Finally, the code converting means 43 in FIG.
0 will be described.

【００９０】コード変換手段４３０では文字コード選択
手段４２０によって切り出された文字コード４０３に対
してまず文字種の判定を行う。そして，各文字種に応じ
たコード変換処理を行うことにより，入力テキスト中の
文字コードを図８（ｂ）に示すビット圧縮した形の２バ
イト構造の内部文字コード体系にコード変換する。The code conversion means 430 first determines the character type of the character code 403 cut out by the character code selection means 420. Then, by performing a code conversion process according to each character type, the character code in the input text is converted into the bit-compressed 2-byte internal character code system shown in FIG. 8B.

【００９１】以下，本実施例におけるコード変換手段４
３０の構成および動作について説明する。The code converting means 4 in this embodiment will be described below.
The configuration and operation of 30 will be described.

【００９２】本実施例におけるコード変換手段４３０の
構成を図１２に示す。The structure of the code converting means 430 in this embodiment is shown in FIG.

【００９３】本コード変換手段４３０はマルチプレクサ
４３１，文字コード変換ブロック４３２，文字種判定ブ
ロック４３３，出力ゲート４３４，バイナリデータ格納
用レジスタ４３５，レジスタ４３６および４３７，ＯＲ
回路４３８によって構成される。The code converting means 430 includes a multiplexer 431, a character code converting block 432, a character type determining block 433, an output gate 434, a binary data storage register 435, registers 436 and 437, OR.
It is constituted by the circuit 438.

【００９４】はじめに，マルチプレクサ４３１およびバ
イナリデータ格納用レジスタ４３５の動作について説明
する。First, the operations of the multiplexer 431 and the binary data storage register 435 will be described.

【００９５】マルチプレクサ４３１では，前回の処理ス
テップにおけるバイナリデータマーカフラグ４３９の値
によって文字コードとバイナリデータの選別を行う。つ
まり，前ステップのバイナリデータマーカフラグの値４
３９が１のとき，すなわちその直前に取り込まれたコー
ドがバイナリデータマーカのときには，続く２バイトを
ここでは文書番号としてバイナリデータ格納用レジスタ
４３５に出力する。そして，バイナリデータ格納用レジ
スタ４３５に格納されたバイナリデータは文書番号とし
て複合条件判定手段１０３に出力される。The multiplexer 431 selects the character code and the binary data according to the value of the binary data marker flag 439 in the previous processing step. That is, the value 4 of the binary data marker flag in the previous step
When 39 is 1, that is, when the code fetched immediately before is a binary data marker, the following 2 bytes are output to the binary data storage register 435 as a document number here. Then, the binary data stored in the binary data storage register 435 is output to the compound condition judging means 103 as a document number.

【００９６】また，前ステップのバイナリデータマーカ
フラグの値４３９が０のとき，すなわちその直前に取り
込まれたコードがバイナリデータマーカ以外のコードで
あった場合には，続く２バイトコードを文字コード変換
ブロック４３２および文字種判定ブロック４３３に出力
する。When the value 439 of the binary data marker flag in the previous step is 0, that is, when the code fetched immediately before that is a code other than the binary data marker, the subsequent 2-byte code is converted into a character code. The data is output to the block 432 and the character type determination block 433.

【００９７】以上が，マルチプレクサ４３１およびバイ
ナリデータ格納用レジスタ４３５の動作である。The above is the operation of the multiplexer 431 and the binary data storage register 435.

【００９８】次に，文字種判定ブロック４３３およびレ
ジスタ４３６，４３７，ＯＲ回路４３８の構成と動作に
ついて説明する。Next, the configuration and operation of the character type determination block 433, the registers 436 and 437, and the OR circuit 438 will be described.

【００９９】本実施例においては，文字種判定ブロック
４３３を文字種テーブルと呼ぶメモリによって構成して
いる。In this embodiment, the character type determination block 433 is composed of a memory called a character type table.

【０１００】文字種テーブル４３３は，マルチプレクサ
４３１から出力される２バイトコードをアドレスとして
参照され，図１３に示す４ビットのフラグ，すなわち２
バイト文字フラグ，削除コードフラグ，バイナリデータ
マーカフラグ，およびＥＯＦフラグによって構成され
る。The character type table 433 is referred to by the 2-byte code output from the multiplexer 431 as an address, and the 4-bit flag shown in FIG.
It is composed of a byte character flag, a deletion code flag, a binary data marker flag, and an EOF flag.

【０１０１】すなわち，文字種テーブル４３３にはマル
チプレクサ４３１から出力されるコードに対してその文
字種に応じた値が格納されている。マルチプレクサ４３
１から出力されたコードが２バイト文字のときには２バ
イト文字フラグに１を出力する。また，削除コードのと
きには削除コードフラグに，バイナリデータマーカのと
きにはバイナリデータマーカフラグに，ＥＯＦコードの
ときにはＥＯＦフラグにそれぞれ１を出力する。That is, the character type table 433 stores a value corresponding to the character type of the code output from the multiplexer 431. Multiplexer 43
When the code output from 1 is a 2-byte character, 1 is output to the 2-byte character flag. Further, 1 is output to the deletion code flag for the deletion code, 1 to the binary data marker flag for the binary data marker, and 1 to the EOF flag for the EOF code.

【０１０２】２バイト文字フラグはＯＲゲート４３８を
介してレジスタ４３６に格納され，次ステップでの境界
ウィンドウを判定するために前ステップ２バイト文字フ
ラグ４０４として文字コード選択手段４２０へ出力され
る。The 2-byte character flag is stored in the register 436 via the OR gate 438, and is output to the character code selection means 420 as the previous step 2-byte character flag 404 for determining the boundary window in the next step.

【０１０３】また，削除コードフラグは出力ゲート４３
４に出力され，後述する文字コード変換ブロック４３２
から出力される変換文字コード４４０の出力を制御す
る。すなわち，削除コードフラグが１の場合は変換文字
コード４４０の出力を抑止し，０の場合は文字列照合手
段１０２へ出力する。The deletion code flag is output to the output gate 43.
4 and the character code conversion block 432 described later.
The output of the conversion character code 440 output from is controlled. That is, when the deletion code flag is 1, the output of the converted character code 440 is suppressed, and when it is 0, the conversion character code 440 is output to the character string collating means 102.

【０１０４】バイナリデータマーカフラグはレジスタ４
３７に格納され，その次のステップでマルチプレクサ４
３１とＯＲ回路４３８に出力される。マルチプレクサ４
３１では，前ステップのバイナリデータマーカフラグの
値が１のとき，つまりバイナリデータマーカが入力され
た直後には続く２バイト，すなわち文書番号をバイナリ
データ格納用レジスタ４３５に出力する。また，ＯＲ回
路４３８ではバイナリデータマーカに続くコード，すな
わちバイナリデータに対応する２バイト文字フラグに強
制的に１を与えることにより，文書番号を２バイトのコ
ードとして一括処理する。The binary data marker flag is set in the register 4
37, and in the next step the multiplexer 4
31 and the OR circuit 438. Multiplexer 4
In step 31, when the value of the binary data marker flag in the previous step is 1, that is, immediately after the binary data marker is input, the following 2 bytes, that is, the document number is output to the binary data storage register 435. The OR circuit 438 forcibly gives 1 to the code following the binary data marker, that is, the 2-byte character flag corresponding to the binary data, so that the document number is collectively processed as a 2-byte code.

【０１０５】また，ＥＯＦコードが入力されるとＥＯＦ
フラグがコード変換処理の終了信号として検索制御手段
１０１に出力される。When an EOF code is input, EOF
The flag is output to the search control means 101 as an end signal of the code conversion process.

【０１０６】以上が，文字種判定ブロック４３３，レジ
スタ４３６，４３７，およびＯＲ回路４３８の動作であ
る。The above is the operation of the character type determination block 433, the registers 436, 437, and the OR circuit 438.

【０１０７】最後に，図１２の文字コード変換ブロック
４３２および出力ゲート４３４の構成および動作につい
て説明する。Finally, the configuration and operation of the character code conversion block 432 and the output gate 434 of FIG. 12 will be described.

【０１０８】本実施例においては，文字コード変換ブロ
ック４３２を文字コード変換テーブルと呼ぶメモリによ
って構成している。In this embodiment, the character code conversion block 432 is composed of a memory called a character code conversion table.

【０１０９】文字コード変換テーブル４３２は文字種テ
ーブル４３４と同じくマルチプレクサ４３１から出力さ
れる２バイトの文字コードをアドレスとして参照され
る。テーブル中には，各アドレスに対して図３９に示す
値が格納されている。Like the character type table 434, the character code conversion table 432 is referred to by using the 2-byte character code output from the multiplexer 431 as an address. In the table, the values shown in FIG. 39 are stored for each address.

【０１１０】すなわち，文字コード変換テーブル４３２
にはマルチプレクサ４３１から出力された２バイトコー
ドに対する圧縮変換後の文字コードが格納されている。
出力ゲート４３４では削除コードフラグが１のときに，
対応するコード変換ブロック４３２からの出力を抑止す
ることによって，検索の対象とならない制御コードの削
除処理を行う。削除コード以外の文字コードの場合に
は，文字コード変換テーブル４３２より出力された値を
そのまま文字列照合手段１０２へ送出する。That is, the character code conversion table 432
The character code after the compression conversion for the 2-byte code output from the multiplexer 431 is stored in.
In the output gate 434, when the deletion code flag is 1,
By suppressing the output from the corresponding code conversion block 432, the control code not to be searched is deleted. In the case of a character code other than the deletion code, the value output from the character code conversion table 432 is sent to the character string collating means 102 as it is.

【０１１１】以上が，文字コード変換ブロック４３２お
よび出力ゲート４３３の動作である。The above is the operation of the character code conversion block 432 and the output gate 433.

【０１１２】このように，本実施例においては文字コー
ド変換手段４００（図１０）を，１バイトずつずらした
状態で２バイトずつ取り込む複数のウィンドウと，その
ウィンドウが境界ウィンドウであるか否かを判定し，１
文字単位に文字コードを切り出す文字コード選択手段
と，文字コード選択手段によって切り出された文字コー
ドに対して文字種の判定を行い，各文字種に応じたコー
ド変換処理を行う文字コード変換手段によって構成する
ことにより，１バイト文字と２バイト文字，さらには各
種の制御コードが混在したテキスト中からバイトずれを
生じることなく正しく文字コードを切り出し，さらに圧
縮コード体系へのコード変換を行うことによって文字列
照合手段中の状態遷移メモリの容量を格段に削減するこ
とを可能とするとともに，柔軟な検索を可能としてい
る。As described above, in the present embodiment, the character code conversion means 400 (FIG. 10) is used to determine a plurality of windows that take in 2 bytes each in a state of being shifted by 1 byte and whether or not the window is a boundary window. Judge, 1
Character code selection means for extracting a character code for each character, and character code conversion means for determining a character type for the character code cut out by the character code selection means and performing code conversion processing according to each character type. The character string collating means by correctly extracting the character code from the text in which the 1-byte character and the 2-byte character, and further various control codes are mixed without causing the byte shift, and further performing the code conversion into the compression code system. The capacity of the internal state transition memory can be significantly reduced, and flexible search is possible.

【０１１３】なお，図１０に示した文字コード取込みウ
ィンドウＡ４１０ａおよびＢ４１０ｂは図１４に示す構
成によっても実現できる。The character code acquisition windows A410a and B410b shown in FIG. 10 can also be realized by the configuration shown in FIG.

【０１１４】すなわち，図１０において上位バイト用レ
ジスタ４１１ａと４１１ｂ，および下位バイト用レジス
タ４１２ａと４１２ｂは常に同じ文字コードを格納し，
またレジスタ４１４とレジスタ４１５も常に同じ文字コ
ードを格納している。That is, in FIG. 10, the upper byte registers 411a and 411b and the lower byte registers 412a and 412b always store the same character code,
Also, the register 414 and the register 415 always store the same character code.

【０１１５】したがって，図１０に示した構成図におい
て冗長なレジスタを省略することにより，文字コード取
込みウィンドウ４１０ａおよび４１０ｂは，図１４に示
すより簡単な構成にすることができる。すなわち，図１
４のレジスタ４１６ａには図１０におけるレジスタ４１
１ａとレジスタ４１１ｂに対応するコードが，レジスタ
４１６ｂにはレジスタ４１２ａと４１２ｂに対応するコ
ードがそれぞれ格納される構成になっている。また，図
１４のレジスタ４１７ａには図１０におけるレジスタ４
１３に対応するコードが，レジスタ４１７ｂにはレジス
タ４１４と４１５に対応するコードがそれぞれ格納され
る構成になっている。Therefore, by omitting redundant registers in the configuration diagram shown in FIG. 10, the character code acquisition windows 410a and 410b can have a simpler configuration as shown in FIG. That is, FIG.
The register 41 in FIG.
The code corresponding to 1a and the register 411b is stored in the register 416b, and the code corresponding to the registers 412a and 412b is stored in the register 416b. Further, the register 417a in FIG.
The code corresponding to 13 is stored in the register 417b, and the code corresponding to the registers 414 and 415 is stored in the register 417b.

【０１１６】本図における文字コード取込みウィンドウ
４１０ａおよび４１０ｂの動作を，テキストとして“A
とBの・・・”，すなわち１６進コード表現で(0x41)(0x
82C6)(0x42)(0x82CC)・・・が入力された場合を例とし
て説明する。The operation of the character code acquisition windows 410a and 410b in this figure is represented by the text "A".
And B ... ”, that is, in hexadecimal code representation (0x41) (0x
The case where 82C6) (0x42) (0x82CC) ... Is input will be described as an example.

【０１１７】まず，１回目の入力で最初の２バイト(0x4
182)が取り込まれ，上位側１バイト(0x41)はレジスタ４
１６ａに，下位側１バイト(0x82)はレジスタ４１６ｂに
格納される。First, the first 2 bytes (0x4
182) is taken in and the upper 1 byte (0x41) is in register 4
16a and the lower 1 byte (0x82) is stored in the register 416b.

【０１１８】そして，２回目の入力が行われるときに，
レジスタ４１６ａに格納されている(0x41)はレジスタ４
１７ａに，レジスタ４１６ｂに格納されている(0x82)は
レジスタ４１７ｂに格納され，新たに取り込まれた２バ
イト(0xC642)の上位側１バイト(0xC6)がレジスタ４１６
ａに，下位側１バイト(0x42)がレジスタ４１６ｂに格納
される。Then, when the second input is made,
(0x41) stored in register 416a is register 4
17a, (0x82) stored in the register 416b is stored in the register 417b, and the upper 1 byte (0xC6) of the newly fetched 2 bytes (0xC642) is registered in the register 416.
The lower one byte (0x42) is stored in the register 416b.

【０１１９】ウィンドウＡとして出力される２バイトコ
ード４０１は文字コードレジスタ４１７ａおよび４１７
ｂからの出力，すなわち(0x41)と(0x82)によって，また
ウィンドウＢとしての出力される２バイトコード４０２
は文字コードレジスタ４１７ｂおよび４１６ａからの出
力，すなわち(0x82)と(0xC6)によって構成される。すな
わち，ウィンドウＡおよびＢは互いに１バイトずつずれ
た状態で文字コードを取り込んだことになる。The 2-byte code 401 output as the window A is the character code registers 417a and 417.
2-byte code 402 output by b, that is, (0x41) and (0x82), and as window B
Is composed of outputs from the character code registers 417b and 416a, that is, (0x82) and (0xC6). In other words, the windows A and B have taken in the character codes in a state of being shifted by 1 byte from each other.

【０１２０】以上が図１４に示す文字コード取込みウィ
ンドウの動作である。The above is the operation of the character code capture window shown in FIG.

【０１２１】また，図１２に示した文字コード変換手段
４３０における文字種判定ブロック４３３は，図１５に
示す構成によっても実現できる。The character type determination block 433 in the character code conversion means 430 shown in FIG. 12 can also be realized by the configuration shown in FIG.

【０１２２】すなわち，図１２における文字種テーブル
４２３では各文字種に対応して１ビットずつフラグを割
り当てて値を格納したが，本図における文字種テーブル
４３３ａではこれらのフラグの値をエンコードして２ビ
ットにした値を格納する。That is, in the character type table 423 shown in FIG. 12, flags are stored by assigning one bit corresponding to each character type, but in the character type table 433a shown in this figure, the values of these flags are encoded into two bits. Stores the value.

【０１２３】このときの文字種テーブル値の構成を図１
６に示す。The structure of the character type table value at this time is shown in FIG.
6 shows.

【０１２４】そして，文字種判定時には文字種テーブル
４３３ａの値を図１５に示すようにデコーダ４３３ｂで
デコードして各文字種フラグを求めることにより，文字
種テーブルの容量を削減することができる。At the time of character type determination, the value of the character type table 433a is decoded by the decoder 433b as shown in FIG. 15 to obtain each character type flag, whereby the capacity of the character type table can be reduced.

【０１２５】さらに，図１２に示す文字コード変換手段
４３０における文字種判定ブロック４３３は図１７に示
す構成によっても実現できる。Further, the character type determination block 433 in the character code conversion means 430 shown in FIG. 12 can be realized also by the configuration shown in FIG.

【０１２６】ＪＩＳ／ＳＪＩＳコード体系において１バ
イト文字であるか２バイト文字であるかは，マルチプレ
クサ４３１から出力される２バイトコードの上位１バイ
トの値によって判定することができる。すなわち，上位
１バイトの値が0x80から0x9Fおよび0xE0から0xEAの領域
に含まれる場合には２バイト文字，それ以外のときには
１バイト文字とみなすことができる。したがって，マル
チプレクサ４３１から出力される２バイトコードが２バ
イト文字であるか否かは，本図に示す回路おけるコンパ
レータ４４１ａ，４４１ｂ，４４１ｃ，および４４１ｄ
によって判定される。Whether it is a 1-byte character or a 2-byte character in the JIS / SJIS code system can be determined by the value of the upper 1 byte of the 2-byte code output from the multiplexer 431. That is, when the value of the upper 1 byte is included in the area of 0x80 to 0x9F and 0xE0 to 0xEA, it can be regarded as a 2-byte character, and in other cases, it can be regarded as a 1-byte character. Therefore, whether the 2-byte code output from the multiplexer 431 is a 2-byte character or not is determined by the comparators 441a, 441b, 441c, and 441d in the circuit shown in this figure.
It is judged by.

【０１２７】削除コードについは，あらかじめ削除コー
ドに該当するコード値をレジスタ４４２−１〜４４２−
ｎに格納しておき，コンパレータ４４３−１〜４４３−
ｎでマルチプレクサ４３１から出力される２バイトコー
ドと比較することによって，文字コード選択手段４２０
によって切り出された文字コード４０３が検索に不要な
削除コードであるか否かを判定することができる。For the deletion code, the code values corresponding to the deletion code are previously stored in the registers 442-1 to 442.
n and store them in the comparators 443-1 to 443-
The character code selecting means 420 is compared with the 2-byte code output from the multiplexer 431 at n.
It is possible to determine whether or not the character code 403 cut out by is a deletion code unnecessary for the search.

【０１２８】また，バイナリデータマーカおよびＥＯＦ
コードについても同様に，バイナリデータマーカとＥＯ
Ｆコードに対応するコード値をそれぞれレジスタ４４４
と４４６に格納しておき，コンパレータ４４５と４４７
でマルチプレクサ４３１から出力される２バイトコード
と比較することによって，文字コード選択手段４２０に
よって切り出された文字コード４０３がバイナリデータ
マーカであるか，ＥＯＦコードであるかを判定すること
ができる。In addition, a binary data marker and EOF
Similarly for code, binary data marker and EO
The code values corresponding to the F code are respectively registered in the register 444.
And 446 and the comparators 445 and 447.
It is possible to determine whether the character code 403 cut out by the character code selection means 420 is a binary data marker or an EOF code by comparing with the 2-byte code output from the multiplexer 431 at.

【０１２９】この結果，図１２に示す文字種判定回路４
３３では文字種テーブル用として４×６４Ｋｂｉｔ＝２
５６Ｋｂｉｔのメモリが必要であったのに対し，本図に
示す文字種判定ブロックでは文字種テーブルが不要とな
るため文字種の判定処理を行うためハードウェアの構成
を極めて小さくすることが可能となる。As a result, the character type determination circuit 4 shown in FIG.
In 33, 4 × 64 Kbit = 2 for the character type table
While a memory of 56 Kbits was required, the character type determination block shown in this figure does not require a character type table, and therefore character type determination processing is performed, so that the hardware configuration can be made extremely small.

【０１３０】なお，図１２に示したコード変換手段４３
１において，検索タームに指定されていない文字コード
を削除し，文字列照合手段１０２に送出される文字コー
ドの数を減少させることによって，文字列照合手段１０
２の負荷を減らし，ひいては文書検索装置全体としての
検索スループットを向上させることが可能となる。The code converting means 43 shown in FIG.
1, the character code not specified in the search term is deleted, and the number of character codes sent to the character string collating means 102 is reduced, whereby the character string collating means 10
It is possible to reduce the load of 2 and improve the search throughput of the entire document search apparatus.

【０１３１】このことを，以下に具体的に説明する。This will be specifically described below.

【０１３２】図１３に示す文字種テーブル４３２では，
検索の対象とならない制御コード（削除コード）に対し
てのみ，削除コードフラグとして１を与えた。そして，
出力ゲート４３４では削除コードフラグが１のときに
は，対応するコード変換ブロック４３２からの出力４４
０をゲートすることによって削除処理を行った。In the character type table 432 shown in FIG. 13,
1 is given as the deletion code flag only to the control code (deletion code) that is not the search target. And
When the deletion code flag is 1, the output gate 434 outputs the output 44 from the corresponding code conversion block 432.
The deletion process was performed by gating 0.

【０１３３】ここで，文字種テーブル４３２において検
索タームに指定されていない文字コードに対しても削除
コードフラグとして１を与える。そして削除コード同
様，検索タームに指定されていない文字コードに対して
も出力ゲート４３４からの出力を抑止することにより，
削除処理を行うことが可能となる。Here, 1 is also given as a deletion code flag for a character code not specified in the search term in the character type table 432. Then, like the deletion code, by suppressing the output from the output gate 434 even for the character code that is not specified in the search term,
It becomes possible to perform deletion processing.

【０１３４】また，本実施例に示した文書検索装置（図
９）では，文字コード変換手段４００において半角全角
や大文字小文字，カタカナ平仮名，新旧漢字などのよう
にＪＩＳ／ＳＪＩＳコード体系においては異なったコー
ドで表される異表記文字を，圧縮変換後のコード体系で
同一のコードにコード変換することによって異表記文字
を含む文書をも一括して検索することが可能となる。Further, in the document retrieval apparatus (FIG. 9) shown in this embodiment, the character code conversion means 400 is different in the JIS / SJIS code system such as half-width full-width characters, uppercase and lowercase letters, katakana hiragana, old and new kanji, etc. By converting the different notation characters represented by the code into the same code in the code system after compression conversion, it becomes possible to collectively search for documents including the different notation characters.

【０１３５】このことを，英数字の半角全角による異表
記文字（半角の“A”と全角の“Ａ”）を例として，以
下に説明する。This will be described below by taking the different notation characters (half-width “A” and full-width “A”) of alphanumeric characters as an example.

【０１３６】図１２に示すコード変換手段４３０におい
て文字コード変換テーブル４３２には，通常，図４０に
示す値が格納されている。In the code conversion means 430 shown in FIG. 12, the character code conversion table 432 normally stores the values shown in FIG.

【０１３７】すなわち，文字コード変換テーブル４３２
には文字コード選択手段４２０（図１）で切り出された
２バイトコード４０３に対する圧縮変換後の文字コード
が格納されている。That is, the character code conversion table 432
Stores the character code after the compression conversion for the 2-byte code 403 cut out by the character code selection means 420 (FIG. 1).

【０１３８】図４０に示した文字コード変換テーブル４
３２では，例えば半角の“A”(0x41)に対応するアドレ
ス(0x41??)(但し，?は任意のコード）に対しては(0x004
1)が，全角“Ａ”(0x8260)に対しては(0x01E0)が格納さ
れており，半角の“A”および全角“Ａ”は互いに異な
ったコードに変換される。すなわち，半角の“A”を検
索タームとして検索を行った場合には全角“Ａ”は照合
されず，また全角“Ａ”を検索タームとして検索を行っ
た場合には半角の“A”は照合されないことになる。Character code conversion table 4 shown in FIG.
In 32, for example, for the address (0x41 ??) (where? Is any code) corresponding to half-width "A" (0x41), (0x004
In 1), (0x01E0) is stored for full-width "A" (0x8260), and half-width "A" and full-width "A" are converted into codes different from each other. That is, when a half-width “A” is used as the search term, the full-width “A” is not matched, and when a full-width “A” is used as the search term, the half-width “A” is matched. Will not be done.

【０１３９】ここで，文字コード変換テーブル４３２に
図４１に示す値を格納することによって半角の“A”お
よび全角“Ａ”を同一のコードに変換することが可能と
なる。By storing the values shown in FIG. 41 in the character code conversion table 432, it is possible to convert half-width "A" and full-width "A" to the same code.

【０１４０】すなわち，半角の“A”に対応するアドレ
ス(0x41??)および全角“Ａ”に対応するアドレス(0x826
0)に対して同じ値(0x01E0)を格納することによって，半
角の“A”および全角“Ａ”を同じコード(0x01E0)に変
換することができる。That is, the address (0x41 ??) corresponding to the half-width "A" and the address (0x826) corresponding to the full-width "A".
By storing the same value (0x01E0) for 0, half-width “A” and full-width “A” can be converted to the same code (0x01E0).

【０１４１】そして，文字列照合手段１０２では全角
“Ａ”に対応した圧縮変換後コード(0x01E0)で検索処理
を行なうことによって，ＪＩＳ／ＳＪＩＳ体系において
は半角全角の異表記のため異なったコードで表されてい
た文字に対しても，一括して検索することが可能とな
る。すなわち，“A”と“Ａ”の両方を検索することが
可能となる。Then, in the character string collating means 102, the retrieval processing is performed with the code after compression conversion (0x01E0) corresponding to the full-width "A". It is possible to search all the displayed characters at once. That is, both "A" and "A" can be searched.

【０１４２】さらに，英字の大文字小文字やカタカナ平
仮名，新旧漢字による異表記文字（例えば“Ａ”と
“ａ”，“あ”と“ア”，“斎”と“齋”）についても
同様の方法で，同じ圧縮変換後コードに変換することに
より一括して検索することが可能となる。Furthermore, the same method can be applied to uppercase and lowercase letters of English characters, katakana hiragana, and different notation characters of old and new kanji (for example, "A" and "a", "a" and "a", "sai" and "sai"). Thus, it is possible to search all at once by converting to the same compressed and converted code.

【０１４３】しかし，カタカナの半角全角による異表記
文字については例外的なケースが生じる。すなわち，濁
音である“ハ゛”はＪＩＳ（１バイト）コード体系におい
ては“ハ”(0xCA)と“゛”(0xDE)によって表現されるが，
ＳＪＩＳ（２バイト）コード体系における全角の“バ”
は２バイト文字１文字(0x836F)で表される。このよう
に，カタカナの半角文字においては濁音および半濁音は
１バイトコード２文字で表現されるが，これらの濁音お
よび半濁音に対応する全角文字は２バイトコード１文字
で表される。However, an exceptional case arises for the different notation characters by half-width and full-width katakana. In other words, the "ba" that is a dull sound is expressed by "ha" (0xCA) and """(0xDE) in the JIS (1 byte) code system.
Full-width "ba" in SJIS (2-byte) code system
Is represented by one 2-byte character (0x836F). As described above, in the half-width character of katakana, the voiced sound and the semi-voiced sound are expressed by two characters of the 1-byte code, but the full-width characters corresponding to these voiced sound and the semi-voiced sound are expressed by the one-byte code of 2 bytes.

【０１４４】このような場合においても，対応する異表
記文字を同一コードに変換することによって，すなわち
半角の“ハ゛”および全角の“バ”を同じ圧縮変換後コー
ドに変換することによって可能になる。Even in such a case, it becomes possible by converting the corresponding different notation characters into the same code, that is, by converting the half-width "ba" and the full-width "ba" into the same compressed and converted code. .

【０１４５】この場合の処理について図４２に例を挙げ
て説明する。The processing in this case will be described with an example in FIG.

【０１４６】すなわち，半角“ハ゛”に対応する文字コー
ド変換テーブル４３２のアドレスの値は(0xCADE)である
が，これに対応するテーブル値として(0x02AF)，すなわ
ち全角“バ”に対応するテーブル値と同じ値(0x02AF)を
格納する。また，半角“ハ゜”(0xCADF)に対しても同様
に，全角“パ”に対応するテーブル値と同じ値(0x02B0)
を格納する。そして，文字コード(0xCADE)および(0xCAD
F)をそれぞれ２バイト文字として扱う，すなわち文字種
テーブル４３３において(0xCADE)および(0xCADF)に対応
する２バイト文字フラグとして１を設定することによっ
て，(0xCADE)および(0xCADF)を２バイト文字として処理
し，カタカナの半角全角による異表記文字についても一
括して検索することが可能となる。That is, the value of the address of the character code conversion table 432 corresponding to the half-width "ba" is (0xCADE), but the table value corresponding to this is (0x02AF), that is, the table value corresponding to the full-width "ba". Store the same value as (0x02AF). Similarly, for half-width “pa” (0xCADF), the same value (0x02B0) as the table value corresponding to full-width “pa”
To store. Then, the character code (0xCADE) and (0xCAD
F) is treated as a 2-byte character, that is, by setting 1 as a 2-byte character flag corresponding to (0xCADE) and (0xCADF) in the character type table 433, (0xCADE) and (0xCADF) are treated as a 2-byte character. However, it is also possible to search collectively for the different notation characters in half-width and full-width katakana.

【０１４７】本実施例に示した文書検索装置は，入力テ
キストを１バイトずつずらした状態で２バイトずつ取り
込む２つの文字コード取込みウィンドウと，それらの中
から文字境界となるウィンドウからの出力文字コードを
選択することによって文字コードの切り出しを行う文字
コード選択手段と，文字コード選択手段によって切り出
された文字コードに対して文字種を与える文字種テーブ
ル，および変換後の文字コードを与える文字コード変換
テーブルを参照して文字コードの変換を行うコード変換
手段とで構成され，１バイト文字と２バイト文字とが混
在する入力テキストを２バイトで構成されるビット圧縮
した形の文字コード体系にすべて変換することにより，
文書検索装置における状態遷移テーブルの大きさをコン
パクトなものにすることができるばかりでなく，テキス
ト中に埋め込んだ各種の制御コードを用いて柔軟で高度
な検索処理を行うことができる。その結果，安価で高
速，かつ高機能な文書検索装置を実現することが可能と
なる。The document retrieval apparatus according to the present embodiment has two character code capture windows for capturing two bytes of input text while shifting the input text by one byte, and an output character code from a window forming a character boundary among them. Refer to the character code selection unit that cuts out the character code by selecting, the character type table that gives the character type to the character code cut out by the character code selection unit, and the character code conversion table that gives the converted character code. By converting the input text in which 1-byte characters and 2-byte characters are mixed into a bit-compressed character code system consisting of 2 bytes. ，
Not only can the size of the state transition table in the document retrieval apparatus be made compact, but also flexible and sophisticated retrieval processing can be performed using various control codes embedded in the text. As a result, it is possible to realize an inexpensive, high-speed, and highly functional document retrieval device.

【０１４８】以上説明してきた第一の実施例では，コー
ド変換手段４３０における文字コード変換ブロック４３
２（図１２）として，マルチプレクサ４３１から出力さ
れる２バイトコードをアドレスとして圧縮変換後の文字
コードを出力する文字コード変換テーブルを参照するこ
とによりコード変換を行った。In the first embodiment described above, the character code conversion block 43 in the code conversion means 430 is used.
2 (FIG. 12), code conversion was performed by referring to the character code conversion table that outputs the character code after compression conversion using the 2-byte code output from the multiplexer 431 as an address.

【０１４９】しかし，この方法によってコード変換を行
うためには文字コード変換テーブル用として６４ＫＷ×
１６ｂｉｔ＝１Ｍｂｉｔのメモリが必要である。However, in order to perform the code conversion by this method, 64 KW × is used for the character code conversion table.
A memory of 16 bits = 1 Mbit is required.

【０１５０】本発明の第二の実施例として，第一の実施
例における文字コード変換ブロック４３２（図１２）に
おいて文字コード変換テーブルのかわりに，マルチプレ
クサ４３１から出力される２バイトコードの上位側１バ
イトをアドレスとして１バイト文字用のコード変換値を
出力する１バイト文字用コード変換テーブルと２バイト
文字用コード変換処理用のオフセット情報を出力するコ
ード変換オフセットテーブルを参照し，これらのテーブ
ルによって得られた値を用いてコード変換処理を行うこ
とによってハードウェアの構成をコンパクトにしたもの
について例を挙げて説明する。As a second embodiment of the present invention, in the character code conversion block 432 (FIG. 12) in the first embodiment, instead of the character code conversion table, the upper side 1 of the 2-byte code output from the multiplexer 431 is used. Refer to the code conversion table for 1-byte character that outputs the code conversion value for 1-byte character and the code conversion offset table that outputs the offset information for the code conversion process for 2-byte character by using the byte as an address. An example in which the hardware configuration is made compact by performing code conversion processing using the obtained values will be described.

【０１５１】まず，本実施例におけるコード変換手段の
構成および動作について説明する前に，ＪＩＳ／ＳＪＩ
Ｓコード体系におけるコード変換のアルゴリズムについ
て簡単に説明する。First, before explaining the configuration and operation of the code converting means in this embodiment, JIS / SJI
A code conversion algorithm in the S code system will be briefly described.

【０１５２】図８に示したＳＪＩＳ（２バイト）コード
の文字コード変換は次式によって表すことができる。Character code conversion of the SJIS (2-byte) code shown in FIG. 8 can be expressed by the following equation.

【０１５３】(1) 下位バイトが0x7F以下の時(1) When the lower byte is 0x7F or less

【０１５４】[0154]

【数４】ＳＪＩＳ′＝（ＳＪＩＳ＿Ｈ＆０ｘＢＦ）×０ｘＢＣ＋ＳＪＩＳ＿Ｌ−０ｘ５ＤＦＣ (2) 下位バイトが0x80以上の時[Formula 4] SJIS '= (SJIS_H & 0xBF) x 0xBC + SJIS_L-0x5DFC (2) When the lower byte is 0x80 or more

【０１５５】[0155]

【数５】ＳＪＩＳ′＝（ＳＪＩＳ＿Ｈ＆０ｘＢＦ）×０ｘＢＣ＋ＳＪＩＳ＿Ｌ−０ｘ５ＤＦＤ上式においてＳＪＩＳ´は圧縮変換後の文字コードを表
し，ＳＪＩＳ＿ＨおよびＳＪＩＳ＿ＬはそれぞれＳＪＩ
Ｓコードの上位バイトおよび下位バイトを表す。また記
号“＆”はビット対応の論理積を表す。SJIS ′ = (SJIS_H & 0xBF) × 0xBC + SJIS_L-0x5DFD In the above formula, SJIS ′ represents the character code after compression conversion, and SJIS_H and SJIS_L are SJI, respectively.
It represents the upper and lower bytes of the S code. The symbol "&" represents a logical product corresponding to bits.

【０１５６】すなわち，ＳＪＩＳ（２バイト）コードの
下位１バイトが0x80より大きいか否かを判定し，小さい
場合には数４による変換を，大きい場合には数５による
変換を行うことによって図８（ｂ）に示した圧縮文字コ
ード体系へのコード変換を行うことができる。That is, it is determined whether the lower 1 byte of the SJIS (2 bytes) code is larger than 0x80, and if it is smaller, the conversion by the equation 4 is performed, and if it is larger, the conversion by the equation 5 is performed. Code conversion to the compressed character code system shown in (b) can be performed.

【０１５７】以上が，ＪＩＳ／ＳＪＩＳコード体系にお
けるコード変換のアルゴリズムである。The above is the code conversion algorithm in the JIS / SJIS code system.

【０１５８】次に，本実施例における文字コード変換ブ
ロック４３２の構成および動作について以下に説明す
る。The structure and operation of the character code conversion block 432 in this embodiment will be described below.

【０１５９】本実施例における文字コード変換ブロック
４３２および文字種判定ブロック４３３の構成を図１８
に示す。The configuration of the character code conversion block 432 and the character type determination block 433 in this embodiment is shown in FIG.
Shown in.

【０１６０】本実施例における文字コード変換ブロック
４３３は１バイト文字用コード変換回路４５０，２バイ
ト文字用コード変換回路４６０，変換モードセレクタ４
７０によって構成される。The character code conversion block 433 in this embodiment includes a 1-byte character code conversion circuit 450, a 2-byte character code conversion circuit 460, and a conversion mode selector 4.
70.

【０１６１】はじめに，１バイト文字用コード変換回路
４５０の構成と動作について説明する。First, the structure and operation of the 1-byte character code conversion circuit 450 will be described.

【０１６２】本実施例における１バイト文字用コード変
換回路４５０は１バイト文字用コード変換テーブルと呼
ぶメモリによって構成している。The 1-byte character code conversion circuit 450 in this embodiment is composed of a memory called a 1-byte character code conversion table.

【０１６３】１バイト文字用コード変換テーブルは，マ
ルチプレクサ４３１から出力される２バイトコードの上
位側１バイトの値４７１をアドレスとして参照され，各
アドレスに対して図４３に示す値が格納されている。In the 1-byte character code conversion table, the value 471 of the upper 1 byte of the 2-byte code output from the multiplexer 431 is referred to as an address, and the value shown in FIG. 43 is stored for each address. .

【０１６４】すなわち，１バイト文字用コード変換テー
ブル４５０にはマルチプレクサ４３１から出力された２
バイトコードの上位側１バイトの値４７１に対する圧縮
変換後の文字コードが格納されており，１バイト文字用
コード変換は１バイト文字用コード変換テーブルの値を
そのまま出力することによって行われる。That is, the 1-byte character code conversion table 450 contains the 2 bits output from the multiplexer 431.
The character code after compression conversion for the upper byte 1 byte value 471 of the byte code is stored, and the 1-byte character code conversion is performed by directly outputting the value of the 1-byte character code conversion table.

【０１６５】以上が１バイト文字用コード変換回路４５
０の内容である。The above is the code conversion circuit 45 for 1-byte characters.
The content is 0.

【０１６６】次に，２バイト文字用コード変換回路４６
０の構成と動作について説明する。Next, the 2-byte character code conversion circuit 46
The configuration and operation of 0 will be described.

【０１６７】本実施例における２バイト文字用コード変
換回路４６０の構成を図１９に示す。The configuration of the 2-byte character code conversion circuit 460 in this embodiment is shown in FIG.

【０１６８】本実施例における２バイト文字用コード変
換回路４６０はコード変換オフセットテーブル４６１，
加算器４６２，デクリメンタ４６３，コンパレータ４６
４，およびセレクタ４６５によって構成される。The 2-byte character code conversion circuit 460 in this embodiment is the code conversion offset table 461.
Adder 462, decrementer 463, comparator 46
4, and a selector 465.

【０１６９】以下，２バイト文字用コード変換回路４６
０の各構成要素の動作について具体的に説明する。Hereinafter, the 2-byte character code conversion circuit 46
The operation of each component of 0 will be specifically described.

【０１７０】はじめに，コード変換オフセットテーブル
４６１について説明する。First, the code conversion offset table 461 will be described.

【０１７１】コード変換オフセットテーブル４６１は１
バイト文字用コード変換テーブル４５０と同様，マルチ
プレクサ４３１から出力される２バイトコードの上位側
１バイトの値４７１をアドレスとして参照される。The code conversion offset table 461 is 1
Similar to the byte character code conversion table 450, the value 471 of the upper 1 byte of the 2-byte code output from the multiplexer 431 is referred to as an address.

【０１７２】コード変換オフセットテーブル４６１に格
納される値の例を図４４に示す。FIG. 44 shows an example of the values stored in the code conversion offset table 461.

【０１７３】すなわち，コード変換オフセットテーブル
４６１には各アドレスに対して次式に示す値が格納され
ている。That is, the code conversion offset table 461 stores the values shown in the following equation for each address.

【０１７４】[0174]

【数６】ＯＦＳＴ＝（ＳＪＩＳ＿Ｈ＆０ｘＢＦ）×０ｘＢＣ−０ｘ５ＤＦＣ上式においてＳＪＩＳ＿Ｈはマルチプレクサ４３１から
出力される２バイトコードの上位側１バイトの値４７１
を表し，ＯＦＳＴはコード変換オフセットテーブル４６
１のテーブル値を表す。[Equation 6] OFST = (SJIS_H & 0xBF) × 0xBC-0x5DFC In the above equation, SJIS_H is the value 471 of the upper byte of the 2-byte code output from the multiplexer 431.
, OFST is a code conversion offset table 46.
Represents a table value of 1.

【０１７５】数６に示す値は数４に示したコード変換式
のオフセット値となっており，コード変換オフセットテ
ーブル４６１はマルチプレクサ４３１から出力される２
バイトコードの上位側１バイトの値４７１に対して，２
バイト文字用コード変換のオフセット値を出力する。す
なわち，数４による２バイト文字用コード変換値はコー
ド変換オフセットテーブルから出力される値と２バイト
文字の下位側１バイトとを加算することによって算出さ
れる。The value shown in Expression 6 is the offset value of the code conversion equation shown in Expression 4, and the code conversion offset table 461 is 2 output from the multiplexer 431.
2 for the value 471 of the upper byte of the byte code
Outputs the offset value for byte character code conversion. That is, the code conversion value for the 2-byte character according to Equation 4 is calculated by adding the value output from the code conversion offset table and the lower byte of the 2-byte character.

【０１７６】次に，加算器４６２，デンクリメンタ４６
３，コンパレータ４６４，およびセレクタ４６５の動作
についてそれぞれ説明する。Next, adder 462 and dencrementer 46
3, the operations of the comparator 464 and the selector 465 will be described respectively.

【０１７７】加算器４６２では，コード変換オフセット
テーブル４６１から出力される値とマルチプレクサ４３
１から出力される２バイトコードの下位側１バイトの値
４７２とを加算することによって，数４による２バイト
文字用コード変換値４７５を算出する。In the adder 462, the value output from the code conversion offset table 461 and the multiplexer 43
The 2-byte character code conversion value 475 according to equation 4 is calculated by adding the lower byte 1 byte value 472 of the 2-byte code output from 1.

【０１７８】また，デクリメンタ４６３では数４による
２バイト文字用コード変換値４７５から１を引くことに
よって数５による２バイト文字用コード変換値４７６を
算出する。Also, the decrementer 463 subtracts 1 from the 2-byte character code conversion value 475 of the equation 4 to calculate the 2-byte character code conversion value 476 of the equation 5.

【０１７９】コンパレータ４６４では，マルチプレクサ
４３１から出力される２バイトコードの下位側１バイト
の値４７２が0x80より大きいか否かの判定を行う。そし
て，下位側１バイトの値４７２が0x80より小さい場合に
は数４によるコード変換値４７５を，0x80より大きい場
合には数５によるコード変換値４７６をセレクタ４６５
で選択することによって２バイト文字用コード変換値４
７４を出力する。The comparator 464 determines whether the value 472 of the lower 1 byte of the 2-byte code output from the multiplexer 431 is larger than 0x80. Then, when the value 472 of the lower-order 1 byte is smaller than 0x80, the code conversion value 475 by the equation 4 is used, and when it is larger than 0x80, the code conversion value 476 by the equation 5 is used.
2 byte character code conversion value 4 by selecting with
74 is output.

【０１８０】以上が，２バイト文字用コード変換回路４
６０の動作である。The above is the 2-byte character code conversion circuit 4.
60 operation.

【０１８１】最後に，変換モードセレクタ４７０の動作
について説明する。Finally, the operation of the conversion mode selector 470 will be described.

【０１８２】変換モードセレクタ４７０では，文字種判
定ブロック４３３から出力される２バイト文字フラグの
値によって１バイト文字用のコード変換を行うか２バイ
ト文字用のコード変換を行うべきかを選択する。つま
り，マルチプレクサ４３１から出力される２バイトコー
ドが２バイト文字と判定されたとき，すなわち２バイト
文字フラグの値が１のときには２バイト文字用コード変
換値４７４を，１バイト文字と判定されたとき，すなわ
ち２バイト文字フラグが０のときには１バイト文字用コ
ード変換値４７３を選択することによって文字コードの
変換処理を行う。The conversion mode selector 470 selects whether to perform code conversion for 1-byte characters or code conversion for 2-byte characters according to the value of the 2-byte character flag output from the character type determination block 433. That is, when the 2-byte code output from the multiplexer 431 is determined to be a 2-byte character, that is, when the value of the 2-byte character flag is 1, the 2-byte character code conversion value 474 is determined to be a 1-byte character. That is, when the 2-byte character flag is 0, the character code conversion process is performed by selecting the 1-byte character code conversion value 473.

【０１８３】以上が，本実施例における文字コード変換
ブロック４３２の動作である。The above is the operation of the character code conversion block 432 in this embodiment.

【０１８４】このように，本実施例における文字コード
変換ブロック４３２においてはマルチプレクサ４３１か
ら出力される２バイトコードの上位側１バイトに対して
１バイト文字用コード変換値を出力する１バイト文字用
コード変換テーブルと，２バイト文字用コード変換のオ
フセットとなる値を出力するコード変換オフセットテー
ブルを参照することによって文字コードの変換処理を行
う。そのため，本発明第一の実施例に示したコード変換
手段においては文字コード変換テーブルとして６４ＫＷ
×１６ｂｉｔ＝１Ｍｂｉｔのメモリが必要であったもの
を，本実施例に示すコード変換手段では１バイト文字用
コード変換テーブル用およびコード変換オフセットテー
ブル用として２５６Ｗ×１６ｂｉｔ＝４Ｋｂｉｔのメモ
リ２個に縮小することができ，ハードウェア構成を小さ
くすることが可能となる。As described above, in the character code conversion block 432 according to this embodiment, the 1-byte character code for outputting the 1-byte character code conversion value for the upper 1 byte of the 2-byte code output from the multiplexer 431. The conversion process of the character code is performed by referring to the conversion table and the code conversion offset table that outputs the value that is the offset of the code conversion for 2-byte characters. Therefore, in the code converting means shown in the first embodiment of the present invention, the character code conversion table is 64 KW.
In the code conversion means shown in this embodiment, the memory of x16 bit = 1 Mbit is reduced to two memories of 256 W × 16 bit = 4 Kbit for the 1-byte character code conversion table and the code conversion offset table. Therefore, the hardware configuration can be reduced.

【０１８５】なお，本実施例における１バイト文字用コ
ード変換回路４５０は１バイト文字用コード変換テーブ
ルと呼ぶメモリによって構成した。しかし，図８に示し
たように，ＪＩＳ（１バイト）コードは圧縮変換後の文
字コード体系においては(0x0000)〜(0x00FF)の領域にマ
ッピングされるため，１バイト文字用コード変換回路４
５０は図２０に示す構成によっても実現できる。The 1-byte character code conversion circuit 450 in this embodiment is composed of a memory called a 1-byte character code conversion table. However, as shown in FIG. 8, since the JIS (1 byte) code is mapped to the area (0x0000) to (0x00FF) in the character code system after compression conversion, the 1-byte character code conversion circuit 4
50 can also be realized by the configuration shown in FIG.

【０１８６】すなわち，本図に示す１バイト文字用コー
ド変換回路４５０において，２バイトで構成されるレジ
スタ４５１に，上位１バイトとして(0x00)を取り込む。
そして，下位１バイトとしてマルチプレクサ４３１から
出力される２バイトコードの上位側１バイトを取り込む
ことによって１バイト文字用コード変換値４７３を算出
する。That is, in the 1-byte character code conversion circuit 450 shown in the figure, (0x00) is fetched as the upper 1 byte into the register 451 composed of 2 bytes.
Then, the 1-byte character code conversion value 473 is calculated by taking in the upper-order 1 byte of the 2-byte code output from the multiplexer 431 as the lower 1 byte.

【０１８７】このように，上位側１バイトを下位バイト
へシフトして１バイト文字用コード変換値を算出するこ
とによって，図１８に示した１バイト文字用コード変換
回路４５０では１バイト文字用コード変換テーブルとし
て２５６Ｗ×１６ｂｉｔ＝４Ｋｂｉｔのメモリが必要で
あったものを，メモリを使用することなく１バイト文字
用コード変換値４７３を算出することができ，１バイト
文字用コード変換部のハードウェア構成を小さくするこ
とが可能となる。In this way, by shifting the high-order 1 byte to the low-order byte and calculating the 1-byte character code conversion value, the 1-byte character code conversion circuit 450 shown in FIG. The conversion table, which requires a memory of 256 W × 16 bits = 4 Kbits, can calculate the 1-byte character code conversion value 473 without using the memory, and the hardware configuration of the 1-byte character code conversion unit. Can be reduced.

【０１８８】また，図１８に示す文字コード変換ブロッ
ク４３２において，１バイト文字用コード変換回路４５
０を構成する１バイト文字用コード変換テーブルと２バ
イト文字用コード変換回路４６０中のコード変換オフセ
ットテーブル４６１は，どちらもマルチプレクサ４３１
から出力される２バイトコードの上位側１バイト４７１
をアドレスとして参照される。さらに，ＪＩＳ／ＳＪＩ
Ｓコード体系においては１バイト文字コードの存在する
領域(0x00〜0x7F，0xA0〜0xDF)と２バイト文字コードの
上位側１バイトコードの存在する領域(0x80〜0x9F，0xE
0〜0xFF)は互いに排他の領域にある。したがって，図４
３における１バイト文字用コード変換テーブル４５０が
値を持つ領域(0x00〜0x7F，0xA0〜0xDF)と図４４におけ
るコード変換オフセットテーブル４７１が値を持つ領域
(0x81〜0x9F，0xE0〜0xEA)も，当然互いに排他的な領域
となる。In the character code conversion block 432 shown in FIG. 18, the 1-byte character code conversion circuit 45 is used.
Both the 1-byte character code conversion table and the code conversion offset table 461 in the 2-byte character code conversion circuit 460 that form 0 are multiplexer 431.
1 byte 471 of high-order side of 2-byte code output from
Is referred to as an address. Furthermore, JIS / SJI
In the S code system, the area where the 1-byte character code exists (0x00-0x7F, 0xA0-0xDF) and the area where the upper 1-byte code of the 2-byte character code exists (0x80-0x9F, 0xE)
0 to 0xFF) are in mutually exclusive areas. Therefore, FIG.
Area (0x00-0x7F, 0xA0-0xDF) in which the 1-byte character code conversion table 450 has a value and area in which the code conversion offset table 471 in FIG. 44 has a value
(0x81 to 0x9F, 0xE0 to 0xEA) are of course mutually exclusive areas.

【０１８９】以上のことから，図１８に示す文字コード
変換ブロック４３２において，１バイト文字用コード変
換回路４５０を構成する１バイト文字用コード変換テー
ブルの内容を，２バイト文字用コード変換回路４６０中
のコード変換オフセットテーブル４６１の中に併せ持つ
ことができる。From the above, in the character code conversion block 432 shown in FIG. 18, the contents of the 1-byte character code conversion table forming the 1-byte character code conversion circuit 450 are stored in the 2-byte character code conversion circuit 460. Can also be held in the code conversion offset table 461 of

【０１９０】このときの，文字コード変換ブロック４３
２の構成を図２１に示す。At this time, the character code conversion block 43
The configuration of No. 2 is shown in FIG.

【０１９１】なお，本図に示す文字コード変換ブロック
４３２において，コード変換オフセットテーブル４６１
には図４５に示す値を格納する。In the character code conversion block 432 shown in this figure, the code conversion offset table 461 is used.
Stores the values shown in FIG.

【０１９２】すなわち，本図に示すコード変換オフセッ
トテーブル４６１ではアドレス４７１が１バイト文字コ
ードの領域(0x00〜0x7F，0xA0〜0xDF)に含まれる，すな
わちマルチプレクサ４３１から出力される２バイトコー
ドの上位側１バイトコード４７１が１バイト文字コード
のときには１バイト文字用コード変換値４７３を出力す
る。また，アドレス４７１が２バイト文字の上位側バイ
トの領域(0x80〜0x9F，0xE0〜0xFF)に含まれるときには
２バイト文字用コード変換処理のオフセットとなる値を
出力する。That is, in the code conversion offset table 461 shown in this figure, the address 471 is included in the 1-byte character code area (0x00-0x7F, 0xA0-0xDF), that is, the upper side of the 2-byte code output from the multiplexer 431. When the 1-byte code 471 is a 1-byte character code, the 1-byte character code conversion value 473 is output. When the address 471 is included in the upper byte area (0x80 to 0x9F, 0xE0 to 0xFF) of the 2-byte character, a value that is an offset for the 2-byte character code conversion process is output.

【０１９３】さらに，加算器４６２，デクリメンタ４６
３，コンパレータ４６４，セレクタ４６５，および変換
モードセレクタ４７０は，図１９に示した２バイト文字
用コード変換回路４６０と同様に，コード変換オフセッ
トテーブル４６１から出力される２バイト文字用コード
変換処理のオフセットとなる値とマルチプレクサ４３１
から出力される２バイトコードの下位側１バイトの値４
７２から２バイト文字用コード変換値４７４を算出す
る。Further, the adder 462 and the decrementer 46
3, the comparator 464, the selector 465, and the conversion mode selector 470, like the 2-byte character code conversion circuit 460 shown in FIG. 19, are offsets for the 2-byte character code conversion processing output from the code conversion offset table 461. Value and multiplexer 431
Value of 1 byte of lower side of 2 byte code output from
A 2-byte character code conversion value 474 is calculated from 72.

【０１９４】このように，２バイト文字用コード変換回
路４６０中のコード変換オフセットテーブル４６１内に
１バイト文字用コード変換回路４５０を構成する１バイ
ト文字用コード変換テーブルの内容を併せ持つことによ
って，図１８に示す文字コード変換ブロックにおいては
２５６Ｗ×１６ｂｉｔ＝４Ｋｂｉｔのメモリが２個必要
であったものを，２５６Ｗ×１６ｂｉｔ＝４Ｋｂｉｔの
メモリ１つに削減することができ，ハードウェア構成を
小さくすることが可能となる。As described above, the code conversion offset table 461 in the 2-byte character code conversion circuit 460 also has the contents of the 1-byte character code conversion table constituting the 1-byte character code conversion circuit 450. In the character code conversion block shown in FIG. 18, it is possible to reduce one that requires two 256W × 16bit = 4Kbit memories to one 256W × 16bit = 4Kbit memory, and to reduce the hardware configuration. It will be possible.

【０１９５】第二の実施例では，ＪＩＳ／ＳＪＩＳコー
ド体系で表される入力テキストを，全く隙間のないビッ
ト圧縮した形のコード体系にコード変換を行った。すな
わち，図８からも分かるようにＳＪＩＳコード体系にお
いて下位バイトが0x7Fの文字コードがマッピングされて
いない領域を削除し，全く隙間のない圧縮コード体系に
にコード変換を行うため，コード変換ブロック４３２に
おける２バイト文字用コード変換回路４６０（図１９）
の構成が複雑なものになってしまう。In the second embodiment, the input text represented by the JIS / SJIS code system was code-converted into a bit-compressed code system with no gaps. That is, as can be seen from FIG. 8, in the SJIS code system, the area where the character code of the lower byte is 0x7F is not mapped and the code conversion is performed to the compressed code system without any gap. 2-byte character code conversion circuit 460 (FIG. 19)
The configuration of will become complicated.

【０１９６】本発明の第三の実施例として圧縮コード体
系において部分的に隙間を許すことによって，第二の実
施例における２バイト文字用コード変換回路４６０の構
成を簡略化したものについて説明する。As a third embodiment of the present invention, a simplified configuration of the 2-byte character code conversion circuit 460 in the second embodiment by partially allowing a gap in the compression code system will be described.

【０１９７】はじめに，本実施例における文字コードの
変換アルゴリズムについて説明する。First, the conversion algorithm of the character code in this embodiment will be described.

【０１９８】本実施例ではＳＪＩＳコード体系で表され
た２バイト文字コードを次式に示す変換式によってコー
ド変換する。In this embodiment, the 2-byte character code represented by the SJIS code system is converted by the conversion formula shown below.

【０１９９】[0199]

【数７】ＳＪＩＳ″＝（ＳＪＩＳ＿Ｈ＆０ｘＢＦ）×０ｘＢＤ＋ＳＪＩＳ＿Ｌ−０ｘ５Ｅ７ＤＪＩＳ／ＳＪＩＳコード体系と上記変換式による圧縮変
換後文字コードとの対応を図２２に示す。22 shows the correspondence between SJIS ″ = (SJIS_H & 0xBF) × 0xBD + SJIS_L-0x5E7D JIS / SJIS code system and the character code after compression conversion by the above conversion formula.

【０２００】本図から分かるように圧縮コード体系にお
いて，ＪＩＳ／ＳＪＩＳコード体系における下位バイト
が0x7Fとなる領域に相当する箇所に文字コードの割り当
てられない領域が生じるものの，圧縮コード体系では全
てのＳＪＩＳ（２バイト）文字コードが0x0100〜0x1FA7
の領域にコード変換され，すべての文字を１３ビットの
コードで表現することができる。As can be seen from the figure, in the compressed code system, there is an area where the character code is not assigned in the area corresponding to the area where the lower byte in the JIS / SJIS code system is 0x7F. (2 bytes) Character code is 0x0100-0x1FA7
The code is converted into the area of, and all characters can be expressed by a 13-bit code.

【０２０１】以上が，本実施例におけるコード変換のア
ルゴリズムである。The above is the code conversion algorithm in this embodiment.

【０２０２】次に，数７に基づいたコード変換処理を行
ったときの２バイト文字用コード変換回路４６０の構成
および動作について説明する。Next, the configuration and operation of the 2-byte character code conversion circuit 460 when the code conversion processing based on the equation 7 is performed will be described.

【０２０３】本実施例における２バイト文字用コード変
換回路４６０の構成を図２３に示す。The structure of the 2-byte character code conversion circuit 460 in this embodiment is shown in FIG.

【０２０４】本図に示す２バイト文字用コード変換回路
４６０は，コード変換オフセットテーブル４６１と加算
器４６２によって構成される。The 2-byte character code conversion circuit 460 shown in the figure comprises a code conversion offset table 461 and an adder 462.

【０２０５】はじめに，コード変換オフセットテーブル
４６１について説明する。First, the code conversion offset table 461 will be described.

【０２０６】コード変換オフセットテーブル４６１に
は，マルチプレクサ４３１から出力される２バイトコー
ドの上位側１バイトの値４７１をアドレスとして参照さ
れ各アドレスには次式に示す値を格納する。In the code conversion offset table 461, the value 471 of the upper 1 byte of the 2-byte code output from the multiplexer 431 is referred to as an address, and the value shown in the following equation is stored in each address.

【０２０７】[0207]

【数８】ｏｆｓｔ＝（ＳＪＩＳ＿Ｈ＆０ｘＢＦ）×０ｘＢＤ−０ｘ５Ｅ７Ｄまた，このときのコード変換オフセットテーブル４６１
の値を図４６に示す。Ofst = (SJIS_H & 0xBF) × 0xBD-0x5E7D In addition, the code conversion offset table 461 at this time
The value of is shown in FIG.

【０２０８】数８に示す値は数７に示したコード変換式
のオフセット値となっており，２バイト文字用コード変
換値４７４は，コード変換オフセットテーブル４６１か
ら出力される値と２バイト文字の下位側１バイトの値４
７２（ＳＪＩＳ＿Ｌ）とを加算器４６２で加算すること
によって容易に算出される。The value shown in Expression 8 is the offset value of the code conversion expression shown in Expression 7, and the 2-byte character code conversion value 474 is the value output from the code conversion offset table 461 and the 2-byte character. Value 4 in lower 1 byte
72 (SJIS_L) is added by the adder 462 to easily calculate.

【０２０９】このように，図２３に示した２バイト文字
用コード変換回路４６０では，圧縮変換後の文字コード
体系において部分的に隙間を許すことによって，図１９
に示す２バイト文字用コード変換回路４６０に比べ，デ
クリメンタ４６３，コンパレータ４６４，およびセレク
タ４６５を省略することができ，ハードウェアの構成を
小さくすることが可能となる。As described above, in the 2-byte character code conversion circuit 460 shown in FIG. 23, a gap is partially allowed in the character code system after compression conversion, so that FIG.
The decrementer 463, the comparator 464, and the selector 465 can be omitted as compared with the 2-byte character code conversion circuit 460 shown in (4), and the hardware configuration can be reduced.

【０２１０】また，第二の実施例では，図１８に示す２
バイト文字用コード変換回路４６０としてマルチプレク
サ４３２から出力される２バイトコードの上位側１バイ
ト４７１をアドレスとして，２バイト文字用コード変換
のオフセットとなる値を出力するコード変換オフセット
テーブルを参照することによりコード変換を行ってい
る。In addition, in the second embodiment, 2 shown in FIG.
By referring to a code conversion offset table that outputs a value that is an offset for 2-byte character code conversion, using the upper 1 byte 471 of the 2-byte code output from the multiplexer 432 as the byte character code conversion circuit 460 as an address. Code conversion is performed.

【０２１１】しかし，この方法によって文字コード変換
を行うにはコード変換オフセットテーブル用として４Ｋ
ｂｉｔのメモリが必要となる。However, in order to perform character code conversion by this method, 4K for the code conversion offset table is used.
Bit memory is required.

【０２１２】本発明の第四の実施例として，第二の実施
例におけるコード変換手段４３０における２バイト文字
用コード変換回路４６０において，コード変換オフセッ
トテーブル４６１を使用せず，演算処理によってコード
変換を行うものについて説明する。As the fourth embodiment of the present invention, the code conversion offset table 461 is not used in the 2-byte character code conversion circuit 460 in the code conversion means 430 in the second embodiment, and the code conversion is performed by the arithmetic processing. What you do is explained.

【０２１３】以下に，第四の実施例における２バイト文
字用コード変換回路４６０の構成および動作について説
明する。The configuration and operation of the 2-byte character code conversion circuit 460 in the fourth embodiment will be described below.

【０２１４】はじめに，本実施例における２バイト文字
用コード変換回路４６０の構成を図２４に示す。First, the configuration of the 2-byte character code conversion circuit 460 in this embodiment is shown in FIG.

【０２１５】本図に示す２バイト文字用コード変換回路
４６０では数４および数５にしたがって２バイト文字用
のコード変換値を算出する。すなわち，上位側１バイト
の値４７１（ＳＪＩＳ＿Ｈ）と0xBFとのビット論理積を
とった値と0xBCを掛け，これに下位側１バイトの値４７
２（ＳＪＩＳ＿Ｌ）を加えることにより中間結果４７７
を求める。さらに，この中間結果４７７から0x5DFCを引
くことにより変換結果４７５を算出する。また，中間結
果４７７から0x5DFDを引くことにより変換結果４７６を
算出する。すなわち，本図において変換結果４７５は数
４によるコード変換値を，変換結果４７６は数５による
コード変換値を表している。In the 2-byte character code conversion circuit 460 shown in this figure, the 2-byte character code conversion value is calculated in accordance with equations (4) and (5). That is, the value obtained by bitwise ANDing the value 471 (SJIS_H) of the high-order 1 byte and 0xBF is multiplied by 0xBC, and this is multiplied by the value 47 of the low-order 1 byte 47.
Intermediate result 477 by adding 2 (SJIS_L)
Ask for. Further, the conversion result 475 is calculated by subtracting 0x5DFC from the intermediate result 477. Further, the conversion result 476 is calculated by subtracting 0x5DFD from the intermediate result 477. That is, in the figure, the conversion result 475 indicates the code conversion value by the equation 4, and the conversion result 476 indicates the code conversion value by the equation 5.

【０２１６】コンパレータ４６４ではマルチプレクサ４
３１から出力される２バイトコードの下位側１バイトが
0x80より大きいか否かを判定することによって数４によ
る変換を行うべきか，数５による変換を行うべきかを決
定する。すなわち，下位側１バイトの値４７２が0x80よ
り小さい場合にはセレクタ４６５で変換結果４７５を，
0x80より大きい場合には変換結果４７６を選択し，２バ
イト文字用コード変換値４７４として出力する。In the comparator 464, the multiplexer 4
The lower 1 byte of the 2-byte code output from 31 is
By determining whether or not it is larger than 0x80, it is determined whether the conversion by the equation 4 or the conversion by the equation 5 should be performed. That is, when the value 472 of the lower 1 byte is smaller than 0x80, the selector 465 outputs the conversion result 475,
When it is larger than 0x80, the conversion result 476 is selected and output as the 2-byte character code conversion value 474.

【０２１７】以上が，２バイト文字用コード変換回路４
６０の動作である。The above is the code conversion circuit 4 for 2-byte characters.
60 operation.

【０２１８】このように，本実施例に示した２バイト文
字用コード変換ブロック４３２では文字コードの変換処
理を演算処理によって行う。そのため，本発明第二の実
施例に示した２バイト文字用コード変換回路４６０（図
１９）においてはコード変換オフセットテーブルとして
２５６Ｗ×１６ｂｉｔ＝４Ｋｂｉｔのメモリが必要であ
ったものを，メモリを使用することなくコード変換処理
を行うことが可能となり，ハードウェアの構成を小さく
することができる。As described above, in the 2-byte character code conversion block 432 shown in this embodiment, the character code conversion processing is performed by the arithmetic processing. Therefore, the 2-byte character code conversion circuit 460 (FIG. 19) shown in the second embodiment of the present invention uses a memory of 256 W × 16 bits = 4 Kbits as a code conversion offset table. It is possible to perform code conversion processing without having to do so, and it is possible to reduce the hardware configuration.

【０２１９】なお，本実施例においても第三の実施例同
様，圧縮変換後のコード体系に部分的に隙間を許すこと
によって２バイト文字用コード変換値算出回路４７０の
構成を簡略化することができる。この場合のコード変換
手段４３０の構成図を図２５に示す。Also in this embodiment, as in the third embodiment, the structure of the 2-byte character code conversion value calculation circuit 470 can be simplified by partially allowing a gap in the code system after compression conversion. it can. FIG. 25 shows a configuration diagram of the code converting means 430 in this case.

【０２２０】本図に示す２バイト文字用コード変換回路
４６０では，数７に示す２バイト文字用コード変換式に
したがって２バイト用コード変換値を算出する。すなわ
ち，上位側１バイトの値４７１（ＳＪＩＳ＿Ｈ）と0xBF
とのビット論理積をとった値４８１と0xBCをで掛ける。
この値４７２から，0x5E75と下位側１バイトの値４７２
（ＳＪＩＳ＿Ｌ）の引き算を行った値４８３を減じるこ
とによって２バイト文字用コード変換値４７４を求め
る。本実施例では，図２４に示す２バイト文字用コード
変換回路に比べてハードウェアの構成を小さくすること
が可能となる。In the 2-byte character code conversion circuit 460 shown in this figure, the 2-byte code conversion value is calculated according to the 2-byte character code conversion formula shown in Eq. That is, the value 471 (SJIS_H) of the upper 1 byte and 0xBF
Multiply by the value 481 which is the bit logical product of and and 0xBC.
From this value 472, the value 472 of 0x5E75 and the lower 1 byte
A 2-byte character code conversion value 474 is obtained by subtracting the value 483 obtained by subtracting (SJIS_L). In this embodiment, the hardware configuration can be made smaller than that of the 2-byte character code conversion circuit shown in FIG.

【０２２１】前記実施例においては，ＪＩＳ／ＳＪＩＳ
コード体系で表された文書を検索対象とした場合の文字
コード変換手段の構成および動作について説明した。し
かし，文字コードが１バイト文字に対してはＥＢＣＤＩ
Ｋコード体系，２バイト文字に対してはＫＥＩＳコード
体系（以後，ＥＢＣＤＩＫ／ＫＥＩＳコード体系と呼
ぶ）で表された文書に対しては，１バイト文字と２バイ
ト文字との判定方法および圧縮コード体系への変換アル
ゴリズムが異なるため，異なった構成の文字コード変換
手段が必要となる。In the above embodiment, JIS / SJIS
The configuration and operation of the character code conversion means when the document represented by the code system is the search target have been described. However, if the character code is 1-byte character, EBCDI
K code system, 1-byte character and 2-byte character determination method and compressed code system for documents represented by KEIS code system for double-byte characters (hereinafter referred to as EBCDIK / KEIS code system) Since the conversion algorithm for the is different, a character code conversion means with a different structure is required.

【０２２２】第五の実施例として，ＥＢＣＤＩＫ／ＫＥ
ＩＳコード体系で表された文書を対象としたときのコー
ド変換手段の構成および動作について例を挙げて説明す
る。As a fifth embodiment, EBCDIK / KE
The configuration and operation of the code conversion means when a document represented by the IS code system is targeted will be described with an example.

【０２２３】まず，本実施例におけるコード変換手段の
構成および動作について説明する前に，ＥＢＣＤＩＫ／
ＫＥＩＳコード体系における１バイト文字と２バイト文
字の判定方法および圧縮コード体系への変換アルゴリズ
ムについて簡単に説明する。First, before explaining the configuration and operation of the code converting means in this embodiment, EBCDIK /
A method for determining a 1-byte character and a 2-byte character in the KEIS code system and a conversion algorithm to the compression code system will be briefly described.

【０２２４】はじめに，ＥＢＣＤＩＫ／ＫＥＩＳコード
体系における１バイト文字と２バイト文字との判定方法
について説明する。First, a method of determining a 1-byte character and a 2-byte character in the EBCDIK / KEIS code system will be described.

【０２２５】ＥＢＣＤＩＫ／ＫＥＩＳコード体系におい
ては，１バイト文字しか入力されないステージ（以後，
１バイトステージと呼ぶ）と２バイト文字しか入力され
ないステージ（以後，２バイトステージとよぶ）が存在
する。そして，１バイトステージと２バイトステージは
特定の制御コード（以後，エスケープシーケンスと呼
ぶ）によって分離されている。すなわち，１バイトステ
ージへの遷移を表わす制御コード(0x0A41)が入力される
と，次に２バイトステージへの遷移を表わす制御コード
(0x0A42)が入力されるまで１バイトステージとなり１バ
イト文字しか入力されない。また，２バイトステージへ
の遷移を表わす制御コード(0x0A42)が入力されると，次
に１バイトステージへの遷移を表わす制御コード(0x0A4
1)が入力されるまで２バイトステージとなり２バイト文
字しか入力されない。つまり，入力された文字コードが
１バイト文字であるか２バイト文字であるかは，現在１
バイトステージであるか２バイトステージであるかによ
って判定され，それはエスケープシーケンスを検出する
ことによって行われる。In the EBCDIK / KEIS code system, the stage (hereinafter,
There is a stage in which only 2-byte characters are input (hereinafter referred to as a 1-byte stage) (hereinafter referred to as a 2-byte stage). Then, the 1-byte stage and the 2-byte stage are separated by a specific control code (hereinafter referred to as an escape sequence). That is, when the control code (0x0A41) indicating the transition to the 1-byte stage is input, the control code indicating the transition to the 2-byte stage is input next.
It becomes a 1-byte stage and only 1-byte characters are input until (0x0A42) is input. When a control code (0x0A42) indicating a transition to the 2-byte stage is input, a control code (0x0A4) indicating a transition to the 1-byte stage is input next.
It becomes a 2-byte stage until 1) is input and only 2-byte characters are input. In other words, it is currently determined whether the input character code is a 1-byte character or a 2-byte character.
It is determined by whether it is a byte stage or a 2-byte stage, which is performed by detecting an escape sequence.

【０２２６】以上が，ＥＢＣＤＩＫ／ＫＥＩＳコード体
系における１バイト文字と２バイト文字との判定方法で
ある。The above is a method for determining a 1-byte character and a 2-byte character in the EBCDIK / KEIS code system.

【０２２７】次に，ＥＢＣＤＩＫ／ＫＥＩＳコード体系
から圧縮コード体系への変換アルゴリズムについて説明
する。Next, the conversion algorithm from the EBCDIK / KEIS code system to the compressed code system will be described.

【０２２８】ＥＢＣＤＩＫ／ＫＥＩＳコード体系から圧
縮コード体系への変換方式を図２６に示す。FIG. 26 shows a conversion system from the EBCDIK / KEIS code system to the compressed code system.

【０２２９】すなわち，ＥＢＣＤＩＫ（１バイト）コー
ドに対しては圧縮変換後の文字コード体系で0x0000〜0x
00FFの領域に，ＫＥＩＳコードに対しては0x0100〜0x1F
7Dの領域にマッピングする。That is, for the EBCDIK (1 byte) code, 0x0000 to 0x in the character code system after compression conversion.
00x area, 0x0100-0x1F for KEIS code
Map to 7D area.

【０２３０】以上が，ＥＢＣＤＩＫ／ＫＥＩＳコード体
系から圧縮コード体系への変換アルゴリズムである。The above is the conversion algorithm from the EBCDIK / KEIS code system to the compressed code system.

【０２３１】以上のように，ＥＢＣＤＩＫ／ＫＥＩＳコ
ード体系で表された文書においては文字種の判定方法お
よび圧縮変換後の文字コード体系への変換方式が，ＪＩ
Ｓ／ＳＪＩＳコード体系で表されたときと異なるため，
異なった構成のコード変換ブロック４３２および文字種
判定ブロック４３３が必要となる。As described above, in the document represented by the EBCDIK / KEIS code system, the character type determination method and the conversion method to the character code system after compression conversion are JI
Since it is different from the one expressed in the S / S JIS code system,
A code conversion block 432 and a character type determination block 433 having different configurations are required.

【０２３２】本実施例におけるコード変換ブロック４３
２および文字種判定ブロック４３３の構成を図２７に示
す。The code conversion block 43 in this embodiment.
FIG. 27 shows the configuration of the 2 and character type determination block 433.

【０２３３】本実施例において，コード変換ブロック４
３２および文字種判定ブロック４３３はエスケープシー
ケンス検出回路５１０および１バイト文字用文字種判定
回路５２０，２バイト文字用文字種判定回路５３０，判
定モードセレクタ５４０，１バイト文字用コード変換回
路４５０，２バイト文字用コード変換回路４６０，変換
モードセレクタ４７０によって構成される。In this embodiment, the code conversion block 4
32 and a character type determination block 433 are an escape sequence detection circuit 510, a character type determination circuit 520 for 1-byte characters, a character type determination circuit 530 for 2-byte characters, a determination mode selector 540, a code conversion circuit 450 for 1-byte characters, a code for 2-byte characters. It is composed of a conversion circuit 460 and a conversion mode selector 470.

【０２３４】以下に，本実施例におけるコード変換ブロ
ック４３２および文字種判定ブロック４３３の動作につ
いて説明する。The operations of the code conversion block 432 and the character type determination block 433 in this embodiment will be described below.

【０２３５】はじめに，エスケープシーケンス検出回路
５１０の動作について説明する。First, the operation of escape sequence detection circuit 510 will be described.

【０２３６】エスケープシーケンス検出回路５１０では
マルチプレクサ４３１から出力される２バイトコードの
中からエスケープシーケンスを検出することにより１バ
イトステージであるか２バイトステージであるかの判
定，すなわち入力された文字が１バイト文字であるか２
バイト文字であるかの判定を行う。The escape sequence detection circuit 510 detects the escape sequence from the 2-byte code output from the multiplexer 431 to determine whether it is the 1-byte stage or the 2-byte stage, that is, the input character is 1 Is it a byte character or 2
Determine if it is a byte character.

【０２３７】２バイト文字判定回路５１０の構成を図２
８に示す。The configuration of the 2-byte character determination circuit 510 is shown in FIG.
8 shows.

【０２３８】コンパレータ５１１ではマルチプレクサ４
３１から出力された２バイトコードが１バイトステージ
へのエスケープシーケンスであるか否かを，またコンパ
レータ５１２では２バイトステージへのエスケープシー
ケンスであるかを判定する。フリップフロップ５１３で
はコンパレータ５１１および５１２の出力より１バイト
ステージであるか２バイトステージであるかを判定す
る。すなわち，２バイトステージへのエスケープシーケ
ンスが入力されたときにはコンパレータ５１２から１が
出力され，フリップフロップ５１３の値に１をセットす
ることにより２バイトステージに遷移したことを表す。
また，１バイトステージへのエスケープシーケンスが入
力されたときにはコンパレータ５１１から１が出力さ
れ，フリップフロップ５１３の値をリセットすることに
より１バイトステージに遷移したことを表す。また，Ｏ
Ｒ回路５１４ではエスケープシーケンス自体を２バイト
コードとして扱うために，フリップフロップ５１３から
の出力とコンパレータ５１１，５１２からの出力とＯＲ
をとることによって２バイト文字フラグを設定してい
る。In the comparator 511, the multiplexer 4
It is determined whether the 2-byte code output from 31 is an escape sequence to the 1-byte stage, and the comparator 512 determines whether it is an escape sequence to the 2-byte stage. The flip-flop 513 determines from the outputs of the comparators 511 and 512 whether it is a 1-byte stage or a 2-byte stage. That is, when the escape sequence to the 2-byte stage is input, 1 is output from the comparator 512, and by setting the value of the flip-flop 513 to 1, the transition to the 2-byte stage is shown.
Further, when the escape sequence to the 1-byte stage is input, 1 is output from the comparator 511, which indicates that the value of the flip-flop 513 is reset and the transition to the 1-byte stage is made. Also, O
In the R circuit 514, since the escape sequence itself is treated as a 2-byte code, the output from the flip-flop 513 and the output from the comparators 511 and 512 are ORed.
The double-byte character flag is set by taking.

【０２３９】以上が，２バイト文字判定回路５１０の動
作である。The above is the operation of the 2-byte character determination circuit 510.

【０２４０】次に，１バイト文字用文字種判定回路５２
０および２バイト文字用文字種判定回路５３０，判定モ
ードセレクタ５４０の動作について説明する。Next, the character type determination circuit 52 for 1-byte characters
The operation of the character type determination circuit 530 for 0 and 2-byte characters and the determination mode selector 540 will be described.

【０２４１】１バイト文字用文字種判定回路５２０で
は，マルチプレクサ４３１から出力される２バイトコー
ドの上位側１バイトに対してを入力として，ＥＢＣＤＩ
Ｋ（１バイト）コード体系における文字種の判定，すな
わち削除文字であるか否か，バイナリデータマーカであ
るか否か，ＥＯＦコードであるか否の判定を行う。In the character type determination circuit 520 for 1-byte characters, the upper byte of the 2-byte code output from the multiplexer 431 is input to the EBCDI.
The character type in the K (1 byte) code system is determined, that is, whether it is a deletion character, whether it is a binary data marker, or whether it is an EOF code.

【０２４２】また，２バイト文字用文字種判定回路５３
０では，マルチプレクサ４３１から出力される２バイト
コードを入力としてＫＥＩＳ（２バイト）コード体系に
おける文字種の判定を行う。Also, the character type determination circuit 53 for double-byte characters
At 0, the 2-byte code output from the multiplexer 431 is used as an input to determine the character type in the KEIS (2-byte) code system.

【０２４３】なお，本実施例においては１バイト文字用
文字種判定回路５２０と２バイト文字用文字種判定回路
５３０はそれぞれ１バイト文字用文字種判定テーブル，
２バイト文字用文字種判定テーブルと呼ぶテーブルよっ
て構成している。１バイト文字用文字種判定テーブルは
マルチプレクサ４３１から出力される２バイトコードの
上位側１バイトコードをアドレスとして図２９に示す３
ビットのフラグ，すなわち削除文字フラグ，バイナリコ
ードマーカフラグ，ＥＯＦフラグを出力する。また，２
バイト文字用文字種判定テーブルについても同様にマル
チプレクサ４３１から出力される２バイトコードをアド
レスとして，図３０に示す３ビットのフラグを格納して
いる。In the present embodiment, the 1-byte character type determination circuit 520 and the 2-byte character type determination circuit 530 are respectively a 1-byte character type determination table,
It is configured by a table called a character type determination table for double-byte characters. The character type determination table for 1-byte characters shown in FIG. 29 has the upper 1-byte code of the 2-byte code output from the multiplexer 431 as the address.
It outputs a bit flag, that is, a deletion character flag, a binary code marker flag, and an EOF flag. Also, 2
Also in the character type determination table for byte characters, the 2-bit code output from the multiplexer 431 is used as an address and the 3-bit flag shown in FIG. 30 is stored.

【０２４４】判定モードセレクタ５４０では，１バイト
文字用文字種判定回路５２０からの出力と２バイト文字
用文字種判定回路５３０からの出力のうち該当する値を
選択することにより文字種の判定を行う。すなわち，２
バイト文字フラグの値が０のときには１バイト文字用文
字種判定回路５２０からの出力を，２バイト文字フラグ
の値が１のときには２バイト文字用文字種判定回路５３
０から出力を選択する。The determination mode selector 540 determines the character type by selecting the appropriate value from the output from the character type determination circuit 520 for 1-byte characters and the output from the character type determination circuit 530 for 2-byte characters. That is, 2
When the value of the byte character flag is 0, the output from the 1-byte character character type determination circuit 520 is output, and when the value of the 2-byte character flag is 1, the 2-byte character character type determination circuit 53.
Select output from 0.

【０２４５】以上が，１バイト文字用文字種判定回路５
２０および２バイト文字用文字種判定回路５３０，判定
モードセレクタ５４０の動作である。The above is the character type determination circuit 5 for 1-byte characters.
These are the operations of the character type determination circuit 530 for 20 and 2-byte characters and the determination mode selector 540.

【０２４６】最後に，１バイト文字用コード変換回路４
５０および２バイト文字用コード変換回路４６０，変換
モードセレクタ４７０の動作について説明する。Finally, the 1-byte character code conversion circuit 4
The operations of the 50 and 2-byte character code conversion circuit 460 and the conversion mode selector 470 will be described.

【０２４７】本実施例において，１バイト文字用コード
変換回路４５０および２バイト文字用コード変換回路４
６０は，それぞれ１バイト文字用コード変換テーブルお
よび２バイト文字用コード変換テーブルと呼ばれるテー
ブルによって構成している。In this embodiment, the 1-byte character code conversion circuit 450 and the 2-byte character code conversion circuit 4 are used.
Reference numerals 60 are constituted by tables called a 1-byte character code conversion table and a 2-byte character code conversion table, respectively.

【０２４８】１バイト文字用コード変換テーブルはマル
チプレクサ４３１から出力される２バイトコードの上位
側１バイトコードをアドレスとして，各アドレスには図
４７に示す値を格納する。In the 1-byte character code conversion table, the upper 1-byte code of the 2-byte code output from the multiplexer 431 is used as an address, and the value shown in FIG. 47 is stored in each address.

【０２４９】すなわち，１バイト文字用コード変換テー
ブルには各アドレスに対する圧縮変換後の文字コードを
格納しており，１バイト文字用の変換処理は１バイト文
字用コード変換テーブルの値をそのまま出力することに
より行われる。That is, the character code after compression conversion for each address is stored in the 1-byte character code conversion table, and the conversion processing for 1-byte character outputs the value of the 1-byte character code conversion table as it is. It is done by

【０２５０】また，２バイト文字用コード変換テーブル
はマルチプレクサ４３１から出力される２バイトコード
をアドレスとして，各アドレスには図４８に示す値を格
納する。In the 2-byte character code conversion table, the 2-byte code output from the multiplexer 431 is used as an address, and the value shown in FIG. 48 is stored in each address.

【０２５１】２バイト文字用コード変換テーブルにおい
ても１バイト文字用コード変換テーブルと同様，各アド
レスに対する圧縮変換後の文字コードを格納しており，
２バイト文字用の変換処理は２バイト文字用コード変換
テーブルの値をそのまま出力することによって行われ
る。Similarly to the 1-byte character code conversion table, the 2-byte character code conversion table stores the character code after compression conversion for each address.
The conversion process for 2-byte characters is performed by directly outputting the values in the 2-byte character code conversion table.

【０２５２】変換モードセレクタ４７０では，入力され
た文字コードが１バイト文字のときすなわち２バイト文
字フラグが０のときには１バイト文字用コード変換回路
４５０からの出力を，２バイト文字のときすなわち２バ
イト文字フラグが１のときには２バイト文字用コード変
換回路５３０から出力を選択することによって該当する
コード変換値を与える。In the conversion mode selector 470, when the input character code is a 1-byte character, that is, when the 2-byte character flag is 0, the output from the 1-byte character code conversion circuit 450 is output, and when it is a 2-byte character, that is, 2 bytes. When the character flag is 1, a corresponding code conversion value is given by selecting an output from the 2-byte character code conversion circuit 530.

【０２５３】以上が，１バイト文字用コード変換回路４
５０および２バイト文字用コード変換回路４６０，変換
モードセレクタ４７０の動作である。The above is the code conversion circuit 4 for 1-byte characters.
These are the operations of the 50 and 2-byte character code conversion circuit 460 and the conversion mode selector 470.

【０２５４】なお，本実施例においてはＥＢＣＤＩＫ／
ＫＥＩＳコード体系で表されたテキストのコード変換処
理をそれぞれ１バイト文字用コード変換テーブルおよび
２バイト文字用コード変換テーブルを参照することによ
り実現したが，これらのテーブルに格納する値とＪＩＳ
／ＳＪＩコード体系における圧縮変換後の文字コードと
を対応づけることにより，これら２つの異なったコード
体系で表されたテキストを同一の圧縮コード体系に変換
することができる。In the present embodiment, EBCDIK /
The code conversion processing of the text represented by the KEIS code system was realized by referring to the 1-byte character code conversion table and the 2-byte character code conversion table, respectively. The values stored in these tables and JIS
By associating the character code after the compression conversion in the / SJI code system with each other, the texts represented by these two different code systems can be converted into the same compressed code system.

【０２５５】また，本実施例において１バイト文字用文
字種判定回路５２０および２バイト文字用文字種判定回
路５３０は１バイト文字用文字種判定テーブルおよび２
バイト文字用文字種判定テーブルによって構成された
が，第一の実施例に示したようにデコーダによっても構
成することができることは明らかである。さらに，１バ
イト文字用コード変換回路４５０および２バイト文字用
コード変換回路４６０は１バイト文字用コード変換テー
ブルおよび２バイト文字用コード変換テーブルによって
構成したが，これらについては第二，第三，第四の実施
例に示したように，そのテーブルの容量を縮小させるこ
とによりハードウェアの構成を簡略化することができる
ことも明らかである。In the present embodiment, the 1-byte character character type determination circuit 520 and the 2-byte character character type determination circuit 530 are the 1-byte character character type determination table and the 2-byte character type determination circuit.
Although it is configured by the character type determination table for byte characters, it is obvious that it can be configured by the decoder as shown in the first embodiment. Further, the 1-byte character code conversion circuit 450 and the 2-byte character code conversion circuit 460 are composed of a 1-byte character code conversion table and a 2-byte character code conversion table. It is also apparent that the hardware configuration can be simplified by reducing the capacity of the table as shown in the fourth embodiment.

【０２５６】このように，コード変換手段における文字
種判定ブロック内に，１バイトステージであるか２バイ
トステージであるかを判定するエスケープシーケンス検
出回路を設けることにより，ＥＢＣＤＩＫ／ＫＥＩＳコ
ード体系で表された文書に対しても文字コードを圧縮変
換することができる。As described above, by providing the escape sequence detection circuit for determining whether it is the 1-byte stage or the 2-byte stage in the character type determination block in the code conversion means, the EBCDIK / KEIS code system is used. The character code can be compressed and converted for a document.

【０２５７】前記実施例においては，ＪＩＳ／ＳＪＩＳ
コード体系およびＥＢＣＤＩＫ／ＫＥＩＳコード体系で
表された文書に対する文字コード変換手段の構成および
動作について説明した。しかし，文字コードが１バイト
文字に対してはＪＩＳコード体系，２バイト文字に対し
てもＪＩＳコード体系（以後，ＪＩＳ／ＪＩＳコード体
系と呼ぶ）で表された文書に対しては，３バイトのコー
ドで表される制御コードが含まれるため文字コード取込
みウィンドウの構成および圧縮コード体系への変換アル
ゴリズムが異なるため，文字コード変換手段の構成が異
なる。In the above embodiment, JIS / SJIS
The configuration and operation of the character code conversion means for the document represented by the code system and the EBCDIK / KEIS code system have been described. However, for a document whose character code is represented by the JIS code system for 1-byte characters and the JIS code system for 2-byte characters (hereinafter referred to as JIS / JIS code system), a 3-byte character code is used. Since the control code represented by the code is included, the configuration of the character code capture window and the conversion algorithm to the compressed code system are different, and thus the configuration of the character code conversion means is different.

【０２５８】第六の実施例として，ＪＩＳ／ＪＩＳコー
ド体系で表された文書を対象としたときのコード変換手
段の構成および動作について例を挙げて説明する。As a sixth embodiment, the structure and operation of the code conversion means when a document represented by the JIS / JIS code system is used will be described by way of example.

【０２５９】まず，本実施例におけるコード変換手段の
構成および動作について説明する前に，ＪＩＳ／ＪＩＳ
コード体系における１バイト文字と２バイト文字の判定
方法および圧縮コード体系への変換アルゴリズムについ
て簡単に説明する。First, before explaining the configuration and operation of the code converting means in this embodiment, JIS / JIS
A method of determining a 1-byte character and a 2-byte character in the code system and a conversion algorithm to the compressed code system will be briefly described.

【０２６０】はじめに，ＪＩＳ／ＪＩＳコード体系にお
ける１バイト文字と２バイト文字との判定方法について
説明する。First, a method of determining a 1-byte character and a 2-byte character in the JIS / JIS code system will be described.

【０２６１】ＪＩＳ／ＪＩＳコード体系においては，１
バイトステージと２バイトステージは３バイトのエスケ
ープシーケンスによって分離されている。すなわち，１
バイトステージへの遷移を表わすエスケープシーケンス
(0x1B284A)が入力されると，次に２バイトステージへの
遷移を表わすエスケープシーケンス(0x1B2442)が入力さ
れるまで１バイトステージとなり１バイト文字しか入力
されない。また，２バイトステージへの遷移を表わすエ
スケープシーケンス(0x1B284A)が入力されると，次に１
バイトステージへの遷移を表わすエスケープシーケンス
(0x1B2442)が入力されるまで２バイトステージとなり２
バイト文字しか入力されない。つまり，入力された文字
コードが１バイト文字であるか２バイト文字であるか
は，現在１バイトステージであるか２バイトステージで
あるかによって判定され，それはエスケープシーケンス
を検出することによって行われる。In the JIS / JIS code system, 1
The byte stage and the 2-byte stage are separated by a 3-byte escape sequence. That is, 1
Escape sequence indicating transition to byte stage
When (0x1B284A) is input, the 1-byte stage is entered and only 1-byte characters are input until the next escape sequence (0x1B2442) indicating the transition to the 2-byte stage is input. When an escape sequence (0x1B284A) indicating a transition to the 2-byte stage is input, the next 1
Escape sequence indicating transition to byte stage
It becomes a 2-byte stage until (0x1B2442) is input. 2
Only byte characters are entered. That is, whether the input character code is a 1-byte character or a 2-byte character is determined depending on whether it is currently the 1-byte stage or the 2-byte stage, which is performed by detecting the escape sequence.

【０２６２】以上が，ＪＩＳ／ＪＩＳコード体系におけ
る１バイト文字と２バイト文字との判定方法である。The above is the method for determining a 1-byte character and a 2-byte character in the JIS / JIS code system.

【０２６３】次に，ＪＩＳ／ＪＩＳコード体系から圧縮
コード体系への変換アルゴリズムについて説明する。Ｊ
ＩＳ／ＪＩＳコード体系から圧縮コード体系への変換方
式を図３１に示す。Next, a conversion algorithm from the JIS / JIS code system to the compressed code system will be described. J
FIG. 31 shows a conversion method from the IS / JIS code system to the compressed code system.

【０２６４】すなわち，ＪＩＳ（１バイト）コードに対
しては圧縮変換後の文字コード体系で0x0000〜0x00FFの
領域に，ＪＩＳ（２バイト）コードに対しては0x0100〜
0x1F7Dの領域にマッピングする。That is, for the JIS (1 byte) code, it is in the area of 0x0000 to 0x00FF in the character code system after compression conversion, and for the JIS (2 byte) code, it is 0x0100 to
Map to the area of 0x1F7D.

【０２６５】以上が，ＪＩＳ／ＪＩＳコード体系から圧
縮コード体系への変換アルゴリズムである。The above is the conversion algorithm from the JIS / JIS code system to the compressed code system.

【０２６６】このように，ＪＩＳ／ＪＩＳコード体系で
表された文書においては１バイト文字と２バイト文字の
ほか３バイト文字が混在して現われるため，異なった構
成のコード変換手段４００が必要となる。As described above, in a document represented by the JIS / JIS code system, one-byte characters, two-byte characters, and three-byte characters appear in a mixed manner, so that the code conversion means 400 having a different structure is required. .

【０２６７】本実施例における文字コード変換手段４０
０の構成を図３２にブロック図で示す。Character code conversion means 40 in this embodiment
The configuration of 0 is shown in a block diagram in FIG.

【０２６８】文字コード変換手段４００の構成を図１に
ブロック図で示す。The configuration of the character code conversion means 400 is shown in a block diagram in FIG.

【０２６９】本文字コード変換手段４００はビット幅変
換手段７１０，文字コード取込みウィンドウ７２０ａ，
７２０ｂおよび７２０ｃ，文字コード選択手段４２０，
コード変換手段４３０によって構成される。The character code conversion means 400 comprises a bit width conversion means 710, a character code acquisition window 720a,
720b and 720c, character code selection means 420,
The code conversion means 430 is used.

【０２７０】文字列記憶手段１０５から１６ビットずつ
読み出された入力テキスト２０４はビット幅変換手段７
１０によって２４ビットの幅に変換される。そして，互
いに１バイトずつずらした状態で文字コード取込みウィ
ンドウＡ７２０ａ，文字コード取込みウィンドウＢ７２
０ｂおよび文字コード取込みウィンドウｃに３バイトず
つ取り込まれる。文字コード選択手段４２０では，各ウ
ィンドウの出力する文字コード７０２，７０３および７
０４の中から先頭バイトが１バイト文字コードか，２バ
イト文字コードの上位バイトか，あるいは３バイト文字
コードの最上位バイトとなるもの，すなわち先頭バイト
が文字境界となるウィンドウの出力を選択する。The input text 204 read out from the character string storage means 105 16 bits at a time is the bit width conversion means 7
Converted to a width of 24 bits by 10. Then, the character code acquisition window A720a and the character code acquisition window B72 are shifted by 1 byte from each other.
3 bytes are captured in 0b and the character code capture window c. In the character code selection means 420, the character codes 702, 703 and 7 output from each window are output.
From 04, the first byte is a 1-byte character code, the upper byte of a 2-byte character code, or the uppermost byte of a 3-byte character code, that is, the output of a window in which the first byte is a character boundary is selected.

【０２７１】コード変換手段４３０では，文字コード選
択手段４２０によって切り出された文字コード４０３に
対してこれらの文字の種類の判定を行う。そして，各文
字種に応じたコード変換処理を行うことにより図３１
（ｂ）に示すビット圧縮した形の１３ビットの内部文字
コード体系にコード変換し，圧縮変換後の文字コード２
０７として文字列照合手段１０２に送出する。また，こ
こで判定された文字種情報のうち２バイト文字であるこ
とを示すフラグおよび３バイト文字であることを示すフ
ラグは，次の変換ステップでの境界ウィンドウを判定す
るために，それぞれ２バイト文字フラグ４０４および３
バイト文字フラグ７０５として文字コード選択手段４２
０に返送される。The code conversion means 430 determines the type of these characters for the character code 403 cut out by the character code selection means 420. Then, by performing code conversion processing according to each character type,
Character code 2 after code conversion to the 13-bit internal character code system in the bit-compressed form shown in (b) and compression conversion
It is sent to the character string collating means 102 as 07. Further, among the character type information determined here, the flag indicating that it is a 2-byte character and the flag indicating that it is a 3-byte character are each a 2-byte character in order to determine the boundary window in the next conversion step. Flags 404 and 3
As the byte character flag 705, the character code selection means 42
Returned to 0.

【０２７２】次に，ビット幅変換手段７１０の構成およ
び動作について説明する。Next, the structure and operation of the bit width conversion means 710 will be described.

【０２７３】ビット幅変換手段７１０の構成を図３３に
示す。The structure of the bit width conversion means 710 is shown in FIG.

【０２７４】例えば，テキストとして“AとB・・・”，
すなわち１６進コード表現で(0x41)(0x1B2442)(0x2448)
(0x1B284A)(0x41)・・・が入力されたときの動作につい
て説明する。For example, as text, "A and B ...",
That is, in hexadecimal code expression (0x41) (0x1B2442) (0x2448)
The operation when (0x1B284A) (0x41) ... Is input will be described.

【０２７５】まず，１回目の入力で最初の２バイト(0x4
11B)が取り込まれる。[0275] First, the first 2 bytes (0x4
11B) is taken in.

【０２７６】取り込まれた２バイトのコードのうち，上
位側１バイト(0x41)は上位バイト用レジスタ７１１ａ
に，下位側１バイト(0x1B)は下位バイト用レジスタ７１
１ｂに格納される。Of the fetched 2-byte codes, the upper 1 byte (0x41) is the upper byte register 711a.
The lower 1 byte (0x1B) is the lower byte register 71
It is stored in 1b.

【０２７７】そして，２回目の入力が行われるときに，
上位バイト用レジスタ７１１ａに格納されている(0x41)
はレジスタ７１２ａに，下位バイト用レジスタ７１１ｂ
に格納されている(0x1B)はレジスタ７１２ｂに格納さ
れ，新たに取り込まれた２バイト(0x2442)の上位側１バ
イト(0x24)が上位バイト用レジスタ７１１ａに，下位側
１バイト(0x42)が文字コードレジスタ７１１ｂに格納さ
れる。Then, when the second input is made,
Stored in upper byte register 711a (0x41)
Is in the register 712a and the lower byte register 711b
(0x1B) is stored in the register 712b, and the newly captured 2 bytes (0x2442) of the upper byte 1 byte (0x24) are stored in the upper byte register 711a and the lower byte (0x42) is stored in the character string. It is stored in the code register 711b.

【０２７８】セレクタ７１５はレジスタ７１２ａ，７１
２ｂおよび７１１ａから出力される３バイトコード７１
３と，レジスタ７１２ｂ，７１１ａおよび７１１ｂから
出力される３バイトコード７１４を入力としてこれらを
交互に選択することにより１６ビット幅で入力される文
字コードを２４ビット幅に変換する。The selector 715 has registers 712a, 71a.
3-byte code 71 output from 2b and 711a
3 and the 3-byte code 714 output from the registers 712b, 711a, and 711b are input, and these are alternately selected to convert the character code input in 16-bit width into 24-bit width.

【０２７９】つまりこの例では，セレクタ７１５には３
バイトコード７１３として(0x411b24)が，３バイトコー
ド７１４として(0x1b2442)が入力され，初期条件として
３バイトコード７１３を選択することにより(0x411b24)
を出力する。In other words, in this example, the selector 715 has 3
(0x411b24) is input as the byte code 713, (0x1b2442) is input as the 3-byte code 714, and the 3-byte code 713 is selected as the initial condition (0x411b24)
Is output.

【０２８０】また，３回目の入力が行なわれるときに
は，上位バイト用レジスタ７１１ａに格納されている(0
x24)はレジスタ７１２ａに，下位バイト用レジスタ４１
１ｂに格納されている(0x42はレジスタ７１２ｂに格納
され，新たに取り込まれた２バイト(0x2448)の上位側１
バイト(0x24)が上位バイト用レジスタ７１１ａに，下位
側１バイト(0x48)が文字コードレジスタ７１１ｂに格納
される。When the third input is made, it is stored in the upper byte register 711a (0
x24) is in the register 712a and the lower byte register 41
1b (0x42 is stored in the register 712b and newly captured 2 bytes (0x2448) upper side 1
The byte (0x24) is stored in the upper byte register 711a, and the lower one byte (0x48) is stored in the character code register 711b.

【０２８１】このとき，セレクタ７１５には３バイトコ
ード７１３として(0x244224)が，３バイトコード７１４
として(0x422448)が入力される。セレクタ７１５では，
前回の処理ステップにおいて３バイトコード７１３側を
選択したため，今回の処理では３バイトコード７１４を
選択することにより(0x422448)を出力する。At this time, the selector 715 stores (0x244224) as the 3-byte code 713 and the 3-byte code 714.
Is input as (0x422448). In the selector 715,
Since the 3-byte code 713 side was selected in the previous processing step, (0x422448) is output by selecting the 3-byte code 714 in this processing.

【０２８２】以上が，ビット幅変換手段７１０の動作で
ある。The above is the operation of the bit width conversion means 710.

【０２８３】次に，文字コード取込みウィンドウＡ７２
０ａ，文字コード取込みウィンドウＢ７２０ｂおよび文
字コード取込みウィンドウＣ７２０ｃの構成を図３４に
示し，その動作を説明する。Next, a character code acquisition window A72
0a, the character code acquisition window B720b, and the character code acquisition window C720c are shown in FIG. 34, and their operations will be described.

【０２８４】はじめに，文字コード取込みウィンドウＡ
７２０ａ，文字コード取込みウィンドウＢ７２０ｂおよ
び文字コード取込みウィンドウＣ７２０ｃの構成を図３
４に示す。[0284] First, the character code acquisition window A
720a, character code acquisition window B720b and character code acquisition window C720c are shown in FIG.
4 shows.

【０２８５】まず，１回目の入力で最初の３バイト(0x4
11b24)が取り込まれる。[0285] First, the first 3 bytes (0x4
11b24) is imported.

【０２８６】取り込まれた３バイトのコードのうち，１
バイト目の(0x41)は上位用レジスタ７２１ａ，７２１ｂ
および７２１ｃに，２バイト目の(0x1b)は中位用レジス
タ７２２ａ，７２２ｂおよび７２２ｃに，３バイト目の
(0x24)は下位用レジスタ７２３ａ，７２３ｂおよび７２
３ｃに格納される。1 out of the fetched 3-byte code
Byte (0x41) is upper register 721a, 721b
And 721c, the second byte (0x1b) is stored in the middle registers 722a, 722b and 722c, and the third byte is stored.
(0x24) is lower register 723a, 723b and 72
3c is stored.

【０２８７】そして，２回目の入力が行われるときに，
上位用レジスタ７２１ａに格納されている(0x41)はレジ
スタ７２４に，中位用レジスタ７２２ａおよび７２２ｂ
に格納されている(0x1b)はそれぞれレジスタ７２５およ
び７２７に，下位用レジスタ７２３ａ，７２３ｂおよび
７２３ｃに格納されている(0x24)はそれぞれレジスタ７
２６，７２８および７２９に格納され，新たに取り込ま
れた３バイト(0x422448)の１バイト目(0x42)が上位用レ
ジスタ７２１ａ，７２１ｂおよび７２１ｃに，２バイト
目(0x24)が中位用レジスタ７２２ａ，７２２ｂおよび７
２２ｃに，３バイト目(0x48)が下位用レジスタ７２３
ａ，７２３ｂおよび７２３ｃに格納される。Then, when the second input is made,
(0x41) stored in the upper register 721a is stored in the register 724, and the middle registers 722a and 722b are stored.
(0x1b) is stored in registers 725 and 727, and (0x24) stored in lower registers 723a, 723b and 723c is stored in register 7 respectively.
The first byte (0x42) of the newly captured 3 bytes (0x422448) stored in Nos. 26, 728 and 729 is in the upper registers 721a, 721b and 721c, and the second byte (0x24) is in the middle register 722a, 722b and 7
22c, the third byte (0x48) is lower register 723
a, 723b and 723c.

【０２８８】ウィンドウＡとして切り出される３バイト
コード７０２はレジスタ７２４，７２５，および７２６
からの出力，すなわち(0x41)と(0x1b)と(0x24)によって
構成される。また，ウィンドウＢとして切り出される３
バイトコード７０３はレジスタ７２７，７２８および７
２１ｂからの出力，すなわち(0x1b)と(0x24)と(0x42)に
よって，ウィンドウＣとして切り出される３バイトコー
ド７０４はレジスタ７２９，７２１ｃおよび７２２ｃか
らの出力，すなわち(0x24)と(0x42)と(0x24)によって構
成される。The 3-byte code 702 cut out as the window A is the registers 724, 725, and 726.
From (0x41), (0x1b) and (0x24). Also, it is cut out as window B 3
Bytecode 703 is registered in registers 727, 728 and 7
The output from 21b, that is, (0x1b), (0x24), and (0x42), the 3-byte code 704 cut out as the window C is the output from the registers 729, 721c, and 722c, that is, (0x24), (0x42), and (0x24). ).

【０２８９】つまり，本実施例ではウィンドウＡからの
出力される３バイトコード７０２(0x411b24)と，ウィン
ドウＢからの出力される３バイトコード７０３(0x1b244
2)，とウィンドウＢからの出力される３バイトコード７
０４(0x244224)とは，それぞれ互いに１バイトずつずれ
た状態で文字コード選択手段４２０に取り込まれること
になる。That is, in this embodiment, the 3-byte code 702 (0x411b24) output from the window A and the 3-byte code 703 (0x1b244) output from the window B are output.
2), and 3-byte code 7 output from window B
04 (0x244224) is taken into the character code selection means 420 in a state where they are shifted by 1 byte from each other.

【０２９０】以上が文字コード取込みウィンドウ７２０
ａ，７２０ｂおよび７２０ｃの動作である。The above is the character code import window 720.
a, 720b and 720c.

【０２９１】次に，文字コード選択手段４２０について
説明する。Next, the character code selection means 420 will be described.

【０２９２】文字コード選択手段４２０の構成を図３５
に示す。FIG. 35 shows the configuration of the character code selection means 420.
Shown in.

【０２９３】文字コード選択手段４２０では，第一の実
施例と同様に前変換ステップにおいてどのウィンドウか
ら文字コードを切り出したかという情報と，そのウィン
ドウによって切り出された文字が１バイト文字であるか
２バイト文字であるか３バイト文字であるかといった情
報から，現変換ステップで文字コードを切り出すウィン
ドウを決定し選択するという処理を行なう。In the character code selection means 420, as in the first embodiment, information indicating from which window the character code was cut out in the previous conversion step and whether the character cut out by the window is a 1-byte character or 2 bytes. A process of determining and selecting a window for cutting out a character code in the current conversion step is performed based on information such as whether it is a character or a 3-byte character.

【０２９４】どのウィンドウに取り込まれた文字コード
を選択するかは，ウィンドウＡ着目フラグ７３３，ウィ
ンドウＢ着目フラグ７３４およびウィンドウＣ着目フラ
グ７３５の値によって判定される。Which window the character code fetched into is selected is determined by the values of the window A attention flag 733, the window B attention flag 734 and the window C attention flag 735.

【０２９５】すなわち，これらのフラグは互いに排他に
なっており，文字コードセレクタ７３１ではウィンドウ
Ａ着目フラグ７３３が１のときにはウィンドウＡに取り
込まれた文字コード７０２を，ウィンドウＢ着目フラグ
７３４が１のときにはウィンドウＢに取り込まれた文字
コード７０３を，ウィンドウＣ着目フラグ７３５が１の
ときにはウィンドウＣに取り込まれた文字コード７０４
を選択することにより文字コードの切り出しを行なう。That is, these flags are mutually exclusive. In the character code selector 731, the character code 702 taken into the window A when the window A attention flag 733 is 1 and the window B attention flag 734 is 1 in the character code selector 731. The character code 703 fetched in the window B is the character code 704 fetched in the window C when the window C attention flag 735 is 1.
The character code is cut out by selecting.

【０２９６】また，デコーダ７３２では前ステップに取
り込まれた文字コードが２バイト文字であることを示す
前ステップ２バイト文字フラグ７３９，前ステップに取
り込まれた文字コードが３バイト文字であることを示す
前ステップ３バイト文字フラグ７４０，前ステップウィ
ンドウＡ着目フラグ７４１，前ステップウィンドウＢ着
目フラグ７４２，前ステップウィンドウＣ着目フラグ７
４３から，現ステップでどのウィンドウから文字コード
を切り出すか，すなわちウィンドウＡ着目フラグ７３
３，ウィンドウＢ着目フラグ７３４およびウィンドウＣ
着目フラグ７３５の値を決定する。Also, in the decoder 732, the previous step 2-byte character flag 739 indicating that the character code fetched in the previous step is a 2-byte character, and the character code fetched in the previous step is a 3-byte character Previous step 3 byte character flag 740, previous step window A attention flag 741, previous step window B attention flag 742, previous step window C attention flag 7
43, from which window the character code is to be cut out in the current step, that is, the window A focus flag 73
3, window B attention flag 734 and window C
The value of the attention flag 735 is determined.

【０２９７】例えば，前ステップにおいてウィンドウＡ
に１バイト文字コードが取り込まれたとき，すなわち前
ステップウィンドウＡ着目ウィンドウフラグ７４１が１
であり前ステップ２バイト文字フラグ７３９，前ステッ
プ３バイト文字フラグ７４０がともに０のときには，ウ
ィンドウＡから１バイトずれたウィンドウ，すなわちウ
ィンドウＢから文字コードを切り出す。また，ウィンド
ウＡに２バイト文字コードが取り込まれたとき，すなわ
ち前ステップウィンドウＡ着目ウィンドウフラグ７４１
が１であり前ステップ２バイト文字フラグ７３９が１の
ときには，ウィンドウＡから２バイトずれたウィンド
ウ，すなわちウィンドウＣから文字コードを切り出す。For example, in the previous step, window A
When a 1-byte character code is fetched into, that is, the previous step window A focused window flag 741 is 1
When both the previous step two-byte character flag 739 and the previous step three-byte character flag 740 are 0, the character code is cut out from the window that is shifted by one byte from the window A, that is, the window B. Also, when a 2-byte character code is fetched in window A, that is, the previous step window A focused window flag 741
Is 1 and the previous step 2-byte character flag 739 is 1, the character code is cut out from the window shifted from the window A by 2 bytes, that is, the window C.

【０２９８】以上のように，デコーダ７３２では図３６
に示す真理値表に従ってウィンドウＡ着目フラグ７３
３，ウィンドウＢ着目フラグ７３４およびウィンドウＣ
着目フラグ７３５を出力する。As described above, in the decoder 732, as shown in FIG.
According to the truth table shown in FIG.
3, window B attention flag 734 and window C
The focus flag 735 is output.

【０２９９】前述した例を用いて文字コード選択手段４
２０の具体的な動作を説明する。Character code selection means 4 using the example described above
The specific operation of 20 will be described.

【０３００】初期値として，ウィンドウＡ着目フラグ７
４１に１が，ウィンドウＢ着目フラグ７４２に０が，ウ
ィンドウＣ着目フラグ７４３に０が設定されている。As an initial value, the window A attention flag 7
41 is set to 1, the window B attention flag 742 is set to 0, and the window C attention flag 743 is set to 0.

【０３０１】１回目の変換ステップではウィンドウＡ着
目フラグ７４１の値が１であるためウィンドウＡが選択
され，ここから１バイト文字(0x41)が切り出される。２
回目の変換ステップでは前ステップでウィンドウＡから
１バイト文字(0x41)が切り出された，すなわち前ステッ
プウィンドウＡ着目フラグ７４１が１であり，前ステッ
プ２バイト文字フラグ７３９，前ステップ３バイト文字
フラグ７４０がともに０のためウィンドウＢ着目フラグ
７３４が１となりウィンドウＢから３バイト文字(0x1B2
442)が切り出される。また，３回目の変換ステップでは
前ステップでウィンドウＢから３バイト文字(0x1B2442)
が切り出された，すなわち前ステップウィンドウＢ着目
フラグ７４２が１であり，前ステップ２バイト文字フラ
グ７３９が０，前ステップ３バイト文字フラグ７４０が
１のためウィンドウＢ着目フラグ７３４が１となりウィ
ンドウＢから３バイト文字(0x1B2442)が切り出される。In the first conversion step, since the value of the window A focus flag 741 is 1, window A is selected, and a 1-byte character (0x41) is cut out from this. Two
In the conversion step of the first time, a 1-byte character (0x41) was cut out from the window A in the previous step, that is, the previous step window A attention flag 741 is 1, and the previous step 2-byte character flag 739 and the previous step 3-byte character flag 740 Since both are 0, the window B focus flag 734 becomes 1 and the 3-byte character (0x1B2
442) is cut out. Also, in the third conversion step, 3-byte characters from window B (0x1B2442) in the previous step
Is cut out, that is, the previous step window B attention flag 742 is 1, the previous step 2 byte character flag 739 is 0, the previous step 3 byte character flag 740 is 1, so the window B attention flag 734 is 1 and the window B The 3-byte character (0x1B2442) is cut out.

【０３０２】以下，同様にして１バイト文字と２バイト
文字，３バイト文字を識別しながら１バイト文字と２バ
イト文字，３バイト文字が混在するテキストの中から正
しく文字コードを切り出して行く。In the same manner, the character code is correctly cut out from the text in which the 1-byte character, the 2-byte character and the 3-byte character are mixed while identifying the 1-byte character, the 2-byte character and the 3-byte character in the same manner.

【０３０３】以上が文字コード選択手段４２０の動作の
詳細である。The above is the details of the operation of the character code selection means 420.

【０３０４】最後に，コード変換手段４３０について説
明する。Finally, the code converting means 430 will be described.

【０３０５】本実施例におけるコード変換手段４３０の
構成を図３７に示す。The configuration of the code converting means 430 in this embodiment is shown in FIG.

【０３０６】本実施例におけるコード変換手段４３０で
は文字種判定ブロック４３３内で，文字コード選択手段
から出力される２４ビットの値によりエスケープシーケ
ンスの検出を行ない，エスケープシーケンスが検出され
た場合には前ステップ３バイト文字フラグ７０５に１を
出力する。このため，第五の実施例における文字種判定
ブロック４３３内（図２７）のエスケープシーケンス検
出回路の構成が異なる。In the code conversion means 430 of this embodiment, the escape sequence is detected by the 24-bit value output from the character code selection means in the character type determination block 433, and if the escape sequence is detected, the previous step is carried out. 1 is output to the 3-byte character flag 705. Therefore, the configuration of the escape sequence detection circuit in the character type determination block 433 (FIG. 27) in the fifth embodiment is different.

【０３０７】本実施例におけるエスケープシーケンス検
出回路の構成を図３８に示す。FIG. 38 shows the configuration of the escape sequence detection circuit in this embodiment.

【０３０８】コンパレータ５１１では文字コード選択手
段４３１から出力された３バイトコードが１バイトステ
ージへのエスケープシーケンスであるか否かを，またコ
ンパレータ５１２では２バイトステージへのエスケープ
シーケンスであるかを判定する。フリップフロップ５１
３ではコンパレータ５１１および５１２の出力より１バ
イトステージであるか２バイトステージであるかを判定
する。また，ＯＲ回路５１５ではコンパレータ５１１か
らの出力とコンパレータ５１２からの出力とのＯＲをと
ることにより３バイト文字フラグを生成する。ＡＮＤ回
路５１６では，２バイトステージであり，かつエスケー
プシーケンスでないという条件で２バイト文字フラグを
生成する。The comparator 511 determines whether or not the 3-byte code output from the character code selection means 431 is an escape sequence to the 1-byte stage, and the comparator 512 determines whether it is an escape sequence to the 2-byte stage. . Flip-flop 51
At 3, the output from the comparators 511 and 512 determines whether the stage is a 1-byte stage or a 2-byte stage. The OR circuit 515 ORs the output from the comparator 511 and the output from the comparator 512 to generate a 3-byte character flag. The AND circuit 516 generates a 2-byte character flag on the condition that it is a 2-byte stage and is not an escape sequence.

【０３０９】以上が，本実施例におけるエスケープシー
ケンス検出回路５１０の動作である。The above is the operation of the escape sequence detection circuit 510 in this embodiment.

【０３１０】なお，本実施例において１バイト文字用文
字種判定回路５２０および２バイト文字用文字種判定回
路５３０は，第五の実施例と同様にそれぞれ１バイト文
字用文字種判定テーブルおよび２バイト文字用文字種判
定テーブルによって実現している。In the present embodiment, the 1-byte character character type determination circuit 520 and the 2-byte character character type determination circuit 530 are respectively a 1-byte character character type determination table and a 2-byte character character type as in the fifth embodiment. It is realized by the judgment table.

【０３１１】また，１バイト文字用コード変換回路４５
０および２バイト文字用コード変換回路４６０について
も，第五の実施例と同様にそれぞれ１バイト文字用コー
ド変換テーブルおよび２バイト文字用コード変換テーブ
ル１バイト文字用文字種によって実現している。Also, the 1-byte character code conversion circuit 45
The 0- and 2-byte character code conversion circuit 460 is also realized by the 1-byte character code conversion table and the 2-byte character code conversion table, respectively, as in the fifth embodiment.

【０３１２】なお，１バイト文字用コード変換テーブル
には，各アドレスに対して図４９に示す値を格納する。The values shown in FIG. 49 are stored for each address in the 1-byte character code conversion table.

【０３１３】また，２バイト文字用コード変換テーブル
には，各アドレスに対して図５０に示す値を格納する。Further, the 2-byte character code conversion table stores the values shown in FIG. 50 for each address.

【０３１４】なお，本実施例においてはＪＩＳ／ＪＩＳ
コード体系で表されたテキストのコード変換処理をそれ
ぞれ１バイト文字用コード変換テーブルおよび２バイト
文字用コード変換テーブルを参照することにより実現し
たが，これらのテーブルに格納する値とＪＩＳ／ＳＪＩ
コード体系における圧縮変換後の文字コードとを対応づ
けることにより，これら２つの異なったコード体系で表
されたテキストを同一の圧縮コード体系に変換すること
ができる。In this embodiment, JIS / JIS
The code conversion processing of the text represented by the code system was realized by referring to the code conversion table for 1-byte character and the code conversion table for 2-byte character, respectively. The values stored in these tables and JIS / SJI
By associating with the character code after the compression conversion in the code system, the text represented by these two different code systems can be converted into the same compressed code system.

【０３１５】また，本実施例において１バイト文字用文
字種判定回路５２０および２バイト文字用文字種判定回
路５３０は１バイト文字用文字種判定テーブルおよび２
バイト文字用文字種判定テーブルによって構成された
が，第一の実施例に示したようにデコーダによっても構
成することができることは明らかである。さらに，１バ
イト文字用コード変換回路４５０および２バイト文字用
コード変換回路４６０は１バイト文字コード変換テーブ
ルおよび２バイト文字コード変換テーブルによって構成
したが，これらについては第二，第三，第四の実施例に
示したように，そのテーブルの容量を縮小させることに
よりハードウェアの構成を簡略化することができること
も明らかである。In the present embodiment, the 1-byte character character type determination circuit 520 and the 2-byte character character type determination circuit 530 are a 1-byte character character type determination table and a 2-byte character type determination circuit.
Although it is configured by the character type determination table for byte characters, it is obvious that it can be configured by the decoder as shown in the first embodiment. Further, the 1-byte character code conversion circuit 450 and the 2-byte character code conversion circuit 460 are composed of a 1-byte character code conversion table and a 2-byte character code conversion table. It is also clear that the hardware configuration can be simplified by reducing the capacity of the table as shown in the embodiment.

【０３１６】このように，コード変換手段における文字
種判定ブロック内に，３バイトのエスケープシーケンス
を検出するエスケープシーケンス検出回路を設けること
により，ＪＩＳ／ＪＩＳコード体系で表された文書に対
しても文字コードを圧縮変換することができる。As described above, by providing the escape sequence detection circuit for detecting the 3-byte escape sequence in the character type determination block in the code conversion means, the character code can be applied to the document represented by the JIS / JIS code system. Can be compressed and converted.

【０３１７】[0317]

【発明の効果】１バイトずつずらした状態で２バイトず
つ取り込む２つの文字コード取込み手段（ウィンドウ）
と，そのウィンドウが境界ウィンドウであるか否かを判
定し，１バイト文字あるいは２バイト文字を文字単位で
切り出す文字コード選択手段と，これをビット圧縮した
文字コード体系に変換するコード変換手段を設けること
により状態遷移テーブルの容量を格段に削減することが
できるとともに，１バイト文字と２バイト文字が混在す
るテキストにおいてもバイトずれによる誤照合を生じる
ことなく高速に文字列照合を行うことができる安価な文
字列検索装置を実現することが可能になる。EFFECTS OF THE INVENTION Two character code capturing means (windows) for capturing 2 bytes at a time by shifting by 1 byte
And a character code selection means for judging whether or not the window is a boundary window and cutting out a 1-byte character or a 2-byte character in character units, and a code conversion means for converting this to a bit-compressed character code system. As a result, the capacity of the state transition table can be significantly reduced, and even in texts in which 1-byte characters and 2-byte characters are mixed, high-speed character string matching can be performed without causing erroneous matching due to byte shifts. It becomes possible to realize a simple character string search device.

[Brief description of drawings]

【図１】本発明の１実施例を示す構成図である。FIG. 1 is a configuration diagram showing an embodiment of the present invention.

【図２】従来の文字列検索装置の構成を示す図である。FIG. 2 is a diagram showing a configuration of a conventional character string search device.

【図３】引例の文字列照合手段の構成を示すブロック図
である。FIG. 3 is a block diagram showing a configuration of a reference character string collating means.

【図４】検索タームを照合するためのオートマトンの一
例を示す図である。FIG. 4 is a diagram showing an example of an automaton for matching search terms.

【図５】引例で用いられる状態遷移テーブルの内容の一
例を示す図である。FIG. 5 is a diagram showing an example of contents of a state transition table used in a reference.

【図６】バイトずれによる誤照合の生じる一例を示す図
である。FIG. 6 is a diagram showing an example in which erroneous collation occurs due to a byte shift.

【図７】本発明における文字コード取込み方法の一例を
示す図である。FIG. 7 is a diagram showing an example of a character code capturing method according to the present invention.

【図８】ＳＪＩＳコード体系から圧縮文字コード体系へ
の変換フォーマットの一例を示す図である。FIG. 8 is a diagram showing an example of a conversion format from the SJIS code system to the compressed character code system.

【図９】本発明による文字列検索装置の一実施例の構成
を示す図である。FIG. 9 is a diagram showing a configuration of an embodiment of a character string search device according to the present invention.

【図１０】本発明による文字コード取込みウィンドウと
文字コード選択手段の一実施例を示す構成図である。FIG. 10 is a configuration diagram showing an embodiment of a character code acquisition window and character code selection means according to the present invention.

【図１１】本発明による文字コード選択手段の一実施例
の動作を記述した真理値表である。FIG. 11 is a truth table describing the operation of one embodiment of the character code selection means according to the present invention.

【図１２】本発明によるコード変換手段の一実施例を示
す構成図である。FIG. 12 is a configuration diagram showing an embodiment of a code converting means according to the present invention.

【図１３】本発明における文字種判定ブロックの一実施
例を示す図である。FIG. 13 is a diagram showing an embodiment of a character type determination block in the present invention.

【図１４】本発明による簡略化した文字コード取込みウ
ィンドウの一実施例を示す図である。FIG. 14 is a diagram showing an embodiment of a simplified character code acquisition window according to the present invention.

【図１５】本発明による文字種判定ブロックの一実施例
を示す図である。FIG. 15 is a diagram showing an embodiment of a character type determination block according to the present invention.

【図１６】本発明による簡略化した文字種判定ブロック
の一実施例を示す図である。FIG. 16 is a diagram showing an embodiment of a simplified character type determination block according to the present invention.

【図１７】本発明による文字種判定回路の一実施例を示
す図である。FIG. 17 is a diagram showing an embodiment of a character type determination circuit according to the present invention.

【図１８】本発明第二の実施例によるコード変換ブロッ
クおよび文字種判定ブロックの一実施例を示す図であ
る。FIG. 18 is a diagram showing an embodiment of a code conversion block and a character type determination block according to the second embodiment of the present invention.

【図１９】本発明第二の実施例による２バイト文字用コ
ード変換回路の一実施例を示す図である。FIG. 19 is a diagram showing one embodiment of a 2-byte character code conversion circuit according to the second embodiment of the present invention.

【図２０】本発明第二の実施例による１バイト文字用コ
ード変換回路の一実施例を示す図である。FIG. 20 is a diagram showing one embodiment of a 1-byte character code conversion circuit according to the second embodiment of the present invention.

【図２１】本発明第二の実施例による簡略化した文字コ
ード変換ブロックの一実施例を示す図である。FIG. 21 is a diagram showing one embodiment of a simplified character code conversion block according to the second embodiment of the present invention.

【図２２】ＳＪＩＳコード体系から非完全圧縮文字コー
ド体系への変換フォーマットの一例を示す図である。FIG. 22 is a diagram showing an example of a conversion format from the SJIS code system to the incompletely compressed character code system.

【図２３】本発明第三の実施例による２バイト文字用コ
ード変換回路の一実施例を示す図である。FIG. 23 is a diagram showing one embodiment of a 2-byte character code conversion circuit according to the third embodiment of the present invention.

【図２４】本発明第四の実施例による２バイト文字用コ
ード変換回路の一実施例を示す図である。FIG. 24 is a diagram showing one embodiment of a 2-byte character code conversion circuit according to the fourth embodiment of the present invention.

【図２５】本発明第四の実施例による簡略化した２バイ
ト文字用コード変換回路の一実施例を示す図である。FIG. 25 is a diagram showing one embodiment of a simplified 2-byte character code conversion circuit according to the fourth embodiment of the present invention.

【図２６】ＥＢＣＤＩＫ／ＫＥＩＳコード体系から圧縮
文字コード体系への変換フォーマットの一例を示す図で
ある。FIG. 26 is a diagram showing an example of a conversion format from an EBCDIK / KEIS code system to a compressed character code system.

【図２７】本発明第五の実施例によるコード変換ブロッ
クおよび文字種判定ブロックの一実施例を示す図であ
る。FIG. 27 is a diagram showing one embodiment of a code conversion block and a character type determination block according to the fifth embodiment of the present invention.

【図２８】本発明第五の実施例によるエスケープシーケ
ンス検出回路の一実施例を示す図である。FIG. 28 is a diagram showing an embodiment of the escape sequence detection circuit according to the fifth embodiment of the present invention.

【図２９】本発明第五の実施例による１バイト文字用文
字種判定テーブルの一実施例を示す図である。FIG. 29 is a diagram showing an example of a character type determination table for 1-byte characters according to the fifth example of the present invention.

【図３０】本発明第五の実施例による２バイト文字用文
字種判定テーブルの一実施例を示す図である。FIG. 30 is a diagram showing an example of a character type determination table for double-byte characters according to the fifth example of the present invention.

【図３１】ＪＩＳ／ＪＩＳコード体系から圧縮文字コー
ド体系への変換フォーマットの一例を示す図である。FIG. 31 is a diagram showing an example of a conversion format from the JIS / JIS code system to the compressed character code system.

【図３２】本発明第六の実施例によるコード変換手段の
一実施例を示す図である。FIG. 32 is a diagram showing an example of the code converting means according to the sixth example of the present invention.

【図３３】本発明第六の実施例によるビット幅変換手段
の一実施例を示す図である。FIG. 33 is a diagram showing an example of a bit width conversion means according to a sixth example of the present invention.

【図３４】本発明第六の実施例による文字コード取込み
ウィンドウの一実施例を示す図である。FIG. 34 is a diagram showing an example of a character code acquisition window according to the sixth example of the present invention.

【図３５】本発明第六の実施例による文字コード選択手
段の一実施例を示す図である。FIG. 35 is a diagram showing one embodiment of a character code selection means according to the sixth embodiment of the present invention.

【図３６】本発明第六の実施例による文字コード選択手
段の一実施例を示す図である。FIG. 36 is a diagram showing an embodiment of a character code selection means according to the sixth embodiment of the present invention.

【図３７】本発明第六の実施例によるコード変換手段の
一実施例の動作を示す図である。FIG. 37 is a diagram showing the operation of one embodiment of the code converting means according to the sixth embodiment of the present invention.

【図３８】本発明第六の実施例によるエスケープシーケ
ンス検出回路の一実施例を示す図である。FIG. 38 is a diagram showing one embodiment of the escape sequence detection circuit according to the sixth embodiment of the present invention.

【図３９】文字コード変換テーブルの値を表す図であ
る。FIG. 39 is a diagram showing values in a character code conversion table.

【図４０】異表記検索を行なわない場合の文字コード変
換テーブルの値を示す図である。FIG. 40 is a diagram showing values in a character code conversion table when a different notation search is not performed.

【図４１】半角全角間の異表記検索を行う場合の文字コ
ード変換テーブルの値を示す図である。FIG. 41 is a diagram showing values in a character code conversion table when a different notation search between half-width and full-width characters is performed.

【図４２】カタカナにおいて半角全角間の異表記検索を
行う場合の文字コード変換テーブルの値を示す図であ
る。FIG. 42 is a diagram showing values in a character code conversion table when performing a different notation search between half-width and full-width characters in katakana.

【図４３】１バイト文字用コード変換テーブルの値を示
す図である。FIG. 43 is a diagram showing values in a 1-byte character code conversion table.

【図４４】コード変換オフセットテーブルの値を示す図
である。FIG. 44 is a diagram showing values in a code conversion offset table.

【図４５】コード変換オフセットテーブルの値を示す図
である。FIG. 45 is a diagram showing values in a code conversion offset table.

【図４６】第三の実施例におけるコード変換オフセット
テーブルの値を示す図である。FIG. 46 is a diagram showing values in a code conversion offset table in the third embodiment.

【図４７】第五の実施例における１バイト文字用コード
変換テーブルの値を示す図である。FIG. 47 is a diagram showing values of a 1-byte character code conversion table in the fifth embodiment.

【図４８】第五の実施例における２バイト文字用コード
変換テーブルの値を示す図である。FIG. 48 is a diagram showing values in a 2-byte character code conversion table in the fifth embodiment.

【図４９】第六の実施例における１バイト文字用コード
変換テーブルの値を示す図である。FIG. 49 is a diagram showing values in the code conversion table for 1-byte characters in the sixth embodiment.

【図５０】第六の実施例における２バイト文字用コード
変換テーブルの値を示す図である。FIG. 50 is a diagram showing values in a 2-byte character code conversion table in the sixth embodiment.

[Explanation of symbols]

１…文字列検索装置，１０５…文字列記憶手段，４００
…文字コード変換手段，１０２…文字列照合手段，４１
０ａ，４１０ｂ…文字コード取込み手段（ウィンド
ウ），４２０…文字コード選択手段，４３０…コード変
換手段1 ... Character string search device, 105 ... Character string storage means, 400
... character code conversion means, 102 ... character string collation means, 41
0a, 410b ... Character code acquisition means (window), 420 ... Character code selection means, 430 ... Code conversion means

───────────────────────────────────────────────────── フロントページの続き (72)発明者畠山敦東京都国分寺市東恋ケ窪１丁目280番地株式会社日立製作所中央研究所内 (72)発明者篠崎雅継神奈川県海老名市下今泉810番地株式会社日立製作所オフィスシステム事業部内 ─────────────────────────────────────────────────── ─── Continuation of the front page (72) Inventor Atsushi Hatakeyama 1-280, Higashi Koigokubo, Kokubunji, Tokyo Inside Central Research Laboratory, Hitachi, Ltd. (72) Inventor Masatsugu Shinozaki 810 Shimoimaizumi, Ebina, Kanagawa Prefecture Hitachi, Ltd. Office system division

Claims

[Claims]

1. A character code conversion device used for exchanging text data between devices having different character code systems, comprising a plurality of character code capturing means for capturing input character codes in a state where they are shifted by 1 byte from each other. A character code selecting means for selecting a character code to be output from the character codes output by the plurality of character code fetching means, based on the immediately preceding selection result and the code conversion result, and the character code selecting means. A code conversion means for discriminating a character type such as whether the generated character code is a 1-byte character or a 2-byte character, and performing a character code conversion corresponding to each character type. The input text written in one character code system is changed to the second character code different from the first character code system. Character code conversion apparatus characterized by converting the de scheme.

2. As the code conversion means in the character code conversion device according to claim 1, when the code conversion is performed on the input text written in the first character code system, the code is allocated without a gap. A character code conversion device having a complete compression type code conversion means for compressing and converting to a second character code system.

3. As the code conversion means in the character code conversion device according to claim 2, the input text written in the first character code system is compressed into a second character code system in which codes are allocated without any gaps. A character code conversion device having a complete compression type code conversion means for converting and a character code reverse conversion means for converting a complete compression type code system back to a non-compression type character code system.

4. A code conversion means in the character code conversion device according to claim 1, wherein when performing code conversion on an input text written in the first character code system, a code is allowed with a gap partially. A character code conversion device having an incompletely compressed code conversion means for compressing and converting into a second character code system allocated to.

5. A second character code as code conversion means in the character code conversion device according to claim 4, wherein the input text written in the first character code system is assigned a code with a gap partially allowed. A character code conversion device having not-completely-compressed code conversion means for compressing conversion into a system and character-code reverse conversion means for performing reverse conversion from an incompletely-compressed code system to a non-compressed character code system .

6. A code conversion unit in the character code conversion apparatus according to claim 1, which has a code conversion circuit for a 1-byte character and a code conversion circuit for a 2-byte character independently, and a code according to a corresponding conversion mode. A character code conversion device comprising a conversion mode selection type code conversion means for converting a character code by selecting a conversion value.

7. A code conversion means in the character code conversion device according to claim 1, which is a batch processing type code conversion means for performing code conversion processing for 1-byte characters and code conversion processing for 2-byte characters in the same circuit. A character code conversion device having.

8. Character type information for determining whether the character code selected by the character code selecting means is a 1-byte character or a 2-byte character as the code converting means in the character code converting device according to claim 1. A character code conversion device having a character type information storage means for storing at least one or more types, and performing character code conversion according to the character type information obtained by the character type information storage means.

9. The character code conversion device according to claim 8, wherein in addition to the character type information such as whether the input character code is a 1-byte character or a 2-byte character, the input character code is stored in the character type information storage means. Character code conversion characterized in that conversion processing information necessary for conversion into different character code systems of different types is stored, and character code conversion is performed using the character type information and the conversion processing information obtained by the character type information storage means. apparatus.

10. The character code conversion device according to claim 8, wherein when converting an input character code into another different character code system, for input text in which character strings represented by a plurality of code systems are mixed. A character code conversion device characterized by dynamically converting a code.

11. A document retrieval apparatus for retrieving a document containing a retrieval word specified in a retrieval condition expression for a document database stored as character codes, wherein input character codes are fetched in a state of being shifted by 1 byte from each other. A plurality of character code fetching means, and a character code selecting means for selecting a character code to be output from the character codes output by the plurality of character code fetching means based on the immediately preceding selection result and the code conversion result; The character code selecting unit has a code converting unit that determines a character type such as whether the character code selected is a 1-byte character or a 2-byte character, and performs character code conversion corresponding to each character type. The input text written in the first character code system in which double-byte characters are mixed is used as the first character code. Document search apparatus characterized by performing the matching process for the specified search term after converting systematically different from the second character coding system.

12. The code conversion means in the document retrieval device according to claim 11, wherein when the code conversion is performed on the input text written in the first character code system, the code is allocated without any gap. A document retrieval device comprising a complete compression type code conversion means for performing compression conversion to a second character code system.

13. The code conversion means in the document retrieval device according to claim 11, wherein when the code conversion is performed on the input text written in the first character code system, the code is allowed with a gap partially. A document retrieval device having an incompletely compressed code conversion means for compressing and converting into the allocated second character code system.

14. A code conversion unit for a document retrieval apparatus according to claim 11, which has a code conversion circuit for a 1-byte character and a code conversion circuit for a 2-byte character independently, and code conversion in a corresponding conversion mode. A document retrieval apparatus having a conversion mode selection type code conversion means for converting a character code by selecting a value.

15. The batch conversion type code conversion means for performing code conversion processing for 1-byte characters and code conversion processing for 2-byte characters in the same circuit as code conversion means in the document retrieval apparatus according to claim 11. A document retrieval device characterized by the above.

16. The character type information for determining whether the character code selected by the character code selecting means is a one-byte character or a two-byte character as code conversion means in the document search device according to claim 11. A document search characterized by having character type information storage means for storing at least one kind or more, and performing collation processing of a specified search word after performing character code conversion according to the character type information obtained by the character type information storage means apparatus.

17. The document retrieval apparatus according to claim 16, wherein the input character code is stored in the character type information storage means in addition to the character type information such as whether the input character code is a 1-byte character or a 2-byte character. A code conversion information storage type code conversion means for storing conversion processing information necessary for conversion into a different character code system and performing character code conversion using the character type information and the conversion processing information obtained by the character type information storage means. A document search device having:

18. The document retrieval apparatus according to claim 16, which is unnecessary for performing retrieval in addition to character type information such as whether the input character code is a 1-byte character or a 2-byte character in the character type information storage means. In addition to storing specific character type information corresponding to a control code, that is, a specific control code representing double-width display, halftone display, etc., in addition to performing code conversion according to each character type information as code conversion means,
A document retrieval device comprising control code deletion type code conversion means for deleting the character code when the character type information obtained by the character type information storage means represents the specific character type.

19. The document retrieval device according to claim 16, wherein in addition to character type information such as whether the input character code is a 1-byte character or a 2-byte character in the character type information storage means, a document number, a page number, and an indent. In addition to storing specific character type information corresponding to a specific control code added before the binary data such as quantity, and performing code conversion according to each character type information as code conversion means, A document retrieval apparatus comprising a binary data detection type code conversion means for detecting binary data when the obtained character type information represents the specific character type.

20. The document retrieval apparatus according to claim 16, wherein the character type information storage means stores the code conversion information only for the character code included in the retrieval word specified in the retrieval condition expression, and the other characters. Specific character type information is stored for the code, and when the character type information obtained by the character type information storage means as the code conversion means represents the specific character type, it is deleted as a character code not related to the search word. A document retrieval device having unnecessary code deletion type code conversion means.

21. In the document retrieval apparatus according to claim 16, in addition to character type information such as whether the input character code is a 1-byte character or a 2-byte character in the character type information storage means, the end of the file to be searched, etc. A specific character type information corresponding to a specific control code having a special meaning is stored, and a document search process is performed when the character type information obtained by the character type information storage means as the code conversion means represents the specific character type. A document retrieval apparatus having a control information detection type code conversion means for outputting control information such as the end of.

22. In the document retrieval device according to claim 16, various characters are included in the input text due to different notation expressions such as half-width characters such as alphanumeric characters, uppercase and lowercase letters, Katakana Hiragana characters, old and new Kanji characters. When converting a character string represented by a code to another character code system different from the input character code system, by converting the code to a certain character code string, A document retrieval device characterized in that the retrieval processing is performed collectively.

23. When converting an input character code into another different character code system in the document retrieval apparatus according to claim 16, a dynamic operation is performed for input text in which character strings represented by a plurality of code systems are mixed. A document retrieval device having a code conversion means for converting a code into.