JPH03223923A

JPH03223923A - Control system for input of unfixed byte length character

Info

Publication number: JPH03223923A
Application number: JP2018239A
Authority: JP
Inventors: Yuji Ogawa; 雄司小川
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1990-01-29
Filing date: 1990-01-29
Publication date: 1991-10-02
Anticipated expiration: 2012-01-08
Also published as: JP2569857B2

Abstract

PURPOSE:To process a text code in a simple and effective way by providing a text code generating means which inserts a control character for the generation of the text code and a character attribute identifying means which identifies the attribute defined to each character of the text code. CONSTITUTION:A text code generating means 1 produces a text code with the addition of a control character having the byte length of an unchanged character when the byte length of the characters constituting a character string is changed halfway of the character string. A character attribute identifying part 2 controls the 1-byte character attribute information 5 and the 2-byte character attribute information 6 which are stored in a storage area 3. For these character attribute information, the byte code tables 7-10 are prepared. Thus, the byte code tables are turned into a hierarchical structure in response to each byte of characters for the control of the attribute set to each character of each byte length. Thus, it is not required to process a text code which incorporates the characters of different number of bytes.

Description

【発明の詳細な説明】（産業上の利用分野）本発明は、計算機システムにおけるバイト長の異なる文
字が混在する文字列を処理する不定バイト長文字入力制
御方式に関する。DETAILED DESCRIPTION OF THE INVENTION (Field of Industrial Application) The present invention relates to an indefinite byte length character input control method for processing character strings in which characters of different byte lengths coexist in a computer system.

（従来の技術）近年の情報処理システムでは、日本語データの処理は必
要不可欠となってきている。ところが、１バイトで表現
できない日本語データの文字の処理に対して、様々な問
題が発生している。この様々な問題は英数字を標準とし
た１バイトコード系に１文字と２バイトで表現する日本
語コード系を加えてコード系を拡張したことに起因する
。たとえば、１バイト文字の次に２バイト文字が現れる
文字列の処理を行う場合、その文字列を処理するシステ
ムに対して１文字を表すバイト長が変化したことをどの
ように伝えるかという問題や、ＡＳＣＩＩコードの°１
゛とＪＩＳ漠字コードの°ｌ′を同じ数字として認識し
なくてはならないという問題などがある。そしてこれら
の問題は言語処理系等のように、テ該テキストコード処
理するシステムにとって大きな影響を与えることになる
。(Prior Art) In recent information processing systems, processing of Japanese data has become essential. However, various problems have arisen in processing characters in Japanese data that cannot be expressed in one byte. These various problems are due to the expansion of the 1-byte code system, which uses alphanumeric characters as the standard, by adding the Japanese code system, which is expressed using 1 character and 2 bytes. For example, when processing a string in which a 2-byte character appears next to a 1-byte character, there are issues such as how to notify the system processing the string that the byte length representing one character has changed. , ASCII code °1
There are problems such as the need to recognize ゛ and °l' in the JIS vague character code as the same number. These problems have a great impact on systems that process text codes, such as language processing systems.

第２図は従来の不定バイト長文字入力制御方式の構成を
表す図である。テキストコード生成部２０１が生成する
テキストコード２０８では、該テキストコード２０８中
の２バイト文字列部分が２バイト文字の始まりまたは終
わりを示すバイト長が固定の２つの制御文字で囲まれて
いる。たとえば、°＾ＢＣ日本語Ｘ’／２　’　という
文字列では、第３図に示すように′Ｃ′と′日°の間と
°語°と。FIG. 2 is a diagram showing the configuration of a conventional character input control system of indefinite byte length. In the text code 208 generated by the text code generation unit 201, a 2-byte character string portion in the text code 208 is surrounded by two control characters with a fixed byte length indicating the start or end of the 2-byte character. For example, in the character string °^BCJapanese

Ｘｏの間にそれぞれ２バイト文字の始まりと終わりを表
す制御文字＜ＫＴ＞、＜ＫＯ＞が挿入されている。これ
ら２つの制御文字は２バイトからなり、１６進数コード
ではそれぞれ３Ｆ７５．３Ｆ７６である。Control characters <KT> and <KO> representing the start and end of double-byte characters, respectively, are inserted between Xo. These two control characters consist of 2 bytes and have hexadecimal codes of 3F75.3F76, respectively.

テキストコード生成部２０１で生成したテキストコード
２０８を与えられた１バイト入力処理部２０２は、与え
られたテキストコード２０８で１バイト文字を１バイト
文字属性識別部２０４に与える。１バイト文字属性識別
部２０４は、与えられた文字コードで１バイトコードテ
ーブル２０６を表引きし、その１バイト文字に対する属
性情報が格納されている属性ビットテーブル２０９を得
る。ここで仮に、１バイト入力処理部２０２が１バイト
の３Ｆというコードを発見すると、次の１バイトを読み
、それが７５なら２バイト入力処理部２０３へ制御を移
し、そうでなければそのまま１バイト入力処理を行う。The 1-byte input processing unit 202, which is given the text code 208 generated by the text code generation unit 201, supplies the 1-byte character with the given text code 208 to the 1-byte character attribute identification unit 204. The 1-byte character attribute identification unit 204 looks up the 1-byte code table 206 using the given character code, and obtains the attribute bit table 209 in which attribute information for the 1-byte character is stored. Here, if the 1-byte input processing unit 202 finds a 1-byte code of 3F, it will read the next 1 byte, and if it is 75, it will transfer control to the 2-byte input processing unit 203, otherwise it will continue to write 1 byte. Perform input processing.

２バイト入力処理部２０３へ制御が移ると２バイト文字
を２バイト文字属性識別部２０５に与え、２バイトコー
ドテーブル２０７をその２バイトコードで表引きしてそ
の２バイト文字に対する属性ビットテーブル２１０を得
る。２バイト文字列の終わりを示す制御文字くＫＯ〉は
２バイトであるから、２バイト入力処理部２０３は特別
な処理を行うことなく制御文字＜ＫＯ＞を識別すること
ができ、該制御文字くＫＯ〉を発見すると１バイト入力
処理部２０２へ制御を戻す。When control is transferred to the 2-byte input processing unit 203, the 2-byte character is given to the 2-byte character attribute identification unit 205, and the 2-byte code table 207 is looked up using the 2-byte code to create the attribute bit table 210 for the 2-byte character. obtain. Since the control character <KO> that indicates the end of a 2-byte character string is 2 bytes long, the 2-byte input processing unit 203 can identify the control character <KO> without performing any special processing. KO> is found, control is returned to the 1-byte input processing unit 202.

文字属性の識別は以下のように行っていた。たとえば、
１バイトが８ビツトの文字で３２種類の属性の有無を表
すときは、要素数が２５６であって１要素が４バイトの
バイトコードテーブルを用意し、′１°の文字コードが
２４１であるとそのバイトコードテーブルの２４１番目
の属性ビットテーブルの数字属性を表すビットを°ＯＮ
’にする、といった具合である。２バイト文字に対して
は、テーブルの大きさを６５５３６にして処理する。Character attributes were identified as follows. for example,
To represent the presence or absence of 32 types of attributes using 8-bit characters, prepare a bytecode table with 256 elements and 4 bytes per element, and if the character code for '1° is 241. Turn ON the bit representing the numeric attribute in the 241st attribute bit table of the bytecode table.
' to do, and so on. For 2-byte characters, the table size is set to 65536 and processed.

（発明が解決しようとする課題）上述した従来の不定バイト長文字入力制御方式では、２
バイト文字の前後に付加する２バイト文字の始まりまた
は終わりを示す２つの制御文字のバイト長が固定である
から、興なるバイト数の文字を含んだテ該テキストコー
ド処理するシステムが１バイト文字を入力中または２バ
イト文字を入力中に、処理中の文字とバイト長が異なる
制御文字に対応しなくてはならなかった。また、テキス
トコードの処理システムは、１バイト文字の入力処理部
と２バイト文字の入力処理部の２つの入力処理部とが必
要であり、処理システムが複雑かつ非効率的となってい
た。(Problem to be Solved by the Invention) In the conventional character input control method of indefinite byte length described above, two
Because the byte length of the two control characters that are added before and after a byte character to indicate the start or end of a 2-byte character is fixed, a system that processes a text code containing characters of varying number of bytes will not be able to process 1-byte characters. While inputting or inputting double-byte characters, it was necessary to deal with a control character whose byte length was different from the character being processed. Furthermore, the text code processing system requires two input processing sections, a 1-byte character input processing section and a 2-byte character input processing section, making the processing system complex and inefficient.

このように、従来の不定バイト長文字入力制御方式には
解決すべき課題があった。As described above, there are problems that need to be solved in the conventional character input control method of indefinite byte length.

（課題を解決するための手段）本発明は、１バイトで表現される英語文字と複数バイト
で表現される日本語文字とが混在する文字列の処理を行
う計算機システムにおける不定バイト長文字入力制御方
式において、文字列に混在するそれぞれ異なるバイト長の文字毎に文
字と同じバイト長であって文字列において以下に続く文
字のバイト長が変化することとその以下に続く文字のバ
イト長とを示す制御文字と、文字を構成する各バイトに
対応するバイトコードテーブルを階層化し°ζ各バイト
長毎のそれぞれの文字に対する属性を管理する文字属性
情報とを定義し、異なるバイト長の文字が混在する文字列を受け、該文字
列における異なるバイト長の文字と文字との間にある仮
想上の位置に、直前の文字と同じバイト長であって該文
字に続く文字のバイト長が何バイトであるかを示す前記
制御文字を挿入してテ該テキストコード生成するテキス
トコード生成手段と、該テキストコード生成手段が生成するテ該テキストコー
ド受け、前記文字属性情報を参照して前記テキストコー
ドの各文字に定義されている文字属性を識別する文字属
性識別手段と、を有する。(Means for Solving the Problems) The present invention provides variable byte length character input control in a computer system that processes character strings in which English characters expressed in one byte and Japanese characters expressed in multiple bytes coexist. In this method, for each character of different byte length mixed in a string, it is shown that the byte length of the character that has the same byte length and that follows in the string changes, and the byte length of the character that follows it. Control characters and character attribute information that manages the attributes of each character for each byte length are defined by layering the byte code table corresponding to each byte that constitutes the character, and characters with different byte lengths coexist. Receive a character string, and at a virtual position between characters of different byte lengths in the string, find out how many bytes is the byte length of the character that has the same byte length as the previous character and follows the character. text code generation means for generating the text code by inserting the control character indicating the text; character attribute identification means for identifying character attributes defined in .

（実施例）次に、本発明について図面を参照して説明する。(Example) Next, the present invention will be explained with reference to the drawings.

第１図は本発明の一実施例を示す構成図である。FIG. 1 is a block diagram showing an embodiment of the present invention.

テキストコード生成部１は、入力された文字列が途中で
その文字列を構成する文字のバイト長が変化したときに
、変化前の文字のバイト長である制御文字を付加したテ
該テキストコード生成する。When the byte length of the characters constituting the input character string changes midway through the input character string, the text code generation unit 1 generates a text code by adding a control character that is the byte length of the character before the change. do.

文字属性識別部２は、記憶領域３中の１バイト文字属性
情報５および２バイト文字属性情報６を管理する。それ
ぞれの文字属性情報はいくつかのバイトコードテーブル
７．８，９．１０を有しており、各バイト長のそれぞれ
の文字に対する属性をその文字の各バイトに対応するバ
イトコードテーブルを階層化して管理している。そして
各文字の最下位バイトに対応するバイトコードテーブル
には、その文字に対する属性を示す属性ビットテーブル
または、文字のバイト長が変化することを知らせる他の
文字属性情報の最上位バイトに対するバイトコードテー
ブルへのポインタが格納されている。Character attribute identification section 2 manages 1-byte character attribute information 5 and 2-byte character attribute information 6 in storage area 3 . Each character attribute information has several bytecode tables 7.8, 9.10, and the attributes for each character of each byte length are hierarchically arranged by the bytecode tables corresponding to each byte of that character. Managed. The bytecode table corresponding to the least significant byte of each character contains an attribute bit table indicating attributes for that character, or a bytecode table for the most significant byte of other character attribute information that indicates that the byte length of the character changes. A pointer to is stored.

レジスタＣＴ４は現在入力中の文字のパイ１〜長に対す
る文字属性情報の最上位バイトのバイトコードテーブル
を指す、レジスタＣＴ４の初期値は、１バイト文字属性
情報５のバイトコードテーブル７へのポインタ値である
。Register CT4 points to the byte code table of the most significant byte of character attribute information for the length of the character currently being input.The initial value of register CT4 is the pointer value to byte code table 7 of 1-byte character attribute information 5. It is.

１バイトが８ビツトで１要素の大きさが４バイトである
バイトコードテーブルについて考える。Consider a bytecode table in which 1 byte is 8 bits and each element is 4 bytes.

ここで、１バイト文字から２バイト文字への変化を示す
１バイトの制御文字〈２Ｓ〉のコードを３Ｆ、２バイト
文字から１バイト文字への変化を示す２バイトの制御文
字くＩＳ〉のコードを３Ｆ７６とする。このとき、テキ
ストコード生成部１は°１２３４５６’という文字列に
対して、第４図に示すテ該テキストコード生成する。こ
こで’１’、’２’、’５’、°６′は１バイト文字で
あり、１６進数でそれぞれＦｌ、Ｆ２．Ｆ５Ｆ６のコー
ドであるとする。’　３’　、’　４’は２バイト文字
であり、そのコードは１６進法でそれぞれ７ＢＦ３．７
８Ｆ４であるとする。２゜と°３°との間および′４′
と５”との間には、それぞれ＜２８＞、＜ＩＳ＞の制御
文字が挿入されている。したがって、２バイト長の文字
の前後にバイト長が変化する直前の文字と同じバイト長
の制御文字を挿入することにより文字のバイト長が変化
することを示す制御文字を全て現在入力中の文字のバイ
ト長で表現するから、現在入力中の文字のバイト長で制
御文字を識別することができる。Here, the code for the 1-byte control character <2S> indicating a change from a 1-byte character to a 2-byte character is 3F, and the code for the 2-byte control character <IS> indicating a change from a 2-byte character to a 1-byte character. Let be 3F76. At this time, the text code generating section 1 generates the text code shown in FIG. 4 for the character string 123456'. Here, '1', '2', '5', and °6' are 1-byte characters, and in hexadecimal numbers Fl, F2, . Assume that the code is F5F6. '3' and '4' are double-byte characters, and their codes are respectively 7BF3.7 in hexadecimal.
Suppose it is 8F4. Between 2° and °3° and '4'
and 5", control characters <28> and <IS> are inserted, respectively. Therefore, the control characters with the same byte length as the character immediately before and after the 2-byte length character change. All control characters that indicate that the byte length of a character changes when a character is inserted are expressed by the byte length of the character currently being input, so control characters can be identified by the byte length of the character currently being input. .

次に、テキストコード生成部１で生成された第４図に示
すテキストコードの処理について説明する。第５図は文
字属性識別部２の処理を示す流れ図である。文字属性識
別部２は、まず、レジスタＰ１５にレジスタＣＴ４の内
容を代入する（ステップ５０１）。このとき、レジスタ
ＣＴ４には１バイト文字属性情報５のバイトコードテー
ブル７へのポインタ値が格納されており、そのポインタ
値がレジスタＰ１５に代入される。そして、テキストコ
ードの１バイトを読み、レジスタＱ１６に代入する（ス
テップ５０２）。すると、レジスタＱ１６には文字゛１
′の文字コードＦ１が渡り、レジスタＣＴ４の指し示す
バイトコードテーブル７のＦ１番目の要素の内容をレジ
スタＰ１５に代入する（ステップ５０３）（第７図（ａ
））。Next, processing of the text code shown in FIG. 4 generated by the text code generation section 1 will be explained. FIG. 5 is a flowchart showing the processing of the character attribute identification section 2. The character attribute identification unit 2 first assigns the contents of the register CT4 to the register P15 (step 501). At this time, a pointer value to the byte code table 7 of the 1-byte character attribute information 5 is stored in the register CT4, and the pointer value is assigned to the register P15. Then, one byte of the text code is read and assigned to register Q16 (step 502). Then, the character ``1'' is stored in register Q16.
' character code F1 is passed, and the contents of the F1-th element of the bytecode table 7 pointed to by the register CT4 are assigned to the register P15 (step 503) (see FIG. 7(a).
)).

各バイトコードテーブルの要素の内容は、他のバイトコ
ードテーブルへのポインタ値または属性ビットテーブル
もしくは文字がバイト長の変化を示す制御文字であると
きの他の文字属性情報の最上位バイトに対するバイトコ
ードテーブルへのポインタ値のいずれかである。これら
は次のようにして識別される。各バイトコードテーブル
の大きさは一定で、４の倍数のバイト数であるから、記
憶領域３の中で整列することにより、それぞれのバイト
コードテーブルの先頭番地の下位２ビツトをＯにするこ
とができる。したがって、バイトコードテーブルの要素
の最下位ビットを第０ビツト、最上位ビットを第３１ビ
ツトとすると、第６図に示すように第１、第０ビツトの
値が００なら他のバイトコードテーブルへのポインタ値
（６０１）、０１なら属性ビットテーブル（６０２）、
１１なら他の文字属性情報の最上位バイトに対するバイ
トコ−トチ−プルへのポインタ値に３を加えた値として
識別することができる。The content of each bytecode table element is a pointer value to another bytecode table or a bytecode for the most significant byte of attribute information in an attribute bit table or other character when the character is a control character indicating a change in byte length. Any pointer value to a table. These are identified as follows. The size of each bytecode table is constant and the number of bytes is a multiple of 4, so by arranging them in storage area 3, the lower two bits of the first address of each bytecode table can be set to O. can. Therefore, if the least significant bit of an element in a bytecode table is the 0th bit and the most significant bit is the 31st bit, as shown in Figure 6, if the value of the 1st and 0th bits is 00, the data is transferred to another bytecode table. pointer value (601), if 01, attribute bit table (602),
If it is 11, it can be identified as the value obtained by adding 3 to the pointer value to the byte code triple for the most significant byte of other character attribute information.

今、入力処理をしている文字のバイト長は１であるから
レジスタＰ１５の内容は他のバイトコードテーブルへの
ポインタ値ではなくまたその文字は１′であって制御文
字でもないからレジスタＰ１５に与えられた属性ビット
テーブル１２の内容に従った各属性処理を行う（ステッ
プ５０４゜５０５．５０６）（’　１°という文字の意
味から数字としての処理が行われる）。次に、ステップ
５０１へと処理が戻り、°２°の文字に対して同様の処
理が行われる。制御文字である２Ｓに対してはステップ
５０３によってレジスタＰ１５に１バイト文字属性情報
５のバイトコードテーブル７の３Ｆ番目の内容が得られ
る。この内容は、２バイト文字属性情報６の上位バイト
に対するバイトコードテーブル８へのポインタ値に３を
加えた値であるから、ステップ５０４，５０５を経てス
テップ５０７へ処理が移る。ここでレジスタＰ１５の内
容から３を引いた値をレジスタＣＴ４に代入して（ステ
ップ５０７）、ステップ５０１に戻る。The byte length of the character currently being input is 1, so the content of register P15 is not a pointer value to another bytecode table, and the character is 1' and is not a control character, so the contents of register P15 are Each attribute is processed according to the contents of the given attribute bit table 12 (steps 504, 505, and 506) (processing is performed as a number based on the meaning of the character 1°). Next, the process returns to step 501, and the same process is performed on the 2° character. For the control character 2S, the contents of the 3Fth byte code table 7 of the 1-byte character attribute information 5 are obtained in the register P15 in step 503. Since this content is the value obtained by adding 3 to the pointer value to the bytecode table 8 for the upper byte of the 2-byte character attribute information 6, the process moves to step 507 via steps 504 and 505. Here, the value obtained by subtracting 3 from the contents of register P15 is assigned to register CT4 (step 507), and the process returns to step 501.

以下、レジスタＣＴ４は、２バイト文字属性情報６の上
位バイトに対するバイトコードテーブルへのポインタ値
を保持する。テキストコード１１から得られる３′の２
バイトの文字コードの上位バイトを得るとレジスタＣＴ
４の指すバイトコードテーブル８の７Ｂ番目の要素には
、他のバイトコードテーブル９へのポインタ値が格納し
であるから、ステップ５０４ではステップ５０２へ処理
を移し、下位バイトに対しては下位バイトのバイトコー
ドテーブル１０のＦ３番目の内容から属性ビットテーブ
ル１４を得て、各属性処理を行う（ステップ５０６）（
第７図（ｂ））。文字°４゛については、３′と同様で
ある。制御文字ＩＳは２バイトからなる１バイト文字の
始まりを示す制御文字で、レジスタＣＴ４の指すバイト
コードテーブル８の３Ｆ番目の要素が指すバイトコード
テーブル９の７６番目の内容には１バイト文字属性情報
５のバイトコードテーブル７へのポインタ値に３を加え
た値が設定してあり、ステップ５０７によってレジスタ
ＣＴ４は再び１バイト文字を識別する状態になる。以下
、文字°５°、゛６′が°１°、°２°と同様に入力し
て処理される。Hereinafter, the register CT4 holds a pointer value to the bytecode table for the upper byte of the 2-byte character attribute information 6. 2 of 3' obtained from text code 11
Get the upper byte of the character code of the byte and register CT
The 7Bth element of the bytecode table 8 pointed to by 4 stores a pointer value to another bytecode table 9, so in step 504, the process moves to step 502, and for the lower byte, the lower byte is stored. The attribute bit table 14 is obtained from the contents of the F3rd bytecode table 10, and each attribute process is performed (step 506).
Figure 7(b)). The character °4' is the same as 3'. The control character IS is a control character that indicates the start of a 1-byte character consisting of 2 bytes, and the 76th content of the byte code table 9 pointed to by the 3F-th element of the byte code table 8 pointed to by register CT4 contains 1-byte character attribute information. A value obtained by adding 3 to the pointer value to the byte code table 7 of 5 is set, and in step 507, the register CT4 is again in the state of identifying 1-byte characters. Thereafter, the characters °5° and '6' are input and processed in the same way as °1° and °2°.

以上に説明した実施例は、１バイト文字と２バイト文字
に限ったものであったが、１〜Ｎバイトの文字を扱うシ
ステムでは、各文字毎にそのバイト長の変化を示す制御
文字をそれぞれＮ−１個定義することによって容易に本
発明を適用することができる。たとえば、１〜３バイト
の文字を扱うシステムでは、２バイトおよび３バイト文
字に変化することを示す１バイトの２つの制御文字、１
バイトおよび３バイト文字に変化することを示す２バイ
トの２つの制御文字、１バイトおよび２バイト文字に変
化することを示す３バイトの２つの制御文字の６個の制
御文字を定義すれば良い。また、本発明は一見多くのバ
イトコードテーブルを必要とするようにみえるが、実際
は特殊属性を持つ文字というのは極限られたいくつかの
文字であるから全く同じ内容のバイトコードテーブルが
多く存在する傾向がある。そこで、バイトコードテ−プ
ルを１つたけ用意し、別々なところから参照できるよう
に工夫することによって多数のバイトコードテーブルが
必要であるという問題を解決することができる。The embodiment described above was limited to 1-byte characters and 2-byte characters, but in a system that handles characters of 1 to N bytes, a control character indicating a change in the byte length of each character is assigned to each character. The present invention can be easily applied by defining N-1. For example, in a system that handles 1- to 3-byte characters, two 1-byte control characters, 1
It is sufficient to define six control characters: two 2-byte control characters indicating change to byte and 3-byte characters, and two 3-byte control characters indicating change to 1-byte and 2-byte characters. Furthermore, although the present invention appears to require a large number of bytecode tables, in reality there are only a limited number of characters that have special attributes, so there are many bytecode tables with exactly the same content. Tend. Therefore, the problem of the need for a large number of bytecode tables can be solved by preparing only one bytecode table so that it can be referenced from different locations.

（発明の効果）以上に詳しく説明したように本発明の不定バイト長文字
入力制御方式は、テ該テキストコード処理するシステム
が１バイト文字を入力中であっても２バイト文字を入力
中であっても制御文字に対する処理の乱れがない。また
、テキストコードの処理部に１バイト文字に対する入力
処理部と２バイト文字に対する入力処理部とをそれぞれ
用意する必要がなく、単純かつ効率的なテキストコード
の処理部が実現できる。(Effects of the Invention) As explained above in detail, the variable byte length character input control method of the present invention allows the text code processing system to input 2-byte characters even when it is inputting 1-byte characters. There is no disruption in the processing of control characters. Further, it is not necessary to prepare an input processing section for 1-byte characters and an input processing section for 2-byte characters in the text code processing section, and a simple and efficient text code processing section can be realized.

本発明には以上のような効果がある。The present invention has the above effects.

[Brief explanation of drawings]

第１図は本発明の一実施例を示す構成図、第２図は従来
の不定バイト長文字入力制御方式を示す構成図、第３図
は従来のテキストコード生成部２０１が生成したテ該テ
キストコード示す図、第４図は本発明のテキストコード
生成部１が生成したテ該テキストコード示す図、第５図
は本発明の文字属性識別部２の処理を示す流れ図、第６
図は本発明のバイトコードテーブルの要素の種類を示す
図、第７図は本発明のバイトコードテーブルにより階層
的に管理される属性ビットテーブルを示す図である。１．２０１・・・テキストコード生成部、２・・・文字
属性識別部、３・・・記憶領域、４・・・レジスタＣＴ
、５・・・１バイト文字属性情報、６・・・２バイト文
字属性情報、７，８，９．１０・・・バイトコードテー
ブル、１１，２０８・・・テキストコード、１２．１３
゜１４．２０９．２１０・・・属性ビットテーブル、１
５・・・レジスタＰ、１６・・・レジスタＱ、２０２・
・・１バイト入力処理部、２０３・・・２バイト入力処
理部、２０４・・・１バイト文字属性識別部、２０５・
・・２バイト文字属性識別部、２０６・・・１バイトコ
ードテーブル、２０７・・・２バイトコードテーブル、
６０１．６０２，６０３・・・バイトコードテーブル６FIG. 1 is a block diagram showing an embodiment of the present invention, FIG. 2 is a block diagram showing a conventional variable byte length character input control method, and FIG. 3 is a block diagram showing the text generated by the conventional text code generation unit 201. FIG. 4 is a diagram showing the text code generated by the text code generation section 1 of the present invention. FIG. 5 is a flowchart showing the processing of the character attribute identification section 2 of the present invention.
The figure shows the types of elements of the bytecode table of the present invention, and FIG. 7 shows the attribute bit table hierarchically managed by the bytecode table of the present invention. 1.201...Text code generation section, 2...Character attribute identification section, 3...Storage area, 4...Register CT
, 5... 1-byte character attribute information, 6... 2-byte character attribute information, 7, 8, 9.10... Byte code table, 11,208... Text code, 12.13
゜14.209.210...Attribute bit table, 1
5...Register P, 16...Register Q, 202.
...1-byte input processing section, 203...2-byte input processing section, 204...1-byte character attribute identification section, 205.
... 2-byte character attribute identification section, 206... 1-byte code table, 207... 2-byte code table,
601, 602, 603... Bytecode table 6

Claims

[Claims] In an indefinite byte length character input control method in a computer system that processes character strings in which English characters expressed in one byte and Japanese characters expressed in multiple bytes coexist, For each character of different byte length, the character is composed of a control character that has the same byte length as the character and indicates that the byte length of the character that follows in the string changes and the byte length of the character that follows it. The byte code table corresponding to each byte is hierarchically defined, and the character attribute information that manages the attributes for each character for each byte length is defined. Character attribute identification means receives characters of different byte lengths and a text code generated by the text code generation means, and identifies character attributes defined for each character of the text code by referring to the character attribute information. An indefinite byte length character input control method characterized by having the following.