JP2569857B2

JP2569857B2 - Variable byte length character input control method

Info

Publication number: JP2569857B2
Application number: JP2018239A
Authority: JP
Inventors: 雄司小川
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1990-01-29
Filing date: 1990-01-29
Publication date: 1997-01-08
Anticipated expiration: 2012-01-08
Also published as: JPH03223923A

Description

【発明の詳細な説明】（産業上の利用分野）本発明は、計算機システムにおけるバイト長の異なる
文字が混在する文字列を処理する不定バイト長文字入力
制御方式に関する。Description: BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an undefined byte length character input control system for processing a character string in which characters having different byte lengths are mixed in a computer system.

（従来の技術）近年の情報処理システムでは、日本語データの処理は
必要不可欠となってきている。ところが、１バイトで表
現できない日本語データの文字の処理に対して、様々な
問題が発生している。この様々な問題は英数字を標準と
した１バイトコード系に１文字と２バイトで表現する日
本語コード系を加えてコード系を拡張したことに起因す
る。たとえば、１バイト文字の次に２バイト文字が現れ
る文字列の処理を行う場合、その文字列を処理するシス
テムに対して１文字を表すバイト長が変化したことをど
のように伝えるかという問題や、ASCIIコードの‘1'とJ
IS漢字コードの‘1'を同じ数字として認識しなくてはな
らないという問題などがある。そしてこれらの問題は言
語処理系等のように、テキストコードを処理するシステ
ムにとって大きな影響を与えることになる。(Prior Art) In recent information processing systems, processing of Japanese data has become indispensable. However, various problems have occurred with respect to the processing of characters of Japanese data that cannot be represented by one byte. These various problems stem from the fact that the code system has been expanded by adding a Japanese code system expressed by one character and two bytes to a one-byte code system using alphanumeric characters as a standard. For example, when processing a character string in which a two-byte character appears after a one-byte character, there is a problem of how to notify a system that processes the character string that the byte length representing one character has changed. , ASCII code '1' and J
There is a problem that the IS kanji code '1' must be recognized as the same number. These problems have a great effect on a system that processes text codes, such as a language processing system.

第２図は従来の不定バイト長文字入力制御方式の構成
を表す図である。テキストコード生成部201が生成する
テキストコード208では、該テキストコード208中の２バ
イト文字列部分が２バイト文字の始まりまたは終わりを
示すバイト長が固定の２つの制御文字で囲まれている。
たとえば、‘ABC日本語XYZ'という文字列では、第３に
示すように‘C'と‘日’の間と‘語’と‘X'の間にそれ
ぞれ２バイト文字の始まりと終わりを表す制御文字＜KI
＞，＜KO＞が挿入されている。これら２つの制御文字は
２バイトからなり、16進数コードではそれぞれ3F75,3F7
6である。FIG. 2 is a diagram showing a configuration of a conventional character input control system of indefinite byte length. In the text code 208 generated by the text code generation unit 201, the two-byte character string portion in the text code 208 is surrounded by two control characters having a fixed byte length indicating the start or end of the two-byte character.
For example, in the character string "ABC Japanese XYZ", as shown in Fig. 3, the control that indicates the start and end of a double-byte character between "C" and "day" and between "word" and "X" respectively. Character <KI
>, <KO> are inserted. These two control characters consist of 2 bytes, and are 3F75 and 3F7 respectively in hexadecimal code.
6

テキストコード生成部201で生成したテキストコード2
08を与えられた１バイト入力処理部202は、与えられた
テキストコード208で１バイト文字を１バイト文字属性
識別部204に与える。１バイト文字属性識別部204は、与
えられた文字コードで１バイトコードテーブル206を表
引きし、その１バイト文字に対する属性情報が格納され
ている属性ビットテーブル209を得る。ここで仮に、１
バイト入力処理部202が１バイトの3Fというコードを発
見すると、次の１バイトを読み、それが75なら２バイト
入力処理部203へ制御を移し、そうでなければそのまま
１バイト入力処理を行う。２バイト入力処理部203へ制
御が移ると２バイト文字を２バイト文字属性識別部205
に与え、２バイトコートテーブル207をその２バイトコ
ードで表引きしてその２バイト文字に対する属性ビット
テーブル210を得る。２バイト文字列の終わりを示す制
御文字＜KO＞は２バイトであるから、２バイト入力処理
部203は特別な処理を行うことなく制御文字＜KO＞を識
別することができ、該制御文字＜KO＞を発見すると１バ
イト入力処理部202へ制御を戻す。Text code 2 generated by text code generator 201
The 1-byte input processing unit 202 given 08 gives a 1-byte character to the 1-byte character attribute identification unit 204 with the given text code 208. The one-byte character attribute identification unit 204 looks up the one-byte code table 206 using the given character code, and obtains an attribute bit table 209 storing attribute information for the one-byte character. Here, temporarily
When the byte input processing unit 202 finds the one-byte code of 3F, the next one byte is read, and if it is 75, the control is transferred to the two-byte input processing unit 203. Otherwise, the one-byte input process is performed. When control is transferred to the 2-byte input processing unit 203, the 2-byte character is identified by the 2-byte character attribute identification unit 205.
The two-byte code table 207 is represented by the two-byte code to obtain an attribute bit table 210 for the two-byte character. Since the control character <KO> indicating the end of the two-byte character string is two bytes, the two-byte input processing unit 203 can identify the control character <KO> without performing any special processing. When KO> is found, the control is returned to the 1-byte input processing unit 202.

文字属性の識別は以下のように行っていた。たとえ
ば、１バイトが８ビットの文字で32種類の属性の有無を
表すときは、要素数が256であって１要素が４バイトの
バイトコードテーブルを用意し、‘1'の文字コードが24
1であるとそのバイトコードテーブルの241番目の属性ビ
ットテーブルの数字属性を表すビットを‘ON'にする、
といった具合である。２バイト文字に対しては、テーブ
ルの大きさを65536にして処理する。Character attributes were identified as follows. For example, when one byte is an 8-bit character and indicates the presence or absence of 32 types of attributes, a byte code table with 256 elements and one element with 4 bytes is prepared, and the character code of “1” is 24 bytes.
If it is 1, the bit indicating the numeric attribute of the 241st attribute bit table of the bytecode table is set to 'ON',
And so on. For 2-byte characters, processing is performed with the table size set to 65536.

（発明が解決しようとする課題）上述した従来の不定バイト長文字入力制御方式では、
２バイト文字の前後に付加する２バイト文字の始まりま
たは終わりを示す２つの制御文字のバイト長が固定であ
るため、異なるバイト数の文字を含んだテキストコード
を処理するシステムが１バイト文字を入力中または２バ
イト文字を入力中に、処理中の文字とバイト長が異なる
制御文字に対応しなくてはならなかった。また、テキス
トコードの処理システムは、１バイト文字の入力処理部
と２バイト文字の入力処理部の２つの入力処理部とが必
要であり、処理システムが複雑かつ非効率的となってい
た。(Problem to be Solved by the Invention) In the above-mentioned conventional character input control method of undefined byte length,
Since the byte lengths of the two control characters that indicate the beginning or end of the double-byte character added before and after the double-byte character are fixed, a system that processes text codes containing characters with different numbers of bytes inputs single-byte characters. When entering medium or double-byte characters, it was necessary to accommodate control characters that differed in byte length from the character being processed. In addition, the text code processing system requires two input processing units, a one-byte character input processing unit and a two-byte character input processing unit, and the processing system is complicated and inefficient.

このように、従来の不定バイト長文字入力制御方式に
は解決すべき課題があった。As described above, there is a problem to be solved in the conventional character input control method of indefinite byte length.

（課題を解決するための手段）本発明は、１バイトで表現される英語文字と複数バイ
トで表現される日本語文字とが混在する文字列の処理を
行う計算機システムにおける不定バイト長文字入力制御
方式において、文字列に混在するそれぞれ異なるバイト長の文字毎に
文字と同じバイト長であって文字列において以下に続く
文字のバイト長が変化することとその以下に続く文字の
バイト長とを示す制御文字を生成する制御文字生成手段
と、文字を構成する各バイトに対応するバイトコードテ
ーブルを階層化して各バイト長毎のそれぞれの文字に対
する属性情報を管理する文字属性情報管理手段と、異なるバイト長の文字が混在する文字列を受け、該文
字列における異なるバイト長の文字と文字との間にある
仮想上の位置に、直前の文字と同じバイト長であって該
文字に続く文字のバイト長が何バイトであるかを示す前
記制御文字を挿入してテキストコードを生成するテキス
トコード生成手段と、該テキストコード生成手段が生成するテキストコード
を受け、前記文字属性情報を参照して前記テキストコー
ドの各文字に定義されている文字属性を識別する文字属
性識別手段と、を有する。(Means for Solving the Problems) The present invention relates to an undefined byte length character input control in a computer system for processing a character string in which English characters represented by one byte and Japanese characters represented by a plurality of bytes are mixed. In the method, for each character with a different byte length mixed in the character string, it indicates that the byte length of the following character changes in the character string and the byte length of the following character in the character string Control character generating means for generating control characters, character attribute information managing means for hierarchizing a byte code table corresponding to each byte constituting the character and managing attribute information for each character for each byte length, different bytes Receives a character string containing mixed-length characters, and places the same character as the immediately preceding character at a virtual position between the characters with different byte lengths in the character string. Text code generation means for generating a text code by inserting the control character indicating the byte length of a character following the character, the text code being generated by the text code generation means; And character attribute identification means for identifying a character attribute defined for each character of the text code with reference to the character attribute information.

（実施例）次に、本発明について図面を参照して説明する。(Example) Next, the present invention will be described with reference to the drawings.

第１図は本発明の一実施例を示す構成図である。テキ
ストコード生成部１は、入力された文字列が途中でその
文字列を構成する文字のバイト長が変化したときに、変
化前の文字のバイト長である制御文字を付加したテキス
トコードを生成する。文字属性識別部２は、記憶領域３
中の１バイト文字属性情報５および２バイト文字属性情
報６を管理する。それぞれの文字属性情報はいくつかの
バイトコードテーブル7,8,9、10を有しており、各バイ
ト長のそれぞれの文字に対する属性をその文字の各バイ
トに対応するバイトコードテーブルを階層化して管理し
ている。そして各文字の最下位バイトに対応するバイト
コードテーブルには、その文字に対する属性を示す属性
ビットテーブルまたは、文字のバイト長が変化すること
を知らせる他の文字属性情報の最上位バイトに対するバ
イトコードテーブルへのポインタが格納されている。レ
ジスタCT4は現在入力中の文字のバイト長に対する文字
属性情報の最上位バイトのバイトコードテーブルを指
す。レジスタCT4の初期値は、１バイト文字属性情報５
のバイトコードテーブル７へのポインタ値である。FIG. 1 is a block diagram showing one embodiment of the present invention. The text code generation unit 1 generates a text code to which a control character, which is the byte length of the character before the change, is added when the byte length of the character forming the character string changes in the middle of the input character string. . The character attribute identification unit 2 is a storage area 3
1-byte character attribute information 5 and 2-byte character attribute information 6 therein. Each character attribute information has several byte code tables 7, 8, 9, and 10, and the attribute for each character of each byte length is obtained by layering the byte code table corresponding to each byte of the character. Managing. The byte code table corresponding to the least significant byte of each character includes an attribute bit table indicating the attribute of the character or a byte code table corresponding to the most significant byte of other character attribute information indicating that the byte length of the character changes. The pointer to is stored. The register CT4 indicates the byte code table of the most significant byte of the character attribute information with respect to the byte length of the currently input character. The initial value of register CT4 is 1-byte character attribute information 5
Is a pointer value to the byte code table 7 of FIG.

１バイトが８ビットで１要素の大きさが４バイトであ
るバイトコードテーブルについて考える。ここで、１バ
イト文字から２バイト文字への変化を示す１バイトの制
御文字＜2S＞のコードを3F、２バイト文字から１バイト
文字への変化を示す２バイトの制御文字＜1S＞のコード
を3F76とする。このとき、テキストコード生成部１は
“123456'という文字列に対して、第４図に示すテキス
トコードを生成する。ここで‘1',‘2',‘5',‘6'は１
バイト文字であり、16進数でそれぞれF1,F2,F5,F6のコ
ードであるとする。‘3',‘4'は２バイト文字であり、
そのコードは16進法でそれぞれ7BF3,7BF4であるとす
る。‘2'と‘3'との間および‘4'と‘5'との間には、そ
れぞれ＜2S＞，＜1S＞の制御文字が挿入されている。し
たがって、２バイト長の文字の前後にバイト長が変化す
る直前の文字と同じバイト長の制御文字を挿入すること
により文字のバイト長が変化することを示す制御文字を
全て現在入力中の文字のバイト長で表現するから、現在
入力中の文字のバイト長で制御文字を識別することがで
きる。Consider a byte code table in which one byte is 8 bits and the size of one element is 4 bytes. Here, the code of a 1-byte control character <2S> indicating a change from a 1-byte character to a 2-byte character is 3F, and a code of a 2-byte control character <1S> indicating a change from a 2-byte character to a 1-byte character. To 3F76. At this time, the text code generation unit 1 generates the text code shown in Fig. 4 for the character string "123456 ', where" 1 "," 2 "," 5 ", and" 6 "are 1
It is assumed that the characters are byte characters and the codes are F1, F2, F5, and F6 in hexadecimal. '3' and '4' are double-byte characters,
Assume that the codes are 7BF3 and 7BF4 in hexadecimal, respectively. Control characters <2S> and <1S> are inserted between '2' and '3' and between '4' and '5', respectively. Therefore, by inserting a control character having the same byte length as the character immediately before the byte length changes before and after the 2-byte length character, all the control characters indicating that the byte length of the character changes will be the same as those of the currently input character. Since it is expressed in byte length, the control character can be identified by the byte length of the currently input character.

次に、テキストコード生成部１で生成された第４図に
示すテキストコードの処理について説明する。第５図は
文字属性識別部２の処理を示す流れ図である。文字属性
識別部２は、まず、レジスタP15にレジスタCT4の内容を
代入する（ステップ501）。このとき、レジスタCT4には
１バイト文字属性情報５のバイトコードテーブル７への
ポインタ値が格納されており、そのポインタ値がレジス
タP15に代入される。そして、テキストコードの１バイ
トを読み、レジスタQ16に代入する（ステップ502）。す
ると、レジスタQ16には文字‘1'の文字コードF1が渡
り、レジスタCT4の指し示すバイトコードテーブル７のF
1番目の要素の内容をレジスタP15に代入する（ステップ
503）（第７図（ａ））。Next, the processing of the text code shown in FIG. 4 generated by the text code generation unit 1 will be described. FIG. 5 is a flowchart showing the processing of the character attribute identification unit 2. First, the character attribute identification unit 2 substitutes the contents of the register CT4 into the register P15 (step 501). At this time, the pointer value of the one-byte character attribute information 5 to the byte code table 7 is stored in the register CT4, and the pointer value is assigned to the register P15. Then, one byte of the text code is read and assigned to the register Q16 (step 502). Then, the character code F1 of the character “1” is passed to the register Q16, and the F of the byte code table 7 indicated by the register CT4 is
Substitute the contents of the first element into register P15 (step
503) (FIG. 7 (a)).

各バイトコードテーブルの要素の内容は、他のバイト
コードテーブルへのポインタ値または属性ビットテーブ
ルもしくは文字がバイト長の変化を示す制御文字である
ときの他の文字属性情報の最上位バイトに対するバイト
コードテーブルへのポインタ値のいずれかである。これ
らは次のようにして識別される。各バイトコードテーブ
ルの大きさは一定で、４の倍数のバイト数であるから、
記憶領域３の中で整列することにより、それぞれのバイ
トコードテーブルの先頭番地の下位２ビットを０にする
ことができる。したがって、バイトコードテーブルの要
素の最下位ビットを第０ビット、最上位ビットを第31ビ
ットとすると、第６図に示すように第１、第０ビットの
値が00なら他のバイトコードテーブルへのポインタ値
（601）、01なら属性ビットテーブル（602）、11なら他
の文字属性情報の最上位バイトに対するバイトコードド
テーブルへのポインタ値に３を加えた値として識別する
ことができる。The content of each bytecode table element is a pointer value to another bytecode table or the bytecode for the most significant byte of the attribute bit table or other character attribute information when a character is a control character indicating a change in byte length One of the pointer values to the table. These are identified as follows. Since the size of each bytecode table is constant and is a multiple of 4 bytes,
By arranging in the storage area 3, the lower 2 bits of the head address of each byte code table can be set to 0. Therefore, assuming that the least significant bit of the byte code table element is the 0th bit and the most significant bit is the 31st bit, if the values of the first and 0th bits are 00 as shown in FIG. Can be identified as a value obtained by adding 3 to the pointer value to the byte-coded table for the most significant byte of the other character attribute information if the pointer bit is (601), 01 is the attribute bit table (602), and 11 is the attribute bit table.

今、入力処理をしている文字のバイト長は１であるか
らレジスタP15の内容は他のバイトコードテーブルへの
ポインタ値ではなくまたその文字は‘1'であって制御文
字でもないからレジスタP15に与えられた属性ビットテ
ーブル12の内容に従った各属性処理を行う（ステップ50
4,505,506）（‘1'という文字の意味から数字としての
処理が行われる）。次に、ステップ501へと処理が戻り
‘2'の文字に対して同様の処理が行われる。制御文字で
ある2Sに対してはステップ503によってレジスタP15に１
バイト文字属性情報５のバイトコードテーブル７の3F番
目の内容が得られる。この内容は、２バイト文字属性情
報６の上位バイトに対するバイトコードテーブル８への
ポインタ値に３を加えた値であるから、ステップ504,50
5を経てステップ507へ処理が移る。ここでレデスタP15
の内容から３を引いた値をレジスタCT4に代入して（ス
テップ507）、ステップ501に戻る。以下、レジスタCT4
は、２バイト文字属性情報６の上位バイトに対するバイ
トコードテーブルへのポインタ値を保持する。テキスト
コード11から得られる‘3'の２バイトの文字コードの上
位バイトを得るとレジスタCT4の指すバイトコードテー
ブル８の7B番目の要素には、他のバイトコードテーブル
９へのポインタ値が格納してあるから、ステップ504で
はステップ502へ処理を移し、下位バイトに対しては下
位バイトのバイトコードテーブル10のF3番目の内容から
属性ビットテーブル14を得て、各属性処理を行う（ステ
ップ506）（第７図（ｂ））。文字‘4'については、
‘3'と同様である。制御文字1Sは２バイトからなる１バ
イト文字の始まりを示す制御文字で、レジスタCT4の指
すバイトコードテーブル８の3F番目の要素が指すバイト
コードテーブル９の76番目の内容には１バイト文字属性
情報５のバイトコードテーブル７へのポインタ値に３を
加えた値が設定してあり、ステップ507によってレジス
タCT4は再び１バイト文字を識別する状態になる。以
下、文字‘5',‘6'が‘1',‘2'と同様に入力して処理さ
れる。Now, since the byte length of the character for which input processing is performed is 1, the content of the register P15 is not a pointer value to another byte code table, and the character is "1" and is not a control character. Each attribute processing is performed according to the contents of the attribute bit table 12 given to the
(4,505,506) (processing as a number is performed from the meaning of the character '1'). Next, the process returns to step 501, and the same process is performed on the character “2”. At step 503, 1S is stored in the register P15 for the control character 2S.
The 3F-th contents of the byte code table 7 of the byte character attribute information 5 are obtained. Since this content is a value obtained by adding 3 to the pointer value to the byte code table 8 for the upper byte of the 2-byte character attribute information 6, steps 504, 50
After step 5, the processing moves to step 507. Here Redesta P15
Is substituted into the register CT4 (step 507), and the process returns to step 501. Below, register CT4
Holds the pointer value to the byte code table for the upper byte of the 2-byte character attribute information 6. When the upper byte of the two-byte character code of “3” obtained from the text code 11 is obtained, a pointer value to another byte code table 9 is stored in the 7B-th element of the byte code table 8 indicated by the register CT4. In step 504, the process proceeds to step 502. For the lower byte, the attribute bit table 14 is obtained from the F3 contents of the byte code table 10 of the lower byte, and each attribute process is performed (step 506). (FIG. 7 (b)). For the character '4',
Same as '3'. The control character 1S is a control character indicating the start of a one-byte character consisting of two bytes. The 76th content of the bytecode table 9 indicated by the 3Fth element of the bytecode table 8 indicated by the register CT4 includes 1-byte character attribute information. A value obtained by adding 3 to the pointer value to the byte code table 7 of 5 is set, and the register CT4 is again in a state of identifying a 1-byte character by step 507. Hereinafter, characters '5' and '6' are input and processed in the same manner as '1' and '2'.

以上に説明した実施例は、１バイト文字と２バイト文
字に限ったものであったが、１〜Ｎバイトの文字を扱う
システムでは、各文字毎にそのバイト長の変化を示す制
御文字をそれぞれＮ−１個定義することによって容易に
本発明を適用することができる。たとえば、１〜３バイ
トの文字を扱うシステムでは、２バイトおよび３バイト
文字に変化することを示す１バイトの２つの制御文字、
１バイトおよび３バイト文字に変化することを示す２バ
イトの２つの制御文字、１バイトおよび２バイト文字に
変化することを示す３バイトの２つの制御文字の６個の
制御文字を定義すれば良い。また、本発明は一見多くの
バイトコードテーブルを必要とするようにみえるが、実
際は特殊属性を持つ文字というのは極限られたいくつか
の文字であるから全く同じ内容のバイトコードテーブル
が多く存在する傾向がある。そこで、バイトコードテー
ブルを１つだけ用意し、別々なところから参照できるよ
うに工夫することによって多数のバイトコードテーブル
が必要であるという問題を解決することができる。Although the embodiment described above is limited to one-byte characters and two-byte characters, in a system that handles 1 to N-byte characters, a control character indicating a change in the byte length is provided for each character. The present invention can be easily applied by defining N-1 pieces. For example, in a system that handles 1-3 byte characters, two 1-byte control characters indicate that they will change to 2-byte and 3-byte characters,
It is sufficient to define six control characters of two control characters of two bytes indicating change to one-byte and three-byte characters, and two control characters of three bytes indicating change to one-byte and two-byte characters. . Also, the present invention seems to require many bytecode tables at first glance, but in reality, there are many bytecode tables having exactly the same contents because characters having special attributes are a very limited number of characters. Tend. Therefore, it is possible to solve the problem that a large number of bytecode tables are required by preparing only one bytecode table and devising it so that it can be referred to from different places.

（発明の効果）以上に詳しく説明したように本発明の不定バイト長文
字入力制御方式は、テキストコードを処理するシステム
が１バイト文字を入力中であっても２バイト文字を入力
中であっても制御文字に対する処理の乱れがない。ま
た、テキストコードの処理部に１バイト文字に対する入
力処理部と２バイト文字に対する入力処理部とをそれぞ
れ用意する必要がなく、単純かつ効率的なテキストコー
ドの処理部が実現できる。(Effect of the Invention) As described in detail above, the variable byte character input control method of the present invention provides a system for processing a text code in which a single-byte character is being input while a double-byte character is being input. Also, there is no disturbance in processing for control characters. Further, it is not necessary to prepare an input processing unit for one-byte characters and an input processing unit for two-byte characters in the text code processing unit, respectively, and a simple and efficient text code processing unit can be realized.

本発明には以上のような効果がある。 The present invention has the above effects.

[Brief description of the drawings]

第１図は本発明の一実施例を示す構成図、第２図は従来
の不定バイト長文字入力制御方式を示す構成図、第３図
は従来のテキストコード生成部201が生成したテキスト
コードを示す図、第４図は本発明のテキストコード生成
部１が生成したテキストコードを示す図、第５図は本発
明の文字属性識別部２の処理を示す流れ図、第６図は本
発明のバイトコードテーブルの要素の種類を示す図、第
７図は本発明のバイトコードテーブルにより階層的に管
理される属性ビットテーブルを示す図である。 1,201……テキストコード生成部、２……文字属性識別
部、３……記憶領域、４……レジスタCT、５……１バイ
ト文字属性情報、６……２バイト文字属性情報、7,8,9,
10……バイトコードテーブル、11,208……テキストコー
ド、12,13,14,209,210……属性ビットテーブル、15……
レジスタＰ、16……レジスタＱ、202……１バイト入力
処理部、203……２バイト入力処理部、204……１バイト
文字属性識別部、205……２バイト文字属性識別部、206
……１バイトコードテーブル、207……２バイトコード
テーブル、601,602,603……バイトコードテーブルの要
素。FIG. 1 is a block diagram showing one embodiment of the present invention, FIG. 2 is a block diagram showing a conventional character input control method of indefinite byte length, and FIG. FIG. 4 is a diagram showing a text code generated by the text code generation unit 1 of the present invention. FIG. 5 is a flowchart showing processing of the character attribute identification unit 2 of the present invention. FIG. 6 is a byte diagram of the present invention. FIG. 7 is a diagram showing types of elements of the code table, and FIG. 7 is a diagram showing an attribute bit table managed hierarchically by the bytecode table of the present invention. 1, 201: text code generation unit, 2: character attribute identification unit, 3: storage area, 4: register CT, 5: 1-byte character attribute information, 6: 2-byte character attribute information, 7, 8, 9,
10… Byte code table, 11,208 …… Text code, 12,13,14,209,210 …… Attribute bit table, 15…
Registers P, 16: Register Q, 202: 1-byte input processing unit, 203: 2-byte input processing unit, 204: 1-byte character attribute identification unit, 205: 2-byte character attribute identification unit, 206
... 1 byte code table, 207 ... 2 byte code table, 601, 602, 603 ... elements of the byte code table.

Claims

(57) [Claims]

An indeterminate byte length character input control method in a computer system for processing a character string in which English characters represented by one byte and Japanese characters represented by a plurality of bytes are mixed. Control character generation that generates control characters that have the same byte length as the character for each character with a different byte length, and indicate that the byte length of the following character changes in the character string and the byte length of the following character Means, character attribute information management means for hierarchizing a byte code table corresponding to each byte constituting a character, and managing attribute information for each character for each byte length, and a character string in which characters of different byte lengths are mixed. Received in a virtual position between characters having different byte lengths in the character string, and having the same byte length as the immediately preceding character and following the character. Text code generating means for generating a text code by inserting the control character indicating how many bytes the character has, and receiving the text code generated by the text code generating means, and referring to the character attribute information And a character attribute identification means for identifying a character attribute defined for each character of the text code.