JP2643330B2

JP2643330B2 - Conversion method of internal representation format of character string

Info

Publication number: JP2643330B2
Application number: JP63183139A
Authority: JP
Inventors: 政富稲垣; 峰明横山; 徹荒谷
Original assignee: Fuji Xerox Co Ltd
Current assignee: Fujifilm Business Innovation Corp
Priority date: 1988-07-22
Filing date: 1988-07-22
Publication date: 1997-08-20
Anticipated expiration: 2012-08-20
Also published as: JPH0233222A

Description

【発明の詳細な説明】〔産業上の利用分野〕この発明は計算機システムにおける文字列の内部表現
形式変換方法に関する。Description: TECHNICAL FIELD The present invention relates to a method for converting a character string into an internal representation in a computer system.

[Conventional technology]

従来、英数字等の１文字あたり１バイトの記憶領域を
必要とする文字のみからなる１バイト文字列と、漢字等
の１文字あたり２バイトの記憶領域を必要とする文字を
含むことが可能な２バイト文字列の２種類の文字列内部
表現形式を持つ計算機システムにおいては、プログラム
を入力する際、プログラマは文字列の生成時に１バイト
又は２バイトのどちらの内部表現形式を用いるかを明示
的に指定していた。Conventionally, it is possible to include a one-byte character string composed of only characters requiring one-byte storage area per character such as alphanumeric characters, and a character requiring two-byte storage area per character such as kanji. In a computer system having two types of character string internal representations of a two-byte character string, when inputting a program, the programmer explicitly specifies whether to use a one-byte or two-byte internal representation when generating a character string. Was specified.

すなわち、現在知られている各種コンパイラ及びイン
タープリタには、少なくとも１バイト長及び２バイト長
のデータ型が存在し、プログラム内で文字列を使用する
場合には該文字列のデータ型を宣言する必要があるた
め、プログラマがプログラミングする際には、英数字等
の１バイト文字のみからなる文字列については、１バイ
トのデータ型で宣言した配列に該文字列を格納し、漢字
等の２バイト文字からなる文字列については２バイトの
データ型で宣言した配列に該文字列を格納しなければな
らない。That is, currently known various compilers and interpreters have data types of at least 1-byte length and 2-byte length, and when a character string is used in a program, it is necessary to declare the data type of the character string. Therefore, when programming by a programmer, for a character string consisting of only single-byte characters such as alphanumeric characters, store the character string in an array declared with a 1-byte data type, and use 2-byte characters such as kanji Must be stored in an array declared with a 2-byte data type.

[Problems to be solved by the invention]

このため、プログラムは常に文字列の内部表現形式を
意識しながらプログラムを作成しなければならず、ま
た、ある内部表現形式を持つ文字列に対して、他の内部
表現形式を持つ文字列を挿入したり、一部の文字の内部
表現形式を他の内部表現形式に置換する処理を含むプロ
グラムを作成する場合にも、常に文字列の内部表現形式
を意識する必要がある。For this reason, programs must always be aware of the internal representation of character strings when writing programs, and insert a character string with another internal representation into a character string with one internal representation. It is necessary to always be aware of the internal representation of a character string when creating a program that includes processing for replacing the internal representation of some characters with another internal representation.

このように、文字列の内部表現形式という一種の足枷
により、プログラマによるプログラムの作成作業が複雑
化し、プログラミング効率が低下し、場合によっては誤
入力をもたらすという問題があった。As described above, the kind of shackles of the internal representation format of the character string complicates the program creation work by the programmer, reduces the programming efficiency, and in some cases, causes erroneous input.

そこで、この発明では、上記従来技術の問題を解決
し、プログラマがプログラミングを行う際に、異なる内
部表現形式を持つ文字列が存在する場合であっても各文
字列の内部表現形式を意識することなくプログラムを記
述できるようにし、もってプログラマによるプログラミ
ングを容易にすることを目的とする。In view of the above, the present invention solves the above-mentioned problem of the prior art, and allows a programmer to be aware of the internal representation form of each character string when programming, even when character strings having different internal representation forms exist. It is an object of the present invention to make it possible to describe a program without programming, thereby facilitating programming by a programmer.

[Means for solving the problem]

上記課題を解決するため、本発明は、Ｘバイトの文字
コード長で各文字を内部表現したＸバイト文字列と、該
Ｘバイトよりも長いＹバイトの文字コード長で各文字を
内部表現したＹバイト文字列とを処理対象とし、前記Ｘ
バイト文字列とＹバイト文字列とで異なる処理を行う計
算機システムの内部表現形式変換方法であって、Ｘバイ
トの文字コード長で内部表現された文字の文字列の処理
においてＹバイトの文字コードで内部表現された文字を
検出すると、前記Ｘバイトの文字コード長で表現された
文字列をＹバイトの文字コード長の内部表現形式に変換
することを特徴とする。In order to solve the above problem, the present invention provides an X-byte character string in which each character is internally represented by a character code length of X bytes, and a Y-character in which each character is internally represented by a character code length of Y bytes longer than the X bytes. A byte character string and the X
An internal representation format conversion method for a computer system that performs different processing between a byte character string and a Y byte character string, wherein a Y character code is used to process a character string internally represented by an X byte character code length. When an internally represented character is detected, the character string represented by the X-byte character code length is converted into an Y-byte character code length internal representation format.

また、本発明は、Ｘバイトの文字コード長で各文字を
内部表現したＸバイト文字列と、該Ｘバイトよりも長い
Ｙバイトの文字コード長で各文字を内部表現したＹバイ
ト文字列とを処理対象とし、前記Ｘバイト文字列とＹバ
イト文字列とで異なる処理を行う計算機システムの内部
表現形式変換方法であって、Ｘバイトの文字コード長で
内部表現された第１の文字列の所定の位置にＹバイトの
文字コード長で内部表現される第２の文字列が挿入され
た文字列からなるプログラムを受け付け、文字列の内部
表面の異なる文字を検出すると、該プログラムで定義さ
れた第１の文字列の各文字要素の内部表現形式をＹバイ
トの文字コード長を有するＹバイト文字にそれぞれ変換
した後、変換後の第１の文字列に前記第２の文字列を挿
入して第３の文字列を生成し、生成した第３の文字列に
対してＹバイト文字列に対応する処理を行うことを特徴
とする。Further, the present invention provides an X-byte character string that internally represents each character with a character code length of X bytes and a Y-byte character string that internally represents each character with a character code length of Y bytes longer than the X bytes. An internal representation format conversion method for a computer system that performs a different process between the X-byte character string and the Y-byte character string as a processing target, the method comprising: determining a first character string internally represented by an X-byte character code length Accepts a program consisting of a character string in which a second character string internally represented by a character code length of Y bytes is inserted at the position of, and when a different character on the inner surface of the character string is detected, After converting the internal representation of each character element of the first character string into Y-byte characters having a Y-byte character code length, inserting the second character string into the converted first character string, String of 3 Generated, and performs a process corresponding to the Y byte string for the generated third string.

[Action]

入力されたプログラムの文字データは、プログラムの
実行時にその内部表現形式が処理内容に応じて変換され
るので、プログラマはデータの入力時に文字の内部表現
形式を意識することなく同様の操作で文字列を取り扱う
ことができる。When the character data of the input program is executed, the internal representation is converted according to the processing content when the program is executed, so the programmer can perform the same operation without having to be aware of the internal representation of the character when inputting the data. Can be handled.

〔Example〕

以下、この発明の実施例を図面と共に説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

第２図はこの発明に係る文字列の内部表現形式変換方
法を実施するための計算機システムを示す構成ブロック
図である。この計算機システムは、データやコマンドの
入力手段であるキーボード１と、このキーボード１から
入力されたデータ等を表示する出力手段であるディスプ
レイ装置２と、後述する処理手順のフローチャートに基
づいたプログラムのほか、各種処理を実行するための制
御プログラムが格納されたメモリ３と、前記キーボード
１から入力されたデータ等を記憶する入力バッファ４
と、前記メモリ３に格納されたプログラムに基づいて、
キーボード１から入力されたデータの各種処理を行なう
と共に、上記各部の制御を行なうCPU5とから構成され、
英数字等の１文字あたり１バイトの記憶領域を必要とす
る文字のみからなる１バイト文字列と、漢字等の１文字
あたり２バイトの記憶領域を必要とする文字を含むこと
が可能な２バイト文字列の２種類の２文字列内部表現形
式を持っている。FIG. 2 is a block diagram showing the configuration of a computer system for implementing the method of converting a character string into an internal representation according to the present invention. The computer system includes a keyboard 1 as input means for data and commands, a display device 2 as output means for displaying data and the like input from the keyboard 1, and a program based on a flowchart of a processing procedure described later. , A memory 3 storing a control program for executing various processes, and an input buffer 4 storing data input from the keyboard 1 and the like.
And based on the program stored in the memory 3,
A CPU 5 that performs various processes on data input from the keyboard 1 and controls the above-described units;
A 1-byte character string consisting of only characters that require a 1-byte storage area per character, such as alphanumeric characters, and 2 bytes that can contain characters that require a 2-byte storage area per character, such as kanji It has two types of character string internal representation format.

次に、上記計算機システムにおいて、ある文字列に内
部表現形式が異なる文字を挿入する処理を含むプログラ
ムを実行する場合の一例を第１図を用いて説明する。Next, an example of executing a program including a process of inserting a character having a different internal representation into a certain character string in the computer system will be described with reference to FIG.

第１図は、１バイト長の内部表現を持つ文字からなる
文字列に対して２バイト長の文字を挿入する処理を含む
プログラムを処理する際の処理手順を示すものである。
図において、１バイト文字列の内部表現（S₁）を持つ文
字列Ｓ「abcd」に対して、２バイト文字（ｃ）「亜」を
挿入する場合、まず、文字列Ｓの内部表現を２バイト文
字列の内部表現（S₂）「abcd」に動的に変換する。そし
て、２バイト文字列となった文字列Ｓに文字（ｃ）
「亜」を挿入することにより、２バイト文字列の内部表
現（S₂′）による文字列Ｓ「ab亜cd」を得ることができ
る。FIG. 1 shows a processing procedure for processing a program including a process of inserting a 2-byte character into a character string composed of a character having a 1-byte internal representation.
In the figure, when inserting a two-byte character (c) “a” into a character string S “abcd” having a one-byte character string internal representation (S ₁ ), first, the internal representation of the character string S is set to 2 internal representation of the byte string (S ₂₎ to dynamically converted to "abcd". Then, the character (c) is added to the character string S that has become a two-byte character string.
By inserting "sub", it is possible to obtain the string S to "ab sub cd" by internal representation of the 2-byte character string (S ₂ ').

このように、プログラマは、プログラムを作成する際
に、文字列S1が１バイト文字列で、文字Ｃが２バイト文
字であることを意識することなくプログラミングを行
い、計算機システム側が、この内部表現形式の違いを吸
収するため、プログラマによるプログラミングが容易に
なる。As described above, when creating a program, the programmer performs programming without being aware that the character string S1 is a one-byte character string and the character C is a two-byte character. The difference can be absorbed, so that the programmer can easily program.

次に、第２図のCPU5による文字列の内部表現形式変換
の処理手順を第３図のフローチャートを用いて説明す
る。Next, the processing procedure of the internal representation format conversion of the character string by the CPU 5 of FIG. 2 will be described with reference to the flowchart of FIG.

第３図において、CPU5はまず文字列Ｓが１バイト文字
列かどうか判断する（ステップ101）。ここで文字列Ｓ
が１バイト文字列でないときは、文字Ｃをそのまま挿入
する（ステップ104）。一方、ステップ101でＳが１バイ
ト文字列であるときはステップ102に移り、文字Ｃが２
バイト文字でなければステップ104に移行して文字Ｃを
挿入する。また、ステップ102で文字Ｃが２バイト文字
であるときには、文字列Ｓの内部表現形式を２バイト文
字列に動的に変換し（ステップ103）、文字Ｃを挿入す
る（ステップ104）。In FIG. 3, the CPU 5 first determines whether or not the character string S is a one-byte character string (step 101). Where the string S
Is not a one-byte character string, the character C is inserted as it is (step 104). On the other hand, if S is a one-byte character string in step 101, the process proceeds to step 102, where
If it is not a byte character, the process proceeds to step 104 to insert the character C. If the character C is a two-byte character in step 102, the internal representation of the character string S is dynamically converted to a two-byte character string (step 103), and the character C is inserted (step 104).

上記実施例では、１バイト文字列に対して２バイト文
字を挿入する場合を例にして説明したが、この発明はこ
れに限定されるものではなく、文字列の生成時や文字又
は文字列の削除、変換等の走査を行う場合でも適用する
ことができる。また、あらかじめデフォルトの内部表現
形式を指定しておけば、文字列の生成時の内部表現形式
を所定のバイト文字列に選択することができる。さら
に、既存の１バイト文字列によるプログラムに対して
も、異なる内部表現形式を持つ文字列の挿入を簡単に行
なうことができ、２バイト又は３バイトの文字列を変更
を加えることなくそのまま適用することができる。In the above-described embodiment, an example has been described in which a 2-byte character is inserted into a 1-byte character string. However, the present invention is not limited to this. The present invention can be applied even when scanning such as deletion and conversion is performed. If a default internal representation format is specified in advance, the internal representation format at the time of generating the character string can be selected as a predetermined byte character string. Furthermore, a character string having a different internal representation format can be easily inserted into an existing one-byte character string program, and a two-byte or three-byte character string can be applied without modification. be able to.

〔The invention's effect〕

以上説明したように、本発明では、Ｘバイトの文字コ
ード長で内部表現された文字の文字列の処理においてＹ
バイトの文字コードで内部表現された文字を検出する
と、前記Ｘバイトの文字コード長で表現された文字列を
Ｙバイトの文字コード長の内部表現形式に変換するよう
構成したので、プログラマは文字列の内部表現形式を意
識することなくプログラミングを容易に行うことがで
き、もってプログラミング効率の向上を図ることができ
る。As described above, according to the present invention, in processing a character string internally represented by a character code length of X bytes, Y
When the character internally represented by the byte character code is detected, the character string represented by the X-byte character code length is converted to the internal representation form of the Y-byte character code length. The programming can be easily performed without being aware of the internal expression form of the program, and the programming efficiency can be improved.

また、本発明では、Ｘバイトの文字コード長で内部表
現された第１の文字列の所定の位置にＹバイトの文字コ
ード長で内部表現される第２の文字列が挿入された文字
列からなるプログラムを受け付け、文字列の内部表現の
異なる文字を検出すると、該プログラムで定義された第
１の文字列の各文字要素の内部表現形式をＹバイトの文
字コード長を有するＹバイト文字にそれぞれ変換した
後、変換後の第１の文字列に第２の文字列を挿入して第
３の文字列を生成し、生成した第３の文字列に対してＹ
バイト文字列に対応する処理を行うよう構成したので、
プログラム内に内部表現形式が異なる文字列を結合する
場合であっても、プログラマは、各文字列の内部表現形
式を意識することなくプログラミングを行うことができ
る。Further, in the present invention, a character string in which a second character string internally represented by a Y-byte character code length is inserted at a predetermined position of a first character string internally represented by an X-byte character code length When a program having a different internal representation of a character string is detected, the internal representation form of each character element of the first character string defined by the program is converted to a Y-byte character having a Y-byte character code length. After the conversion, the second character string is inserted into the converted first character string to generate a third character string, and the generated third character string is subjected to Y
Since it is configured to perform processing corresponding to byte strings,
Even in the case where character strings having different internal representations are combined in a program, the programmer can perform programming without being aware of the internal representation of each character string.

[Brief description of the drawings]

第１図はこの発明の一実施例を示す説明図、第２図はこ
の発明を実施するための計算機システムを示す構成ブロ
ック図、第３図はCPUの処理手順を示すフローチャート
である。１……キーボード、２……ディスプレイ装置、３……メ
モリ、４……入力バッファ、５……CPU。FIG. 1 is an explanatory diagram showing an embodiment of the present invention, FIG. 2 is a block diagram showing a configuration of a computer system for carrying out the present invention, and FIG. 3 is a flowchart showing a processing procedure of a CPU. 1 ... keyboard, 2 ... display device, 3 ... memory, 4 ... input buffer, 5 ... CPU.

───────────────────────────────────────────────────── フロントページの続き (72)発明者荒谷徹東京都渋谷区代々木３丁目57番６号グランフォーレ富士ゼロックス株式会社内 (56)参考文献特開昭61−163424（ＪＰ，Ａ) 特開昭63−88626（ＪＰ，Ａ) ──────────────────────────────────────────────────続き Continuation of the front page (72) Inventor Toru Araya 3-57-6 Yoyogi, Shibuya-ku, Tokyo Granforet Fuji Xerox Co., Ltd. (56) References JP-A-61-163424 (JP, A) Kaisho 63-88626 (JP, A)

Claims

(57) [Claims]

An X-byte character string internally representing each character with a character code length of X bytes and a Y-byte character string internally representing each character with a character code length of Y bytes longer than the X bytes are processed. An internal representation format conversion method for a computer system that performs different processing between the X-byte character string and the Y-byte character string, wherein Y is used for processing a character string internally represented by an X-byte character code length. Detecting a character internally represented by a byte character code, converting the character string represented by the X-byte character code length into an internal representation form of a Y-byte character code length; Expression format conversion method.

2. An X-byte character string internally representing each character with a character code length of X bytes and a Y-byte character string internally representing each character with a character code length of Y bytes longer than the X bytes A method for converting an internal representation format of a computer system, which performs a different process between the X-byte character string and the Y-byte character string, wherein a predetermined character string of a first character string internally represented by a character code length of X bytes When a program including a character string in which a second character string internally represented by a character code length of Y bytes is inserted at a position and a different character on the inner surface of the character string is detected, a first character defined by the program is detected. After converting the internal representation form of each character element of the character string into a Y-byte character having a character code length of Y bytes,
And generating a third character string by inserting the third character string, and performing a process corresponding to the Y-byte character string on the generated third character string.