JP5325309B2

JP5325309B2 - Method for optimizing character string processing during program execution, and computer system and computer program thereof

Info

Publication number: JP5325309B2
Application number: JP2012000276A
Authority: JP
Inventors: 一則緒方; 清久仁河内谷; 一明石崎
Original assignee: International Business Machines Corp
Current assignee: International Business Machines Corp
Priority date: 2012-01-04
Filing date: 2012-01-04
Publication date: 2013-10-23
Anticipated expiration: 2029-05-29
Also published as: JP2012064252A

Abstract

<P>PROBLEM TO BE SOLVED: To provide a mechanism for producing a program of high execution efficiency without lowering productivity of the program. <P>SOLUTION: There is provided a method of optimizing processing on character strings in program execution using characteristic information representing characteristics of the character strings made to correspond to the character strings in a computer system which executes a program including processing on character strings. The method includes the steps of: determining characteristic information on at least one character string of a first character string and a second character string as a result of arithmetic processing based on characteristics as to the first character string and the arithmetic processing the relating to the first character string; and making the determined characteristic information correspond to the at least one character string. The present invention relates to the computer system that executes the program including the processing on the character strings and optimizes the processing on the character strings in the program execution using the characteristic information, and the computer program. <P>COPYRIGHT: (C)2012,JPO&INPIT

Description

本発明は、プログラム実行時における文字列の処理を最適化する方法、並びにそのコンピュータ・システム及びコンピュータ・プログラムに関する。 The present invention relates to a method for optimizing character string processing during program execution, and a computer system and computer program therefor.

多くのプログラム言語のクラス・ライブラリには、文字列を保持するためのクラス（以下、文字列クラス）が標準のクラスとして用意されている。プログラマーがシステムに用意された豊富なクラス・ライブラリを利用することによって、プログラミング作業の生産性が向上される。しかし、クラス・ライブラリに用意されているクラスは汎用的に作られている。よって、必ずしも実行効率がよいプログラムが上記クラスを使用して生産されるわけではない。 In many programming language class libraries, a class for holding a character string (hereinafter, a character string class) is prepared as a standard class. Programmers can improve the productivity of programming work by utilizing the extensive class library provided in the system. However, the classes provided in the class library are created for general purposes. Therefore, a program with high execution efficiency is not necessarily produced using the above class.

例えばＪａｖａ（商標）言語には、文字列クラスの１つとしてjava.lang.Stringクラス（以下、Stringクラス）が用意されている。そして、Stringクラスには、指定された正規表現に合致する文字列中の部分を別の文字列に変換するreplaceFirst()メソッドが用意されている。replaceFirst()メソッドにおいて指定される正規表現は、正規表現のメタ文字を必ずしも含んでいるとは限らない。例えば「set」というメタ文字を含まない正規表現が指定されうる。該指定により、「set」という文字列が、例えば「get」という文字列に変換される。 For example, in the Java (trademark) language, a java.lang.String class (hereinafter referred to as a String class) is prepared as one of character string classes. The String class provides a replaceFirst () method that converts a part in a character string that matches a specified regular expression to another character string. The regular expression specified in the replaceFirst () method does not necessarily include the regular expression metacharacters. For example, a regular expression that does not include the meta character “set” may be specified. With this designation, the character string “set” is converted into a character string “get”, for example.

上記変換は、例えばString.indexOf()メソッドとString.substring()メソッドとを組み合わせることによっても実装ができる。上記組み合わせによる変換が実装された場合、プログラムの実行速度はreplaceFirst()メソッドによる実装よりも速くなる。一方で、プログラムの生産性及び汎用性はreplaceFirst()メソッドが実装された場合よりも低くなる。 The above conversion can also be implemented by combining, for example, the String.indexOf () method and the String.substring () method. When conversion by the above combination is implemented, the program execution speed is faster than the implementation by the replaceFirst () method. On the other hand, the productivity and versatility of the program is lower than when the replaceFirst () method is implemented.

プログラムの実行速度を優先する場合、例えば正規表現がメタ文字を含むかどうかに応じてreplaceFirst()メソッドと上記組み合わせとをプログラマーが使い分けてコードを作成する方法が考えられうる。しかし、正規表現がメタ文字を含むかどうかがプログラム実行時まで決定されない場合、プログラマーが、replaceFirst()メソッドによる実装と上記組み合わせによる実装との両方の実装を用意すること及び該両方の実装をメンテナンスすることは現実的ではない。 When priority is given to the execution speed of the program, for example, a method can be considered in which the programmer uses the replaceFirst () method and the above combination depending on whether the regular expression includes metacharacters or not, and creates code. However, if it is not determined until the program execution whether the regular expression contains metacharacters, the programmer must prepare both the implementation with the replaceFirst () method and the implementation with the above combination, and maintain both implementations. To do is not realistic.

プログラマーは、システムに用意された文字列クラスを利用することで、プログラミング作業の生産性を向上させる。しかし、文字列クラスは汎用的に作られているため、必ずしも実行効率がよいプログラムが生産されるわけではない。そこで、プログラムの生産性を下げることなく、実行効率の高いプログラムを生産するための仕組みが求められている。 The programmer improves the productivity of programming work by using the character string class prepared in the system. However, since the character string class is created for general purposes, a program with high execution efficiency is not necessarily produced. Therefore, there is a demand for a mechanism for producing a program with high execution efficiency without reducing the productivity of the program.

本発明は、文字列を処理するためのプログラムを実行するコンピュータ・システムにおいて、文字列に対応付けられた上記文字列の特性を表す特性情報を用いて、上記プログラム実行時における上記文字列の処理を最適化する方法を提供する。該方法は、コンピュータ・システムに下記ステップを実行させることを含む。該ステップは、
第１の文字列についての特性と該第１の文字列に関する演算処理とから、上記第１の文字列及び上記演算処理の結果である第２の文字列のうちの少なくとも１についての文字列の特性情報を判定するステップと、
上記判定された特性情報を、上記少なくとも１の文字列に対応付けるステップと
を含む。 The present invention provides a computer system that executes a program for processing the string, by using the characteristic information representing the characteristics of the character string associated with the string, the processing of the character string at runtime the program Provide a way to optimize. The method includes causing a computer system to perform the following steps. The step is
From the characteristic of the first character string and the arithmetic processing related to the first character string, the character string of at least one of the first character string and the second character string that is the result of the arithmetic processing is calculated. Determining characteristic information;
Associating the determined characteristic information with the at least one character string.

本発明の１つの実施態様では、上記方法は、コンピュータ・システムに下記ステップをさらに実行させることを含む。該ステップは、上記特性情報が対応付けられた文字列に関する処理を実行する場合に、上記対応付けられた特性情報に応じて、最適化された処理を選択して上記処理を実行するステップを含む。 In one embodiment of the invention, the method includes causing the computer system to further perform the following steps. This step includes a step of selecting the optimized process according to the associated characteristic information and executing the process when executing a process related to the character string associated with the characteristic information. .

本発明の１つの実施態様では、上記演算処理が、少なくとも１の第１の文字列を代入する少なくとも１の第２の文字列を生成し、該生成時に上記少なくとも１の第１の文字列を１文字ずつ逐次処理することを含み、
上記判定するステップが、上記逐次処理を実行するステップと該逐次処理とともに上記逐次処理する文字を検査するステップとをさらに含み、それによって上記少なくとも１の第２の文字列の特性情報が判定される。 In one embodiment of the present invention, the arithmetic processing generates at least one second character string into which at least one first character string is substituted, and the at least one first character string is generated at the time of generation. Including sequentially processing character by character,
The determining step further includes a step of executing the sequential processing and a step of examining the character to be sequentially processed together with the sequential processing, whereby characteristic information of the at least one second character string is determined. .

本発明の１つの実施態様では、上記演算処理が、特性情報が不明である又は対応付けられていない少なくとも１の上記第１の文字列を１文字ずつ逐次処理することを含み、
上記判定するステップが、上記逐次処理を実行するステップと該逐次処理とともに上記逐次処理する文字を検査するステップとをさらに含み、それによって上記少なくとも１の特性情報が判定される。 In one embodiment of the present invention, the arithmetic processing includes sequentially processing at least one of the first character strings whose characteristic information is unknown or not associated one character at a time,
The determining step further includes a step of executing the sequential processing and a step of inspecting the character to be sequentially processed together with the sequential processing, whereby the at least one characteristic information is determined.

本発明の１つの実施態様では、上記演算処理が、少なくとも１の第１の文字列についての演算処理であり、
上記判定するステップが、上記少なくとも１の第１の文字列それぞれについての特性情報と上記演算処理とから、少なくとも１の第２の文字列の特性情報を判定するステップを含む。 In one embodiment of the present invention, the arithmetic processing is arithmetic processing for at least one first character string,
The step of determining includes a step of determining characteristic information of at least one second character string from the characteristic information for each of the at least one first character string and the arithmetic processing.

本発明の１つの実施態様では、上記文字列の特性を表す特性情報が、上記文字列が文字集合に属する文字を含むこと、含まないこと又は含むかどうか不明であることを表す少なくとも１の第１の情報と、上記文字列が上記文字集合に属さない文字を含むこと、含まないこと又は含むかどうか不明であることを表す第２の情報とを含む。 In one embodiment of the present invention, the characteristic information indicating the characteristic of the character string includes at least one first character indicating that the character string includes, does not include, or does not include whether the character string belongs to a character set. 1 information and 2nd information showing that the said character string contains the character which does not belong to the said character set, it is not included or it is unknown whether it is included.

本発明の１つの実施態様では、上記文字集合が少なくとも２の第１の文字集合をまとめて表す第２の文字集合であり、該少なくとも２の第１の文字集合それぞれに含まれる文字が重複せず、上記第１の情報それぞれが、上記文字列が上記第１の文字集合それぞれに属する文字を含むこと、含まないこと又は含むかどうか不明であることを表し、上記第２の情報が、上記文字列が上記第１の文字集合全てに属さない文字を含むこと、含まないこと又は含むかどうか不明であることを表す。 In one embodiment of the present invention, the character set is a second character set that collectively represents at least two first character sets, and the characters included in each of the at least two first character sets overlap. Each of the first information represents that the character string includes, does not include, or is unknown whether the character string belongs to each of the first character sets, and the second information This indicates that the character string includes characters that do not belong to all of the first character set, does not include the character string, or whether it is unknown whether the character string is included.

本発明の１つの実施態様では、上記文字集合が少なくとも２の第１の文字集合をまとめて表す第２の文字集合であり、該少なくとも２の第１の文字集合それぞれに含まれる文字が重複せず、上記第１の情報が、上記文字列が上記第２の文字集合に属する文字を含むこと、含まないこと又は含むことが不明であることを表し、上記第２の情報が、上記文字列が上記第２の文字集合に属さない文字を含むこと、含まないこと又は含むかどうか不明であることを表す。 In one embodiment of the present invention, the character set is a second character set that collectively represents at least two first character sets, and the characters included in each of the at least two first character sets overlap. The first information represents that the character string includes, does not include, or is unknown to include characters belonging to the second character set, and the second information is the character string. Indicates that the character does not belong to the second character set, does not include, or whether it is unknown.

本発明の１つの実施態様では、上記第１の情報及び上記第２の情報がビット列で表され、ビットそれぞれが１つの文字集合又は文字集合のいずれにも属さない文字の集合に対応する。 In one embodiment of the present invention, the first information and the second information are represented by bit strings, and each bit corresponds to a character set or a character set that does not belong to any character set.

本発明の１つの実施態様では、上記演算処理が、第１の特性情報が対応付けられた第１の文字列と、第２の特性情報が対応付けられた第１の文字列とを結合する演算処理であり、
上記判定するステップが、上記第１の特性情報と上記第２の特性情報との論理和によって求められる特性情報を第２の文字列の特性情報と判定するステップを含む。 In one embodiment of the present invention, the arithmetic processing combines the first character string associated with the first characteristic information and the first character string associated with the second characteristic information. Arithmetic processing,
The step of determining includes the step of determining the characteristic information obtained by the logical sum of the first characteristic information and the second characteristic information as the characteristic information of the second character string.

本発明の１つの実施態様では、上記演算処理が、第１の特性情報が対応付けられた第１の文字列から第２の特性情報が対応付けられた第１の文字列を削除する演算処理であり、
上記判定するステップが、上記第１の特性情報を第２の文字列の特性情報と判定するステップを含む。 In one embodiment of the present invention, the calculation processing deletes the first character string associated with the second characteristic information from the first character string associated with the first characteristic information. And
The step of determining includes the step of determining the first characteristic information as characteristic information of a second character string.

本発明の１つの実施態様では、上記演算処理が、第１の特性情報が対応付けられた第１の文字列から第２の特性情報が対応付けられた第１の文字列を削除する演算処理であり、該演算処理によって上記第２の特性情報が表す文字集合に属する文字が上記第１の特性情報が対応付けられた第１の文字列に含まれなくなる場合、
上記判定するステップが、上記第１の特性情報と上記第２の特性情報との論理積によって求められる特性情報を第１の文字列の特性情報と判定するステップを含む。 In one embodiment of the present invention, the calculation processing deletes the first character string associated with the second characteristic information from the first character string associated with the first characteristic information. When the character belonging to the character set represented by the second characteristic information is not included in the first character string associated with the first characteristic information by the calculation process,
The step of determining includes a step of determining characteristic information obtained by a logical product of the first characteristic information and the second characteristic information as characteristic information of the first character string.

本発明の１つの実施態様では、上記第１の文字列及び上記演算処理の結果である第２の文字列のうちの少なくとも１についての文字列の特性情報を判定するステップが、
上記第１の特性情報と上記第２の特性情報との論理和によって求められる特性情報を第２の文字列の特性情報と判定するステップ、上記第１の特性情報を第２の文字列の特性情報と判定するステップ及び上記第１の特性情報と上記第２の特性情報との論理積によって求められる特性情報を第１の文字列の特性情報と判定するステップの少なくとも２を組み合わせて上記特性情報を判定するステップを含む。 In one embodiment of the present invention, the step of determining character string characteristic information for at least one of the first character string and the second character string that is a result of the arithmetic processing,
Determining the characteristic information obtained by the logical sum of the first characteristic information and the second characteristic information as the characteristic information of the second character string; and determining the first characteristic information as the characteristic of the second character string. The characteristic information by combining at least two of the step of determining the information and the step of determining the characteristic information obtained by the logical product of the first characteristic information and the second characteristic information as the characteristic information of the first character string The step of determining.

本発明の１つの実施態様では、上記文字集合が、大文字の集まり、小文字の集まり、数字の集まり、２バイト文字の集まり、ローマ字の集まり、同一の文字コード文字の集まり、正規表現の特殊文字の集まり、８ビットで表現できる文字の集まり及びＵＲＬエンコードにおける特殊文字の集まりからなる群から選択される少なくとも１を表す。 In one embodiment of the present invention, the character set includes a group of uppercase letters, a group of lowercase letters, a group of numbers, a group of two-byte characters, a group of Roman characters, a group of identical character code characters, a special character of a regular expression. This represents at least one selected from the group consisting of a group of characters that can be expressed by 8 bits and a group of special characters in URL encoding.

本発明の１つの実施態様では、上記演算処理が、少なくとも１の第１の文字列を代入する少なくとも１の第２の文字列を生成することを含み、
上記判定するステップが、上記少なくとも１の第１の文字列についての特性情報から上記第２の文字列の特性情報を判定するステップを含む。 In one embodiment of the present invention, the arithmetic processing includes generating at least one second character string that substitutes at least one first character string,
The step of determining includes the step of determining characteristic information of the second character string from characteristic information about the at least one first character string.

本発明の１つの実施態様では、上記演算処理が、文字列を生成し、該生成時に少なくとも１の第１の文字列を少なくとも１の第２の文字列に１文字ずつ代入する処理を含み、
上記判定するステップが、上記代入する処理を実行するステップと該代入とともに上記代入する文字を検査するステップとをさらに含み、それによって上記少なくとも１の第２の文字列の特性情報が判定される。 In one embodiment of the present invention, the arithmetic processing includes a process of generating a character string and substituting at least one first character string into at least one second character string one character at a time during the generation,
The step of determining further includes the step of executing the process of substituting, and the step of examining the character to be substituted together with the substituting, whereby the characteristic information of the at least one second character string is determined.

本発明の１つの実施態様では、上記文字列がオブジェクトとして扱われるプログラム言語において、上記文字列に関する演算処理が上記オブジェクトに関する演算処理であり、上記特性情報が上記オブジェクトに含まれる。 In one embodiment of the present invention, in a programming language in which the character string is handled as an object, the arithmetic processing related to the character string is the arithmetic processing related to the object, and the characteristic information is included in the object.

本発明はまた、文字列を処理するためのプログラムを実行するコンピュータ・プログラムを提供する。該コンピュータ・プログラムは、コンピュータ・システムに上記に記載の方法の各ステップを実行させる。 The present invention also provides a computer program for executing a program for processing the string. The computer program causes a computer system to execute the steps of the method described above.

本発明はまた、文字列を処理するためのプログラムを実行し、文字列に対応付けられた上記文字列の特性を表す特性情報を用いて、上記プログラム実行時における上記文字列の処理を最適化するコンピュータ・システムを提供する。該コンピュータ・システムは、
第１の文字列についての特性と該第１の文字列に関する演算処理とから、上記第１の文字列及び上記演算処理の結果である第２の文字列のうちの少なくとも１についての文字列の特性情報を判定する判定部と、
上記判定された特性情報を、上記少なくとも１の文字列に対応付ける対応付け部と
を含む。 The present invention also executes a program for processing the string, by using the characteristic information representing the characteristics of the character string associated with the string, optimizing the processing of the character string at runtime the program A computer system is provided. The computer system
From the characteristic of the first character string and the arithmetic processing related to the first character string, the character string of at least one of the first character string and the second character string that is the result of the arithmetic processing is calculated. A determination unit for determining characteristic information;
An association unit that associates the determined characteristic information with the at least one character string.

本発明の１つの実施態様では、上記特性情報が対応付けられた文字列に関する処理を実行する場合に、上記対応付けられた特性情報に応じて、最適化された処理を選択して上記処理を実行する実行部をさらに含む。 In one embodiment of the present invention, when a process related to a character string associated with the above characteristic information is executed, an optimized process is selected according to the associated characteristic information and the above process is performed. An execution unit is further included.

本発明の１つの実施態様では、上記演算処理が、少なくとも１の第１の文字列を代入する少なくとも１の第２の文字列を生成し、該生成時に上記少なくとも１の第１の文字列を１文字ずつ逐次処理することを含み、
上記判定部が、上記逐次処理を実行し、及び該逐次処理とともに上記逐次処理する文字を検査する。それによって上記少なくとも１の第２の文字列の特性情報が判定される。 In one embodiment of the present invention, the arithmetic processing generates at least one second character string into which at least one first character string is substituted, and the at least one first character string is generated at the time of generation. Including sequentially processing character by character,
The determination unit executes the sequential processing, and inspects the characters to be sequentially processed together with the sequential processing. Thereby, the characteristic information of the at least one second character string is determined.

本発明の１つの実施態様では、上記演算処理が、特性情報が不明である又は対応付けられていない少なくとも１の上記第１の文字列を１文字ずつ逐次処理することを含み、
上記判定部が、上記逐次処理を実行し、及び該逐次処理とともに上記逐次処理する文字を検査する。それによって上記少なくとも１の特性情報が判定される。 In one embodiment of the present invention, the arithmetic processing includes sequentially processing at least one of the first character strings whose characteristic information is unknown or not associated one character at a time,
The determination unit executes the sequential processing, and inspects the characters to be sequentially processed together with the sequential processing. Thereby, the at least one characteristic information is determined.

本発明の１つの実施態様では、上記演算処理が、少なくとも１の第１の文字列についての演算処理であり、
上記判定部が、上記少なくとも１の第１の文字列それぞれについての特性情報と上記演算処理とから、少なくとも１の第２の文字列の特性情報を判定する。 In one embodiment of the present invention, the arithmetic processing is arithmetic processing for at least one first character string,
The determination unit determines characteristic information of at least one second character string from the characteristic information for each of the at least one first character string and the arithmetic processing.

本発明の１つの実施態様では、上記演算処理が、第１の特性情報が対応付けられた第１の文字列と、第２の特性情報が対応付けられた第１の文字列とを結合する演算処理であり、
上記判定部が、上記第１の特性情報と上記第２の特性情報との論理和によって求められる特性情報を第２の文字列の特性情報と判定する。 In one embodiment of the present invention, the arithmetic processing combines the first character string associated with the first characteristic information and the first character string associated with the second characteristic information. Arithmetic processing,
The determination unit determines the characteristic information obtained by the logical sum of the first characteristic information and the second characteristic information as the characteristic information of the second character string.

本発明の１つの実施態様では、上記演算処理が、第１の特性情報が対応付けられた第１の文字列から第２の特性情報が対応付けられた第１の文字列を削除する演算処理であり、
上記判定部が、上記第１の特性情報を第２の文字列の特性情報と判定する。 In one embodiment of the present invention, the calculation processing deletes the first character string associated with the second characteristic information from the first character string associated with the first characteristic information. And
The determination unit determines the first characteristic information as characteristic information of the second character string.

本発明の１つの実施態様では、上記演算処理が、第１の特性情報が対応付けられた第１の文字列から第２の特性情報が対応付けられた第１の文字列を削除する演算処理であり、該演算処理によって上記第２の特性情報が表す文字集合に属する文字が上記第１の特性情報が対応付けられた第１の文字列に含まれなくなる場合、
上記判定部が、上記第１の特性情報と上記第２の特性情報との論理積によって求められる特性情報を第１の文字列の特性情報と判定する。 In one embodiment of the present invention, the calculation processing deletes the first character string associated with the second characteristic information from the first character string associated with the first characteristic information. When the character belonging to the character set represented by the second characteristic information is not included in the first character string associated with the first characteristic information by the calculation process,
The determination unit determines the characteristic information obtained by the logical product of the first characteristic information and the second characteristic information as the characteristic information of the first character string.

本発明の１つの実施態様では、上記判定部が、演算処理が、上記第１の特性情報と上記第２の特性情報との論理和によって求められる特性情報を第２の文字列の特性情報と判定するステップ、上記第１の特性情報を第２の文字列の特性情報と判定するステップ及び上記第１の特性情報と上記第２の特性情報との論理積によって求められる特性情報を第１の文字列の特性情報と判定するステップの少なくとも２を組み合わせて上記特性情報を判定する。 In one embodiment of the present invention, the determination unit may convert the characteristic information obtained by the logical sum of the first characteristic information and the second characteristic information to the characteristic information of the second character string. A step of determining, a step of determining the first characteristic information as characteristic information of the second character string, and characteristic information obtained by a logical product of the first characteristic information and the second characteristic information. The characteristic information is determined by combining at least two of the determination steps with character string characteristic information.

本発明の１つの実施態様では、上記演算処理が、少なくとも１の第１の文字列を代入する少なくとも１の第２の文字列を生成することを含み、
上記判定部が、上記少なくとも１の第１の文字列についての特性情報から上記第２の文字列の特性情報を判定する。 In one embodiment of the present invention, the arithmetic processing includes generating at least one second character string that substitutes at least one first character string,
The determination unit determines characteristic information of the second character string from characteristic information about the at least one first character string.

本発明の１つの実施態様では、上記演算処理が、文字列を生成し、該生成時に少なくとも１の第１の文字列を少なくとも１の第２の文字列に１文字ずつ代入する処理を含み、
上記判定部が、上記代入する処理を実行するステップと該代入とともに上記代入する文字を検査する。それによって上記少なくとも１の第２の文字列の特性情報が判定される。 In one embodiment of the present invention, the arithmetic processing includes a process of generating a character string and substituting at least one first character string into at least one second character string one character at a time during the generation,
The determination unit checks the character to be substituted together with the step of executing the substitution process and the substitution. Thereby, the characteristic information of the at least one second character string is determined.

本発明の実施態様によれば、文字列の内容又は文字列の値に応じて高速に実行される最適化された処理が、プログラム実行時に動的に切り替えられて実行される。上記最適化された処理は、クラス・ライブラリに用意されるため、プログラマーの生産性を低下させることなく、プログラムの実装速度が向上する。 According to the embodiment of the present invention, the optimized process that is executed at high speed according to the contents of the character string or the value of the character string is dynamically switched and executed when the program is executed. Since the optimized processing is prepared in the class library, the implementation speed of the program is improved without reducing the productivity of the programmer.

本発明の実施態様における、Ｊａｖａ（商標）で実装した場合の文字列オブジェクトの構造を示す。The structure of the character string object at the time of mounting by Java (trademark) in the embodiment of this invention is shown. 本発明の実施態様における、図１に示す特性情報の判定を説明するための図である。It is a figure for demonstrating determination of the characteristic information shown in FIG. 1 in the embodiment of this invention. 本発明の実施態様における、図１に示す文字列オブジェクトに含まれる特性情報を、他の文字列オブジェクトに伝搬するための実装例を示す。An implementation example for propagating characteristic information included in the character string object shown in FIG. 1 to another character string object in the embodiment of the present invention will be described. 本発明の実施態様における、図１に示す文字列オブジェクトを生成する処理のフローチャートを示す。2 shows a flowchart of processing for generating the character string object shown in FIG. 1 in the embodiment of the present invention. 本発明の実施態様における、図１に示す文字列オブジェクトに含まれる文字列を１文字ずつ逐次処理することを含む演算処理のフローチャートの例を示す。The example of the flowchart of the arithmetic processing which includes processing the character string contained in the character string object shown in FIG. 1 in the embodiment of this invention one character at a time is shown. 本発明の実施態様における、図１に示す文字列オブジェクトに含まれる文字列をまとめて処理することを含む演算処理のフローチャートの例を示す。The example of the flowchart of the arithmetic processing including collectively processing the character string contained in the character string object shown in FIG. 1 in the embodiment of this invention is shown. 本発明の実施態様における、例１の実施例を説明するためのプログラム・コードを示す。The program code for demonstrating the Example of Example 1 in the embodiment of this invention is shown. 本発明の実施態様における、例２の実施例を説明するためのプログラム・コードを示す。The program code for demonstrating the Example of Example 2 in the embodiment of this invention is shown. 本発明の実施態様における、下記図４Ｄの説明におけるプログラムに関する状態遷移図を示す。The state transition diagram regarding the program in description of the following FIG. 4D in the embodiment of this invention is shown. 本発明の実施態様における、例３の実施例を説明するためのプログラム・コードを示す。The program code for demonstrating the Example of Example 3 in the embodiment of this invention is shown. 本発明の実施態様における、例４の実施例を説明するためのプログラム・コードを示す。The program code for demonstrating the Example of Example 4 in the embodiment of this invention is shown. 本発明の実施態様における、図３Ａに示すフローチャートの処理を実装するプログラム・コードの例を示す。3C shows an example of program code for implementing the processing of the flowchart shown in FIG. 3A in the embodiment of the present invention. 本発明の実施態様における、プログラム実行時の文字列の処理を最適化するコンピュータ・システムが有する機能を図示する機能ブロック図の例を示す。2 shows an example of a functional block diagram illustrating functions of a computer system that optimizes character string processing during program execution in an embodiment of the present invention. FIG. 本発明の実施態様における、図５に示すシステムのハードウェアのブロック図を示す。FIG. 6 shows a hardware block diagram of the system shown in FIG. 5 in an embodiment of the present invention.

本発明の実施態様において、「文字列」とは、１つ以上の文字の連なりをいう。文字は、プログラムにおいて文字表現として使用される文字コードの文字データである。文字コードは、例えばＥＵＣ−ＪＰ、Ｓｈｉｆｔ＿ＪＩＳ、ＵＴＦ−８であるがこれらに限定されない。文字はまた、意味を持つ文字、例えばタブ文字、改行文字、エスケープ文字又は正規表現の特殊文字であってよい。正規表現の特殊文字とは、あるパターンの文字列をまとめて表現するための文字である。正規表現の特殊文字は、例えば、正規表現“[a-z]+[0-9]”における、１回以上の繰り返しを表す“+”、文字範囲指定の開始と終了を表す“[”及び“]”であるが、これらに限定されない。 In the embodiment of the present invention, the “character string” refers to a series of one or more characters. The character is character data of a character code used as a character expression in the program. The character code is, for example, EUC-JP, Shift_JIS, or UTF-8, but is not limited thereto. A character may also be a meaningful character, such as a tab character, a newline character, an escape character, or a special character of a regular expression. A special character of a regular expression is a character for collectively expressing a character string of a certain pattern. The special characters of the regular expression are, for example, “+” representing one or more repetitions in the regular expression “[az] + [0-9]”, “[” and “] representing the start and end of character range specification. However, it is not limited to these.

本発明の実施態様において、「文字列を処理するためのプログラム」とは、プログラムの実行時において、文字列に対して何らかの処理を実行するプログラムである。上記処理はまた、該文字列が格納される例えば変数、構造体又はオブジェクトに対する処理を含む。上記処理は、例えば、文字列を生成する処理、文字列を複製する処理、文字列に含まれる任意の文字を検索する処理、文字列に含まれる文字を他の文字に変換する処理、文字列から任意の文字を抽出する処理、文字列同士を結合する処理、文字列同士を比較する処理、又は文字列に関する情報を取得する処理であるが、これらに限定されない。 In an embodiment of the present invention, the term "program for processing the character string", at the time of execution of the program is a program to perform some processing on the string. The above processing also includes processing for, for example, a variable, structure, or object in which the character string is stored. The above processing is, for example, processing for generating a character string, processing for copying a character string, processing for searching for any character included in the character string, processing for converting a character included in the character string into another character, character string However, the present invention is not limited to these. However, the present invention is not limited to these.

本発明の実施態様において、「文字列の特性」とは、文字列に含まれる少なくとも１つの文字が持つ特徴をいう。上記特徴は、上記文字が、例えば大文字であること、小文字であること、数字であること、２バイト文字、例えば日本語、中国語であること、アルファベットであること、同一の文字コード文字であること、正規表現の特殊文字であること、８ビットで表現できる文字であること、ＵＲＬエンコードにおける特殊文字であること又は上記例の文字でないことであるが、これらに限定されない。
本発明の実施態様において、「文字集合」は、共通の上記特徴をもつ文字の集まりである。文字集合は、例えば大文字の集合、小文字の集合、数字の集合、２バイト文字の集合、アルファベットの集合、同一の文字コード文字の集合、正規表現の特殊文字の集合、８ビットで表現できる文字の集合、ＵＲＬエンコードにおける特殊文字の集合又はある集合に属さない文字の集合であるが、これらに限らない。文字集合に含まれる文字が重複していない複数の文字集合をまとめて、１つの文字集合として表してもよい。 In the embodiment of the present invention, the “character string characteristic” refers to a characteristic of at least one character included in the character string. The above characteristics are that the characters are, for example, uppercase letters, lowercase letters, numbers, double-byte characters such as Japanese, Chinese, alphabets, and the same character code characters. It is a special character of regular expression, a character that can be expressed by 8 bits, a special character in URL encoding, or a character that is not the above example, but is not limited thereto.
In the embodiment of the present invention, the “character set” is a set of characters having the above-mentioned common characteristics. For example, a set of uppercase letters, a set of lowercase letters, a set of numbers, a set of 2-byte characters, a set of alphabets, a set of the same character code characters, a set of special characters of regular expressions, a character that can be expressed in 8 bits A set, a set of special characters in URL encoding, or a set of characters that do not belong to a certain set, but is not limited thereto. A plurality of character sets in which characters included in the character set do not overlap may be collectively represented as one character set.

本発明の実施態様において、「特性情報」とは、文字列の特性を表す情報である。特性情報は、文字列ごとに用意される。特性情報は、文字列が文字集合に属する文字を含むこと、含まないこと又は含むかどうか不明であることを表す少なくとも１の第１の情報と、上記文字列が文字集合に属さない文字を含むこと、含まないこと又は含むかどうか不明であることを表す第２の情報とを含む。 In the embodiment of the present invention, the “characteristic information” is information representing the character string characteristic. The characteristic information is prepared for each character string. The characteristic information includes at least one first information indicating that the character string includes, does not include, or is unknown whether the character string belongs to the character set, and the character string includes a character that does not belong to the character set. And second information indicating that it is not included or is unknown.

本発明の実施態様において、「文字列の処理を最適化する」とは、文字列の処理が複数の処理を含む場合に、文字列の処理の実行時においてコンピュータ・システムが、上記複数の処理のうち文字列の特性情報に応じた実行効率のよい処理を選択して実行することをいう。文字列の特性情報に応じた実行効率のよい処理とは、例えば第１の処理がどのような文字列にも適用可能な処理であり、第２の処理がある特性情報が対応付けられた文字列に対してのみ適用可能な処理である場合において、第１の処理に比べて高速に実行され且つ第１の処理と同一の結果が得られる第２の処理であるがこれに限らない。最適化された処理は、文字列の特性情報に応じた実行効率のよい処理を含む。 In the embodiment of the present invention, “optimizing character string processing” means that when the character string processing includes a plurality of processes, the computer system executes the plurality of processes when the character string processing is executed. Is to select and execute a process with high execution efficiency according to the character string characteristic information. A process with high execution efficiency according to character string characteristic information is, for example, a process in which the first process can be applied to any character string, and a character associated with the characteristic information with the second process. In the case of a process that can be applied only to a column, the second process is executed faster than the first process and obtains the same result as the first process, but is not limited thereto. The optimized processing includes processing with high execution efficiency according to the character string characteristic information.

本発明の実施態様において、「文字列に関する演算処理」とは、プログラムにおいて実行される文字列又は該文字列が格納される領域に対する何らかの演算処理である。上記プログラムにおいて実行される文字列に対する演算処理は、例えば、文字列の生成、文字列の複製、文字列に含まれる任意の文字の検索、文字列に含まれる文字の他の文字への変換、文字列から任意の文字の抽出、文字列同士の結合、文字列同士の比較、又は文字列に関する情報の取得であるが、これらに限定されない。プログラムにおいて実行される文字列が格納される領域に対する処理は、例えば文字列を割り付けるメモリ領域の確保であるが、これに限定されない。 In the embodiment of the present invention, “arithmetic processing relating to a character string” refers to some arithmetic processing on a character string executed in a program or an area in which the character string is stored. Arithmetic processing for the character string executed in the program includes, for example, generation of a character string, duplication of the character string, search for an arbitrary character included in the character string, conversion of the character included in the character string to another character, Extraction of an arbitrary character from a character string, connection between character strings, comparison between character strings, or acquisition of information related to a character string are not limited thereto. The processing for the area storing the character string executed in the program is, for example, securing a memory area to which the character string is allocated, but is not limited to this.

本発明の実施態様において、「文字列を１文字ずつ逐次処理する」とは、コンピュータ・システムが、文字列に含まれる文字それぞれに対して、個別に処理を実行することをいう。例えば文字列からある文字を検索する処理において、コンピュータ・システムが文字列に含まれる文字を全てチェックし、ヒットした全ての文字を検索結果とする場合、該検索は、文字列を１文字ずつ逐次処理することに該当する。一方、コンピュータ・システムが検索対象の文字がヒットした時点で残りの文字をチェックせずにヒットした文字を検索結果とする場合、該検索は、文字列を１文字ずつ逐次処理することに該当しない。 In the embodiment of the present invention, “sequentially processing a character string one character at a time” means that the computer system individually executes processing for each character included in the character string. For example, in a process of searching for a character from a character string, when the computer system checks all characters included in the character string and uses all hit characters as search results, the search is performed sequentially for each character string. It corresponds to processing. On the other hand, when the computer system uses a hit character without checking the remaining characters when the search target character hits, the search does not correspond to processing the character string one character at a time. .

本発明の実施態様において、「逐次処理される文字を検査する」とは、文字の特性を求めることをいう。文字の特性は、上記逐次処理される文字が、例えば大文字であること、小文字であること、数字であること、２バイト文字であること、アルファベットであること、同一の文字コード文字であること、正規表現の特殊文字であること、８ビットで表現できる文字であること、ＵＲＬエンコードにおける特殊文字であること又は上記例の文字でないことであるが、これらに限定されない。 In the embodiment of the present invention, “inspecting sequentially processed characters” means obtaining character characteristics. Characteristic characteristics are that the character to be sequentially processed is, for example, upper case, lower case, number, two-byte character, alphabet, the same character code character, It is a special character of regular expression, a character that can be expressed by 8 bits, a special character in URL encoding, or a character that is not the above example, but is not limited thereto.

以下、図面に従って、本発明の実施態様を説明する。本実施態様は、本発明の好適な態様を説明するためのものであり、本発明の範囲をここで示すものに限定する意図はないことを理解されたい。また、以下の図を通して、特に断らない限り、同一の符号は、同一の対象を指す。
なお、以下では、本発明の実施態様をＪａｖａ（商標）で実装した場合を例として説明する。Ｊａｖａ（商標）では、文字列の操作は、文字列オブジェクトを介して実行される。よって、請求項における文字列は、Ｊａｖａ（商標）の場合、上記文字列オブジェクト自身又は該文字列オブジェクトに含まれる文字列を指す。 Embodiments of the present invention will be described below with reference to the drawings. It should be understood that this embodiment is for the purpose of illustrating a preferred embodiment of the invention and is not intended to limit the scope of the invention to what is shown here. Further, throughout the following drawings, the same reference numerals denote the same objects unless otherwise specified.
In the following, a case where the embodiment of the present invention is implemented using Java (trademark) will be described as an example. In Java (trademark), a character string operation is executed via a character string object. Therefore, in the case of Java (trademark), the character string in the claims indicates the character string object itself or a character string included in the character string object.

図１は、本発明の実施態様における、Ｊａｖａ（商標）で実装した場合の文字列オブジェクトの構造を示す。
文字列オブジェクト（１０１）は、文字列（１０２）及び特性情報（１０３）を含む。なお、特性情報（１０３）は、上記文字列オブジェクト（１０１）と関連付けられたメモリ上の文字列オブジェクト（１０１）とは異なる領域に含まれてもよい。文字列オブジェクト（１０１）は、メモリ上に格納される。文字列オブジェクト（１０１）は、クラス・ライブラリに用意されている文字列クラスから生成される。クラス・ライブラリとは、例えばＪａｖａ（商標）におけるＪａｖａ（商標）クラス・ライブラリであるが、これに限定されない。文字列クラスとは、例えばＪａｖａ（商標）におけるjava.lang.Stringクラス（Stringクラス）であるが、これに限定されない。 FIG. 1 shows the structure of a character string object when implemented in Java (trademark) in an embodiment of the present invention.
The character string object (101) includes a character string (102) and characteristic information (103). The characteristic information (103) may be included in an area different from the character string object (101) on the memory associated with the character string object (101). The character string object (101) is stored on the memory. The character string object (101) is generated from the character string class prepared in the class library. The class library is, for example, a Java ™ class library in Java ™, but is not limited thereto. The character string class is, for example, a java.lang.String class (String class) in Java (trademark), but is not limited thereto.

文字列（１０２）は、少なくとも１の文字が含まれる不変な又は可変な文字列のデータである。不変とは、文字列オブジェクト（１０１）が開放されるまで文字列が変化しないことをいう。可変とは、文字列オブジェクト（１０１）が開放されるまで文字列が変化しうることをいう。 The character string (102) is invariable or variable character string data including at least one character. Invariant means that the character string does not change until the character string object (101) is released. “Variable” means that the character string can change until the character string object (101) is released.

特性情報（１０３）は、文字列（１０２）に含まれる文字がどのような特性を持っているかを表す情報である。特性とは、文字が例えば大文字であること、小文字であること若しくは数字であること、又はそれらの組み合わせであるが、これらに限定されない。特性情報（１０３）は、第１の情報（１０３ａ）及び第２の情報（１０３ｂ）を含む。第１の情報（１０３ａ）は、文字列（１０２）が、文字集合（１０４）に属する文字を含むこと、該文字を含まないこと又は該文字を含むことが不明であることを表す。第２の情報（１０３ｂ）は、文字列（１０２）が、文字集合（１０４）に属さない文字を含むこと、該文字を含まないこと又は該文字を含むかどうか不明であることを表す。文字集合（１０４）は、同じ特性を持つ文字の集まりである。文字集合（１０４）は、例えば大文字の集まり、小文字の集まり若しくは数字の集まり、又はそれらの組み合わせであるがこれらに限定されない。例えば、文字集合に含まれる文字が重複しない複数の文字集合同士が、１つの文字集合であってもよい。
特性情報（１０３）は、文字列オブジェクト（１０１）と関係付けられたメモリ上の他の位置に格納されてもよい。 The characteristic information (103) is information indicating what characteristic the character included in the character string (102) has. The characteristic is, for example, that the letters are uppercase letters, lowercase letters or numbers, or a combination thereof, but is not limited thereto. The characteristic information (103) includes first information (103a) and second information (103b). The first information (103a) represents that the character string (102) includes a character belonging to the character set (104), does not include the character, or does not include the character. The second information (103b) indicates that the character string (102) includes a character that does not belong to the character set (104), does not include the character, or whether it is unknown. The character set (104) is a collection of characters having the same characteristics. The character set (104) is, for example, a group of uppercase letters, a group of lowercase letters or a group of numbers, or a combination thereof, but is not limited thereto. For example, a plurality of character sets in which characters included in the character sets do not overlap may be one character set.
The characteristic information (103) may be stored in another location on the memory associated with the character string object (101).

特性情報（１０３）は、文字列オブジェクト（１０１）を処理するメソッドにおいて、高速な処理が実行されるための判断に使用される。
特性情報（１０３）は、第１の情報（１０３ａ）及び第２の情報（１０３ｂ）の組み合わせにより、例えば、下記１〜４のいずれかのケースを示す。
ケース１．文字列（１０２）に含まれる全ての文字が、文字集合（１０４）に属する。ケース１では、第１の情報（１０３ａ）は文字集合（１０４）に属する文字を含むことを表しており、第２の情報（１０３ｂ）は文字集合（１０４）に属さない文字を含まないことを表している。
ケース２．文字列（１０２）に含まれる全ての文字が、文字集合（１０４）に属さない。ケース２では、第１の情報（１０３ａ）は文字集合（１０４）に属する文字を含まないことを表しており、第２の情報（１０３ｂ）は文字集合（１０４）に属さない文字を含むことを表している。
ケース３．文字列（１０２）が、文字集合（１０４）に属する文字と属さない文字との両方を含む。ケース３では、第１の情報（１０３ａ）は文字集合（１０４）に属する文字を含むことを表しており、第２の情報（１０３ｂ）が文字集合（１０４）に属さない文字を含むことを表している。
ケース４．文字列（１０２）に含まれる文字について未検査である。ケース４では、第１の情報（１０３ａ）は文字集合（１０４）に属する文字を含むことが不明であることを表しており、第２の情報（１０３ｂ）が文字集合（１０４）に属さない文字を含むことが不明であることを表している。 The characteristic information (103) is used for determination for executing high-speed processing in a method for processing the character string object (101).
The characteristic information (103) indicates, for example, any of the following cases 1 to 4 by a combination of the first information (103a) and the second information (103b).
Case 1. All characters included in the character string (102) belong to the character set (104). Case 1 indicates that the first information (103a) includes characters belonging to the character set (104), and the second information (103b) does not include characters that do not belong to the character set (104). Represents.
Case 2. All characters included in the character string (102) do not belong to the character set (104). Case 2 indicates that the first information (103a) does not include characters belonging to the character set (104), and the second information (103b) includes characters that do not belong to the character set (104). Represents.
Case 3. The character string (102) includes both characters belonging to the character set (104) and characters that do not belong. Case 3 represents that the first information (103a) includes characters belonging to the character set (104), and the second information (103b) includes characters not belonging to the character set (104). ing.
Case 4. The characters included in the character string (102) have not been inspected. In case 4, it is unknown that the first information (103a) includes characters belonging to the character set (104), and the second information (103b) does not belong to the character set (104). It is unclear to contain.

特性情報（１０３）は、ビット列で表されてもよい。
ビットそれぞれは、第１の情報（１０３ａ）又は第２の情報（１０３ｂ）に対応する。第１の情報（１０３ａ）は、例えば文字列（１０２）が文字集合（１０４）に属する文字を含むことを「１」、文字集合（１０４）に属する文字を含まないこと又は含むことが不明であることを「０」とするビットで表されうる。第２の情報（１０３ｂ）は、例えば文字列（１０２）が文字集合（１０４）に属さない文字を含むことを「１」、文字集合（１０４）に属さない文字を含まないこと又は含むことが不明であることを「０」とするビットで表されうる。
第１の情報（１０３ａ）及び第２の情報（１０３ｂ）がビットで表された場合、上記ケース１は「１０」、上記ケース２は「０１」、上記ケース３は「１１」及び上記ケース４は「００」というビット列で表わされる。 The characteristic information (103) may be represented by a bit string.
Each bit corresponds to the first information (103a) or the second information (103b). The first information (103a) is, for example, “1” indicating that the character string (102) includes a character belonging to the character set (104), and not including or including a character belonging to the character set (104). Something can be represented by a bit with “0”. The second information (103b) is, for example, “1” that the character string (102) includes a character that does not belong to the character set (104), or does not include or includes a character that does not belong to the character set (104). It can be represented by a bit with “0” being unknown.
When the first information (103a) and the second information (103b) are represented by bits, the case 1 is “10”, the case 2 is “01”, the case 3 is “11”, and the case 4 Is represented by a bit string "00".

上記ビット列で表す例において、複数の文字集合それぞれに含まれる文字が排他である場合、複数の文字集合がまとめて扱われうる。上記まとめて扱われる場合、文字集合それぞれに第１の情報（１０３ａ）としてそれぞれ１ビットが割り当てられ、文字集合の何れにも属さない文字の集合に、第２の情報（１０３ｂ）として１ビットが割り当てられる。
例えば大文字の集まり（以下、大文字集合）、小文字の集まり（以下、小文字集合）及び数字の集まり（以下、数字集合）は、互いに排他な文字の集合である。よって、大文字集合に第１の情報として１ビット、小文字集合に第１の情報として１ビット、数字集合に第１の情報として１ビットが割り当てられうる。また、大文字、小文字及び数字のいずれでもない文字の集合（以下、その他集合）に第２の情報として１ビットが割り当てられうる。上記割り当ての結果、特性情報（１０３）は、第１の情報に対応する３ビットと第２の情報に対応する１ビットとの計４ビットで表される。以下の例では、上記４ビットを大文字情報に対応するビット列、小文字情報に対応するビット列、数字集合に対応するビット列及びその他集合に対応するビット列の順で右から順に表す。
・特性情報「０００１」は、文字列が「大文字を含む」ということを表す。
・特性情報「００１０」は、文字列が「小文字を含む」ということを表す。
・特性情報「０１００」は、文字列が「数字を含む」ということを表す。
・特性情報「１０００」は、文字列が「大文字、小文字、数字以外の文字（以下、その他文字）を含む」ということを表す。
なお、上記「１」が立つビットは１つとは限らない。例えば、特性情報「１１１０」は、文字列が「小文字、数字及びその他文字を含む」ということを表す。 In the example represented by the bit string, when the characters included in each of the plurality of character sets are exclusive, the plurality of character sets can be handled together. In the case of handling the above collectively, 1 bit is assigned to each character set as the first information (103a), and 1 bit is assigned to the character set that does not belong to any of the character sets as the second information (103b). Assigned.
For example, a group of uppercase letters (hereinafter referred to as uppercase set), a group of lowercase letters (hereinafter referred to as lowercase set), and a group of numbers (hereinafter referred to as number set) are mutually exclusive sets of characters. Therefore, 1 bit can be assigned to the uppercase set as the first information, 1 bit can be assigned to the lowercase set as the first information, and 1 bit can be assigned to the number set as the first information. Further, one bit can be assigned as the second information to a set of characters that is neither uppercase letters, lowercase letters, or numbers (hereinafter, other sets). As a result of the assignment, the characteristic information (103) is represented by a total of 4 bits, 3 bits corresponding to the first information and 1 bit corresponding to the second information. In the following example, the 4 bits are represented in order from the right in the order of a bit string corresponding to uppercase information, a bit string corresponding to lowercase information, a bit string corresponding to a number set, and a bit string corresponding to other sets.
The characteristic information “0001” indicates that the character string “includes capital letters”.
The characteristic information “0010” represents that the character string is “including lowercase letters”.
The characteristic information “0100” indicates that the character string “includes numbers”.
The characteristic information “1000” indicates that the character string “includes characters other than uppercase letters, lowercase letters, and numbers (hereinafter, other characters)”.
Note that the number of bits where “1” is set is not limited to one. For example, the characteristic information “1110” indicates that the character string “includes lowercase letters, numbers, and other characters”.

コンピュータ・システムは、複数の文字集合をまとめて扱うことにより、まとめて扱わない場合に比べて、特性情報を表すためのビット数を減らすことができる。
複数の文字集合をまとめて扱う場合、上述の例では、４ビットが必要である。一方、複数の文字集合をまとめて扱わない場合、上記ケース１〜４の例で示されるように、１つの文字集合を表す特性情報として２ビットが必要である。よって、複数の文字集合をまとめて扱わない場合、大文字集合、小文字集合及び数字集合の３つの文字集合に対して必要なビット数は、２ビット＊３＝６ビットである。 The computer system can reduce the number of bits for representing the characteristic information by handling a plurality of character sets collectively as compared to a case where the character sets are not handled collectively.
When handling a plurality of character sets collectively, 4 bits are required in the above example. On the other hand, when a plurality of character sets are not handled together, 2 bits are required as characteristic information representing one character set as shown in the examples of cases 1 to 4 above. Therefore, when a plurality of character sets are not handled together, the number of bits required for the three character sets of the upper case set, the lower case set, and the number set is 2 bits * 3 = 6 bits.

図２Ａは、本発明の実施態様における、図１に示す特性情報の判定を説明するための図である。
コンピュータ・システムは、第１の文字列の特性（２０２）と第１の文字列（２０３）に関する演算処理（２０５）とから、第１の文字列（２０３）及び演算処理（２０５）の結果である第２の文字列（２０６）のうち少なくとも１についての文字列の特性情報（２０４又は２０７）を判定する。
第１の文字列の特性（２０２）は、既に判定され第１の特性情報（２０４）として保存されている場合と、コンピュータ・システムによって、例えば第１の文字列（２０３）の文字を検査することで求められる場合とがある。コンピュータ・システムによって第１の文字列の特性（２０２）が求められる場合、コンピュータ・システムは、該求めた特性を第１の特性情報（２０４）と判定する。
第１の文字列（２０３）は、演算処理（２０５）の対象となる文字列である。第１の文字列（２０３）は、１つの文字列であってもよく、又は複数の文字列であってもよい。具体例１（２０１ａ）で示されるように、演算処理（２０５）が例えば文字列“ＡＢＣ”（２０３ａ）と文字列“ＤＥＦ”（２０３ｂ）とを結合する処理の場合、文字列“ＡＢＣ”（２０３ａ）と文字列“ＤＥＦ”（２０３ｂ）とが第１の文字列（２０３）に該当する。
第１の特性情報（２０４）は、第１の文字列（２０３）に対応付けらえた情報である。第１の特性情報（２０４）は、第１の文字列（２０３）に既に対応付けられていてもよく、又はコンピュータ・システムによる第１の文字列（２０３）の文字の検査によって求められ、第１の特性情報（２０４）と判定された特性でもよい。
演算処理（２０５）は、例えば文字列の生成、文字列の複製、文字列に含まれる任意の文字の検索、文字列に含まれる文字の他の文字への変換、文字列から任意の文字の抽出、文字列同士の結合、文字列同士の比較、又は文字列に関する情報の取得であるが、これらに限定されない。
第２の文字列（２０６）は、上記演算の結果が格納される文字列である。具体例１（２０１ａ）で示されるように、演算処理（２０５）が例えば文字列“ＡＢＣ”（２０３ａ）と文字列“ＤＥＦ”（２０３ｂ）とを結合する処理の場合、該加えられた結果の文字列“ＡＢＣＤＥＦ”（２０６ａ）が第２の文字列（２０６）に該当する。第２の文字列（２０６）は、１つの文字列であっても複数の文字列であってもよい。具体例２（２０１ｂ）で示されるように、演算処理（２０５）が例えば文字列“ａｂｃｄｅｆ”（２０３ｃ）から２文字ずつ文字列を抽出する処理の場合、抽出された結果の文字列“ａｂ”（２０６ｂ）、文字列“ｃｄ”（２０６ｃ）及び文字列“ｅｆ”（２０６ｄ）それぞれが、第２の文字列に該当する。
第２の特性情報（２０７）は、第２の文字列（２０６）に対応付けられた情報である。第２の特性情報（２０７）は、第１の文字列の特性（２０２）と演算処理（２０５）とから判定されうる。
具体例１（２０１ａ）では、第１の特性情報（２０４ａ及び２０４ｂ）が「全て大文字」である。よって、特性（２０２ａ及び２０２ｂ）もまた「全て大文字」である。「全て大文字」の文字列同士を結合する演算（２０５ａ）では、演算の結果の文字列もまた「全て大文字」となる。従って、第２の特性情報（２０７ａ）は「全て大文字」と判定されうる。
具体例２（２０１ｂ）では、第１の特性情報（２０４ｃ）が「全て小文字」である。よって、特性（２０２ｃ）もまた「全て小文字」である。「全て小文字」の文字列から文字列を抽出する演算（２０５ｂ）では、演算の結果の文字列もまた「全て小文字」となる。従って、第２の特性情報（２０７ｂ〜２０７ｄ）は「全て小文字文字」と判定されうる。 FIG. 2A is a diagram for describing the determination of the characteristic information shown in FIG. 1 in the embodiment of the present invention.
The computer system obtains the result of the first character string (203) and the arithmetic processing (205) from the characteristic (202) of the first character string and the arithmetic processing (205) related to the first character string (203). Character string characteristic information (204 or 207) for at least one of the second character strings (206) is determined.
The characteristic (202) of the first character string is determined and stored as the first characteristic information (204), and the computer system checks the character of the first character string (203), for example. May be required. When the characteristic (202) of the first character string is obtained by the computer system, the computer system determines the obtained characteristic as the first characteristic information (204).
The first character string (203) is a character string that is a target of the arithmetic processing (205). The first character string (203) may be a single character string or a plurality of character strings. As shown in the specific example 1 (201a), when the calculation process (205) is a process of combining, for example, the character string “ABC” (203a) and the character string “DEF” (203b), the character string “ABC” ( 203a) and the character string “DEF” (203b) correspond to the first character string (203).
The first characteristic information (204) is information associated with the first character string (203). The first characteristic information (204) may already be associated with the first character string (203), or may be obtained by examining the characters of the first character string (203) by the computer system, The characteristic determined as one characteristic information (204) may be used.
The arithmetic processing (205) includes, for example, generation of a character string, duplication of the character string, search for an arbitrary character included in the character string, conversion of the character included in the character string into another character, and conversion of an arbitrary character from the character string. Extraction, combination of character strings, comparison between character strings, or acquisition of information related to character strings, but is not limited thereto.
The second character string (206) is a character string in which the result of the calculation is stored. As shown in the specific example 1 (201a), when the calculation process (205) is a process of combining, for example, the character string “ABC” (203a) and the character string “DEF” (203b), the added result The character string “ABCDEF” (206a) corresponds to the second character string (206). The second character string (206) may be a single character string or a plurality of character strings. As shown in the specific example 2 (201b), when the calculation process (205) is a process of extracting a character string by two characters from the character string “abcdef” (203c), for example, the extracted character string “ab” Each of the character string “cd” (206c) and the character string “ef” (206d) corresponds to the second character string.
The second characteristic information (207) is information associated with the second character string (206). The second characteristic information (207) can be determined from the characteristic (202) of the first character string and the calculation process (205).
In specific example 1 (201a), the first characteristic information (204a and 204b) is “all capital letters”. Thus, the characteristics (202a and 202b) are also “all capital letters”. In the operation (205a) for combining character strings of “all capital letters”, the character string resulting from the calculation is also “all capital letters”. Therefore, the second characteristic information (207a) can be determined as “all capital letters”.
In specific example 2 (201b), the first characteristic information (204c) is “all lowercase”. Therefore, the characteristic (202c) is also “all lowercase”. In the operation (205b) for extracting the character string from the character string “all lowercase”, the character string resulting from the operation is also “all lowercase”. Therefore, the second characteristic information (207b to 207d) can be determined as “all lowercase characters”.

上記判定は、プログラム実行時の下記第１の態様〜第３の態様に示すタイミングにおいて実行される。上記判定により、判定された特性情報は、文字列オブジェクト（図１、１０１）に付加される。該付加により、文字列と特性情報とが対応付けられる。
Ａ．第１の態様
特性情報（図１、１０３）は、文字列オブジェクト（１０１）が文字列クラスから生成されることに応じて、文字列オブジェクト（１０１）に付加されうる。コンピュータ・システムは、生成される文字列オブジェクト（１０１）に含まれる文字列（図１、１０２）の特性情報が生成元の文字列オブジェクト（１０１）から判定できる場合又は生成元の文字列に対する逐次処理と同時に検査できる場合、該判定又は該検査によって求められる特性情報（１０３）を生成した文字列オブジェクト（１０１）に付加する。
特性情報の付加は、可変な文字列を保持する文字列オブジェクト及び不変な文字列を保持する文字列オブジェクトの両方の文字列オブジェクトに対して可能である。特に、不変な文字列を保持する文字列オブジェクト、例えばＪａｖａ（商標）におけるStringオブジェクトでは、文字列オブジェクト（１０１）が一度生成されると、文字列オブジェクト（１０１）に含まれる文字列は変化しない。よって、文字列オブジェクト（１０１）に含まれる文字列の特性情報（１０３）も変化しない。従って、コンピュータ・システムは、不変な文字列を保持する文字列オブジェクトにおいて、オブジェクト生成に応じて特性情報（１０３）が付加された文字列オブジェクト（１０１）を、文字列オブジェクト（１０１）がガーベジコレクションされるまで利用することができる。 The determination is executed at the timing shown in the following first to third modes when the program is executed. The characteristic information determined by the above determination is added to the character string object (FIG. 1, 101). By the addition, the character string and the characteristic information are associated with each other.
A. 1st aspect Characteristic information (FIG. 1, 103) can be added to a character string object (101) according to the character string object (101) being produced | generated from a character string class. The computer system can determine whether the characteristic information of the character string (FIG. 1, 102) included in the generated character string object (101) can be determined from the character string object (101) of the generation source or sequentially with respect to the character string of the generation source. When the inspection can be performed simultaneously with the processing, the characteristic information (103) obtained by the determination or the inspection is added to the generated character string object (101).
The addition of the characteristic information is possible for both the character string object that holds the variable character string and the character string object that holds the invariant character string. In particular, in a character string object that holds an invariant character string, for example, a String object in Java (trademark), once the character string object (101) is generated, the character string included in the character string object (101) does not change. . Therefore, the character string characteristic information (103) included in the character string object (101) does not change. Accordingly, the computer system performs garbage collection on the character string object (101) to which the characteristic information (103) is added according to the object generation, and the character string object (101) is garbage collected. It can be used until it is done.

Ｂ．第２の態様
特性情報（１０３）は、特性情報（１０３）が付加されていない文字列オブジェクト（１０１）に含まれる文字列（１０２）に対する逐次処理と同時に特性が検査されることに応じて、文字列オブジェクト（１０１）に付加されうる。
文字列オブジェクト（１０１）が特性情報（１０３）を含むことの効果は、特性情報（１０３）によって最適化された処理が実行されることである。よって、ある文字列オブジェクトについて、最適化された処理が一度も実行されない場合、上記検査するコストが無駄になってしまう。上記コストを抑えるために、コンピュータ・システムは、文字列を処理するメソッドに文字列の内容を１文字ずつ逐次処理する処理が含まれる場合、該処理と同時に特性の検査を行う。 B. According to the second aspect, the characteristic information (103) is checked simultaneously with the sequential processing for the character string (102) included in the character string object (101) to which the characteristic information (103) is not added. It can be added to the character string object (101).
The effect of the character string object (101) including the characteristic information (103) is that the process optimized by the characteristic information (103) is executed. Therefore, if an optimized process is never executed for a certain character string object, the inspection cost is wasted. In order to reduce the cost, when a method for processing a character string includes a process for sequentially processing the contents of the character string one character at a time, the computer system performs a characteristic check simultaneously with the processing.

Ｃ．第３の態様
特性情報（１０３）は、特性情報（１０３）が付加された少なくとも１の文字列オブジェクト（以下、演算元オブジェクト）の演算処理が実行されることに応じて、演算処理の結果が格納される少なくとも１の文字列オブジェクト（以下、結果オブジェクト）に付加されうる。コンピュータ・システムは、演算元オブジェクトと演算処理の内容とから結果オブジェクトに付加される特性情報が判定されることに応じて、該判定された特性情報（１０３）を結果オブジェクトに付加する。上記演算処理とは、例えば文字列同士を結合する処理、文字列からある文字を取り出す処理、又は文字列を別の文字列に変換する処理であるが、これらに限定されない。 C. Third Aspect The characteristic information (103) indicates that the result of the arithmetic processing is obtained when the arithmetic processing of at least one character string object (hereinafter referred to as an operation source object) to which the characteristic information (103) is added is executed. It can be added to at least one character string object (hereinafter, result object) to be stored. The computer system adds the determined characteristic information (103) to the result object in response to the determination of the characteristic information to be added to the result object from the calculation source object and the content of the calculation process. The arithmetic processing is, for example, processing for combining character strings, processing for extracting a character from a character string, or processing for converting a character string into another character string, but is not limited thereto.

以下、第３の態様について例を挙げて説明する。例えば、第１の文字列オブジェクトが全て大文字の第１の文字列を含み、第２の文字列オブジェクトが全て大文字の第２の文字列を含むとする。
第１の文字列と第２の文字列とを結合する場合、コンピュータ・システムは、結合した結果の文字列も全て大文字で構成されると判定しうる。よって、コンピュータ・システムは、結果オブジェクトに付加される特性情報を、例えば「全て大文字」を表す情報としうる。
第１の文字列からある文字を取り出す場合、コンピュータ・システムは、一部の文字列が取り出された第１の文字列（以下、第１の結果の文字列）及び取り出した一部の文字列（以下、第２の結果の文字列）はともに全て大文字で構成されると判定しうる。第１の文字列からある文字を取り出す場合とは、例えば第１の文字列オブジェクトに対して文字列の一部を取り出すＪａｖａ（商標）言語のString.substring()メソッドを適用した場合である。よって、コンピュータ・システムは、第１の結果の文字列を含む結果オブジェクト及び／又は第２の結果の文字列を含む結果オブジェクトに付加される特性情報を、例えば「全て大文字」を表す情報としうる。なお、第１の結果の文字列を含む結果オブジェクトは、第１の文字列オブジェクトでありうる。
第１の文字列を別の文字列、例えば小文字の文字列に変換する場合、コンピュータ・システムは、変換された文字列が全て小文字で構成されると判定しうる。よって、コンピュータ・システムは、結果オブジェクトに付加される特性情報を、例えば「全て小文字」を表す情報としうる。 Hereinafter, the third aspect will be described with an example. For example, it is assumed that the first character string object includes a first character string that is all capital letters, and the second character string object includes a second character string that is all capital letters.
When combining the first character string and the second character string, the computer system may determine that the combined character string is also composed entirely of capital letters. Therefore, the computer system can set the characteristic information added to the result object as information representing, for example, “all capital letters”.
When a certain character is extracted from the first character string, the computer system uses the first character string from which a part of the character string is extracted (hereinafter referred to as a first result character string) and the extracted part of the character string. It can be determined that both (hereinafter, the second result character string) are composed of all capital letters. A case where a character is extracted from the first character string is, for example, a case where a String (trademark) method of Java (trademark) language that extracts a part of the character string is applied to the first character string object. Therefore, the computer system can use, for example, characteristic information added to the result object including the first result character string and / or the result object including the second result character string as information representing “all capital letters”. . Note that the result object including the first result character string may be the first character string object.
When converting the first character string to another character string, for example, a lowercase character string, the computer system may determine that the converted character string is composed entirely of lowercase characters. Therefore, the computer system can use, for example, the characteristic information added to the result object as information representing “all lowercase letters”.

図２Ｂは、本発明の実施態様における、図１に示す文字列オブジェクトに含まれる特性情報を、他の文字列オブジェクトに伝搬するための実装例を示す。
なお、第３の態様による結果オブジェクトへの特性情報の付加を「演算元オブジェクトから結果オブジェクトへの特性情報の伝搬」という。上記伝搬させるための処理は、文字列を操作するメソッドの処理内容と、該処理の対象となる少なくとも１の文字列オブジェクト（以下、s1、s2、・・・、sn）に含まれる特性情報（以下、attr1、attr2、・・・、attrn）によって決まる。なお、特性情報は、ビット列で表されているとする。該ビット列は、複数の文字集合をまとめて扱うように割り当てられたビット列であってもよいし、又は文字集合ごとに２ビットが割り当てられたビット列であってもよい。上記伝搬させるための処理は、多くの場合、以下に示す第１のコード（２１１）〜第３のコード（２１３）で示される処理の１つ又は少なくとも２の組み合わせである。
第１のコード（２１１）で示される処理は、特性情報attr1が付加された文字列オブジェクトと特性情報attr2が付加された文字列オブジェクトとを結合する場合の、結果オブジェクトに付加される特性情報を算出する処理である。第１のコード（２１１）で示される処理では、特性情報attr1と特性情報attr2との論理和によって、結果オブジェクトに付加される特性情報が求められる。
第２のコード（２１２）で示される処理は、特性情報attr1が付加された文字列オブジェクトに含まれる文字列から、特性情報attr2が付加された文字列オブジェクトに含まれる文字列を削除する場合の、結果オブジェクトに付加される特性情報を算出する処理である。ここで、削除される文字列は、例えば正規表現で表される動的な文字列であってもよい。第２のコード（２１２）で示される処理では、特性情報attr2に対応する文字集合の文字が特性情報attr1が付加された文字列オブジェクトに残っている場合もありうる。よって、特性情報attr1が、結果オブジェクトに付加される特性情報とされる。
第３のコード（２１３）で示される処理は、特性情報attr1が付加された文字列オブジェクトに含まれる文字列から、特性情報attr2が付加された文字列オブジェクトに含まれる文字列を全て削除する場合の、結果オブジェクトに付加される特性情報を算出する処理である。第３のコード（２１３）で示される処理では、特性情報attr1と特性情報attr2との論理積によって、結果オブジェクトに付加される特性情報が求められる。 FIG. 2B shows an implementation example for propagating the characteristic information included in the character string object shown in FIG. 1 to another character string object in the embodiment of the present invention.
The addition of the characteristic information to the result object according to the third aspect is referred to as “propagation of characteristic information from the calculation source object to the result object”. The process for propagating includes processing contents of a method for manipulating a character string and characteristic information (hereinafter referred to as s1, s2,..., Sn) included in at least one character string object to be processed. Hereinafter, it depends on attr1, attr2,. Note that the characteristic information is represented by a bit string. The bit string may be a bit string assigned so as to collectively handle a plurality of character sets, or may be a bit string assigned 2 bits for each character set. In many cases, the process for the propagation is one or a combination of at least two of the processes indicated by the first code (211) to the third code (213) shown below.
In the process indicated by the first code (211), the characteristic information added to the result object when the character string object to which the characteristic information attr1 is added and the character string object to which the characteristic information attr2 is added is combined. This is a calculation process. In the process indicated by the first code (211), the characteristic information added to the result object is obtained by the logical sum of the characteristic information attr1 and the characteristic information attr2.
The process indicated by the second code (212) is performed when the character string included in the character string object added with the characteristic information attr2 is deleted from the character string included in the character string object added with the characteristic information attr1. This is a process of calculating characteristic information added to the result object. Here, the character string to be deleted may be a dynamic character string represented by a regular expression, for example. In the process indicated by the second code (212), characters in the character set corresponding to the characteristic information attr2 may remain in the character string object to which the characteristic information attr1 is added. Therefore, the characteristic information attr1 is the characteristic information added to the result object.
In the process indicated by the third code (213), all character strings included in the character string object added with the characteristic information attr2 are deleted from the character string included in the character string object added with the characteristic information attr1. This is a process for calculating the characteristic information added to the result object. In the process indicated by the third code (213), the characteristic information added to the result object is obtained by the logical product of the characteristic information attr1 and the characteristic information attr2.

クラス・ライブラリ作成者は、文字列を操作するメソッドの本来の処理内容から第１のコード〜第３のコード（２１１〜２１３）の組み合わせを決定し、上記伝搬させるための処理をプログラムに追加することができる。該追加により、プログラム実行時において、例えば結果オブジェクトに含まれる文字列が逐次検査されることなく、特性情報が結果オブジェクトに伝搬され付加される。 The class library creator determines the combination of the first code to the third code (211 to 213) from the original processing contents of the method for manipulating the character string, and adds the process for propagating the program to the program. be able to. By this addition, at the time of program execution, for example, the characteristic information is propagated and added to the result object without sequentially checking the character string included in the result object.

以下では、Ｊａｖａ（商標）におけるStringクラスの３つのメソッドを例に、上記追加されるコードの例を示す。
String.concat()メソッドは、文字列に文字列を加えるためのメソッドである。よって、追加されるコードは、第１のコード（２１１）である。
String.substring()メソッドは、文字列の一部を切り出すメソッドである。よって、追加されるコードは、第２のコード（２１２）である。
String.replaceFirst()メソッドは、文字列の一部を置き換えるメソッドである。よって、追加されるコードは、第１のコード（２１１）と第２のコード（２１２）との組み合わせのコードである。
String.toUpper()メソッドは、小文字を大文字に変換するメソッドである。よって、追加されるコードは、第１のコード（２１１）と第３のコード（２１３）との組み合わせのコードである。 In the following, an example of the added code will be shown by taking three methods of the String class in Java (trademark) as an example.
The String.concat () method is a method for adding a character string to a character string. Therefore, the code to be added is the first code (211).
The String.substring () method is a method for cutting out a part of a character string. Therefore, the code to be added is the second code (212).
The String.replaceFirst () method is a method for replacing a part of a character string. Therefore, the added code is a combination code of the first code (211) and the second code (212).
The String.toUpper () method is a method for converting lowercase letters to uppercase letters. Therefore, the added code is a combination code of the first code (211) and the third code (213).

文字列オブジェクトが特定の特性情報を持つ場合、コンピュータ・システムは、上記メソッドにおいて最適化版のコードを実行させる。文字列オブジェクトが特定の特性情報を持たない場合、コンピュータ・システムは、上記メソッドにおいて、最適化していない元の通常版のコードを実行させる。
プログラムを動作させるためには、最適化版のコードの有無にかかわらず通常版のコードは必要である。特性情報が付加されていない文字列オブジェクトに対する処理を通常版コードで常に実行することによって、コンピュータ・システムはプログラムを正しく動作させる。また、通常版のコードの実行と同時に特性情報の検査が行われている場合、文字列オブジェクトが２回目以降に上記メソッドで処理されるときに、特性情報に応じて最適化版のコードが実行されうる。 If the string object has specific characteristic information, the computer system causes the optimized code to be executed in the above method. When the character string object does not have specific characteristic information, the computer system causes the original normal version code not optimized to be executed in the above method.
To run the program, the normal version of the code is required regardless of the presence or absence of the optimized version of the code. The computer system operates the program correctly by always executing the processing for the character string object to which the characteristic information is not added with the normal version code. In addition, when the characteristic information is checked at the same time as the normal version of the code, the optimized version of the code is executed according to the characteristic information when the character string object is processed with the above method for the second time or later. Can be done.

以下では、文字列オブジェクトに付加される特性情報のバリエーションの例を示す。なお、特性情報のバリエーションは、以下の例１〜例４の例に限定されない。 Below, the example of the variation of the characteristic information added to a character string object is shown. In addition, the variation of characteristic information is not limited to the example of the following Examples 1-4.

例１．「大文字が含まれない」及び「小文字が含まれない」
例２．「例えば＊及び[]という正規表現の特殊文字列が含まれない」
例３．「全ての文字が８ビットで表現できる」
例４．「ＵＲＬエンコードにおける特殊文字である「％」及び「＋」を含まない」
特性情報が上記例１を表す場合において、例えばString.toLowerCase()メソッド又はString.toUpperCase()メソッドの実行時に、文字列を小文字に変換する処理又は文字列を大文字に変換する処理が省略されうる。また、例えばString.compareIgnoreCase()メソッド又はString.equalsIgnoreCase()メソッドの実行時において、比較される２つの文字列がともに、「大文字を含まない」という特性情報を持っている場合又は、「小文字を含まない」という特性情報を持っている場合、上記比較が単純な１６ビット値の比較となりうる。ここで、上記比較される２つの文字列のうち一方の文字列が「数字を含む」という特性情報を有し、他方の文字列が「数字を含まない」という特性情報を有していても構わない。上記比較が単純になることで、処理の高速化が望める。ここで、同じフラグを持っている場合とは、例えば上記比較される２つの文字列どちらにも「大文字が含まれていない」場合のことである。
特性情報が上記例２を表す場合において、例えばString.replaceFirst()メソッド又はsplit()メソッドの実行時に、java.util.regex.Matcherオブジェクトの作成が不要になる。よって、生成されるオブジェクトの削減及びプログラムの実行速度の高速化が期待できる。
特性情報が上記例３を表す場合において、例えば正規表現としてコンパイルしてステートマシンを作成する場合、プログラマーは、例えば分岐の多いswitch文による実装の代わりに、８ビットで表現できる文字である０〜２５５に対応する文字コードでインデックスする表を使った実装が可能になる。よって、該実装により、プログラム実行時の処理が高速化されうる。なお、上記正規表現としてコンパイルしてステートマシンを作成するとは、正規表現に対応した状態遷移を行うプログラムを作成することをいう。
特性情報が上記例４を表す場合において、例えば該特性情報が付加された文字列オブジェクトを引数とするjava.net.URLDecoder.decode()メソッドにおいて、文字列を一文字ずつ調べる処理と作業用バッファであるStringBufferを使用する処理とが省略されうる。よって、該省略により、プログラム実行時の処理が高速化されうる。 Example 1. "Do not include uppercase letters" and "Do not include lowercase letters"
Example 2. “For example, the special character strings of regular expressions such as * and [] are not included.”
Example 3 “All characters can be expressed in 8 bits”
Example 4 “Not including special characters“% ”and“ + ”in URL encoding”
When the property information represents Example 1 above, for example, when executing the String.toLowerCase () method or the String.toUpperCase () method, the process of converting the character string to lower case or the process of converting the character string to upper case may be omitted. . For example, when the String.compareIgnoreCase () method or the String.equalsIgnoreCase () method is executed, if the two strings to be compared both have characteristic information that does not include uppercase characters, In the case of having characteristic information “not included”, the comparison can be a simple 16-bit value comparison. Here, even if one of the two character strings to be compared has the characteristic information “contains a number” and the other character string has the characteristic information “does not include a number” I do not care. By making the above comparison simple, it is possible to speed up the processing. Here, the case where they have the same flag is, for example, a case where neither of the two character strings to be compared is “capital letters are included”.
When the characteristic information represents Example 2 above, for example, when executing the String.replaceFirst () method or the split () method, it is not necessary to create a java.util.regex.Matcher object. Therefore, it is possible to reduce the number of generated objects and increase the execution speed of the program.
In the case where the characteristic information represents Example 3 above, for example, when a state machine is created by compiling as a regular expression, the programmer can use 0-characters that can be expressed in 8 bits instead of implementation by a switch statement with many branches, for example. Implementation using a table indexed with a character code corresponding to 255 is possible. Thus, the implementation can speed up the processing during program execution. Compiling as a regular expression and creating a state machine means creating a program that performs state transition corresponding to the regular expression.
When the characteristic information represents Example 4 above, for example, in the java.net.URLDecoder.decode () method with the character string object to which the characteristic information is added as an argument, a process for examining the character string character by character and a work buffer Processing that uses a certain StringBuffer can be omitted. Thus, the omission can speed up the processing during program execution.

図３Ａは、本発明の実施態様における、図１に示す文字列オブジェクトを生成する処理のフローチャートを示す。
図３Ａに示すフローチャートは、上記第１の態様に対応する。図３Ａ中のステップ３０４の処理が第１の態様で示される生成元の文字列オブジェクト（１０１）から判定できる場合に対応し、及び図３Ａ中のステップ３０９及びステップ３１０の処理が第１の態様で示される生成元の文字列に対する逐次処理と同時に検査できる場合に対応する。 FIG. 3A shows a flowchart of processing for generating the character string object shown in FIG. 1 in the embodiment of the present invention.
The flowchart shown in FIG. 3A corresponds to the first aspect. This corresponds to the case where the process of step 304 in FIG. 3A can be determined from the character string object (101) of the generation source shown in the first mode, and the processes of step 309 and step 310 in FIG. 3A are the first mode. This corresponds to the case where it is possible to inspect simultaneously with the sequential processing for the character string of the generation source indicated by.

ステップ３０１は、文字列オブジェクトを生成する処理の開始を表す。
ステップ３０２では、コンピュータ・システムは、生成される文字列オブジェクトに代入される文字列が特性情報を含んでいるかどうかを判定する。特性情報を含んでいる場合、該処理はステップ３０３に進む。特性情報を含んでいない場合、該処理はステップ３０６に進む。
ステップ３０３では、コンピュータ・システムは、上記文字列から文字列オブジェクトを生成する。該生成の終了に応じて、該処理はステップ３０４に進む。
ステップ３０４では、コンピュータ・システムは、上記生成した文字列オブジェクトに伝搬される特性情報を算出する。１つの実施態様として、コンピュータ・システムは、例えば文字列に含まれる特性情報そのままを、伝搬される特性情報として算出する。他の実施態様として、コンピュータ・システムは、文字列に含まれる特性情報と文字列オブジェクトに初期値として与えられた特性情報との論理積の結果を、伝搬される特性情報として算出する。上記算出に応じて、該処理はステップ３０５に進む。なお、伝搬される特性情報が「不明」と算出された場合も、該処理はステップ３０５に進む。
ステップ３０５では、コンピュータ・システムは、上記算出した特性情報を生成された文字列オブジェクト（結果オブジェクト）に付加する。該付加に応じて、該処理はステップ３１１に進み、文字列オブジェクトを生成する処理は終了する。 Step 301 represents the start of processing for generating a character string object.
In step 302, the computer system determines whether the character string assigned to the generated character string object includes characteristic information. When the characteristic information is included, the process proceeds to step 303. When the characteristic information is not included, the process proceeds to step 306.
In step 303, the computer system generates a character string object from the character string. In response to the end of the generation, the process proceeds to step 304.
In step 304, the computer system calculates characteristic information propagated to the generated character string object. As one embodiment, the computer system calculates, for example, the characteristic information included in the character string as the characteristic information to be propagated. As another embodiment, the computer system calculates a logical product result of the characteristic information included in the character string and the characteristic information given as an initial value to the character string object as the propagated characteristic information. In accordance with the calculation, the process proceeds to step 305. Note that if the propagated characteristic information is calculated as “unknown”, the process proceeds to step 305.
In step 305, the computer system adds the calculated characteristic information to the generated character string object (result object). In response to the addition, the process proceeds to step 311 and the process of generating the character string object ends.

ステップ３０６では、コンピュータ・システムは、上記文字列をchar型に変換する必要があるかどうかを判定する。Ｊａｖａ（商標）言語では、例えば文字列リテラルはbyte型のデータ列としてメモリに格納される。よって、コンピュータ・システムは、上記文字列リテラルをchar型へ変換する必要がある文字列と判定する。char型に変換する必要があることに応じて、該処理はステップ３０９に進む。char型に変換する必要がないことに応じて、該処理はステップ３０７に進む。
ステップ３０７では、コンピュータ・システムは、上記文字列から文字列オブジェクトを生成する。該生成では、コンピュータ・システムは、上記文字列を文字単位ではなく文字列単位でまとめて文字列オブジェクトに代入する。上記生成の終了に応じて、該処理はステップ３０８に進む。
ステップ３０８では、コンピュータ・システムは、上記生成された文字列オブジェクト（結果オブジェクト）に「未検査」であることを示す特性情報を付加する。該付加の終了に応じて、該処理はステップ３１１に進み、文字列オブジェクトを生成する処理は終了する。 In step 306, the computer system determines whether the character string needs to be converted to a char type. In the Java (trademark) language, for example, a character string literal is stored in a memory as a byte-type data string. Therefore, the computer system determines that the character string literal is a character string that needs to be converted to the char type. If it is necessary to convert to char type, the process proceeds to step 309. If it is not necessary to convert to char type, the process proceeds to step 307.
In step 307, the computer system generates a character string object from the character string. In the generation, the computer system collects the character strings in character string units instead of character units and assigns them to the character string object. In response to the end of the generation, the process proceeds to step 308.
In step 308, the computer system adds characteristic information indicating “unchecked” to the generated character string object (result object). In response to the end of the addition, the process proceeds to step 311 and the process of generating the character string object ends.

ステップ３０９では、コンピュータ・システムは、上記文字列から文字列オブジェクトを生成する。該生成では、コンピュータ・システムは、上記文字列を１文字ずつchar型に変換し、文字列オブジェクトに代入する。また、コンピュータ・システムは、該変換と並行して１文字ずつ特性を検査し、該検査の結果から、文字列オブジェクトの特性情報を求める。上記生成の終了に応じて、該処理はステップ３１０に進む。
ステップ３１０では、コンピュータ・システムは、上記求めた特性情報を生成された文字列オブジェクト（結果オブジェクト）に付加する。該付加の終了に応じて、該処理はステップ３１１に進み、文字列オブジェクトを生成する処理は終了する。 In step 309, the computer system generates a character string object from the character string. In the generation, the computer system converts the character string into a char type character by character and assigns it to the character string object. Further, the computer system inspects the characteristics of each character in parallel with the conversion, and obtains the characteristic information of the character string object from the result of the inspection. In response to the end of the generation, the process proceeds to step 310.
In step 310, the computer system adds the obtained characteristic information to the generated character string object (result object). In response to the end of the addition, the process proceeds to step 311 and the process of generating the character string object ends.

図３Ｂ及び図３Ｃは、本発明の実施態様における、図１に示す文字列オブジェクトに含まれる文字列について演算処理を実行する処理のフローチャートの例を示す。 FIG. 3B and FIG. 3C show examples of flowcharts of processing for executing arithmetic processing on the character string included in the character string object shown in FIG. 1 in the embodiment of the present invention.

図３Ｂは、本発明の実施態様における、図１に示す文字列オブジェクトに含まれる文字列を１文字ずつ逐次処理することを含む演算処理（以下、第１の文字列処理）のフローチャートの例を示す。
第１の文字列処理は、特性情報に応じて最適化された処理及び特性情報を付加する処理を、ライブラリに用意されている従来の処理に、クラス・ライブラリ作成者が追加した処理である。従来の処理とは、例えば、引数として受け取った少なくとも１の文字列オブジェクト（元オブジェクト）に含まれる文字列を１文字ずつ逐次処理し、該逐次処理された文字列を他の少なくとも１の文字列オブジェクト（結果オブジェクト）に含まれる文字列とする処理である。
図３Ｂ中のステップ３２６〜ステップ３２８の処理が上記第２の態様に対応し、ステップ３２４の処理が上記第３の態様に対応する。 FIG. 3B is an example of a flowchart of arithmetic processing (hereinafter referred to as first character string processing) including sequentially processing the character string included in the character string object shown in FIG. 1 character by character in the embodiment of the present invention. Show.
The first character string process is a process in which a class library creator adds a process optimized according to characteristic information and a process for adding characteristic information to the conventional process prepared in the library. The conventional processing is, for example, sequentially processing a character string included in at least one character string object (original object) received as an argument one character at a time, and processing the sequentially processed character string into another at least one character string. This is processing for converting a character string included in an object (result object).
The process of step 326 to step 328 in FIG. 3B corresponds to the second aspect, and the process of step 324 corresponds to the third aspect.

ステップ３２１は、第１の文字列処理の開始を表す。
ステップ３２２では、コンピュータ・システムは、元オブジェクトに付加された特性情報に、最適化された処理を実行することが可能である値が設定されているかどうかを判定する。最適化された処理を実行することが可能である値が設定されている場合、該処理はステップ３２３に進む。最適化された処理を実行することが可能である値が設定されていない場合、該処理はステップ３２６に進む。
ステップ３２３では、コンピュータ・システムは、上記特性情報に応じて最適化された処理を実行する。該実行の終了に応じて、該処理はステップ３２４に進む。
ステップ３２４では、コンピュータ・システムは、結果オブジェクトに伝搬される特性情報を算出する。該算出は、第３の態様を用いて行われる。算出の方法は、上記最適化された処理の内容により異なる。コンピュータ・システムは、例えば上記最適化された処理に大文字を小文字に変換する処理が含まれる場合、コンピュータ・システムは、元オブジェクトに含まれる特性情報に「大文字を含まない」という情報を表す特性情報が付加された特性情報を算出する。上記算出に応じて、該処理はステップ３２５に進む。
ステップ３２５では、コンピュータ・システムは、上記算出した特性情報を結果オブジェクトに付加する。該付加に応じて、該処理はステップ３３０に進み、第１の文字列処理は終了する。 Step 321 represents the start of the first character string processing.
In step 322, the computer system determines whether or not the characteristic information added to the original object has a value capable of executing the optimized process. If a value that can execute the optimized process is set, the process proceeds to step 323. When a value that can execute the optimized process is not set, the process proceeds to step 326.
In step 323, the computer system executes a process optimized according to the characteristic information. In response to the end of the execution, the process proceeds to step 324.
In step 324, the computer system calculates characteristic information that is propagated to the result object. The calculation is performed using the third aspect. The calculation method varies depending on the contents of the optimized processing. When the computer system includes, for example, a process of converting uppercase letters to lowercase letters in the optimized process, the computer system displays characteristic information representing information “not including uppercase letters” in the characteristic information included in the original object. The characteristic information to which is added is calculated. In accordance with the calculation, the process proceeds to step 325.
In step 325, the computer system adds the calculated characteristic information to the result object. In response to the addition, the process proceeds to step 330, and the first character string process ends.

ステップ３２６では、コンピュータ・システムは、従来の処理である元オブジェクトに含まれる文字列を１文字ずつ処理する逐次処理を実行する。また、コンピュータ・システムは、該逐次処理と並行して、元オブジェクトに含まれる文字列の文字について、一文字ずつ特性を検査（以下、第１の検査）する。また、コンピュータ・システムは、結果オブジェクトの特性情報を検査（以下、第２の検査）する。上記逐次処理、第１の検査及び第２の検査の終了に応じて、該処理はステップ３２７に進む。
ステップ３２７では、コンピュータ・システムは、「未検査」を意味する特性情報が付加された元オブジェクト（以下、第１の元オブジェクト）があるかどうかを判定する。コンピュータ・システムは、さらに付加されている特性情報と異なる特性情報が第１の検査によって求められた元オブジェクト（第２の元オブジェクト）があるかどうかを判定する。上記第１の元オブジェクト又は上記第２の元オブジェクトがある場合、該処理はステップ３２８に進む。上記第１の元オブジェクト及び上記第２の元オブジェクトがない場合、該処理はステップ３２９に進む。
ステップ３２８では、コンピュータ・システムは、第１の検査の結果から元オブジェクトに付加する特性情報を決定する。そして、コンピュータ・システムは、該決定した特性情報を元オブジェクトに付加する。該付加の終了に応じて、該処理はステップ３２９に進む。
ステップ３２９では、コンピュータ・システムは、第１の検査及び第２の検査の結果から、結果オブジェクトの特性情報を求める。コンピュータ・システムは、上記求めた特性情報を結果オブジェクトに付加する。該付加の終了に応じて、該処理はステップ３３０に進み、第１の文字列処理は終了する。
ステップ３３０は、第１の文字列処理の終了を表す。 In step 326, the computer system executes a sequential process that processes the character string included in the original object one character at a time, which is a conventional process. Further, in parallel with the sequential processing, the computer system inspects the character of the character string included in the original object character by character (hereinafter referred to as a first inspection). Further, the computer system inspects the characteristic information of the result object (hereinafter, second inspection). In accordance with the end of the sequential processing, the first inspection, and the second inspection, the processing proceeds to step 327.
In step 327, the computer system determines whether there is an original object (hereinafter referred to as a first original object) to which characteristic information indicating “unchecked” is added. The computer system further determines whether there is an original object (second original object) whose characteristic information different from the added characteristic information is obtained by the first inspection. If there is the first original object or the second original object, the process proceeds to step 328. If there is no first original object and no second original object, the process proceeds to step 329.
In step 328, the computer system determines characteristic information to be added to the original object from the result of the first inspection. Then, the computer system adds the determined characteristic information to the original object. In response to the end of the addition, the process proceeds to step 329.
In step 329, the computer system obtains characteristic information of the result object from the results of the first inspection and the second inspection. The computer system adds the obtained characteristic information to the result object. In response to the end of the addition, the process proceeds to step 330, and the first character string process ends.
Step 330 represents the end of the first character string processing.

図３Ｃは、本発明の実施態様における、図１に示す文字列オブジェクトに含まれる文字列をまとめて処理することを含む演算処理（以下、第２の文字列処理）のフローチャートの例を示す。
第２の文字列処理は、特性情報に応じて最適化された処理及び特性情報を付加する処理を、ライブラリに用意されている従来の処理にクラス・ライブラリ作成者が追加した処理である。また、上記従来の処理は、例えば引数として受け取った少なくとも１の文字列オブジェクト（元オブジェクト）に含まれる文字列をまとめて処理し、該処理された文字列を他の少なくとも１の文字列オブジェクト（結果オブジェクト）に含まれる文字列とする処理とする。図３Ｃ中のステップ３４４の処理が、第３の態様に対応する。 FIG. 3C shows an example of a flowchart of a calculation process (hereinafter referred to as a second character string process) including collectively processing the character strings included in the character string object shown in FIG. 1 in the embodiment of the present invention.
The second character string process is a process in which the class library creator adds a process optimized according to the characteristic information and a process for adding the characteristic information to the conventional process prepared in the library. Further, the conventional processing described above processes, for example, character strings included in at least one character string object (original object) received as an argument, and processes the processed character string into at least one other character string object ( The character string included in the result object). The process of step 344 in FIG. 3C corresponds to the third mode.

ステップ３４１は、第２の文字列処理の開始を表す。
ステップ３４２では、コンピュータ・システムは、元オブジェクトに付加された特性情報に、最適化された処理を実行することが可能である値が設定されているかどうかを判定する。最適化された処理を実行することが可能である値が設定されている場合、該処理はステップ３４３に進む。最適化された処理を実行することが可能である値が設定されていない場合、該処理はステップ３４６に進む。
ステップ３４３では、コンピュータ・システムは、上記特性情報に応じて最適化された処理を実行する。該実行の終了に応じて、該処理はステップ３４４に進む。
ステップ３４４では、コンピュータ・システムは、結果オブジェクトに伝搬される特性情報を算出する。該算出は、第３の態様により行われる。算出の方法は、上記最適化された処理の内容により異なる。コンピュータ・システムは、例えば上記最適化された処理に先頭から５文字を抽出する処理が含まれ且つ、元オブジェクトに含まれる特性情報が「大文字しか含まない」ことを表す場合、コンピュータ・システムは、元オブジェクトに含まれる特性情報伝搬される特性情報を算出する。上記算出に応じて、該処理はステップ３４５に進む。
ステップ３２５では、コンピュータ・システムは、上記算出した特性情報を結果オブジェクトに付加する。該付加に応じて、該処理はステップ３４７に進み、第２の文字列処理は終了する。 Step 341 represents the start of the second character string processing.
In step 342, the computer system determines whether or not the characteristic information added to the original object is set to a value that can execute the optimized process. If a value that can execute the optimized process is set, the process proceeds to step 343. If a value that can execute the optimized process is not set, the process proceeds to step 346.
In step 343, the computer system executes a process optimized according to the characteristic information. In response to the end of the execution, the processing proceeds to step 344.
In step 344, the computer system calculates characteristic information that is propagated to the result object. The calculation is performed according to the third aspect. The calculation method varies depending on the contents of the optimized processing. When the computer system includes, for example, a process of extracting five characters from the head in the optimized process, and the characteristic information included in the original object indicates “contains only capital letters”, the computer system Characteristic information that is propagated by the characteristic information included in the original object is calculated. In accordance with the calculation, the process proceeds to step 345.
In step 325, the computer system adds the calculated characteristic information to the result object. In response to the addition, the process proceeds to step 347, and the second character string process ends.

ステップ３４６では、コンピュータ・システムは、従来の処理である元オブジェクトに含まれる文字列をまとめて処理する処理を実行する。該実行の終了に応じて、該処理はステップ３４７に進み、第２の文字列処理は終了する。 In step 346, the computer system executes a process of collectively processing character strings included in the original object, which is a conventional process. In response to the end of the execution, the process proceeds to step 347, and the second character string process ends.

図４Ａ〜図４Ｆは、本発明の実施例を説明するためのプログラム・コード及び状態遷移図を示す。 4A to 4F show a program code and a state transition diagram for explaining an embodiment of the present invention.

図４Ａは、本発明の実施態様における、上記例１の実施例を説明するためのプログラム・コードを示す。 FIG. 4A shows a program code for explaining the example of Example 1 in the embodiment of the present invention.

以下では、プログラム・コード（４０１）を例に、オブジェクトの生成に応じて特性情報が付加される流れ、及び該付加により最適化された処理が実行される例を説明する。
コンピュータ・システムが、プログラム・コード（４０１）を実行すると、最初に、newString("ABC")が実行される。ここで、newString("ABC")はオブジェクトを生成する処理に該当するため、図３Ａで示すフローチャートの各処理が実行される。
ステップ３０２では、文字列オブジェクトに代入される文字列"ABC"が特性情報を含んでいないため、該処理はステップ３０６に進む。
ステップ３０６では、文字列"ABC"はリテラルでありchar型に変換する必要があるため、該処理はステップ３０９に進む。
ステップ３０９では、コンピュータ・システムは、文字列"ABC"が代入される文字列オブジェクトsを生成する。該生成において、コンピュータ・システムは、文字列"ABC"を１文字ずつchar型に変換しながらヒープメモリ上に配置する。また、コンピュータ・システムは、上記変換と並行して１文字ずつ文字列"ABC"の特性を検査する。ここで、該検査する特性が、「小文字が含まれない」という特性だとする。上記検査の結果、"A"、"B"及び"C"は小文字ではないため、コンピュータ・システムは、文字列"ABC"は小文字が含まれない文字列であると判定する。そして、該判定により、文字列オブジェクトsの特性情報として「全ての文字が小文字を含まない」を表す値が求められる。上記生成の終了に応じて、該処理がステップ３１０に進む。
ステップ３１０では、上記「全ての文字が小文字を含まない」を表す値を、例えば「01」とする場合、コンピュータ・システムは、文字列オブジェクトsに特性情報「01」を付加する。該付加の終了に応じて、該処理はステップ３１１に進み、オブジェクトを生成する処理は終了する。 In the following, a program code (401) will be described as an example, and a flow in which characteristic information is added according to object generation and an example in which processing optimized by the addition will be executed will be described.
When the computer system executes the program code (401), first, newString ("ABC") is executed. Here, since newString ("ABC") corresponds to the process of generating an object, each process of the flowchart shown in FIG. 3A is executed.
In step 302, since the character string “ABC” to be assigned to the character string object does not include the characteristic information, the process proceeds to step 306.
In step 306, since the character string “ABC” is a literal and needs to be converted to a char type, the process proceeds to step 309.
In step 309, the computer system generates a character string object s to which the character string “ABC” is substituted. In the generation, the computer system arranges the character string “ABC” on the heap memory while converting the character string “ABC” character by character into the char type. Further, the computer system checks the characteristics of the character string “ABC” character by character in parallel with the conversion. Here, it is assumed that the characteristic to be inspected is a characteristic that “lower case is not included”. As a result of the above inspection, since “A”, “B”, and “C” are not lowercase letters, the computer system determines that the character string “ABC” is a character string that does not include lowercase letters. As a result of the determination, a value representing “all characters do not include lowercase letters” is obtained as the characteristic information of the character string object s. In response to the end of the generation, the process proceeds to step 310.
In step 310, if the value representing “all characters do not include lowercase letters” is set to “01”, for example, the computer system adds the characteristic information “01” to the character string object s. In response to the end of the addition, the process proceeds to step 311 and the process of generating an object ends.

上記オブジェクトを生成する処理が終了すると、toUpperCase()メソッドが実行される。toUpperCase()メソッドは、文字列に含まれる文字を全て大文字に変換するメソッドである。
ここで、クラス・ライプラリ作成者が、例えば特性情報「01」が付加された文字列がtoUpperCase()メソッドに与えられた場合、与えられた文字列をそのまま返値として返す実装をtoUpperCase()メソッドの実装部分に追加していたとする。上記実装が追加されている場合、コンピュータ・システムは、上記実行において、文字列を大文字化する従来の処理を全く行う必要がない。コンピュータ・システムは、与えられた特性情報「01」が付加された文字列の文字列"ABC"をそのままtoUpperCase()メソッドの返値として返すことができる。よって、toUpperCase()メソッドの処理は、高速化される。 When the process of generating the object is completed, the toUpperCase () method is executed. The toUpperCase () method is a method that converts all characters in the string to uppercase.
Here, when the class library creator gives a character string with the characteristic information “01” added to the toUpperCase () method, for example, the toUpperCase () method implements an implementation that returns the given character string as it is Suppose that it was added to the implementation part. When the above implementation is added, the computer system does not need to perform the conventional process of capitalizing the character string at all in the execution. The computer system can return the character string “ABC” to which the given characteristic information “01” is added as it is as a return value of the toUpperCase () method. Therefore, the processing of the toUpperCase () method is speeded up.

図４Ｂは、本発明の実施態様における、上記例２の実施例を説明するためのプログラム・コードを示す。
プログラム・コード（４１１）は、StringRepleaceFirst()メソッドの実装部分である。StringRepleaceFirst()メソッドの従来の処理（４１２）では、Pattern.compile()メソッドが実行され、大量のオブジェクトが生成されうる。
ここで、StringRepleaceFirst()メソッドの引数で渡される文字列に正規表現の特殊文字が含まれない場合、上記Pattern.compile()メソッドによって実行される処理は、他のメソッドを用いた処理に置き換えることが可能である。上記正規表現の特殊文字とは、例えば正規表現“[a-z]+[0-9]”における、１回以上の繰り返しを表す“+”、文字範囲指定の開始と終了を表す“[”又は“]”であるがこれらに限られない。他のメソッドを用いた処理は、例えばindexOf()メソッドとsubstring()とを組み合わせたコードによる。上記組み合わせたコードによって実行される処理は、大量のオブジェクトを生成せず、Pattern.compile()メソッドによって実行される処理よりも高速で実行されうる。
よって、クラス・ライブラリ作成者は、例えばStringRepleaceFirst()メソッドに、引数で渡された文字列の特性情報が「正規表現の特殊文字列が含まれないこと」を表す場合のコード（４１２）を追加する。該追加により、引数で渡された文字列が正規表現を含まない場合、StringRepleaceFirst()メソッドは、従来の処理が実行される場合に比べ高速に実行されうる。 FIG. 4B shows a program code for explaining the example of the second example in the embodiment of the present invention.
The program code (411) is an implementation part of the StringRepleaceFirst () method. In the conventional processing (412) of the StringRepleaceFirst () method, the Pattern.compile () method is executed, and a large number of objects can be generated.
Here, if the string passed in the argument of StringRepleaceFirst () method does not include special characters of regular expression, the process executed by the above Pattern.compile () method should be replaced with the process using another method. Is possible. The special character of the regular expression is, for example, “+” representing one or more repetitions in the regular expression “[az] + [0-9]”, “[” or “ ] ”But is not limited to these. Processing using other methods is based on a code combining, for example, the indexOf () method and substring (). The process executed by the combined code does not generate a large amount of objects and can be executed at a higher speed than the process executed by the Pattern.compile () method.
Therefore, the class library creator, for example, added a code (412) to the StringRepleaceFirst () method when the character string characteristic information passed as an argument indicates that “regular expression special character string is not included” To do. With this addition, when the character string passed as an argument does not include a regular expression, the StringRepleaceFirst () method can be executed at a higher speed than when conventional processing is executed.

図４Ｃは、本発明の実施態様における、下記図４Ｄの説明におけるプログラムに関する状態遷移図を示す。
状態遷移図（４２１）は、文字列に含まれる正規表現“[a-z]+[0-9]”についてパターンマッチを行うプログラム・コードの遷移を表す。プログラマーは、上記プログラム・コードを実装するステートマシンを作成することで、正規表現“[a-z]+[0-9]”に対する処理が従来のパターンマッチ・プログラムよりも高速化したパターンマッチ・プログラムを作成することができる。 FIG. 4C shows a state transition diagram regarding the program in the description of FIG. 4D below in the embodiment of the present invention.
The state transition diagram (421) represents a transition of a program code that performs pattern matching for the regular expression “[az] + [0-9]” included in the character string. By creating a state machine that implements the above program code, the programmer can create a pattern match program whose processing on the regular expression “[az] + [0-9]” is faster than the conventional pattern match program. Can be created.

図４Ｄは、本発明の実施態様における、上記例３の実施例を説明するためのプログラム・コードを示す。
プログラム・コード（４３１）は、引数で渡される文字列が８ビット値に限定されている場合のステートマシン（図４Ｃを参照）の実装の例である。プログラム・コード（４３２）は、引数で渡される文字列が８ビット値に限定されていない場合のステートマシンの実装の例である。引数で渡される文字列が８ビット値に限定されていない場合、プログラマーは、プログラム・コード（４３２）に示す、例えばcase文による実装で文字列に含まれる文字と正規表現とのマッチングを実現させる。引数で渡される文字列が８ビット値に限定されている場合、プログラマーは、上記case文による実装に代えて、プログラム・コード（４３１）に示す、例えばfor文による単純なループによって、上記マッチングを実現させうる。 FIG. 4D shows a program code for explaining the example of Example 3 in the embodiment of the present invention.
The program code (431) is an example of implementation of a state machine (see FIG. 4C) when a character string passed as an argument is limited to an 8-bit value. The program code (432) is an example of implementation of a state machine when a character string passed as an argument is not limited to an 8-bit value. When the character string passed as an argument is not limited to an 8-bit value, the programmer realizes the matching between the character contained in the character string and the regular expression by the implementation by the case statement shown in the program code (432), for example. . If the character string passed as an argument is limited to an 8-bit value, the programmer replaces the implementation with the case statement with the simple matching loop shown in the program code (431), for example with the for statement. Can be realized.

図４Ｅは、本発明の実施態様における、上記例４の実施例を説明するためのプログラム・コードを示す。 FIG. 4E shows a program code for explaining the example of Example 4 in the embodiment of the present invention.

プログラム・コード（４４１）は、java.net.URLDecoder.decode()メソッドに本発明の実施態様を適用したプログラム・コードの例である。従来コード（４４２）が、従来の処理に対応し、追加コード（４４３〜４４６）が本発明の実施態様を適用するために追加した処理に対応する。
java.net.URLDecoder.decode()メソッドは、文字列を１文字ずつ読み込み、該読み込んだ文字列内のＵＲＬエンコードの特殊文字を本来の文字に戻すメソッドである。java.net.URLDecoder.decode()メソッドは、第１の文字列処理に該当する。
java.net.URLDecoder.decode()メソッドが実行されると、図３Ｂで示すフローチャートの各処理が実行される。
追加コード（４４４）の処理が、ステップ３２２の処理に対応する。追加コード（４４４）では、引数で渡された文字列オブジェクトsに含まれる特性情報に、最適化された処理を実行することが可能である値が設定されているかどうかが判定される。
特性情報が、例えば「ＵＲＬエンコードにおける特殊文字を含まない」を表す値の場合、最適化された処理を実行することが可能であると判定される。文字列がＵＲＬエンコードにおける特殊文字を含まない場合、ステップ３２３及びステップ３２４に対応する処理は、java.net.URLDecoder.decode()メソッドにはないため、ステップ３２５に対応する「return s」が実行され、java.net.URLDecoder.decode()メソッドは終了する。 The program code (441) is an example of a program code in which the embodiment of the present invention is applied to the java.net.URLDecoder.decode () method. The conventional code (442) corresponds to the conventional process, and the additional code (443 to 446) corresponds to the process added to apply the embodiment of the present invention.
The java.net.URLDecoder.decode () method is a method for reading a character string character by character and returning a URL-encoded special character in the read character string to the original character. The java.net.URLDecoder.decode () method corresponds to the first character string processing.
When the java.net.URLDecoder.decode () method is executed, each process of the flowchart shown in FIG. 3B is executed.
The process of the additional code (444) corresponds to the process of step 322. In the additional code (444), it is determined whether or not a value capable of executing the optimized process is set in the characteristic information included in the character string object s passed as an argument.
For example, when the characteristic information is a value representing “not including special characters in URL encoding”, it is determined that the optimized process can be executed. If the character string does not include special characters in URL encoding, the processing corresponding to step 323 and step 324 is not in the java.net.URLDecoder.decode () method, so “return s” corresponding to step 325 is executed. Then the java.net.URLDecoder.decode () method ends.

上記可能ではないと判定された場合、ステップ３２６に対応する従来の処理である従来コード（４４２）の処理が実行される。コード（４４２）の処理では、ＵＲＬエンコードにおける特殊文字である「＋」及び「％」が、文字列オブジェクトsに含まれる文字列に含まれているかが１文字ずつチェックされる。該チェックの結果は、変数needToChangeにセットされる。なお、java.net.URLDecoder.decode()メソッドにおいては、上記チェックは、第１の検査を兼ねることができる。よって、追加コード（４４５）において、文字列オブジェクトsに特性情報が付加されていない場合、変数needToChangeが利用され、文字列オブジェクトsに特性情報が設定される。追加コード（４４５）の処理は、ステップ３２７及びステップ３２８に対応する。
また、java.net.URLDecoder.decode()メソッドの結果は、ＵＲＬエンコードにおける特殊文字を含まないことが自明である。よって、追加コード（４４６）において、返値としてsb.toString()で生成された文字列に、「ＵＲＬエンコードにおける特殊文字を含まない」を表す特性情報が設定される。追加コード（４４６）の処理は、ステップ３２９に対応する。 When it is determined that the above is not possible, the processing of the conventional code (442), which is the conventional processing corresponding to step 326, is executed. In the processing of the code (442), it is checked character by character whether the special characters “+” and “%” in URL encoding are included in the character string included in the character string object s. The result of the check is set in the variable needToChange. In the java.net.URLDecoder.decode () method, the above check can also serve as the first check. Therefore, in the additional code (445), when the characteristic information is not added to the character string object s, the variable needToChange is used and the characteristic information is set to the character string object s. The processing of the additional code (445) corresponds to step 327 and step 328.
In addition, it is obvious that the result of the java.net.URLDecoder.decode () method does not include special characters in URL encoding. Therefore, in the additional code (446), characteristic information representing “not including special characters in URL encoding” is set in the character string generated by sb.toString () as a return value. The processing of the additional code (446) corresponds to step 329.

また、ステップ３２８において、文字列オブジェクトsには、特性情報が付加される。よって、文字列オブジェクトsを引数として再度URLDecoder.decode()メソッドが実行されたときに、最適化された処理が実行される可能性がある。 In step 328, characteristic information is added to the character string object s. Therefore, when the URLDecoder.decode () method is executed again with the character string object s as an argument, the optimized process may be executed.

図４Ｆは、本発明の実施態様における、図３Ａに示すフローチャートの処理を実装するプログラム・コードの例を示す。
以下では、プログラム・コード（４５１）を例に、オブジェクト生成時に特性情報が付加される流れ及び、特性情報が伝搬する流れを説明する。
コンピュータ・システムが、プログラム・コード（４５１）を実行すると、最初に、newString("ABC")が実行される。ここで、newString("ABC")は、オブジェクトを生成する処理に該当するので、図３Ａで示すフローチャートの各処理が実行される。
ステップ３０２では、文字列オブジェクトに代入される文字列"ABC"が特性情報を含んでいないため、該処理はステップ３０６に進む。
ステップ３０６では、文字列"ABC"はリテラルでありchar型に変換する必要があるため、該処理はステップ３０９に進む。
ステップ３０９では、コンピュータ・システムは、文字列"ABC"が代入される文字列オブジェクトs1を生成する。該生成において、コンピュータ・システムは、文字列"ABC"を１文字ずつchar型に変換しながらヒープメモリ上に配置する。また、コンピュータ・システムは、上記変換と並行して１文字ずつ文字列"ABC"の特性を検査する。ここで、該検査する特性が、「小文字が含まれない」という特性だとする。上記検査の結果、"A"、"B"及び"C"は小文字ではないため、コンピュータ・システムは、文字列"ABC"は小文字が含まれない文字列であると判定する。そして、該判定により、文字列オブジェクトs1の特性情報として「全ての文字が小文字を含まない」を表す値が求められる。上記生成の終了に応じて、該処理がステップ３１０に進む。
ステップ３１０では、上記「全ての文字が小文字を含まない」を表す値を、例えば「01」とする場合、コンピュータ・システムは、文字列オブジェクトs1に特性情報「01」を付加する。該付加の終了に応じて、該処理はステップ３１１に進み、オブジェクトs1を生成する処理は終了する。 FIG. 4F shows an example of program code for implementing the processing of the flowchart shown in FIG. 3A in the embodiment of the present invention.
In the following, the flow of adding characteristic information at the time of object generation and the flow of propagation of characteristic information will be described using the program code (451) as an example.
When the computer system executes the program code (451), first, newString ("ABC") is executed. Here, newString ("ABC") corresponds to the process of generating an object, and thus each process of the flowchart shown in FIG. 3A is executed.
In step 302, since the character string “ABC” to be assigned to the character string object does not include the characteristic information, the process proceeds to step 306.
In step 306, since the character string “ABC” is a literal and needs to be converted to a char type, the process proceeds to step 309.
In step 309, the computer system generates a character string object s1 to which the character string “ABC” is substituted. In the generation, the computer system arranges the character string “ABC” on the heap memory while converting the character string “ABC” character by character into the char type. Further, the computer system checks the characteristics of the character string “ABC” character by character in parallel with the conversion. Here, it is assumed that the characteristic to be inspected is a characteristic that “lower case is not included”. As a result of the above inspection, since “A”, “B”, and “C” are not lowercase letters, the computer system determines that the character string “ABC” is a character string that does not include lowercase letters. As a result of the determination, a value representing “all characters do not include lowercase letters” is obtained as the characteristic information of the character string object s1. In response to the end of the generation, the process proceeds to step 310.
In step 310, when the value representing “all characters do not include lowercase letters” is set to “01”, for example, the computer system adds the characteristic information “01” to the character string object s1. In response to the end of the addition, the process proceeds to step 311 and the process of generating the object s1 is ended.

次に、new String("XYZ")が実行される。ここで、new String("XYZ")は、オブジェクトを生成する処理に該当するため、図３Ａで示すフローチャートの各処理が実行される。
ステップ３０２では、文字列オブジェクトに代入される文字列"XYZ"が特性情報を含んでいないため、該処理はステップ３０６に進む。
ステップ３０６では、文字列"XYZ"はリテラルでありchar型に変換する必要があるため、該処理はステップ３０９に進む。
ステップ３０９では、コンピュータ・システムは、文字列"XYZ"が代入される文字列オブジェクトs2を生成する。該生成において、コンピュータ・システムは、文字列"XYZ"を１文字ずつchar型に変換しながらヒープメモリ上に配置する。また、コンピュータ・システムは、上記変換と並行して１文字ずつ文字列"XYZ"の特性を検査する。ここで、該検査する特性が、「小文字が含まれない」という特性だとする。上記検査の結果、"X"、"Y"及び"Z"は小文字ではないため、コンピュータ・システムは、文字列"XYZ"は小文字が含まれない文字列であると判定する。そして、該判定により、文字列オブジェクトs2の特性情報として「全ての文字が小文字を含まない」を表す値が求められる。上記生成の終了に応じて、該処理がステップ３１０に進む。
ステップ３１０では、上記「全ての文字が小文字を含まない」を表す値を、例えば「01」とする場合、コンピュータ・システムは、文字列オブジェクトs2に特性情報「01」を対応付ける。該対応付けの終了に応じて、該処理はステップ３１１に進み、オブジェクトs2を生成する処理は終了する。 Next, new String ("XYZ") is executed. Here, since new String ("XYZ") corresponds to the process of generating an object, each process of the flowchart shown in FIG. 3A is executed.
In step 302, since the character string “XYZ” to be assigned to the character string object does not include the characteristic information, the process proceeds to step 306.
In step 306, since the character string “XYZ” is a literal and needs to be converted to a char type, the process proceeds to step 309.
In step 309, the computer system generates a character string object s2 to which the character string “XYZ” is substituted. In the generation, the computer system arranges the character string “XYZ” on the heap memory while converting the character string “XYZ” character by character to the char type. Further, the computer system checks the characteristics of the character string “XYZ” character by character in parallel with the above conversion. Here, it is assumed that the characteristic to be inspected is a characteristic that “lower case is not included”. As a result of the above examination, since “X”, “Y”, and “Z” are not lowercase letters, the computer system determines that the character string “XYZ” is a character string that does not include lowercase letters. As a result of the determination, a value representing “all characters do not include lowercase letters” is obtained as the characteristic information of the character string object s2. In response to the end of the generation, the process proceeds to step 310.
In step 310, when the value representing “all characters do not include lowercase letters” is set to “01”, for example, the computer system associates the characteristic information “01” with the character string object s 2. In response to the end of the association, the process proceeds to step 311 and the process of generating the object s2 ends.

最後に、concat()メソッドが実行される。concat()メソッドの実行により、文字列オブジェクトs1に含まれる文字列と文字列オブジェクトs2に含まれる文字列とを連結した、文字列"ABCXYZ"を含むStringオブジェクトが生成される。また、該生成時に、クラス・ライブラリ作成者が、あらかじめconcat()メソッドに用意しておいたmergeAttr(attr, attr)が呼び出される。文字列オブジェクトs1及び文字列オブジェクトs2に含まれる特性情報はともに「全ての文字が小文字を含まない」を表す「01」である。よって、「01」と「01」との論理和「01」が上記文字列"ABCXYZ"を含むStringオブジェクトに付加される、そして、該オブジェクトが、s1.concat(s2)の返値として返される。 Finally, the concat () method is executed. By executing the concat () method, a String object including the character string “ABCXYZ” is generated by concatenating the character string included in the character string object s1 and the character string included in the character string object s2. At the time of generation, mergeAttr (attr, attr) prepared by the class library creator in advance in the concat () method is called. The characteristic information included in the character string object s1 and the character string object s2 is both “01” representing “all characters do not include lowercase letters”. Therefore, the logical sum “01” of “01” and “01” is added to the String object including the character string “ABCXYZ”, and the object is returned as the return value of s1.concat (s2). .

図５は、本発明の実施態様における、プログラム実行時の文字列の処理を最適化するコンピュータ・システムが有する機能を図示する機能ブロック図の例を示す。
コンピュータ・システム（５０１）は、メモリ（５０２）、判定部（５０３）、対応付け部（５０４）及び実行部（５０５）を含む。 FIG. 5 shows an example of a functional block diagram illustrating functions of a computer system that optimizes character string processing during program execution in the embodiment of the present invention.
The computer system (501) includes a memory (502), a determination unit (503), an association unit (504), and an execution unit (505).

メモリ（５０２）は、アプリケーションが自由に利用できる記憶領域を有する主記憶装置である。上記記憶領域には、プログラム（５０９）、少なくとも１の文字列（５０８ａ〜５０８ｎ）及び少なくとも１の文字列（５０８ａ〜５０８ｎ）それぞれに対応する少なくとも１の特性情報（５０７ａ〜５０７ｎ）が格納されうる。 A memory (502) is a main storage device having a storage area that an application can freely use. The storage area may store a program (509), at least one character string (508a to 508n) and at least one characteristic information (507a to 507n) corresponding to each of the at least one character string (508a to 508n). .

プログラム（５０９）は、少なくとも１つの文字列の処理（５１０）を含むプログラムである。プログラム（５０９）は、コンピュータ・システム（５０１）によって、メモリ（５０２）内に読み込まれ、そして中央演算処理装置（ＣＰＵ）で実行される。
文字列の処理（５１０）は、少なくとも１の最適化された処理（５１１）及び従来の処理（５１２）を含む処理である。
最適化された処理（５１１）及び従来の処理（５１２）は、少なくとも１の文字列（５０８ａ〜５０８ｎ）を使用して演算を行う処理である。
最適化された処理（５１１）は、ある特性情報と対応付けられた文字列について実行される場合に、従来の処理（５１２）よりも高速で実行されうる。 The program (509) is a program including processing (510) of at least one character string. The program (509) is read into the memory (502) by the computer system (501) and executed by the central processing unit (CPU).
The string processing (510) is processing including at least one optimized processing (511) and conventional processing (512).
The optimized process (511) and the conventional process (512) are processes that perform calculations using at least one character string (508a to 508n).
The optimized process (511) can be executed at higher speed than the conventional process (512) when it is executed for a character string associated with certain characteristic information.

文字列（５０８ａ〜５０８ｎ）は、プログラム（５０９）実行時にメモリに割り付けられるデータである。文字列（５０８ａ〜５０８ｎ）は、文字列の処理（５１０）の実行時において、例えば、生成、参照、更新又は削除されうる。
特性情報（５０７ａ〜５０７ｎ）は、文字列（５０８ａ〜５０８ｎ）の特性を表す情報である。特性情報（５０７ａ〜５０７ｎ）それぞれは、文字列（５０８ａ〜５０８ｎ）それぞれと対応する。 The character string (508a to 508n) is data assigned to the memory when the program (509) is executed. The character strings (508a to 508n) can be generated, referenced, updated, or deleted, for example, when the character string processing (510) is executed.
The characteristic information (507a to 507n) is information representing the characteristic of the character string (508a to 508n). Each of the characteristic information (507a to 507n) corresponds to each of the character strings (508a to 508n).

判定部（５０３）は、第１の文字列（５０８ａ）に関する演算処理が実行されることに応じて、第１の文字列（５０８ａ）についての特性と上記演算処理とから、第１の文字列（５０８ａ）及び上記演算処理の結果のデータである第２の文字列（５０８ｂ）のうち少なくとも１の文字列に対応する特性情報を判定する。
判定部（５０３）は、検査部（５０６）を含む。検査部（５０６）は、上記演算処理において、文字列（５０８ａ〜５０８ｎ）に対する逐次処理が実行されることに応じて、該逐次処理される文字の特性を１文字ずつ求める。判定部（５０３）は、検査部（５０６）によって求められた文字の特性から、逐次処理が実行された文字列（５０８ａ〜５０８ｎ）に対応する特性情報（５０７ａ〜５０７ｎ）を判定する。 The determination unit (503) determines the first character string from the characteristics of the first character string (508a) and the arithmetic processing in response to the execution of the arithmetic processing related to the first character string (508a). (508a) and the characteristic information corresponding to at least one character string among the second character string (508b) which is the data of the result of the arithmetic processing are determined.
The determination unit (503) includes an inspection unit (506). The inspection unit (506) obtains the characteristics of the sequentially processed characters one character at a time in accordance with the sequential processing performed on the character strings (508a to 508n) in the arithmetic processing. The determination unit (503) determines characteristic information (507a to 507n) corresponding to the character string (508a to 508n) subjected to the sequential processing from the character characteristics obtained by the inspection unit (506).

対応付け部（５０４）は、上記判定された特性情報を判定部（５０３）から受信し、第１の文字列（５０８ａ）又は第２の文字列（５０９ａ）に対応付ける。 The associating unit (504) receives the determined characteristic information from the determining unit (503) and associates it with the first character string (508a) or the second character string (509a).

実行部（５０５）は、最適化された処理（５１１）を含む文字列の処理（５１０）が実行されるときに、文字列の処理（５１０）によって処理される文字列（５０８ａ〜５０８ｎ）に対応する特性情報（５０７ａ〜５０７ｎ）を参照する。そして、実行部（５０５）は、参照した特性情報（５０７ａ〜５０７ｎ）が、最適化された処理（５１１）が実行される条件を満たす場合、最適化された処理（５１１）を実行させる。実行部（５０５）は、参照した特性情報（５０７ａ〜５０７ｎ）が、最適化された処理（５１１）が実行される条件を満たさない場合、例えば従来の処理（５１２）を実行させる。 When the character string process (510) including the optimized process (511) is executed, the execution unit (505) applies the character string (508a to 508n) processed by the character string process (510). The corresponding characteristic information (507a to 507n) is referred to. The execution unit (505) causes the optimized process (511) to be executed when the referred characteristic information (507a to 507n) satisfies the condition for executing the optimized process (511). When the referred characteristic information (507a to 507n) does not satisfy the condition for executing the optimized process (511), the execution unit (505) executes, for example, the conventional process (512).

図６は、本発明の実施態様における、図５に示すシステムのハードウェアのブロック図を示す。
コンピュータ・システム（６０１）は、ＣＰＵ（６０２）とメイン・メモリ（６０３）と含み、これらはバス（６０４）に接続されている。ＣＰＵ（６０２）は好ましくは、３２ビット又は６４ビットのアーキテクチャに基づくものであり、例えば、インテル社のＸｅｏｎ（商標）シリーズ、Ｃｏｒｅ（商標）シリーズ、Ａｔｏｍ（商標）シリーズ、Ｐｅｎｔｉｕｍ（商標）シリーズ、Ｃｅｌｅｒｏｎ（商標）シリーズ、ＡＭＤ社のＰｈｅｎｏｍ（商標）シリーズ、Ａｔｈｌｏｎ（商標）シリーズ、Ｔｕｒｉｏｎ（商標）シリーズ又はＳｅｍｐｒｏｎ（商標）を使用することができる。バス（６０４）には、ディスプレイ・コントローラ（６０５）を介して、ＬＣＤモニタなどのディスプレイ（６０６）が接続される。ディスプレイ（６０６）は、コンピュータ・システムの管理のために、通信回線を介してネットワークに接続されたコンピュータ・システムについての情報と、そのコンピュータ・システム上で動作中のソフトウェアについての情報を、適当なグラフィック・インタフェースで表示するために使用される。バス（６０４）にはまた、ＩＤＥ又はＳＡＴＡコントローラ（６０７）を介して、ハードディスク又はシリコン・ディスク（６０８）と、ＣＤ−ＲＯＭ、ＤＶＤドライブ又はＢＤドライブ（６０９）が接続される。 FIG. 6 shows a hardware block diagram of the system shown in FIG. 5 in the embodiment of the present invention.
The computer system (601) includes a CPU (602) and a main memory (603), which are connected to a bus (604). The CPU (602) is preferably based on a 32-bit or 64-bit architecture, such as Intel's Xeon (TM) series, Core (TM) series, Atom (TM) series, Pentium (TM) series, The Celeron (TM) series, the AMD Phenom (TM) series, the Athlon (TM) series, the Turion (TM) series or the Empron (TM) can be used. A display (606) such as an LCD monitor is connected to the bus (604) via a display controller (605). The display (606) appropriately displays information about a computer system connected to a network via a communication line and information about software running on the computer system for management of the computer system. Used for display with a graphic interface. The bus (604) is also connected to a hard disk or silicon disk (608) and a CD-ROM, DVD drive or BD drive (609) via an IDE or SATA controller (607).

ハードディスク（６０８）には、オペレーティング・システム、Ｊ２ＥＥなどのＪａｖａ（商標）処理環境を提供するプログラム、その他のプログラム及びデータが、メイン・メモリにロード可能に記憶されている。 The hard disk (608) stores an operating system, a program that provides a Java (trademark) processing environment such as J2EE, and other programs and data that can be loaded into the main memory.

ＣＤ−ＲＯＭ、ＤＶＤ又はＢＤドライブ（６０９）は、必要に応じて、ＣＤ−ＲＯＭ、ＤＶＤ−ＲＯＭ又はＢＤからプログラムをハードディスクに追加導入するために使用される。バス（６０４）にはさらに、キーボード・マウスコントローラ（６１０）を介して、キーボード（６１１）及びマウス（６１２）が接続されている。 The CD-ROM, DVD or BD drive (609) is used for additionally introducing a program from the CD-ROM, DVD-ROM or BD to the hard disk as necessary. A keyboard (611) and a mouse (612) are further connected to the bus (604) via a keyboard / mouse controller (610).

通信インタフェース（６１４）は、例えばイーサネット（商標）・プロトコルに従う。通信インタフェース（６１４）は、通信コントローラ（６１３）を介してバス（６０４）に接続され、コンピュータ・システム及び通信回線（６１５）を物理的に接続する役割を担い、コンピュータ・システムのオペレーティング・システムの通信機能のＴＣＰ／ＩＰ通信プロトコルに対して、ネットワーク・インタフェース層を提供する。なお、通信回線は、有線ＬＡＮ環境、或いは例えばＩＥＥＥ８０２．１１ａ／ｂ／ｇ／ｎなどの無線ＬＡＮ接続規格に基づく無線ＬＡＮ環境であってもよい。 The communication interface (614) follows, for example, the Ethernet (trademark) protocol. The communication interface (614) is connected to the bus (604) via the communication controller (613), and plays a role of physically connecting the computer system and the communication line (615), and the operating system of the computer system. A network interface layer is provided for the TCP / IP communication protocol of the communication function. The communication line may be a wired LAN environment or a wireless LAN environment based on a wireless LAN connection standard such as IEEE802.11a / b / g / n.

Claims

In a computer system that executes a program for processing the string, if the processing of the character string including a plurality of processing Oite during the program execution, a good process of execution efficiency among the plurality of processes A method of selecting and processing the character string, wherein when executing the program,
The determination unit of the computer system includes the characteristic information about at least one first character string, or the characteristic about at least one first character string and the processing of the at least one first character string, or at least the at least one A process for an area in which one first character string is stored (hereinafter referred to as a calculation process related to the first character string ), and a process related to the at least one first character string and the at least one first character string. Determining the characteristic information of at least one second character string as a result of the arithmetic processing, and storing the determined characteristic information in a memory included in the computer system;
The associating unit of the computer system converts the characteristic information stored in the memory into the at least one first character string when the characteristic information is characteristic information about the first character string, or Executing the step of associating with the at least one second character string if the property information is characteristic information about the second character string;
Execution unit of the computer system, when the determined quality information to process the character string associated with, among the plurality of processes, a good process of execution efficiency in accordance with the determined quality information Selecting and performing the processing of the string with the selected processing ,
The characteristic is a characteristic of at least one character included in the character string. The characteristic is an uppercase letter, a lowercase letter, a number, a two-byte character, an alphabet, of a character code character that it is a special character in URL encoding, or a, not a character listed above,
The characteristic information includes at least one first information indicating that the character string includes or does not include a character belonging to a character set that is a collection of characters having the above-described common characteristics, and the character string includes: Including the second information indicating that the character set does not belong to the character set, or not included,
The arithmetic processing for the at least one first character string is arithmetic processing for a character string executed in the program, and the arithmetic processing for the character string is a copy of the first character string, the first character string Search for an arbitrary character included in the character, conversion of the character included in the first character string into another character, extraction of an arbitrary character from the first character string, and a plurality of characters in the first character string The method, which is a combination of columns or a comparison between character strings of the plurality of first character strings.

The arithmetic processing related to the at least one first character string converts at least one first character string into a char type, and generates at least one second character string into which the converted char type is substituted; Sequentially processing the at least one first character string character by character at the time of generation;
The determining step further includes a step of executing the sequential processing and a step of inspecting the character that is sequentially processed together with the sequential processing, whereby characteristic information of the at least one second character string is determined. The method according to claim 1.

The arithmetic processing related to the at least one first character string is arithmetic processing related to at least one first character string;
The determining step determines the characteristic information of at least one second character string from the characteristic information about each of the at least one first character string and the arithmetic processing related to the at least one first character string. The method of claim 1, comprising steps.

The character set is a second character set that collectively represents at least two first character sets, the characters included in each of the at least two first character sets do not overlap, and each of the first information is , Indicating that the character string includes or does not include characters belonging to each of the first character sets, and the second information includes characters that the character string does not belong to all of the first character sets. The method according to claim 1, wherein the method represents being or not being included.

The character set is a second character set that collectively represents at least two first character sets, the characters included in each of the at least two first character sets do not overlap, and the first information is: The character string includes or does not include a character belonging to the second character set, and the second information includes a character that does not belong to the second character set, or The method according to claim 1, wherein the method is not included.

The method according to claim 1, wherein the first information and the second information are represented by a bit string, and each bit corresponds to a character set or a character set that does not belong to any character set.

The arithmetic processing related to the at least one first character string combines the first character string associated with the first characteristic information and the first character string associated with the second characteristic information. Arithmetic processing,
The step of determining includes the step of determining, as characteristic information of a second character string, characteristic information obtained by a logical sum of the first characteristic information and the second characteristic information. Method.

The arithmetic processing related to the at least one first character string is an arithmetic processing for deleting the first character string associated with the second characteristic information from the first character string associated with the first characteristic information. And
The method according to claim 6, wherein the determining includes determining the first characteristic information as characteristic information of a second character string.

The arithmetic processing related to the at least one first character string is an arithmetic processing for deleting the first character string associated with the second characteristic information from the first character string associated with the first characteristic information. And a character belonging to the character set represented by the second characteristic information is not included in the first character string associated with the first characteristic information by the calculation process of deleting the first character string. ,
The step of determining includes the step of determining, as characteristic information of a first character string, characteristic information obtained by a logical product of the first characteristic information and the second characteristic information. Method.

Determining the character string characteristic information for at least one of the at least one first character string and the second character string that is a result of the arithmetic processing relating to the at least one first character string; The step of determining the characteristic information by combining at least two of the determining step according to claim 7, the determining step according to claim 8, and the determining step according to claim 9. Method.

The arithmetic processing related to the at least one first character string converts to at least one first character string char type, and generates at least one second character string that substitutes the converted char type. Including
The method according to claim 1, wherein the determining includes determining characteristic information of the second character string from characteristic information about the at least one first character string.

The arithmetic processing related to the at least one first character string includes a process of generating a character string and substituting at least one first character string one character at a time into the at least one second character string at the time of generation,
The step of determining further includes the step of executing the process of substituting and the step of examining the character to be substituted together with the substituting, whereby characteristic information of the at least one second character string is determined; The method of claim 1.

2. The method according to claim 1, wherein in the programming language in which the character string is treated as an object, the arithmetic process related to the first character string is an arithmetic process related to the object, and the characteristic information is included in the object.

In a computer system that executes a program for processing the string, if the processing of the character string including a plurality of processing Oite during the program execution, a good process of execution efficiency among the plurality of processes A computer system for selecting and processing the character string,
At the time of executing the program, characteristics of at least one first character string and processing of the at least one first character string or processing for an area in which the at least one first character string is stored (hereinafter referred to as the first character string) . since one of the arithmetic processing related to the character string) and, at least one of said at least one first character string and the at least one first of the at least one second character string which is the result of the arithmetic processing related to the character string A determination unit for determining character string characteristic information for 1;
When the program is executed, the determined characteristic information is stored in the at least one first character string if the characteristic information is characteristic information about the first character string, or the characteristic information is the second characteristic information. In the case of the characteristic information about the character string, an association unit that associates with the at least one second character string;
In the case of processing a character string in which the determined quality information is associated, among the plurality of processes, select a good process of execution efficiency corresponding to the determined characteristic information, is the selection process And an execution unit for executing the processing of the character string ,
The characteristic is a characteristic of at least one character included in the character string. The characteristic is an uppercase letter, a lowercase letter, a number, a two-byte character, an alphabet, of a character code character that it is a special character in URL encoding, or a, not a character listed above,
The characteristic information includes at least one first information indicating that the character string includes or does not include a character belonging to a character set that is a collection of characters having the above-described common characteristics, and the character string includes: Including a character that does not belong to the character set, or second information indicating that it does not include,
The arithmetic processing for the at least one first character string is arithmetic processing for a character string executed in the program, and the arithmetic processing for the character string is a copy of the first character string, the first character string Search for an arbitrary character included in the character, conversion of the character included in the first character string into another character, extraction of an arbitrary character from the first character string, and a plurality of characters in the first character string The computer system, which is a combination of columns or a comparison of character strings of a plurality of the first character strings.

In a computer system that executes a program for processing a character string included in the character string object, if the processing of the character string including a plurality of processing Oite during the program execution, among the plurality of processes a method to select a good process of execution efficiency for processing the string, during the program execution,
The determination unit of the computer system includes the characteristic information about at least one first character string, or the characteristic about at least one first character string and the processing of the at least one first character string, or at least the at least one A process for an area in which one first character string is stored (hereinafter referred to as a calculation process related to the first character string ), and a process related to the at least one first character string and the at least one first character string. Determining the characteristic information of at least one second character string as a result of the arithmetic processing, and storing the determined characteristic information in a memory included in the computer system;
The associating unit of the computer system converts the characteristic information stored in the memory into the at least one first character string when the characteristic information is characteristic information about the first character string, or Executing the step of associating with the at least one second character string if the property information is characteristic information about the second character string;
Execution unit of the computer system, when the determined quality information to process the character string associated with, among the plurality of processing execution efficiency in accordance with the characteristic information of the determined string Selecting a good process and performing the process of the string with the selected process ,
The characteristic is a characteristic of at least one character included in the character string. The characteristic is an uppercase letter, a lowercase letter, a number, a two-byte character, an alphabet, of a character code character that it is a special character in URL encoding, or a, not a character listed above,
The characteristic information includes at least one first information indicating that the character string includes or does not include a character belonging to a character set that is a collection of characters having the above-described common characteristics, and the character string includes: Including a character that does not belong to the character set, or second information indicating that it does not include,
The arithmetic processing related to the at least one first character string converts at least one first character string into a char type and includes a character string including at least one second character string into which the converted char type is substituted. Generating an object, and sequentially processing the at least one first character string character by character at the time of generation,
The determining step further includes a step of executing the sequential processing and a step of inspecting the character that is sequentially processed together with the sequential processing, whereby characteristic information of the at least one second character string is determined. Said method.

In a computer system that executes a program for processing a character string included in the character string object, if the processing of the character string including a plurality of processing Oite during the program execution, among the plurality of processes A method of selecting a process with high execution efficiency and processing the character string,
The associating unit of the computer system generates a character string object on a memory included in the computer system when the program is executed, and executes a step of adding characteristic information to the character string object. ,
(1) When the characteristic information of the character string included in the generated character string object can be determined from the character string object of the generation source, or when it can be checked simultaneously with the sequential processing for the character string included in the character string object of the generation source, Adding the characteristic information obtained by the determination or the inspection to the generated character string object;
(2) When the characteristics of the character string are checked at the same time as the sequential processing for the character string included in the character string object that does not include the characteristic information, the generated character string includes the characteristic information obtained by the inspection. Adding to the object, or
(3) Characteristic information determined from the calculation source object and the content of the calculation process in response to the calculation process of at least one character string object (hereinafter referred to as the calculation source object) to which the characteristic information is added. Performing at least one of the steps of adding to at least one character string object (hereinafter referred to as a result object) in which the result of the arithmetic processing is stored,
Execution unit of the computer system, when the determined quality information to process the character string associated with, among the plurality of processing execution efficiency in accordance with the characteristic information of the determined string Selecting a good process and performing the process of the string with the selected process ,
The characteristic is a characteristic of at least one character included in the character string. The characteristic is an uppercase letter, a lowercase letter, a number, a two-byte character, an alphabet, of a character code character that it is a special character in URL encoding, or a, not a character listed above,
The characteristic information includes at least one first information indicating that the character string includes or does not include a character belonging to a character set that is a collection of characters having the above-described common characteristics, and the character string includes: Including a character that does not belong to the character set, or second information indicating that it does not include,
The arithmetic processing for the at least one first character string is arithmetic processing for a character string executed in the program, and the arithmetic processing for the character string is a copy of the first character string, the first character string Search for an arbitrary character included in the character, conversion of the character included in the first character string into another character, extraction of an arbitrary character from the first character string, and a plurality of characters in the first character string The method, which is a combination of columns or a comparison between character strings of the plurality of first character strings.