JP2008146162A

JP2008146162A - Canonical representation generation device and method and program

Info

Publication number: JP2008146162A
Application number: JP2006329679A
Authority: JP
Inventors: Takaaki Nakamura; 隆顕中村; Mitsunori Kori; 光則郡
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2006-12-06
Filing date: 2006-12-06
Publication date: 2008-06-26
Anticipated expiration: 2026-12-06
Also published as: JP4897454B2

Abstract

<P>PROBLEM TO BE SOLVED: To efficiently generate canonical representation for retrieving attribute values in a fixed range. <P>SOLUTION: An arithmetic part 102 calculates a first value whose most significant digit has the same value as the lower limit value shown by attribute range condition data, and whose other digits have the maximum values of each digit and a second value whose most significant value has the same value as the upper limit value shown by the attribute range condition data, and whose other digits have the minimum values of each digit on the basis of a format shown by attribute range condition data input by an attribute range condition input part 101. A canonical representation generation part 104 generates low order region data showing attribute values from the lower limit value to the first value with canonical representation and high order region data showing attribute values from the second value to the upper limit value as canonical representation, and when any attribute value exists between the first value and the second value, generates middle order region data showing the attribute value as canonical representation. A canonical representation connection part 105 connects the low order region data, the high order region data and the middle order region data. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、正規表現生成装置及び正規表現生成方法及び正規表現生成プログラムに関するものである。本発明は、特に、正規表現を用いた検索条件生成装置、検索条件生成方式、検索条件生成プログラムに関するものである。 The present invention relates to a regular expression generation device, a regular expression generation method, and a regular expression generation program. In particular, the present invention relates to a search condition generation device, a search condition generation method, and a search condition generation program using regular expressions.

検索のために入力された検索条件（キーワード）を、異なった表記でも類似した表記でも検索可能なように正規表現に変換し、検索対象に誤字脱字が含まれている場合や改行コードが含まれている場合でも検索できるようにする技術が公知である（例えば、特許文献１参照）。 The search conditions (keywords) entered for the search are converted to regular expressions so that they can be searched with different or similar notations, and the search target includes typographical errors and line feed codes. A technique for making it possible to perform a search even in the case where the information is stored is known (for example, see Patent Document 1).

検索対象としては、例えば、ログを保存するデータベースが挙げられる。近年、ログは多様化、大規模化しており、ログを効率的に管理するためのログ専用データベース管理システムの必要性が高まっている（例えば、非特許文献１参照）。
特開平７−１２１５４７号公報中村隆顕、他３名、「大規模ログデータベースの実現」、情報処理学会第６８回（平成１８年）全国大会講演論文集（３）、ｐｐ．２９−３０、２００６年３月 As a search target, for example, a database for storing logs can be cited. In recent years, logs have become diversified and large-scaled, and the need for a log-dedicated database management system for efficiently managing logs is increasing (see, for example, Non-Patent Document 1).
Japanese Patent Laid-Open No. 7-121547 Takaaki Nakamura and 3 others, “Realization of a large-scale log database”, Proc. Of the 68th (2006) National Convention, IPSJ, pp. 29-30, March 2006

情報セキュリティの分野を中心に、ログを収集・保存・分析する動きが進んでいる。ここでいうログには、サーバなどの情報機器の動作履歴・アクセス履歴、セキュリティ機器やソフトウェアが記録するイベントの履歴、ネットワーク上の通信履歴、電子メールの送受信履歴（メールの内容も含む）などがある。これらログの多くはテキストの情報として記録され、そのログの出力元ごとに個別に管理されてきた。情報セキュリティの分野において、これら多種多様な情報源から出力されたログを収集し、一元的に管理することにより、情報漏洩などの事件が発生した場合に、その証拠保全・原因究明などに役立てようという動きが進んでいる。 The movement of collecting, storing and analyzing logs is progressing mainly in the field of information security. The logs used here include the operation history and access history of information devices such as servers, the history of events recorded by security devices and software, the communication history on the network, and the transmission and reception history of e-mails (including the contents of e-mails). is there. Many of these logs are recorded as text information and managed individually for each output source of the logs. In the field of information security, collecting logs output from these various information sources and managing them centrally will help to preserve evidence and investigate the causes of incidents such as information leaks. The movement is progressing.

ログには次のような特徴がある。 The log has the following characteristics.

ログは、常に生成され続けるものであるため、その量も時間の経過と共に増加し続ける。また、上記のような目的で収集する場合は、数ヵ月〜数年単位という長期間保存し続ける必要があり、その量は膨大なものとなる。また、ログはその出力元ごとに出力する内容が異なり、多様な形式が存在する。形式については、日時や数値の情報を含むものが多く、自由度に差はあるものの、書式が決まっている場合が多い。ただ、その書式はログの情報源によって個別に決まっており、共通の書式があるわけではない。 Since the log is constantly generated, the amount thereof continues to increase over time. Moreover, when collecting for the above purposes, it is necessary to keep for a long period of several months to several years, and the amount becomes enormous. In addition, the log output contents differ depending on the output source, and there are various formats. Many of the formats include date and time information, and there are many differences in the degree of freedom, but the formats are often fixed. However, the format is individually determined by the log information source, and there is no common format.

そのため、従来の電子文書の検索装置においては、文書に含まれる属性情報を検索する場合には、文書から事前に属性値（属性の値のことであるが、以下では、単に「属性」という場合がある）を抽出しておいて、専用のファイルに記録して管理するものや、リレーショナルデータベース管理システムなどを利用して管理することが一般的であった。 Therefore, in a conventional electronic document search apparatus, when searching for attribute information included in a document, an attribute value (attribute value in advance) from the document is referred to as “attribute” below. It is common to use a relational database management system or the like that is extracted and recorded in a dedicated file for management.

しかし、ログには上で述べたような書式の多様性があるため、保存するログの種類を増やした場合に、事前に属性を抽出する方式では、属性の抽出方法を定義しない限り属性情報を抽出して保存することができないという課題があった。また、抽出したい属性の種類を追加する場合にも、時間の経過と共に増加し続けるログを長期間保存しているとその量が膨大なものとなるため、保存済みの全てログから属性を抽出し直すことは困難であるという課題があった。 However, because of the variety of formats described above, when the number of types of logs to be saved is increased, the attribute extraction method uses attribute information as long as the attribute extraction method is not defined. There was a problem that it could not be extracted and stored. Also, when adding the type of attribute that you want to extract, if the log that keeps increasing over time is stored for a long time, the amount will be huge, so extract the attribute from all the saved logs There was a problem that it was difficult to fix.

このようなログの保存・検索には、非特許文献１に示すようなログ専用データベースが有効である。このログ専用データベースは、以下のような方針に基づいて設計されている。
（１）ログの形式は意識せず、収集してきたログをそのままの形式で記憶媒体に保存する。
（２）ログの書式を意識した検索条件により検索する。 For such log storage and retrieval, a log-dedicated database as shown in Non-Patent Document 1 is effective. This log-dedicated database is designed based on the following policy.
(1) Regardless of the log format, the collected logs are stored in the storage medium in the same format.
(2) Search according to search conditions that are conscious of the log format.

そうすることにより、以下のような効果がある。
（１）ログからの属性抽出などを行わないため、ありとあらゆる形式のログの保存が可能。
（２）ログの書式を意識した検索条件を正規表現により指定することにより、検索対象の属性などを柔軟に指定することが可能。 By doing so, there are the following effects.
(1) Since no attribute extraction is performed from the log, it is possible to save any type of log.
(2) By specifying the search conditions with consideration of the log format using regular expressions, it is possible to flexibly specify the attributes to be searched.

ここで、正規表現とは、文字列による検索条件の表記法の一種である。正規表現では、検索文字列の一部に、複数文字や文字列からの選択、また、それらの繰り返しの指定を許すことにより、検索条件をより一般化して表記することが可能である。 Here, the regular expression is a kind of notation of a search condition by a character string. In regular expressions, it is possible to express search conditions in a more general manner by allowing selection from a plurality of characters and character strings, and specifying repetition of them as part of a search character string.

上記のログ専用データベースでは、検索条件によってはその正規表現が複雑なものとなり、記述が困難であるという課題がある。例えば、ログの中から「２００６／７／１〜２００６／１０／１５」の範囲に含まれる日付を検索するための正規表現は、
“［＾０−９］（２００６／（［７−８］／（［１−９］｜［１−２］［０−９］｜３［０−１］）｜９／（［１−９］｜［１−２］［０−９］｜３０）｜１０／（［１−９］｜１［０−５］））［＾０−９］”
などと記述することができる。このような正規表現は、正規表現の知識の乏しい者には記述が困難であると同時に、正規表現を熟知している者にとっても正確に記述するためには試行錯誤を要するものである。 In the above-mentioned log-dedicated database, there is a problem that the regular expression becomes complicated depending on the search condition and is difficult to describe. For example, a regular expression for searching a date included in the range of “2006/7/1 to 2006/10/15” from the log is:
“[^ 0-9] (2006 / ([7-8] / ([1-9] | [1-2] [0-9] | 3 [0-1]) | 9 / ([1-9 ] | [1-2] [0-9] | 30) | 10 / ([1-9] | 1 [0-5])) [^ 0-9] "
And so on. Such regular expressions are difficult to describe for those who have little knowledge of regular expressions, and at the same time, those who are familiar with regular expressions need trial and error to write accurately.

同様に、ログの書式を意識した検索条件によっても、その正規表現の記述が困難なものがある。そのようなものの例として、ＣＳＶ（Ｃｏｍｍａ・Ｓｅｐａｒａｔｅｄ・Ｖａｌｕｅｓ）形式のログに対して、カンマ「，」で区切られた特定のフィールドを対象とした検索を行う場合の検索条件がある。例えば、「行の先頭から３番目と４番目のカンマに囲まれたフィールドに『ファイル』という文字列が含まれる」という検索条件は、
“（＾｜￥ｎ）（［＾，］＊，）｛３｝［＾，］＊ファイル”
のように記述することができる。一口にＣＳＶ形式といっても、フィールドがさらにダブルクォーテーション「”」で囲まれているものもあるなど、様々なパターンを網羅しようとすると正規表現がさらに複雑になり、検索条件の記述が困難となる。また、特定のフィールドに上記のような特定の範囲に含まれる日付を検索するための正規表現も複雑なものとなる。 Similarly, there are some cases where it is difficult to describe the regular expression depending on the search condition in consideration of the log format. As an example of such a case, there is a search condition in a case where a search for a specific field separated by a comma “,” is performed on a log in a CSV (Comma / Separated / Values) format. For example, a search condition that “a character string“ file ”is included in the field surrounded by the third and fourth commas from the top of the line”
“(^ | ¥ n) ([^,] *,) {3} [^,] * file”
Can be described as follows. Even though it is a CSV format, some fields are surrounded by double quotes “” ”, so if you try to cover various patterns, regular expressions will become more complicated and it will be difficult to describe search conditions. Become. Further, a regular expression for searching for a date included in a specific field as described above in a specific field becomes complicated.

正規表現を利用したテキストの検索は、ＵＮＩＸ（登録商標）系ＯＳ（オペレーティングシステム）のｇｒｅｐコマンドや、スクリプト言語Ｐｅｒｌを始めとして広く利用されている。上記のような課題は、ログの検索に限定して存在する課題ではなく、これらの正規表現を利用した検索を行う処理系に共通して存在するものである。また、従来技術のように、キーワードを異なった表記でも類似した表記でも検索可能なように正規表現に変換する方式にも依然として存在する課題である。 Text search using regular expressions is widely used, including the grep command of the UNIX (registered trademark) OS (operating system) and the script language Perl. The problems as described above are not problems that are limited to log searches, but are common to processing systems that perform searches using these regular expressions. Further, as in the prior art, there is still a problem that exists in a method of converting a keyword into a regular expression so that the keyword can be searched with different notation or similar notation.

本発明は、一定の範囲の属性値を検索するための正規表現を効率的に生成することを目的とする。 An object of the present invention is to efficiently generate a regular expression for searching a certain range of attribute values.

本発明の一の態様に係る正規表現生成装置は、
属性値の下限値と上限値と書式とを示す属性範囲条件データを入力装置から入力する属性範囲条件入力部と、
前記属性範囲条件入力部により入力された属性範囲条件データが示す書式に基づいて、前記属性範囲条件入力部により入力された属性範囲条件データが示す下限値以上の属性値であって、最下位桁から少なくとも１桁が当該桁の最大値である第１の値と、前記属性範囲条件入力部により入力された属性範囲条件データが示す上限値以下の属性値であって、最下位桁から少なくとも１桁が当該桁の最小値である第２の値とを、処理装置で演算する演算部と、
前記属性範囲条件入力部により入力された属性範囲条件データが示す下限値と上限値と、前記演算部により演算された第１の値と第２の値とを、記憶装置に記憶する属性値記憶部と、
前記属性値記憶部により記憶された下限値から第１の値までの属性値を正規表現で表す下位領域データと、前記属性値記憶部により記憶された第２の値から上限値までの属性値を正規表現で表す上位領域データとを、処理装置で生成するとともに、前記属性値記憶部により記憶された第１の値と第２の値との間に属性値が存在する場合には当該属性値を正規表現で表す中位領域データを処理装置で生成する正規表現生成部と、
前記正規表現生成部により生成された下位領域データと上位領域データと中位領域データとを処理装置で結合して、前記属性値記憶部により記憶された下限値から上限値までの属性値を正規表現で表す正規表現データを生成する正規表現結合部とを備えることを特徴とする。 A regular expression generation device according to one aspect of the present invention is provided.
An attribute range condition input unit for inputting attribute range condition data indicating a lower limit value, an upper limit value and a format of the attribute value from an input device;
Based on the format indicated by the attribute range condition data input by the attribute range condition input unit, the attribute value is equal to or greater than the lower limit value indicated by the attribute range condition data input by the attribute range condition input unit, and the least significant digit A first value in which at least one digit is the maximum value of the digit and an attribute value equal to or lower than the upper limit value indicated by the attribute range condition data input by the attribute range condition input unit, and at least 1 from the lowest digit A computing unit that computes the second value, the digit of which is the minimum value of the digit, by the processing device;
Attribute value storage that stores the lower limit value and the upper limit value indicated by the attribute range condition data input by the attribute range condition input unit, and the first value and the second value calculated by the calculation unit in a storage device And
Lower region data representing the attribute value from the lower limit value to the first value stored by the attribute value storage unit in a regular expression, and the attribute value from the second value to the upper limit value stored by the attribute value storage unit If the attribute value exists between the first value and the second value stored in the attribute value storage unit, the upper region data representing the regular expression in the regular expression is generated by the processing device. A regular expression generating unit that generates intermediate region data representing a value in a regular expression by a processing device;
The lower region data, the upper region data, and the middle region data generated by the regular expression generation unit are combined by a processing device, and the attribute values from the lower limit value to the upper limit value stored by the attribute value storage unit are normalized. And a regular expression combining unit that generates regular expression data expressed in terms of expressions.

本発明の一の態様によれば、正規表現生成装置において、演算部が属性値の書式に基づいて、下限値以上の属性値であって最下位桁から少なくとも１桁が当該桁の最大値である第１の値と上限値以下の属性値であって最下位桁から少なくとも１桁が当該桁の最小値である第２の値とを演算し、正規表現生成部が下限値から第１の値までの属性値を正規表現で表す下位領域データと第２の値から上限値までの属性値を正規表現で表す上位領域データとを生成するとともに、第１の値と第２の値との間に属性値が存在する場合には当該属性値を正規表現で表す中位領域データを生成し、正規表現結合部が下位領域データと上位領域データと中位領域データとを結合することにより、一定の範囲の属性値を検索するための正規表現を効率的に生成することが可能となる。 According to one aspect of the present invention, in the regular expression generation device, the arithmetic unit is an attribute value that is equal to or greater than the lower limit value based on the format of the attribute value, and at least one digit from the least significant digit is the maximum value of the digit. A certain first value and an attribute value equal to or lower than the upper limit value, and at least one digit from the least significant digit is calculated as a second value that is the minimum value of the digit, and the regular expression generator generates the first value from the lower limit value. Generating lower region data that represents an attribute value up to the value in a regular expression and upper region data representing an attribute value from the second value to the upper limit value in a regular expression, and the first value and the second value When there is an attribute value in between, the middle region data that represents the attribute value in regular expression is generated, and the regular expression combining unit combines the lower region data, the upper region data, and the middle region data, Efficiently generate regular expressions for searching a range of attribute values It becomes possible.

以下、本発明の実施の形態について、図を用いて説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

以下では、正規表現として、特に明示していない限り、一般的に普及しているものを想定して説明する。一般的な正規表現においては、例えば、
（１）＜通常の文字＞は、通常の文字を照合する。
（２）￥＜特殊文字＞は、特殊文字（「｜」、「？」、「＊」、「＋」「￥」、「＾」など）を照合する。
（３）［ａｂｃ．．．］は、文字ａｂｃ．．．中の任意の１文字を照合する。
（４）［ａ−ｚ］は、文字コードがａからｚまでの範囲にある任意の１文字を照合する。
（５）［＾ａｂｃ．．．］は、文字ａｂｃ．．．以外の任意の１文字を照合する。
（６）［＾ａ−ｚ］は、文字コードがａからｚまでの範囲にはない任意の１文字を照合する。
（７）＾は、行の先頭を照合する。
（８）＄は、行の末尾を照合する。
（９）＜正規表現＞？は、＜正規表現＞が０回又は１回現れるものを照合する。
（１０）＜正規表現＞＊は、＜正規表現＞が０回又は任意の回数繰り返されるものを照合する。
（１１）＜正規表現＞＋は、＜正規表現＞が１回以上繰り返されるものを照合する。
（１２）＜正規表現＞｛ｎ｝は、＜正規表現＞がｎ回繰り返されるものを照合する。
（１３）＜正規表現＞｛ｎ，｝は、＜正規表現＞がｎ回以上繰り返されるものを照合する。
（１４）＜正規表現＞｛，ｍ｝は、＜正規表現＞が０回以上ｍ回以下繰り返されるものを照合する。
（１５）＜正規表現＞｛ｎ，ｍ｝は、＜正規表現＞がｎ回以上ｍ回以下繰り返されるものを照合する。
（１６）＜正規表現１＞｜＜正規表現２＞は、＜正規表現１＞又は＜正規表現２＞を照合する。
（１７）＜正規表現１＞＜正規表現２＞は、前半部分が＜正規表現１＞で後半部分が＜正規表現２＞であるものを照合する。 In the following description, a regular expression is assumed assuming that it is generally used unless otherwise specified. In general regular expressions, for example,
(1) <ordinary character> collates ordinary characters.
(2) \ <special character> collates special characters (“|”, “?”, “*”, “+”, “¥”, “^”, etc.).
(3) [abc. . . ] Is the character abc. . . Match any single character in it.
(4) [az] collates an arbitrary character whose character code is in the range from a to z.
(5) [^ abc. . . ] Is the character abc. . . Match any single character other than.
(6) [^ a-z] collates an arbitrary character whose character code is not in the range from a to z.
(7) ^ matches the beginning of a line.
(8) $ matches the end of the line.
(9) <Regular expression>? Matches what <regular expression> appears 0 times or once.
(10) <Regular expression> * matches what <Regular expression> is repeated 0 times or any number of times.
(11) <Regular expression> + matches those in which <Regular expression> is repeated one or more times.
(12) <Regular expression> {n} is a collation of <regular expression> repeated n times.
(13) <Regular expression> {n,} matches those in which <regular expression> is repeated n times or more.
(14) <Regular expression> {, m} matches those in which <regular expression> is repeated 0 to m times.
(15) <Regular expression> {n, m} matches those in which <regular expression> is repeated n times or more and m times or less.
(16) <Regular expression 1> | <Regular expression 2> matches <Regular expression 1> or <Regular expression 2>.
(17) <Regular expression 1><Regular expression 2> matches those whose first half is <regular expression 1> and the second half is <regular expression 2>.

以下では、原則として、正規表現はダブルクォーテーションで囲って“＜正規表現＞”という形式で示すものとする。 In the following, as a general rule, regular expressions are shown in the form of “<regular expression>” enclosed in double quotations.

実施の形態１．
図１は、本実施の形態に係る正規表現生成装置１００の構成を示すブロック図である。 Embodiment 1 FIG.
FIG. 1 is a block diagram showing a configuration of a regular expression generation device 100 according to the present embodiment.

図１において、正規表現生成装置１００は、属性範囲条件入力部１０１、演算部１０２、属性値記憶部１０３、正規表現生成部１０４、正規表現結合部１０５、出力部１０６を備える。また、正規表現生成装置１００は、記憶装置１５１、処理装置１５２、入力装置１５３、出力装置１５４などのハードウェア装置を備える（又はこれらのハードウェア装置が正規表現生成装置１００に接続される）。ハードウェア装置は正規表現生成装置１００の各部によって利用される。例えば、処理装置１５２は、正規表現生成装置１００の各部でデータや情報の演算、加工、読み取り、書き込みなどを行うために利用される。記憶装置１５１は、そのデータや情報を記憶するために利用される。また、入力装置１５３は、そのデータや情報を入力するために、出力装置１５４は、そのデータや情報を出力するために利用される。 In FIG. 1, the regular expression generation apparatus 100 includes an attribute range condition input unit 101, a calculation unit 102, an attribute value storage unit 103, a regular expression generation unit 104, a regular expression combination unit 105, and an output unit 106. Further, the regular expression generation device 100 includes hardware devices such as a storage device 151, a processing device 152, an input device 153, and an output device 154 (or these hardware devices are connected to the regular expression generation device 100). The hardware device is used by each unit of the regular expression generation device 100. For example, the processing device 152 is used to perform calculation, processing, reading, writing, and the like of data and information in each unit of the regular expression generation device 100. The storage device 151 is used to store the data and information. The input device 153 is used to input the data and information, and the output device 154 is used to output the data and information.

属性範囲条件入力部１０１は、属性範囲条件データを入力装置１５３から入力する。属性範囲条件データは、属性値の下限値と上限値と書式とを示す属性範囲条件のデータである。例えば、属性値を１２３〜７６５４の範囲の整数値とした場合、下限値は１２３、上限値は７６５４、書式は３〜４桁（最大４桁）の整数値型となる。この例では、属性範囲条件入力部１０１は、属性値の書式として、属性値が数値であることを示す属性範囲条件データを入力することとなるが、例えば、属性値が文字列であれば、属性範囲条件入力部１０１は、属性値の書式として、属性値が文字列であることを示す属性範囲条件データを入力することとなる。 The attribute range condition input unit 101 inputs attribute range condition data from the input device 153. The attribute range condition data is attribute range condition data indicating a lower limit value, an upper limit value, and a format of the attribute value. For example, when the attribute value is an integer value ranging from 123 to 7654, the lower limit value is 123, the upper limit value is 7654, and the format is an integer value type of 3 to 4 digits (maximum 4 digits). In this example, the attribute range condition input unit 101 inputs attribute range condition data indicating that the attribute value is a numerical value as the attribute value format. For example, if the attribute value is a character string, The attribute range condition input unit 101 inputs attribute range condition data indicating that the attribute value is a character string as the format of the attribute value.

演算部１０２は、属性範囲条件入力部１０１により入力された属性範囲条件データが示す書式に基づいて、第１の値と第２の値とを処理装置１５２で演算する。第１の値は、属性範囲条件入力部１０１により入力された属性範囲条件データが示す下限値以上の属性値であって、最下位桁から少なくとも１桁が当該桁の最大値である属性値である。上記の例のように、下限値を１２３とした場合、第１の値は１２９、１９９、９９９、６９９９、７５９９、７６４９などとなる。一方、第２の値は、属性範囲条件入力部１０１により入力された属性範囲条件データが示す上限値以下の属性値であって、最下位桁から少なくとも１桁が当該桁の最小値である属性値である。上記の例のように、上限値を７６５４とした場合、第２の値は１３０、２００、１０００、７０００、７６００、７６５０などとなる。 The computing unit 102 computes the first value and the second value by the processing device 152 based on the format indicated by the attribute range condition data input by the attribute range condition input unit 101. The first value is an attribute value that is equal to or greater than the lower limit value indicated by the attribute range condition data input by the attribute range condition input unit 101, and at least one digit from the least significant digit is the maximum value of the digit. is there. As in the above example, when the lower limit value is 123, the first value is 129, 199, 999, 6999, 7599, 7649, and the like. On the other hand, the second value is an attribute value equal to or lower than the upper limit value indicated by the attribute range condition data input by the attribute range condition input unit 101, and at least one digit from the least significant digit is the minimum value of the digit Value. When the upper limit value is 7654 as in the above example, the second value is 130, 200, 1000, 7000, 7600, 7650, or the like.

特に本実施の形態では、演算部１０２は、第１の値として、属性範囲条件入力部１０１により入力された属性範囲条件データが示す下限値と桁数が同じで少なくとも最上位桁以外の桁が各桁の最大値である属性値を演算する。上記の例のように、下限値を１２３とした場合、第１の値は１９９、９９９などとなる。また、演算部１０２は、第２の値として、属性範囲条件入力部１０１により入力された属性範囲条件データが示す上限値と桁数が同じで少なくとも最上位桁以外の桁が各桁の最小値である属性値を演算する。上記の例のように、上限値を７６５４とした場合、第２の値は１０００、７０００などとなる。 In particular, in the present embodiment, the calculation unit 102 has, as the first value, the same number of digits as the lower limit value indicated by the attribute range condition data input by the attribute range condition input unit 101 and at least a digit other than the most significant digit. Calculate the attribute value that is the maximum value of each digit. As in the above example, when the lower limit value is 123, the first value is 199, 999, or the like. In addition, the calculation unit 102 has the same value as the upper limit value indicated by the attribute range condition data input by the attribute range condition input unit 101 as the second value, and at least the digits other than the most significant digit are the minimum value of each digit. The attribute value is calculated. As in the above example, when the upper limit value is 7654, the second value is 1000, 7000, or the like.

さらに本実施の形態では、演算部１０２は、第１の値として、属性範囲条件入力部１０１により入力された属性範囲条件データが示す下限値と最上位桁が同じ値でそれ以外の桁が各桁の最大値である属性値を演算する。上記の例のように、下限値を１２３とした場合、第１の値は１９９となる。また、演算部１０２は、第２の値として、属性範囲条件入力部１０１により入力された属性範囲条件データが示す上限値と最上位桁が同じ値でそれ以外の桁が各桁の最小値である属性値を演算する。上記の例のように、上限値を７６５４とした場合、第２の値は７０００となる。 Furthermore, in the present embodiment, the calculation unit 102 uses the same value as the first value and the lower-order value indicated by the attribute range condition data input by the attribute range condition input unit 101 is the same as the most significant digit. Calculates the attribute value that is the maximum value of digits. As in the above example, when the lower limit value is 123, the first value is 199. In addition, the calculation unit 102 uses the same value as the upper limit value indicated by the attribute range condition data input by the attribute range condition input unit 101 as the second value and the other digit as the minimum value of each digit. Calculate an attribute value. As in the above example, when the upper limit value is 7654, the second value is 7000.

属性値記憶部１０３は、属性範囲条件入力部１０１により入力された属性範囲条件データが示す下限値と上限値と、演算部１０２により演算された第１の値と第２の値とを、記憶装置１５１に記憶する。 The attribute value storage unit 103 stores the lower limit value and the upper limit value indicated by the attribute range condition data input by the attribute range condition input unit 101, and the first value and the second value calculated by the calculation unit 102. Store in device 151.

正規表現生成部１０４は、属性値記憶部１０３により記憶された下限値から第１の値までの属性値を正規表現で表す下位領域データ（以下、単に「正規表現」、又は「下位領域の正規表現」などという場合がある）と、属性値記憶部１０３により記憶された第２の値から上限値までの属性値を正規表現で表す上位領域データ（以下、単に「正規表現」、又は「上位領域の正規表現」などという場合がある）とを、処理装置１５２で生成する。上記の例のように、下限値が１２３、上限値が７６５４、第１の値が１９９、第２の値が７０００となる場合、下位領域データは、
“１２［３−９］｜１［３−９］［０−９］”（１２３〜１９９の正規表現）、
上位領域データは、
“７［０−５］［０−９］［０−９］｜７６［０−４］［０−９］｜７６５［０−４］”（７０００〜７６５４の正規表現）
などとなる。また、正規表現生成部１０４は、属性値記憶部１０３により記憶された第１の値と第２の値との間に属性値が存在する場合には、当該属性値を正規表現で表す中位領域データ（以下、単に「正規表現」、又は「中位領域の正規表現」などという場合がある）を処理装置１５２で生成する。上記の例のように、第１の値が１９９、第２の値が７０００となる場合、中位領域データは、
“［２−９］［０−９］［０−９］｜［１−６］［０−９］［０−９］［０−９］”（２００〜６９９９の正規表現）
などとなる。 The regular expression generation unit 104 uses lower region data (hereinafter simply referred to as “regular expression” or “lower region normalization”) representing the attribute values from the lower limit value to the first value stored in the attribute value storage unit 103 in a regular expression. And the upper region data representing the attribute values from the second value to the upper limit value stored in the attribute value storage unit 103 in regular expressions (hereinafter simply referred to as “regular expressions” or “upper ranks”). The processing device 152 generates a “regular expression of a region” in some cases. As in the above example, when the lower limit value is 123, the upper limit value is 7654, the first value is 199, and the second value is 7000, the lower area data is
“12 [3-9] | 1 [3-9] [0-9]” (regular expression of 123-199),
The upper area data is
“7 [0-5] [0-9] [0-9] | 76 [0-4] [0-9] | 765 [0-4]” (regular expression of 7000-7654)
And so on. In addition, when there is an attribute value between the first value and the second value stored in the attribute value storage unit 103, the regular expression generation unit 104 represents the intermediate value that represents the attribute value with a regular expression. Region data (hereinafter, simply referred to as “regular expression” or “regular expression of middle region” in some cases) is generated by the processing device 152. As in the above example, when the first value is 199 and the second value is 7000, the middle region data is
“[2-9] [0-9] [0-9] | [1-6] [0-9] [0-9] [0-9]” (regular expression of 200 to 6999)
And so on.

正規表現結合部１０５は、正規表現生成部１０４により生成された下位領域データと上位領域データと中位領域データとを処理装置１５２で結合して、属性値記憶部１０３により記憶された下限値から上限値までの属性値を正規表現で表す正規表現データ（以下、単に「正規表現」、又は「属性値の正規表現」という場合がある）を生成する。上記の例のように、下位領域データが“１２［３−９］｜１［３−９］［０−９］”、上位領域データが“７［０−５］［０−９］［０−９］｜７６［０−４］［０−９］｜７６５［０−４］”、中位領域データが“［２−９］［０−９］［０−９］｜［１−６］［０−９］［０−９］［０−９］”となる場合、正規表現データは、
“（１２［３−９］｜１［３−９］［０−９］）｜（［２−９］［０−９］［０−９］｜［１−６］［０−９］［０−９］［０−９］）｜（７［０−５］［０−９］［０−９］｜７６［０−４］［０−９］｜７６５［０−４］）”
などとなる。 The regular expression combining unit 105 combines the lower region data, the upper region data, and the middle region data generated by the regular expression generation unit 104 by the processing device 152, and uses the lower limit value stored by the attribute value storage unit 103. Regular expression data (hereinafter, simply referred to as “regular expression” or “regular expression of attribute value” in some cases) that represents the attribute value up to the upper limit value is generated. As in the above example, the lower area data is “12 [3-9] | 1 [3-9] [0-9]”, and the upper area data is “7 [0-5] [0-9] [0 −9] | 76 [0-4] [0-9] | 765 [0-4] ”, and the middle region data is“ [2-9] [0-9] [0-9] | [1-6 ] [0-9] [0-9] [0-9] ", the regular expression data is
“(12 [3-9] | 1 [3-9] [0-9]) | ([2-9] [0-9] [0-9] | [1-6] [0-9] [ 0-9] [0-9]) | (7 [0-5] [0-9] [0-9] | 76 [0-4] [0-9] | 765 [0-4]) "
And so on.

出力部１０６は、正規表現結合部１０５により生成された正規表現データを出力装置１５４に出力する。 The output unit 106 outputs the regular expression data generated by the regular expression combination unit 105 to the output device 154.

このように、本実施の形態において、正規表現生成装置１００は、検索条件として属性値の値域に含まれる値の下限値と上限値とを指定する属性範囲条件を入力とし、これを、属性範囲条件に指定された範囲に含まれる属性値を表現する文字列を照合するための正規表現に変換する検索条件生成方式、又は、この方式を計算機上で実行するための検索条件生成プログラムを実装するものである。 As described above, in the present embodiment, the regular expression generation device 100 receives as input the attribute range condition that specifies the lower limit value and the upper limit value of the values included in the attribute value range as the search condition, and uses the attribute range condition as the search range. Implement a search condition generation method that converts a character string that represents an attribute value included in the range specified in the condition into a regular expression for matching, or a search condition generation program for executing this method on a computer Is.

上記検索条件生成方式では、例えば、数値の範囲を選択する属性範囲条件を正規表現に変換する。また、例えば、文字列の範囲を選択する属性範囲条件を正規表現に変換する。 In the search condition generation method, for example, an attribute range condition for selecting a numerical range is converted into a regular expression. Also, for example, an attribute range condition for selecting a character string range is converted into a regular expression.

上記検索条件生成方式では、例えば、属性の下限値、上限値、属性の書式（属性のデータ型を含む）からなる属性範囲条件を、
属性範囲条件＝（０，２５５，％３ｄ）
と表記して入力する。ここでは、一例として、
属性の下限値＝０
属性の上限値＝２５５
属性の書式（桁数とデータ型）＝「（最大）３桁の整数値」
としている。この例では、属性の書式を表すのに、Ｃ言語におけるｐｒｉｎｔｆなどの書式付き出力関数の書式指定の表記法を流用しているが、同等の条件を指定することが可能であれば、同じ表記法である必要はない。また、同様の条件を入力することができるのであれば、その入力方法は問わない。 In the search condition generation method, for example, an attribute range condition including an attribute lower limit value, upper limit value, and attribute format (including attribute data type)
Attribute range condition = (0,255,% 3d)
And enter. Here, as an example
Lower limit of attribute = 0
Upper limit of attribute = 255
Attribute format (number of digits and data type) = "(maximum) 3-digit integer value"
It is said. In this example, the notation of format specification of a formatted output function such as “printf” in C language is used to represent the format of the attribute. It doesn't have to be law. Moreover, if the same conditions can be input, the input method is not ask | required.

上記の属性範囲条件を入力した場合、正規表現生成装置１００により、その下限値から上限値までの範囲に含まれる属性値を表現する文字列を照合するための正規表現が、
正規表現＝“［＾０−９］（［０−９］｜［１−９］［０−９］｜１［０−９］［０−９］｜２［０−４］［０−９］｜２５［０−５］）［＾０−９］”
などとして出力される。なお、上記の属性範囲条件を構成する項目のうち、全てが揃っている必要はない。例えば、下限値がなければ、下限値を属性値のとりうる値の最小値として扱ってもよい（上記の例では、下限値＝０となる）。また、例えば、上限値がなければ、上限値を属性値のとりうる値の最大値として扱ってもよい（上記の例では、上限値＝∞となる）。また、例えば、属性の書式が整数値であることが示されていなければ、属性のデータ型が文字列であると推定してもよいし、下限値と上限値が整数値であることから属性のデータ型が整数値であると推定してもよい。 When the above attribute range condition is input, a regular expression for collating a character string representing an attribute value included in the range from the lower limit value to the upper limit value by the regular expression generation device 100,
Regular expression = “[^ 0-9] ([0-9] | [1-9] [0-9] | 1 [0-9] [0-9] | 2 [0-4] [0-9 ] [25 [0-5]) [^ 0-9] "
Etc. are output. It should be noted that it is not necessary that all the items constituting the attribute range condition are complete. For example, if there is no lower limit value, the lower limit value may be handled as the minimum value that the attribute value can take (in the above example, the lower limit value = 0). For example, if there is no upper limit value, the upper limit value may be treated as the maximum value that can be taken by the attribute value (in the above example, the upper limit value is ∞). Also, for example, if the attribute format is not shown to be an integer value, the attribute data type may be assumed to be a character string, and the lower limit value and the upper limit value are integer values. It may be estimated that the data type of is an integer value.

上記検索条件生成方式は、以下でも述べるように、例えば、ＰＣ（パーソナルコンピュータ）やＰＣサーバなどの計算機上で動作するプログラムとして実現してもよいし、本方式を実装した機能部を備えたハードウェアとして実装してもよい。 As described below, the search condition generation method may be realized as a program that runs on a computer such as a PC (personal computer) or a PC server, or a hardware having a functional unit that implements the method. It may be implemented as wear.

図２は、正規表現生成装置１００の外観の一例を示す図である。 FIG. 2 is a diagram illustrating an example of the appearance of the regular expression generation device 100.

図２において、正規表現生成装置１００は、システムユニット９１０、ＣＲＴ（Ｃａｔｈｏｄｅ・Ｒａｙ・Ｔｕｂｅ）やＬＣＤ（液晶ディスプレイ）の表示画面を有する表示装置９０１、キーボード９０２（Ｋ／Ｂ）、マウス９０３、ＦＤＤ９０４（Ｆｌｅｘｉｂｌｅ・Ｄｉｓｋ・Ｄｒｉｖｅ）、ＣＤＤ９０５（Ｃｏｍｐａｃｔ・Ｄｉｓｃ・Ｄｒｉｖｅ）、プリンタ装置９０６などのハードウェア資源を備え、これらはケーブルや信号線で接続されている。システムユニット９１０は、コンピュータであり、ＬＡＮ９４２（ローカルエリアネットワーク）、ゲートウェイ９４１を介してインターネット９４０に接続されている。 In FIG. 2, a regular expression generating apparatus 100 includes a system unit 910, a display device 901 having a display screen of a CRT (Cathode / Ray / Tube) or LCD (Liquid Crystal Display), a keyboard 902 (K / B), a mouse 903, and an FDD 904. (Flexible / Disk / Drive), CDD905 (Compact / Disc / Drive), printer device 906, and other hardware resources, which are connected by cables and signal lines. The system unit 910 is a computer and is connected to the Internet 940 via a LAN 942 (local area network) and a gateway 941.

図３は、正規表現生成装置１００のハードウェア資源の一例を示す図である。 FIG. 3 is a diagram illustrating an example of hardware resources of the regular expression generation device 100.

図３において、正規表現生成装置１００は、プログラムを実行するＣＰＵ９１１（Ｃｅｎｔｒａｌ・Ｐｒｏｃｅｓｓｉｎｇ・Ｕｎｉｔ）（「演算装置」、「マイクロプロセッサ」、「マイクロコンピュータ」、「プロセッサ」ともいう）を備えている。ＣＰＵ９１１は、処理装置１５２の一例である。ＣＰＵ９１１は、バス９１２を介してＲＯＭ９１３（Ｒｅａｄ・Ｏｎｌｙ・Ｍｅｍｏｒｙ）、ＲＡＭ９１４（Ｒａｎｄｏｍ・Ａｃｃｅｓｓ・Ｍｅｍｏｒｙ）、通信ボード９１５、表示装置９０１、キーボード９０２、マウス９０３、ＦＤＤ９０４、ＣＤＤ９０５、プリンタ装置９０６、磁気ディスク装置９２０と接続され、これらのハードウェアデバイスを制御する。磁気ディスク装置９２０の代わりに、光ディスク装置、メモリカードリーダライタなどの記憶媒体が用いられてもよい。 In FIG. 3, the regular expression generating apparatus 100 includes a CPU 911 (Central Processing Unit) (also referred to as “arithmetic unit”, “microprocessor”, “microcomputer”, “processor”) that executes a program. The CPU 911 is an example of the processing device 152. The CPU 911 includes a ROM 913 (Read / Only / Memory), a RAM 914 (Random / Access / Memory), a communication board 915, a display device 901, a keyboard 902, a mouse 903, an FDD904, a CDD905, a printer device 906, and a magnetic disk. It is connected to the device 920 and controls these hardware devices. Instead of the magnetic disk device 920, a storage medium such as an optical disk device or a memory card reader / writer may be used.

ＲＡＭ９１４は、揮発性メモリの一例である。ＲＯＭ９１３、ＦＤＤ９０４、ＣＤＤ９０５、磁気ディスク装置９２０の記憶媒体は、不揮発性メモリの一例である。これらは、記憶装置１５１の一例である。通信ボード９１５、キーボード９０２、マウス９０３、ＦＤＤ９０４、ＣＤＤ９０５などは、入力装置１５３の一例である。また、通信ボード９１５、表示装置９０１、プリンタ装置９０６などは、出力装置１５４の一例である。 The RAM 914 is an example of a volatile memory. The storage media of the ROM 913, the FDD 904, the CDD 905, and the magnetic disk device 920 are an example of a nonvolatile memory. These are examples of the storage device 151. The communication board 915, the keyboard 902, the mouse 903, the FDD 904, the CDD 905, and the like are examples of the input device 153. The communication board 915, the display device 901, the printer device 906, and the like are examples of the output device 154.

通信ボード９１５は、ＬＡＮ９４２などに接続されている。通信ボード９１５は、ＬＡＮ９４２に限らず、インターネット９４０、あるいは、ＩＰ−ＶＰＮ（Ｉｎｔｅｒｎｅｔ・Ｐｒｏｔｏｃｏｌ・Ｖｉｒｔｕａｌ・Ｐｒｉｖａｔｅ・Ｎｅｔｗｏｒｋ）、広域ＬＡＮ、ＡＴＭ（Ａｓｙｎｃｈｒｏｎｏｕｓ・Ｔｒａｎｓｆｅｒ・Ｍｏｄｅ）ネットワークなどのＷＡＮ（ワイドエリアネットワーク）などに接続されていても構わない。インターネット９４０あるいはＷＡＮなどに接続されている場合、ゲートウェイ９４１は不要となる。 The communication board 915 is connected to the LAN 942 or the like. The communication board 915 is not limited to the LAN 942, but includes the Internet 940, or a WAN (wide area network) such as an IP-VPN (Internet, Protocol, Virtual, Private, Network), a wide area LAN, and an ATM (Asynchronous, Transfer, Mode) network. It may be connected to the When connected to the Internet 940 or WAN, the gateway 941 is not necessary.

磁気ディスク装置９２０には、オペレーティングシステム９２１、ウィンドウシステム９２２、プログラム群９２３、ファイル群９２４が記憶されている。プログラム群９２３のプログラムは、ＣＰＵ９１１、オペレーティングシステム９２１、ウィンドウシステム９２２により実行される。プログラム群９２３には、本実施の形態の説明において「〜部」、「〜手段」として説明する機能を実行するプログラムが記憶されている。プログラムは、ＣＰＵ９１１により読み出され実行される。また、ファイル群９２４には、本実施の形態の説明において、「〜データ」、「〜情報」、「〜ＩＤ（ＩＤｅｎｔｉｆｉｅｒ）」、「〜フラグ」、「〜結果」として説明するデータや情報や信号値や変数値やパラメータが、「〜ファイル」や「〜データベース」や「〜テーブル」の各項目として記憶されている。「〜ファイル」や「〜データベース」や「〜テーブル」は、ディスクやメモリなどの記憶媒体に記憶される。ディスクやメモリなどの記憶媒体に記憶されたデータや情報や信号値や変数値やパラメータは、読み書き回路を介してＣＰＵ９１１によりメインメモリやキャッシュメモリに読み出され、抽出・検索・参照・比較・演算・計算・制御・出力・印刷・表示などのＣＰＵ９１１の処理（動作）に用いられる。抽出・検索・参照・比較・演算・計算・制御・出力・印刷・表示などのＣＰＵ９１１の処理中、データや情報や信号値や変数値やパラメータは、メインメモリやキャッシュメモリやバッファメモリに一時的に記憶される。 The magnetic disk device 920 stores an operating system 921, a window system 922, a program group 923, and a file group 924. The programs in the program group 923 are executed by the CPU 911, the operating system 921, and the window system 922. The program group 923 stores programs for executing functions described as “˜unit” and “˜means” in the description of the present embodiment. The program is read and executed by the CPU 911. The file group 924 includes data and information described as “˜data”, “˜information”, “˜ID (IDentifier)”, “˜flag”, “˜result” in the description of this embodiment. Signal values, variable values, and parameters are stored as items of “˜file”, “˜database”, and “˜table”. The “˜file”, “˜database”, and “˜table” are stored in a storage medium such as a disk or a memory. Data, information, signal values, variable values, and parameters stored in a storage medium such as a disk or memory are read out to the main memory or cache memory by the CPU 911 via a read / write circuit, and extracted, searched, referenced, compared, and calculated. Used for processing (operation) of the CPU 911 such as calculation / control / output / printing / display. Data, information, signal values, variable values, and parameters are temporarily stored in the main memory, cache memory, and buffer memory during processing of the CPU 911 such as extraction, search, reference, comparison, calculation, control, output, printing, and display. Is remembered.

また、本実施の形態の説明において説明するブロック図やフローチャートの矢印の部分は主としてデータや信号の入出力を示し、データや信号は、ＲＡＭ９１４などのメモリ、ＦＤＤ９０４のフレキシブルディスク（ＦＤ）、ＣＤＤ９０５のコンパクトディスク（ＣＤ）、磁気ディスク装置９２０の磁気ディスク、その他光ディスク、ミニディスク（ＭＤ）、ＤＶＤ（Ｄｉｇｉｔａｌ・Ｖｅｒｓａｔｉｌｅ・Ｄｉｓｃ）などの記録媒体に記録される。また、データや信号は、バス９１２や信号線やケーブルその他の伝送媒体により伝送される。 In the block diagrams and flowcharts described in the description of this embodiment, the arrows indicate mainly input and output of data and signals. Data and signals are stored in a memory such as a RAM 914, a flexible disk (FD) of the FDD 904, and a CDD 905. Recording is performed on a recording medium such as a compact disk (CD), a magnetic disk of the magnetic disk device 920, other optical disks, a mini disk (MD), and a DVD (Digital Versatile Disc). Data and signals are transmitted by a bus 912, a signal line, a cable, and other transmission media.

また、本実施の形態の説明において「〜部」、「〜手段」として説明するものは、「〜回路」、「〜装置」、「〜機器」であってもよく、また、「〜ステップ」、「〜工程」、「〜手順」、「〜処理」であってもよい。即ち、「〜部」、「〜手段」として説明するものは、ＲＯＭ９１３に記憶されたファームウェアで実現されていても構わない。あるいは、ソフトウェアのみ、あるいは、素子・デバイス・基板・配線などのハードウェアのみ、あるいは、ソフトウェアとハードウェアとの組み合わせ、さらには、ファームウェアとの組み合わせで実現されていても構わない。ファームウェアとソフトウェアは、プログラムとして、磁気ディスク、フレキシブルディスク、光ディスク、コンパクトディスク、ミニディスク、ＤＶＤなどの記録媒体に記憶される。このプログラムはＣＰＵ９１１により読み出され、ＣＰＵ９１１により実行される。即ち、プログラムは、本実施の形態の説明で述べる「〜部」、「〜手段」としてコンピュータを機能させるものである。あるいは、本実施の形態の説明で述べる「〜部」、「〜手段」の手順や方法をコンピュータに実行させるものである。 In the description of the present embodiment, what is described as “to part” and “to means” may be “to circuit”, “to device”, and “to device”, and “to step”. , “˜step”, “˜procedure”, and “˜treatment”. That is, what is described as “˜unit” and “˜means” may be realized by firmware stored in the ROM 913. Alternatively, it may be realized only by software, or only by hardware such as an element, a device, a board, and wiring, or a combination of software and hardware, and further by a combination of firmware. Firmware and software are stored as programs in a recording medium such as a magnetic disk, flexible disk, optical disk, compact disk, minidisk, or DVD. This program is read by the CPU 911 and executed by the CPU 911. In other words, the program causes the computer to function as “to part” and “to means” described in the description of the present embodiment. Alternatively, the procedure or method of “˜unit” and “˜means” described in the description of the present embodiment is executed by a computer.

以下では、説明をより具体的にするため、正規表現生成装置１００が図２、図３に例示したコンピュータとハードウェア資源により実現されているものとする。 In the following, in order to make the description more specific, it is assumed that the regular expression generation device 100 is realized by the computer and hardware resources illustrated in FIGS. 2 and 3.

図４は、本実施の形態に係る正規表現生成方法を示すフローチャートである。図４のフローチャートに示すフローは、正規表現生成装置１００を実現するコンピュータ上で実行されるプログラム（正規表現生成プログラム）の処理手順に相当する。この処理手順において、正規表現生成プログラムは、以下に示す各処理をコンピュータに実行させる。 FIG. 4 is a flowchart showing a regular expression generation method according to this embodiment. The flow shown in the flowchart of FIG. 4 corresponds to a processing procedure of a program (regular expression generation program) executed on a computer that implements the regular expression generation apparatus 100. In this processing procedure, the regular expression generation program causes the computer to execute the following processes.

正規表現生成装置１００の利用者がキーボード９０２やマウス９０３で属性範囲条件データを指定すると、属性範囲条件入力部１０１は、その属性範囲条件データをキーボード９０２やマウス９０３から入力する（ステップＳ１０１：属性範囲条件入力処理）。 When the user of the regular expression generating apparatus 100 specifies attribute range condition data with the keyboard 902 or the mouse 903, the attribute range condition input unit 101 inputs the attribute range condition data from the keyboard 902 or the mouse 903 (step S101: attribute). Range condition input processing).

演算部１０２は、属性範囲条件入力部１０１により入力された属性範囲条件データが示す書式に基づいて、第１の値と第２の値とをＣＰＵ９１１で演算する（ステップＳ１０２の一部：演算処理）。属性値記憶部１０３は、属性範囲条件入力部１０１により入力された属性範囲条件データが示す下限値と上限値と、演算部１０２により演算された第１の値と第２の値とを、ＲＡＭ９１４に記憶する（ステップＳ１０２の一部：属性値記憶処理）。 The computing unit 102 computes the first value and the second value by the CPU 911 based on the format indicated by the attribute range condition data input by the attribute range condition input unit 101 (part of step S102: computation processing). ). The attribute value storage unit 103 stores the lower limit value and the upper limit value indicated by the attribute range condition data input by the attribute range condition input unit 101, the first value and the second value calculated by the calculation unit 102, in the RAM 914. (Part of step S102: attribute value storage process).

ステップＳ１０２において、演算部１０２は、第１の値と第２の値とを演算することにより、属性範囲条件の下限値から上限値までの範囲に含まれる属性値を、下位、中位、上位の３領域に分割している。３領域のうち、中位領域は、その範囲に含まれる属性値を半固定的に正規表現に変換可能な値の範囲である。下位領域は、属性範囲条件の下限値（即ち、属性値の下限値）から中位領域の下限値の１つ下（即ち、第１の値）までの値の範囲のことを指す。上位領域は、中位領域の上限値の１つ上（即ち、第２の値）から属性範囲条件の上限値（即ち、属性値の上限値）までの値の範囲のことを指す。各領域の算出の方法は、後で具体例を交えて説明する。 In step S102, the computing unit 102 computes the first value and the second value to obtain the attribute values included in the range from the lower limit value to the upper limit value of the attribute range condition as lower, middle, upper Are divided into three areas. Of the three regions, the middle region is a range of values that can semi-fixedly convert attribute values included in the range into regular expressions. The lower region indicates a range of values from the lower limit value of the attribute range condition (that is, the lower limit value of the attribute value) to one level lower than the lower limit value of the middle region (that is, the first value). The upper region refers to a range of values from the upper limit value of the middle region (that is, the second value) to the upper limit value of the attribute range condition (that is, the upper limit value of the attribute value). A method of calculating each region will be described later with a specific example.

正規表現生成部１０４は、属性値記憶部１０３により記憶された第１の値と第２の値との間に属性値が存在する場合には、当該属性値を正規表現で表す中位領域データをＣＰＵ９１１で生成する（ステップＳ１０３：正規表現生成処理の一部）。第１の値と第２の値との間に属性値が存在しない場合には、中位領域データは生成されない。正規表現生成部１０４は、属性値記憶部１０３により記憶された下限値から第１の値までの属性値を正規表現で表す下位領域データをＣＰＵ９１１で生成する（ステップＳ１０４：正規表現生成処理の一部）。正規表現生成部１０４は、属性値記憶部１０３により記憶された第２の値から上限値までの属性値を正規表現で表す上位領域データをＣＰＵ９１１で生成する（ステップＳ１０５：正規表現生成処理の一部）。 When there is an attribute value between the first value and the second value stored in the attribute value storage unit 103, the regular expression generation unit 104 represents intermediate region data that represents the attribute value with a regular expression. Is generated by the CPU 911 (step S103: part of the regular expression generation process). If there is no attribute value between the first value and the second value, the middle region data is not generated. The regular expression generation unit 104 generates lower region data representing the attribute values from the lower limit value to the first value stored in the attribute value storage unit 103 in a regular expression by the CPU 911 (step S104: one of the regular expression generation processes). Part). The regular expression generation unit 104 generates upper region data representing the attribute values from the second value to the upper limit value stored in the attribute value storage unit 103 in regular expressions by the CPU 911 (step S105: one of the regular expression generation processes). Part).

ステップＳ１０３、Ｓ１０４、Ｓ１０５において、正規表現生成部１０４は、中位、下位、上位の３領域のそれぞれについて、個別に対応する正規表現を生成している。ステップＳ１０３、Ｓ１０４、Ｓ１０５の処理は順序に依存しないため、任意の順序で実行してよい。 In steps S103, S104, and S105, the regular expression generation unit 104 generates a corresponding regular expression for each of the middle, lower, and upper three regions. Since the processes of steps S103, S104, and S105 do not depend on the order, they may be executed in any order.

正規表現結合部１０５は、正規表現生成部１０４により生成された下位領域データと上位領域データと中位領域データとをＣＰＵ９１１で結合して（ステップＳ１０３で中位領域データが生成されなかった場合には、下位領域データと上位領域データのみを結合することになる）、属性値記憶部１０３により記憶された下限値から上限値までの属性値を正規表現で表す正規表現データを生成する（ステップＳ１０６：正規表現結合処理）。 The regular expression combining unit 105 combines the lower region data, the upper region data, and the middle region data generated by the regular expression generation unit 104 by the CPU 911 (when the middle region data is not generated in step S103). Will combine only the lower area data and the upper area data), and generates regular expression data representing the attribute values from the lower limit value to the upper limit value stored by the attribute value storage unit 103 in regular expressions (step S106). : Regular expression combination processing).

出力部１０６は、正規表現結合部１０５により生成された正規表現データを表示装置９０１の画面などに出力する（ステップＳ１０７：出力処理）。正規表現生成装置１００の利用者は、この正規表現データを検索条件として、前述したログ専用データベースなどに入力することにより、ログなどの検索を容易に行うことができる。 The output unit 106 outputs the regular expression data generated by the regular expression combining unit 105 to the screen of the display device 901 (step S107: output processing). A user of the regular expression generation device 100 can easily search for a log or the like by inputting the regular expression data as a search condition into the log-dedicated database or the like.

以下、正規表現を生成する処理の流れを、例を交えて説明する。ここで説明するのは、属性範囲条件の下限値Ａと上限値Ｂに対して、Ａ≦Ｘ≦Ｂを満たす値Ｘ（下限値Ａから上限値Ｂまでの属性値Ｘ）を表現する文字列を照合するための正規表現を生成する処理の流れである。 Hereinafter, the flow of processing for generating a regular expression will be described with an example. What is described here is a character string expressing a value X (attribute value X from lower limit value A to upper limit value B) that satisfies A ≦ X ≦ B with respect to lower limit value A and upper limit value B of the attribute range condition. This is a flow of processing for generating a regular expression for matching.

初めにここでは属性値の下限値と上限値の桁数が等しい場合に限定して、次のように表すものとする。
下限値Ａ：Ａ_ｎＡ_ｎ−１．．．Ａ_２Ａ_１
上限値Ｂ：Ｂ_ｎＢ_ｎ−１．．．Ｂ_２Ｂ_１
ここで、添え字が大きいほど上位の桁とする。
属性値を構成する属性要素Ａ_ｉ、Ｂ_ｉ（１≦ｉ≦ｎ）は、それぞれ１個の数字や文字で、その値域をｖ_１≦Ａ_ｉ、Ｂ_ｉ≦ｖ_ｑとする。ただし、最上位桁のみｖ_２≦Ａ_ｎ、Ｂ_ｎ≦ｖ_ｑとする。ここでは簡単のために、最上位桁を除く全ての桁の値域は同じであるとする。また、（ｖ_ｊ）_ｋで「ｋ桁目の値がｖ_ｊ」であることを表すものとする。さらに、この属性値の大小関係は以下の基準で決定されるものとする。
Ａ_ｎ＝Ｂ_ｎ，Ａ_ｎ−１＝Ｂ_ｎ−１，．．．，Ａ_ｋ＋１＝Ｂ_ｋ＋１（１≦ｋ≦ｎ）のとき、Ｂ_ｋ＞Ａ_ｋならばＢ＞Ａ
Ａ_ｉ＋１と記述してＡ_ｉの値を１大きくすることとし、Ａ_ｉ＝ｖ_ｊの場合Ａ_ｉ＋１＝ｖ_ｊ＋１とする。ただし、Ａ_ｉ＝ｖ_ｑの場合はＡ_ｉ＋１＝ｖ_１とし、さらにＡ_ｉ＋１＋１とする。同様に、Ａ_ｉ−１と記述してＡ_ｉの値を１小さくすることとし、Ａ_ｉ＝ｖ_ｊの場合Ａ_ｉ−１＝ｖ_ｊ−１とする。ただし、Ａ_ｉ＝ｖ_１の場合はＡ_ｉ−１＝ｖ_ｑとし、さらにＡ_ｉ＋１−１とする。属性値Ａを１大きくすることをＡ＋１と記述し、Ａ_１＋１を意味するものとする。同様に、属性値Ａを１小さくすることをＡ−１と記述し、Ａ_１−１を意味するものとする。 First, here, it is expressed as follows only when the lower limit value of the attribute value is equal to the number of digits of the upper limit value.
Lower limit A: A _n A _n-1 . . . A ₂ A ₁
Upper limit B: B _n B _n−1 . . . B ₂ B ₁
Here, the larger the subscript, the higher the digit.
Each of the attribute elements A _i and B _i (1 ≦ i ≦ n) constituting the attribute value is one number or character, and the value ranges are v ₁ ≦ A _i and B _i ≦ v _q . However, it is assumed that only the most significant digit is v ₂ ≦ A _n and B _n ≦ v _q . Here, for simplicity, it is assumed that the value ranges of all the digits except the most significant digit are the same. Further, (v _j ) _k represents that “the value of the k-th digit is v _j ”. Furthermore, the magnitude relationship between the attribute values is determined based on the following criteria.
A _n = B _n , A _n−1 = B _n−1,. . . , A _{k + 1} = B _{k + 1} (1 ≦ k ≦ n), B> A if B _k > A _k
A _i +1 is described to increase the value of A _i by 1. When A _i = v _j , A _i + 1 = v _{j + 1} . However, when A _i = v _q , A _i + 1 = v ₁ and further A _{i + 1} +1. Similarly, A _i −1 is described, and the value of A _i is decreased by 1. When A _i = v _j , A _i −1 = v _j−1 . However, when A _i = v ₁ , A _i −1 = v _q, and further A _{i + 1} −1. Increasing the attribute value A by 1 is described as A + 1, and means A ₁ +1. Similarly, reducing the attribute value A by 1 is described as A-1, and means A ₁ -1.

このような特徴を持つ属性値として代表的なものに、整数値（絶対値）がある。 A typical attribute value having such characteristics is an integer value (absolute value).

ここで、下限値Ａと上限値Ｂが等しい場合は、ステップＳ１０２を実行するまでもなく正規表現として以下を出力するだけでよい。
正規表現：“Ａ_ｎＡ_ｎ−１．．．Ａ_２Ａ_１” Here, when the lower limit value A and the upper limit value B are equal, it is only necessary to output the following as a regular expression without executing step S102.
Regular expression: “A _n A _n−1 ... A ₂ A ₁ ”

以降は、下限値と上限値が等しくない場合について説明する。 Hereinafter, a case where the lower limit value and the upper limit value are not equal will be described.

ステップＳ１０２では、演算部１０２は、属性値を下位、中位、上位の３領域に分割するが、その中でも最初に中位領域を求める。中位領域は、図５に示した処理の流れに従って、以下のように値域を求めることができる。図５では中位領域の下限値（即ち、第１の値より１大きい値）をＬ、上限値をＵ（即ち、第２の値より１小さい値）としている。 In step S102, the calculation unit 102 divides the attribute value into three regions, a lower region, a middle region, and an upper region. Among them, first, the middle region is obtained. In the middle region, the value range can be obtained as follows in accordance with the processing flow shown in FIG. In FIG. 5, the lower limit value (that is, a value that is 1 larger than the first value) in the middle region is L, and the upper limit value is U (that is, a value that is 1 smaller than the second value).

図５において、演算部１０２は、最上位桁（ｎ桁目）から順番に（ステップＳ２０１）属性値の下限値Ａと上限値Ｂの同じ桁の値をＣＰＵ９１１により比較していく（ステップＳ２０２）。ある桁（ｉ桁目）について、ＡとＢとで値が同じ場合、演算部１０２は、ＬとＵの同じ桁（Ｌ_ｉ、Ｕ_ｉ）もその値（Ａ_ｉ）に設定し（ステップＳ２０３）、次の桁について（ステップＳ２０４）ＡとＢを比較する。比較した桁について、ＡとＢとで値が異なる場合、演算部１０２は、Ｌの同じ桁（Ｌ_ｉ）をＡの値より１大きい値（Ａ_ｉ＋１）に設定するとともに、Ｕの同じ桁（Ｕ_ｉ）をＢの値より１小さい値（Ｂ_ｉ−１）に設定する（ステップＳ２０５）。そして、次の桁から最下位桁までの各桁について（ステップＳ２０６、Ｓ２０７）、Ｌをその桁の最小値（ｖ_１）に設定するとともに、Ｕをその桁の最大値（ｖ_ｑ）に設定する（ステップＳ２０８）。その結果、下記の通りＬとＵが得られる。このとき、属性値記憶部１０３はＬとＵをＲＡＭ９１４に記憶している。
中位領域（下限値Ｌ）：Ａ_ｎ．．．Ａ_ｋ＋１（Ａ_ｋ＋１）（ｖ_１）_ｋ−１．．．（ｖ_１）_１
中位領域（上限値Ｕ）：Ｂ_ｎ．．．Ｂ_ｋ＋１（Ｂ_ｋ−１）（ｖ_ｑ）_ｋ−１．．．（ｖ_ｑ）_１
ただし、Ａ_ｎ＝Ｂ_ｎ，Ａ_ｎ−１＝Ｂ_ｎ−１，．．．，Ａ_ｋ＋１＝Ｂ_ｋ＋１（１≦ｋ≦ｎ） In FIG. 5, the arithmetic unit 102 compares the values of the same digit of the lower limit value A and the upper limit value B of the attribute value in order from the most significant digit (nth digit) (step S201) (step S202). . When the value of A and B is the same for a certain digit (i-th digit), the arithmetic unit 102 also sets the same digit (L _i , U _i ) of L and U to the value (A _i ) (step S203). ), A and B are compared for the next digit (step S204). When the compared digits have different values between A and B, the arithmetic unit 102 sets the same digit (L _i ) of L to a value (A _i +1) larger than the value of A and the same digit of U (U _i ) is set to a value (B _i −1) that is one smaller than the value of B (step S205). For each digit from the next digit to the least significant digit (steps S206 and S207), L is set to the minimum value (v ₁ ) of that digit, and U is set to the maximum value (v _q ) of that digit. (Step S208). As a result, L and U are obtained as follows. At this time, the attribute value storage unit 103 stores L and U in the RAM 914.
Middle region (lower limit L): _An . . . A _{k + 1} (A _k +1) (v ₁ ) _k−1 . . . (V ₁ ) ₁
Middle region (upper limit U): B _n . . . B _{k + 1} (B _k −1) (v _q ) _k−1 . . . (V _q ) ₁
However, A _n = B _n , A _n−1 = B _n−1,. . . , A _{k + 1} = B _{k + 1} (1 ≦ k ≦ n)

次に、中位領域の下限値を１小さくした値を下位領域の上限値（即ち、第１の値）とし、下位領域は以下のように求める。このとき、属性値記憶部１０３は属性値の下限値Ａと第１の値（Ｌ−１）をＲＡＭ９１４に記憶している。
下位領域（下限値Ａ）：Ａ_ｎＡ_ｎ−１．．．Ａ_２Ａ_１
下位領域（上限値）：Ａ_ｎ．．．Ａ_ｋ（ｖ_ｑ）_ｋ−１．．．（ｖ_ｑ）_１ Next, a value obtained by reducing the lower limit value of the middle region by 1 is set as the upper limit value (that is, the first value) of the lower region, and the lower region is obtained as follows. At this time, the attribute value storage unit 103 stores the lower limit value A and the first value (L−1) of the attribute value in the RAM 914.
Lower region (lower limit A): An _n _n-1 . . . A ₂ A ₁
Lower region (upper limit value): _An . . . A _k (v _q ) _k−1 . . . (V _q ) ₁

また、中位領域の上限値を１大きくした値を上位領域の下限値（即ち、第２の値）とし、上位領域は以下のように求める。このとき、属性値記憶部１０３は第２の値（Ｕ＋１）と属性値の上限値ＢをＲＡＭ９１４に記憶している。
上位領域（下限値）：Ｂ_ｎ．．．Ｂ_ｋ（ｖ_１）_ｋ−１．．．（ｖ_１）_１
上位領域（上限値Ｂ）：Ｂ_ｎＢ_ｎ−１．．．Ｂ_２Ｂ_１ Further, the value obtained by increasing the upper limit value of the middle region by 1 is set as the lower limit value (ie, the second value) of the upper region, and the upper region is obtained as follows. At this time, the attribute value storage unit 103 stores the second value (U + 1) and the upper limit value B of the attribute value in the RAM 914.
Upper region (lower limit): B _n . . . B _k (v ₁ ) _k−1 . . . (V ₁ ) ₁
Upper region (upper limit B): B _n B _n−1 . . . B ₂ B ₁

ステップＳ１０３において、正規表現生成部１０４は、中位領域の正規表現を生成する。中位領域に対応する正規表現は、中位領域の非共通部分の桁数ｋとＡ_ｋ、Ｂ_ｋの値にのみ依存し、以下の通り生成することができる。ｋ＋１桁目からｎ桁目までの値は、ステップＳ１０６において、共通部分として処理できるため、ここでは省略している。
中位領域の正規表現：“［（Ａ_ｋ＋１）−（Ｂ_ｋ−１）］［ｖ_１−ｖ_ｑ］｛ｋ−１｝”
ここで、Ｂ_ｋ−Ａ_ｋ＝１の場合、（Ａ_ｋ＋１）＞（Ｂ_ｋ−１）となるが、この場合は中位領域の正規表現を出力しないものとする。それ以外は同様に処理することができる。 In step S103, the regular expression generation unit 104 generates a regular expression for the middle region. The regular expression corresponding to the middle region depends only on the number k of non-common parts of the middle region and the values of A _k and B _k and can be generated as follows. Since the values from the (k + 1) th digit to the nth digit can be processed as common parts in step S106, they are omitted here.
Regular expression of middle region: “[(A _k +1) − (B _k −1)] [v ₁ −v _q ] {k−1}”
Here, when B _k −A _k = 1, (A _k +1)> (B _k −1) is satisfied. In this case, the regular expression of the middle region is not output. Other than that, it can process similarly.

なお、上記正規表現において、ｋ−１＝１であれば末尾の“｛１｝”を省略してもよい。また、ｋ−１＝０であれば末尾の“［ｖ_１−ｖ_ｑ］｛０｝”を省略してもよい。以降の説明においても、正規表現中の“［ｖ_ｉ−ｖ_ｊ］｛１｝”は“［ｖ_ｉ−ｖ_ｊ］”と記述できるものとし、“［ｖ_ｉ−ｖ_ｊ］｛０｝”は省略できるものとする。また、“［ｖ_ｉ−ｖ_ｉ］”は単に“ｖ_ｉ”と記述できるものとする。 In the regular expression, if “k−1 = 1”, “{1}” at the end may be omitted. If k−1 = 0, the end “[v ₁ −v _q ] {0}” may be omitted. Also in the following description, “[v _i −v _j ] {1}” in the regular expression can be described as “[v _i −v _j ]”, and “[v _i −v _j ] {0}”. Can be omitted. _{_{Also, "[v i -v i]}} " is simply assumed to be described as _{"v i".}

図６は、以下のような属性値の範囲に対応する正規表現の生成処理の流れを示す。
下限値：Ａ_ｋＡ_ｋ−１．．．Ａ_１
上限値：（ｖ_ｊ）_ｋ（ｖ_ｑ）_ｋ−１．．．（ｖ_ｑ）_１（ただし、ｖ_ｊ≧Ａ_ｋ） FIG. 6 shows the flow of regular expression generation processing corresponding to the following attribute value ranges.
Lower limit: A _k A _k−1 . . . A ₁
Upper limit value: (v _j ) _k (v _q ) _k−1 . . . (V _q ) ₁ (where v _j ≧ A _k )

図６の処理の流れでは、正規表現生成部１０４は、最下位の桁から上位の桁に向かって処理を進めている。ここでの入力は、下限値Ａ_ｋ．．．Ａ_１、上限値（ｖ_ｊ）_ｋ（ｖ_ｑ）_ｋ−１．．．（ｖ_ｑ）_１、及び桁数ｋである。ステップＳ３０１において、正規表現生成部１０４は、ＲＡＭ９１４内の正規表現の格納領域（Ｅ）に初期値として“Ａ_ｋ．．．Ａ_２［Ａ_１−ｖ_ｑ］”をセットする。ステップＳ３０２以降では、下位２桁目から始めてｋ−１桁目までステップＳ３０４〜Ｓ３０６の処理を繰り返す。ステップＳ３０３において、処理の対象がｋ桁目より下位の場合（ＹＥＳ）、ステップＳ３０４において、その桁の値がｖ_ｑか否かによって処理を切り分ける。ｖ_ｑの場合には（ＹＥＳ）、正規表現生成部１０４は特に何も出力せずに次の桁の処理に移行する。ｖ_ｑではない場合には（ＮＯ）、正規表現生成部１０４は、正規表現の出力領域（Ｅ）の末尾に“｜Ａ_ｋ．．．Ａ_ｉ＋１［（Ａ_ｉ＋１）−ｖ_ｑ］［ｖ_１−ｖ_ｑ］｛ｉ−１｝”を追加し、次の桁の処理に移行する。ステップＳ３０３において、ｋ−１桁目までの処理が完了していた場合は（ＮＯ）、ステップＳ３０７に進む。ステップＳ３０７において、Ａ_ｋとｖ_ｊが等しい場合は（ＹＥＳ）、正規表現生成部１０４は何もせずに処理を終了する。Ａ_ｋとｖ_ｊが等しくない場合は（ＮＯ）、正規表現生成部１０４はステップＳ３０８に進み、正規表現格納領域（Ｅ）の末尾に“｜［（Ａ_ｋ＋１）−ｖ_ｊ］［ｖ_１−ｖ_ｑ］｛ｋ−１｝”を追加し、処理を終了する。処理が終了した時点で正規表現の格納領域（Ｅ）に格納されているものが、上記の範囲に対応する正規表現である。 In the processing flow of FIG. 6, the regular expression generation unit 104 advances the processing from the lowest digit to the higher digit. The input here is the lower limit value A _k . . . A ₁ , upper limit value (v _j ) _k (v _q ) _k−1 . . . (V _q ) ₁ and the number of digits k. In step S301, the regular expression generation unit 104 sets “A _k ... A ₂ [A ₁ −v _q ]” as an initial value in the regular expression storage area (E) in the RAM 914. In step S302 and subsequent steps, the processes in steps S304 to S306 are repeated from the second least significant digit to the (k-1) th digit. If it is determined in step S303 that the processing target is lower than the k-th digit (YES), the process is divided in step S304 depending on whether or not the value of the digit is v _q . In the case of v _q (YES), the regular expression generation unit 104 shifts to the next digit process without outputting anything. When it is not v _q (NO), the regular expression generation unit 104 adds “| A _k ... A _{i + 1} [(A _i +1) −v _q ] [v at the end of the regular expression output area (E). ₁ −v _q ] {i−1} ”is added, and the processing shifts to the next digit. In step S303, if the processing up to the (k−1) th digit has been completed (NO), the process proceeds to step S307. In step S307, when A _k and v _j are equal (YES), the regular expression generation unit 104 ends the process without doing anything. When A _k and v _j are not equal (NO), the regular expression generation unit 104 proceeds to step S308, and “| [(A _k +1) −v _j ] [v ₁ is added to the end of the regular expression storage area (E). −v _q ] {k−1} ”is added, and the process ends. What is stored in the regular expression storage area (E) when the processing is completed is a regular expression corresponding to the above range.

ステップＳ１０４における下位領域の正規表現生成処理の流れは、図６においてＡ_ｎ＝Ｂ_ｎ，．．．，Ａ_ｋ＋１＝Ｂ_ｋ＋１（１≦ｋ≦ｎ）を満たすｋに対して下限値と上限値を以下のようにした場合に相当する。
下限値：Ａ_ｋＡ_ｋ−１．．．Ａ_１
上限値：Ａ_ｋ（ｖ_ｑ）_ｋ−１．．．（ｖ_ｑ）_１
そして、ステップＳ１０４において、正規表現生成部１０４が生成する下位領域の正規表現は以下の通りとなる。ｋ＋１桁目からｎ桁目までの値は、ステップＳ１０６において、共通部分として処理できるため、ここでも省略している。
下位領域の正規表現：“Ａ_ｋ．．．Ａ_２［Ａ_１−ｖ_ｑ］｜Ａ_ｋ．．．Ａ_３［（Ａ_２＋１）−ｖ_ｑ］［ｖ_１−ｖ_ｑ］｜．．．｜Ａ_ｋ［（Ａ_ｋ−１＋１）−ｖ_ｑ］［ｖ_１−ｖ_ｑ］｛ｋ−２｝” N, _A n _{= B} flows regular expression generation processing of the lower region, in FIG. 6 at step S104. . . , A _{k + 1} = B _{k + 1} (1 ≦ k ≦ n) corresponding to the case where the lower limit value and the upper limit value are set as follows for k.
Lower limit: A _k A _k−1 . . . A ₁
Upper limit value: A _k (v _q ) _k−1 . . . (V _q ) ₁
In step S104, the regular expression of the lower region generated by the regular expression generation unit 104 is as follows. Since the values from the (k + 1) th digit to the nth digit can be processed as common parts in step S106, they are omitted here.
Regular expression of subregion: “A _k ... A ₂ [A ₁ −v _q ] | A _k ... A ₃ [(A ₂ +1) −v _q ] [v ₁ −v _q ] | | A _k [(A _k−1 +1) −v _q ] [v ₁ −v _q ] {k−2} ”

図７は、以下のような属性値の範囲に対応する正規表現の生成処理の流れを示す。
下限値：（ｖ_ｊ）_ｋ（ｖ_１）_ｋ−１．．．（ｖ_１）_１（ただし、ｖ_ｊ≦Ｂ_ｋ）
上限値：Ｂ_ｋＢ_ｋ−１．．．Ｂ_１ FIG. 7 shows the flow of regular expression generation processing corresponding to the following attribute value ranges.
Lower limit value: (v _j ) _k (v ₁ ) _k−1 . . . (V ₁ ) ₁ (where v _j ≦ B _k )
Upper limit value: B _k B _k−1 . . . B ₁

図７の処理の流れでは、正規表現生成部１０４は、図６の処理の流れ同様に、最下位の桁から上位の桁に向かって処理を進めている。ここでの入力は、下限値（ｖ_ｊ）_ｋ（ｖ_１）_ｋ−１．．．（ｖ_１）_１、上限値Ｂ_ｋ．．．Ｂ_１、及び桁数ｋである。ステップＳ４０１において、正規表現生成部１０４は、ＲＡＭ９１４内の正規表現の格納領域（Ｅ）に初期値として“Ｂ_ｋ．．．Ｂ_２［ｖ_１−Ｂ_１］”をセットする。ステップＳ４０２以降では、下位２桁目から始めてｋ−１桁目までステップＳ４０４〜Ｓ４０６の処理を繰り返す。ステップＳ４０３において、処理の対象がｋ桁目より下位の場合（ＹＥＳ）、ステップＳ４０４において、その桁の値がｖ_１か否かによって処理を切り分ける。ｖ_１の場合には（ＹＥＳ）、正規表現生成部１０４は特に何も出力せずに次の桁の処理に移行する。ｖ_１ではない場合には（ＮＯ）、正規表現生成部１０４は、正規表現の出力領域（Ｅ）の末尾に“｜Ｂ_ｋ．．．Ｂ_ｉ＋１［ｖ_１−（Ｂ_ｉ−１）］［ｖ_１−ｖ_ｑ］｛ｉ−１｝”を追加し、次の桁の処理に移行する。ステップＳ４０３において、ｋ−１桁目までの処理が完了していた場合は（ＮＯ）、ステップＳ４０７に進む。ステップＳ４０７において、Ｂ_ｋとｖ_ｊが等しい場合は（ＹＥＳ）、正規表現生成部１０４は何もせずに処理を終了する。Ｂ_ｋとｖ_ｊが等しくない場合は（ＮＯ）、正規表現生成部１０４はステップＳ４０８に進み、正規表現格納領域（Ｅ）の末尾に“｜［ｖ_ｊ−（Ｂ_ｋ−１）］［ｖ_１−ｖ_ｑ］｛ｋ−１｝”を追加し、処理を終了する。処理が終了した時点で正規表現の格納領域（Ｅ）に格納されているものが、上記の範囲に対応する正規表現である。 In the processing flow of FIG. 7, the regular expression generation unit 104 advances the processing from the lowest digit to the higher digit, similarly to the processing flow of FIG. The input here is the lower limit value (v _j ) _k (v ₁ ) _k−1 . . . (V ₁ ) ₁ , upper limit value B _k . . . B ₁ and the number of digits k. In step S401, the regular expression generation unit 104 sets “B _k ... B ₂ [v ₁ −B ₁ ]” as an initial value in the regular expression storage area (E) in the RAM 914. In step S402 and subsequent steps, the processes in steps S404 to S406 are repeated from the second least significant digit to the k-1th digit. In step S403, if the processing of the target is lower than the k-th digit (YES), in step S404, isolate the processing depending on whether the value of that digit is _{v 1} or. In the case of v ₁ (YES), the regular expression generation unit 104 shifts to the next digit process without outputting anything. When it is not v ₁ (NO), the regular expression generation unit 104 adds “| B _k ... B _{i + 1} [v ₁ − (B _i −1)] [[at the end of the regular expression output area (E)”. v ₁ −v _q ] {i−1} ”is added, and the process proceeds to the next digit processing. In step S403, if the processing up to the (k-1) th digit has been completed (NO), the process proceeds to step S407. In step S407, when B _k and v _j are equal (YES), the regular expression generation unit 104 ends the process without doing anything. If B _k and v _j are not equal (NO), the regular expression generation unit 104 proceeds to step S408, and “| [v _j − (B _k −1)] [v at the end of the regular expression storage area (E). ₁ −v _q ] {k−1} ”is added, and the process ends. What is stored in the regular expression storage area (E) when the processing is completed is a regular expression corresponding to the above range.

ステップＳ１０５における上位領域の正規表現生成処理の流れは、図７においてＡ_ｎ＝Ｂ_ｎ，．．．，Ａ_ｋ＋１＝Ｂ_ｋ＋１（１≦ｋ≦ｎ）を満たすｋに対して下限値と上限値を以下のようにした場合に相当する。
下限値：Ｂ_ｋ（ｖ_１）_ｋ−１．．．（ｖ_１）_１
上限値：Ｂ_ｋＢ_ｋ−１．．．Ｂ_１
そして、ステップＳ１０５において、正規表現生成部１０４が生成する上位領域の正規表現は以下の通りとなる。ｋ＋１桁目からｎ桁目までの値は、ステップＳ１０６において、共通部分として処理できるため、ここでも省略している。
上位領域の正規表現：“Ｂ_ｋ．．．Ｂ_２［ｖ_１−Ｂ_１］｜Ｂ_ｋ．．．Ｂ_３［ｖ_１−（Ｂ_２−１）］［ｖ_１−ｖ_ｑ］｜．．．｜Ｂ_ｋ［ｖ_１−（Ｂ_ｋ−１−１）］［ｖ_１−ｖ_ｑ］｛ｋ−２｝” The flow of regular expression generation processing for the upper region in step S105 is shown in FIG. 7 as A _n = B _n,. . . , A _{k + 1} = B _{k + 1} (1 ≦ k ≦ n) corresponding to the case where the lower limit value and the upper limit value are set as follows for k.
Lower limit value: B _k (v ₁ ) _k−1 . . . (V ₁ ) ₁
Upper limit value: B _k B _k−1 . . . B ₁
In step S105, the regular expression of the upper region generated by the regular expression generation unit 104 is as follows. Since the values from the (k + 1) th digit to the nth digit can be processed as common parts in step S106, they are omitted here.
Regular expression of the upper _{_{_{_{area: "B k ... B 2 [}}}} v 1 -B 1] | B k ... B 3 [v 1 - (B 2 -1)] [v 1 -v q] | .. . | B _k [v ₁ − (B _k−1 −1)] [v ₁ −v _q ] {k−2} ”

以上、属性値の範囲を下位、中位、上位領域に分割し、個別に対応する正規表現を生成する処理について説明した。次に、ステップＳ１０６において、正規表現結合部１０５が、それらの生成された正規表現を結合して生成する正規表現について説明する。 The processing for dividing the attribute value range into the lower, middle, and upper regions and generating the corresponding regular expressions has been described above. Next, the regular expression generated by the regular expression combining unit 105 by combining the generated regular expressions in step S106 will be described.

正規表現結合部１０５は、下位、中位、上位領域の正規表現を結合し、結合した正規表現に、ステップＳ１０３〜Ｓ１０５において省略されていた共通部分（ｋ＋１桁目からｎ桁目まで）を追加する。これにより最終的に生成される正規表現は以下のような形式となる。
“Ａ_ｎ．．．Ａ_ｋ＋１（＜下位領域の正規表現＞｜＜中位領域の正規表現＞｜＜上位領域の正規表現＞）”
また、中位領域がない場合に生成される正規表現は以下のような形式となる。
“Ａ_ｎ．．．Ａ_ｋ＋１（＜下位領域の正規表現＞｜＜上位領域の正規表現＞）” The regular expression combining unit 105 combines the lower, middle, and upper region regular expressions, and adds the common part (from the k + 1st digit to the nth digit) omitted in steps S103 to S105 to the combined regular expression. To do. As a result, the regular expression finally generated is in the following format.
“A _n ... A _{k + 1} (<Regular expression in lower region> || <Regular expression in middle region> | <Regular expression in upper region>)”
In addition, the regular expression generated when there is no middle region has the following format.
“A _n ... A _{k + 1} (<Regular expression in lower region> | <Regular expression in upper region>)”

上記の正規表現では、Ａ≦Ｘ≦Ｂを満たす値Ｘだけでなく、値ｖ_ｉＸ、Ｘｖ_ｊ、ｖ_ｉＸｖ_ｊを表す文字列などもヒットしてしまう。そこで、このようなヒットを避けたい場合には、生成した正規表現の前後に除外文字指定“［＾ｖ_１−ｖ_ｑ］”を追加するとよい。 In the regular expression, not only the value X satisfying A ≦ X ≦ B but also a character string representing the values v _i X, Xv _j , and v _i Xv _j is hit. Therefore, to avoid such a hit, it is preferable to add an exclusion character designation “[^ v ₁ −v _q ]” before and after the generated regular expression.

以下、属性範囲条件の上限値と下限値の桁数が異なる場合について説明する。即ち、以下のような場合である。
下限値Ａ：Ａ_ｎＡ_ｎ−１．．．Ａ_２Ａ_１
上限値Ｂ：Ｂ_ｍＢ_ｍ−１．．．Ｂ_２Ｂ_１（ｍ＞ｎ）
属性値の桁数が異なるとき、ｍ＞ｎならばＢ＞Ａである。又は、下限値Ａのｎ＋１桁目からｍ桁目までは、暗黙の値としてｖ_１があるものと考えることもできる。 Hereinafter, a case where the upper limit value and the lower limit value of the attribute range condition are different will be described. That is, the following cases.
Lower limit A: A _n A _n-1 . . . A ₂ A ₁
Upper limit B: B _m B _m−1 . . . B ₂ B ₁ (m> n)
When the number of digits of the attribute value is different, B> A if m> n. Or, from n + 1 digit of the lower limit A to the m-th digit it can also be considered that there is v ₁ as an implicit value.

まず、ｍ−ｎ＝１の場合は、以下のように下位、中位、上位領域を求めることができる。
下位領域：Ａ_ｎＡ_ｎ−１．．．Ａ_２Ａ_１〜（ｖ_ｑ）_ｎ（ｖ_ｑ）_ｎ−１．．．（ｖ_ｑ）_２（ｖ_ｑ）_１
中位領域：なし
上位領域：（ｖ_１）_ｍ（ｖ_１）_ｍ−１．．．（ｖ_１）_２（ｖ_１）_１〜Ｂ_ｍＢ_ｍ−１．．．Ｂ_２Ｂ_１
以降の処理の流れについては、図４、図６、図７に示した処理の流れと同じである。 First, when mn = 1, the lower, middle, and upper regions can be obtained as follows.
Subregion: A _n A _n-1 . . . _{_{_{_{A 2 A 1 ~ (v q}}}} ) n (v q) n-1. . . (V _q ) ₂ (v _q ) ₁
Middle region: none Upper region: (v ₁ ) _m (v ₁ ) _m−1 . . . _{_{_{_{(V 1) 2 (v 1}}}} ) 1 ~B m B m-1. . . B ₂ B ₁
The subsequent processing flow is the same as the processing flow shown in FIG. 4, FIG. 6, and FIG.

もしくは、ｍ−ｎ＝１の場合も、図５の処理の流れに従って、以下のように中位領域を求めてもよい。
中位領域：（Ａ_ｎ＋１）（ｖ_１）_ｎ−１．．．（ｖ_１）_２（ｖ_１）_１〜（Ｂ_ｍ−１）（ｖ_ｑ）_ｍ−１．．．（ｖ_ｑ）_２（ｖ_ｑ）_１
このとき、下位領域と上位領域は以下のようになる。
下位領域：Ａ_ｎＡ_ｎ−１．．．Ａ_２Ａ_１〜Ａ_ｎ（ｖ_ｑ）_ｎ−１．．．（ｖ_ｑ）_２（ｖ_ｑ）_１
上位領域：Ｂ_ｍ（ｖ_１）_ｍ−１．．．（ｖ_１）_２（ｖ_１）_１〜Ｂ_ｍＢ_ｍ−１．．．Ｂ_２Ｂ_１
そして、中位領域の正規表現は以下のようになる。
中位領域の正規表現：“［（Ａ_ｎ＋１）−ｖ_ｑ］［ｖ_１−ｖ_ｑ］｛ｎ−１｝｜［ｖ_２−（Ｂ_ｍ−１）］［ｖ_１−ｖ_ｑ］｛ｍ−１｝” Alternatively, even in the case of mn = 1, the middle region may be obtained as follows according to the processing flow of FIG.
Middle region: (A _n +1) (v ₁ ) _n−1 . . . (V ₁ ) ₂ (v ₁ ) ₁ to (B _m −1) (v _q ) _m−1 . . . (V _q ) ₂ (v _q ) ₁
At this time, the lower area and the upper area are as follows.
Subregion: A _n A _n-1 . . . A ₂ A _{1 to} _An (v _q ) _n-1 . . . (V _q ) ₂ (v _q ) ₁
Upper region: B _m (v ₁ ) _m−1 . . . _{_{_{_{(V 1) 2 (v 1}}}} ) 1 ~B m B m-1. . . B ₂ B ₁
And the regular expression of the middle region is as follows.
Regular expression of middle region: “[(A _n +1) −v _q ] [v ₁ −v _q ] {n−1} | [v ₂ − (B _m −1)] [v ₁ −v _q ] { m−1} ”

また、ｍ−ｎ＞２の場合は、下位、中位、上位領域を以下のようにして同様に処理することができる。
下位領域：Ａ_ｎＡ_ｎ−１．．．Ａ_２Ａ_１〜（ｖ_ｑ）_ｎ（ｖ_ｑ）_ｎ−１．．．（ｖ_ｑ）_２（ｖ_ｑ）_１
中位領域：（ｖ_２）_ｎ＋１（ｖ_１）_ｎ．．．（ｖ_１）_２（ｖ_１）_１〜（ｖ_ｑ）_ｍ−１（ｖ_ｑ）_ｍ−２．．．（ｖ_ｑ）_２（ｖ_ｑ）_１
上位領域：（ｖ_２）_ｍ（ｖ_１）_ｍ−１．．．（ｖ_１）_２（ｖ_１）_１〜Ｂ_ｍＢ_ｍ−１．．．Ｂ_２Ｂ_１
そして、中位領域の正規表現は以下のようになる。
中位領域の正規表現：“［ｖ_２−ｖ_ｑ］［ｖ_１−ｖ_ｑ］｛ｎ，ｍ−２｝” In the case of mn> 2, the lower, middle and upper areas can be processed in the same manner as follows.
Subregion: A _n A _n-1 . . . _{_{_{_{A 2 A 1 ~ (v q}}}} ) n (v q) n-1. . . (V _q ) ₂ (v _q ) ₁
Middle region: (v ₂ ) _{n + 1} (v ₁ ) _n . . . (V ₁ ) ₂ (v ₁ ) ₁ to (v _q ) _m-1 (v _q ) _m-2 . . . (V _q ) ₂ (v _q ) ₁
Upper region: (v ₂ ) _m (v ₁ ) _m−1 . . . _{_{_{_{(V 1) 2 (v 1}}}} ) 1 ~B m B m-1. . . B ₂ B ₁
And the regular expression of the middle region is as follows.
Regular expression of middle region: “[v ₂ −v _q ] [v ₁ −v _q ] {n, m−2}”

上記の方式を０以上の整数値に適用した場合の具体例を示す。１０進数の場合、ｖ_１＝０、ｖ_ｑ＝９である。 A specific example when the above method is applied to an integer value of 0 or more will be described. For decimal numbers, v ₁ = 0 and v _q = 9.

（例１−１）属性範囲条件：１２３〜２４６
下位領域：１２３〜１９９
中位領域：２００〜１９９（正規表現は生成されない）
上位領域：２００〜２４６
正規表現：“（１２［３−９］｜１［３−９］［０−９］）｜（２４［０−６］｜２［０−３］［０−９］）” (Example 1-1) Attribute range condition: 123-246
Lower region: 123-199
Middle region: 200-199 (regular expressions are not generated)
Upper region: 200-246
Regular expression: “(12 [3-9] | 1 [3-9] [0-9]) | (24 [0-6] | 2 [0-3] [0-9])”

（例１−２）属性範囲条件：２３４５〜７６５４
下位領域：２３４５〜２９９９
中位領域：３０００〜６９９９
上位領域：７０００〜７６５４
正規表現：“（２３４［５−９］｜２３［５−９］［０−９］｜２［４−９］［０−９］｛２｝）｜［３−６］［０−９］｛３｝｜（７６５［０−４］｜７６［０−４］［０−９］｜７［０−５］［０−９］｛２｝）” (Example 1-2) Attribute range condition: 2345-7654
Lower region: 2345-2999
Middle region: 3000-6999
Upper area: 7000-7654
Regular expression: “(234 [5-9] | 23 [5-9] [0-9] | 2 [4-9] [0-9] {2}) | [3-6] [0-9] {3} | (765 [0-4] | 76 [0-4] [0-9] | 7 [0-5] [0-9] {2}) ”

（例１−３）属性範囲条件：７５３１〜１８９３９
下位領域：７５３１〜９９９９
中位領域：１００００〜１７９９９
上位領域：１８０００〜１８９３９
正規表現：“（７５３［１−９］｜７５［４−９］［０−９］）｜７［６−９］［０−９］｛２｝｜［８９］［０−９］｛３｝）｜１［０−７］［０−９］｛３｝｜（１８９３［０−９］｜１８９［０−２］［０−９］｜１８［０−８］［０−９］｛２｝）”
あるいは、
下位領域：７５３１〜７９９９
中位領域：８０００〜１７９９９
上位領域：１８０００〜１８９３９
正規表現：“（７５３［１−９］｜７５［４−９］［０−９］）｜７［６−９］［０−９］｛２｝）｜（［８−９］［０−９］｛３｝｜１［０−７］［０−９］｛３｝）｜（１８９３［０−９］｜１８９［０−２］［０−９］｜１８［０−８］［０−９］｛２｝）” (Example 1-3) Attribute range condition: 7531 to 18939
Lower region: 7531 to 9999
Middle region: 10000-17999
Upper region: 18000-18939
Regular expression: “(753 [1-9] | 75 [4-9] [0-9]) | 7 [6-9] [0-9] {2} | [89] [0-9] {3 }) | 1 [0-7] [0-9] {3} | (1893 [0-9] | 189 [0-2] [0-9] | 18 [0-8] [0-9] { 2}) ”
Or
Lower region: 7531-7999
Middle region: 8000-17999
Upper region: 18000-18939
Regular expression: “(753 [1-9] | 75 [4-9] [0-9]) | 7 [6-9] [0-9] {2}) | ([8-9] [0− 9] {3} | 1 [0-7] [0-9] {3}) | (1893 [0-9] | 189 [0-2] [0-9] | 18 [0-8] [0 -9] {2}) "

（例１−４）属性範囲条件：３０４〜７６５４３２
下位領域：３０４〜９９９
中位領域：１０００〜９９９９９
上位領域：１０００００〜７６５４３２
正規表現：“（３０［４−９］｜３［１−９］［０−９］｜［４−９］［０−９］｛２｝）｜［１−９］［０−９］｛３，４｝｜（７６５４３［０−２］｜７６５４［０−２］［０−９］｜７６５［０−３］［０−９］｛２｝｜７６［０−４］［０−９］｛３｝｜７［０−５］［０−９］｛４｝｜［１−６］［０−９］｛５｝）” (Example 1-4) Attribute range condition: 304 to 765432
Lower area: 304 to 999
Middle region: 1000-99999
Upper area: 100,000 to 765432
Regular expression: “(30 [4-9] | 3 [1-9] [0-9] | [4-9] [0-9] {2}) | [1-9] [0-9] { 3, 4} | (76543 [0-2] | 7654 [0-2] [0-9] | 765 [0-3] [0-9] {2} | 76 [0-4] [0-9 ] {3} | 7 [0-5] [0-9] {4} | [1-6] [0-9] {5}) ”

数値の場合、数値の前に０の連続が許容される場合がある。属性範囲条件にて、「数値の先頭部分に任意の数の０の連続がある」という書式指定がなされた場合には、図４のステップＳ１０７において、正規表現結合部１０５は、正規表現を以下のように生成する。
正規表現：“［＾１−９］０＊（＜下位領域の正規表現＞｜＜中位領域の正規表現＞｜＜上位領域の正規表現＞）［＾０−９］” In the case of a numerical value, a continuation of 0 may be allowed before the numerical value. In the attribute range condition, when the format designation “arbitrary number of zeros are continuous at the beginning of the numerical value” is made, in step S107 of FIG. Generate as follows.
Regular expression: “[^ 1-9] 0 * (<Regular expression in lower region> || <Regular expression in middle region> || <Regular expression in upper region>) [^ 0-9]”

また、属性範囲条件にて、「全体の桁数をｔとし、不足する場合は先頭に０を補う」という書式指定（Ｃ言語のｐｒｉｎｔｆ関数の書式指定で記述するところの”％０ｔｄ”）がなされた場合には、正規表現結合部１０５は、以下のように正規表現を生成することができる。
正規表現：“［＾１−９］０｛ｔ−ｍ，０｝（０｛ｍ−ｎ｝＜下位領域の正規表現＞｜＜中位領域の正規表現＞｜＜上位領域の正規表現＞）［＾０−９］”
ここで、中位領域の正規表現は以下のようになる。
中位領域の正規表現：“０｛ｔ−ｎ−１｝［１−９］［０−９］｛ｎ｝｜０｛ｔ−ｎ−２｝［１−９］［０−９］｛ｎ＋１｝｜．．．｜０｛ｔ−ｍ−１｝［１−９］［０−９］｛ｍ｝” In addition, in the attribute range condition, a format specification (“% 0td” described in the format specification of the C printf function) is “the total number of digits is t, and if there is a shortage, 0 is added at the beginning”. If it is made, the regular expression combining unit 105 can generate a regular expression as follows.
Regular expression: “[^ 1-9] 0 {t−m, 0} (0 {m−n} <regular expression in lower region> | <regular expression in middle region> | <regular expression in upper region>) [^ 0-9] "
Here, the regular expression of the middle region is as follows.
Regular expression of middle region: “0 {t−n−1} [1-9] [0-9] {n} | 0 {tn−2} [1-9] [0-9] {n + 1 } | ... | 0 {tm-1} [1-9] [0-9] {m} "

負の整数の場合は、正規表現生成部１０４が、まず絶対値に対して同様に正規表現を生成した後で、正規表現の先頭に負号を加えることで生成できる。また、属性値の範囲が０をまたがるような場合には、正（０以上）の領域と、負（０未満）の領域とに分けて処理を行えばよい。例えば、
下限値Ａ：−Ａ_ｎＡ_ｎ−１．．．Ａ_２Ａ_１
上限値Ｂ：Ｂ_ｍＢ_ｍ−１．．．Ｂ_２Ｂ_１
という属性範囲条件が与えられた場合は、
正の下限値：０
正の上限値：Ｂ_ｍＢ_ｍ−１．．．Ｂ_２Ｂ_１
及び、
負の下限値：−Ａ_ｎＡ_ｎ−１．．．Ａ_２Ａ_１
負の上限値：−１
として個別に正規表現を生成し、正規表現結合部１０５が最後に以下のように結合すればよい。
正規表現：“＜０〜Ｂの範囲の正規表現＞｜（−＜１〜Ａの絶対値の範囲の正規表現＞）” In the case of a negative integer, the regular expression generation unit 104 can generate a regular expression in the same manner for the absolute value, and then add a negative sign to the head of the regular expression. Further, when the range of the attribute value crosses 0, the process may be performed by dividing into a positive (0 or more) area and a negative (less than 0) area. For example,
Lower limit value A: -A _n A _n-1 . . . A ₂ A ₁
Upper limit B: B _m B _m−1 . . . B ₂ B ₁
If the attribute range condition is given,
Positive lower limit: 0
Positive upper limit: B _m B _m−1 . . . B ₂ B ₁
as well as,
Negative lower limit value: -A _n A _n-1 . . . A ₂ A ₁
Negative upper limit: -1
The regular expressions are individually generated as follows, and the regular expression combining unit 105 finally combines as follows.
Regular expression: “<regular expression in the range of 0 to B> | (− <regular expression in the range of absolute values of 1 to A>)”

これまでの説明では、属性値の下限値Ａ、上限値Ｂともに指定されている場合について示してきた。ここでは、上限値Ｂの指定がない場合や上限値が「無限大」と指定された場合について説明する。属性のデータ型として最大値Ｂ´が存在する場合は、正規表現結合部１０５は下限値Ａから上限値Ｂ´までの正規表現を生成すればよい。属性のデータ型として最大値がない場合には、以下のように処理するとよい。 In the description so far, the case where both the lower limit value A and the upper limit value B of the attribute value are specified has been shown. Here, a case where the upper limit value B is not specified or a case where the upper limit value is specified as “infinity” will be described. When the maximum value B ′ exists as the attribute data type, the regular expression combining unit 105 may generate a regular expression from the lower limit value A to the upper limit value B ′. When there is no maximum value as the data type of the attribute, it is better to process as follows.

まず、正規表現結合部１０５は、下限値Ａと同じ桁数の属性値の最大値について正規表現を生成する。即ち、以下のようなｎ桁の下限値と上限値について正規表現を生成する。
下限値Ａ：Ａ_ｎ．．．Ａ_１
上限値Ｂ´：（ｖ_ｑ）_ｎ．．．（ｖ_ｑ）_１
その上で、正規表現結合部１０５は、ｎ桁目より上位の属性値の正規表現“［ｖ_２−ｖ_ｑ］［ｖ_１−ｖ_ｑ］｛ｎ，｝”を追加する。生成される正規表現は以下の形式になる。
正規表現：“＜Ａ〜Ｂ´の範囲の正規表現＞｜［ｖ_２−ｖ_ｑ］［ｖ_１−ｖ_ｑ］｛ｎ，｝” First, the regular expression combining unit 105 generates a regular expression for the maximum value of attribute values having the same number of digits as the lower limit value A. That is, a regular expression is generated for the following n-digit lower and upper limit values.
Lower limit value A: _An . . . A ₁
Upper limit B ′: (v _q ) _n . . . (V _q ) ₁
Then, the regular expression combining unit 105 adds a regular expression “[v ₂ −v _q ] [v ₁ −v _q ] {n,}” of the attribute value higher than the n-th digit. The generated regular expression is in the following format.
Regular expression: “<Regular expression in the range of A to B ′> | [v ₂ −v _q ] [v ₁ −v _q ] {n,}”

下限値Ａの指定がない場合や下限値が「無限小」と指定された場合について説明する。まず、属性のデータ型として最小値Ａ´があるならば、正規表現結合部１０５は下限値Ａ´と上限値Ｂについて正規表現を生成するようにすればよい。例えば、検索対象の属性のデータ型が正の整数であれば最小値は１であるし、自然数であれば最小値は０である。負の整数値も含む場合は、正規表現結合部１０５は、これまでに示してきた手順に従って以下のような正規表現を生成すればよい。即ち、０〜Ｂの数値範囲と、１桁以上の負の数値として考える。なお、以下の正規表現の記述例では、０回以上の繰り返しを“｛０，｝”の代わりに“＊”で表している。
正規表現：“＜０〜Ｂの範囲の正規表現＞｜（−［１−９］［０−９］＊）” A case where the lower limit value A is not specified or a case where the lower limit value is specified as “infinitesimal” will be described. First, if there is a minimum value A ′ as an attribute data type, the regular expression combining unit 105 may generate a regular expression for the lower limit value A ′ and the upper limit value B. For example, the minimum value is 1 if the data type of the search target attribute is a positive integer, and the minimum value is 0 if it is a natural number. When a negative integer value is also included, the regular expression combining unit 105 may generate the following regular expression according to the procedure shown so far. That is, a numerical range of 0 to B and a negative numerical value of one digit or more are considered. In the following regular expression description example, zero or more repetitions are represented by “*” instead of “{0,}”.
Regular expression: "<Regular expression in the range of 0 to B> | (-[1-9] [0-9] *)"

数字に全角と半角数字が、アルファベット（英字）に大文字と小文字があるように、属性の各桁の値域が２種類以上存在する場合がある（ｖ_１≦Ａ_ｉ≦ｖ_ｑ，ｗ_１≦Ａ_ｉ≦ｗ_ｑ，．．．）。その場合でも、正規表現結合部１０５が、上で示した処理の流れを少し修正するだけで、対応する正規表現を生成することができる。即ち、２種類以上の属性値の記述を区別しないという指定がなされた場合には、選択「｜」や文字クラスを使用して並べて記述することで生成することができる。例を示す。
（１）選択を使用する場合
修正前：“ｖ_ｉｖ_ｊ［ｖ_ｋ−ｖ_ｌ］”
修正後：“（ｖ_ｉｖ_ｊ［ｖ_ｋ−ｖ_ｌ］｜ｗ_ｉｗ_ｊ［ｗ_ｋ−ｗ_ｌ］｜．．．）”
（２）文字クラスを使用する場合
修正前：“ｖ_ｉｖ_ｊ［ｖ_ｋ−ｖ_ｌ］”
修正後：“［ｖ_ｉｗ_ｉ．．．］［ｖ_ｊｗ_ｊ．．．］［ｖ_ｋ−ｖ_ｌｗ_ｋ−ｗ_ｌ．．．］” There may be two or more value ranges for each digit of the attribute, such that there are full-width and half-width numbers in the numbers and uppercase and lowercase letters in the alphabet (alphabetic characters) (v ₁ ≦ A _i ≦ v _q , w ₁ ≦ A _i ≦ w _q ,. Even in such a case, the regular expression combining unit 105 can generate a corresponding regular expression only by slightly correcting the processing flow shown above. That is, when it is specified that the descriptions of two or more types of attribute values are not distinguished, they can be generated by describing them side by side using a selection “|” or a character class. An example is shown.
(1) When using selection Before modification: “v _i v _j [v _k −v _l ]”
After correction: “(v _i v _j [v _k −v _l ] | w _i w _j [w _k −w _l ] |...)”
(2) When using a character class Before modification: “v _i v _j [v _k −v _l ]”
After correction: “[v _i w _i ...] [V _j w _j ...] [V _k −v _l w _k −w _l .

数値の場合には、下位の桁から３桁ごとにカンマ「，」で区切られる場合がある。その場合の正規表現の記述は以下のようになる。このとき、上で示した処理の流れによると、下位領域や上位領域の正規表現は桁数が固定である。以下の例では、正規表現の選択記号「｜」に囲まれた部分正規表現についてのみ示す。
修正前：“Ａ_ｍＡ_ｍ−１．．．Ａ_ｎ＋１［ｖ_１−ｖ_ｑ］｛ｎ｝”
修正後：“Ａ_ｍＡ_ｍ−１．．．Ａ_{ｎ−ｔ＋４}，Ａ_{ｎ−ｔ＋３}．．．Ａ_ｎ＋１［ｖ_１−ｖ_ｑ］｛ｔ｝（，［ｖ_１−ｖ_ｑ］｛３｝）｛ｓ｝”
ここで、ｓはｎ／３の商、ｔはｎ／３の余剰とする。
このように、正規表現結合部１０５は、ｎ＋１桁目以上については、ｎ−ｔ＋３×ｉ桁目とｎ−ｔ＋３×ｉ＋１桁目との間にカンマを挿入するとよい（ｉは自然数）。 In the case of numerical values, every three digits from the lower digit may be separated by a comma “,”. The description of the regular expression in that case is as follows. At this time, according to the processing flow shown above, the regular expression of the lower area and the upper area has a fixed number of digits. In the following example, only a partial regular expression surrounded by a regular expression selection symbol “|” is shown.
Before correction: “A _m A _m−1 ... A _{n + 1} [v ₁ −v _q ] {n}”
After the modification: “A _m A _m−1 ... A _{n−t + 4} , A _{n−t + 3} ... A _{n + 1} [v ₁ −v _q ] {t} (, [v ₁ −v _q ] {3}) {S} ”
Here, s is a quotient of n / 3, and t is a surplus of n / 3.
As described above, the regular expression combining unit 105 may insert a comma between the n−t + 3 × i digit and the n−t + 3 × i + 1 digit for the (n + 1) th digit (i is a natural number).

上記の例では、１０進数の場合について示したが、ｖ_１＝０、ｖ_ｑ＝Ｆとすれば１６進数も同様に処理することができる。その他、本実施の形態と同様の形式を持つ属性であれば、いずれも同様に処理することができる。 In the above example, the case of a decimal number is shown. However, if v ₁ = 0 and v _q = F, a hexadecimal number can be processed in the same manner. In addition, any attribute having the same format as the present embodiment can be processed in the same manner.

さらに、ここまで、主に整数値の例を用いて説明してきたが、各桁の値は数字である必要はなく、１桁以上の数値や、順序のある単語の集合でもよい。例えば、日付の月の表記の仕方には、月の英語名（Ｊａｎｕａｒｙ、Ｆｅｂｒｕａｒｙ、Ｍａｒｃｈ、・・・、Ｄｅｃｅｍｂｅｒ）やその略語（Ｊａｎ、Ｆｅｂ、Ｍａｒ、・・・、Ｄｅｃ）が使用される場合もある。この場合、月を１２個の値を持つ１個の属性要素と見なしてもよい。このとき、ｖ_１＝“Ｊａｎ”、ｖ_２＝“Ｆｅｂ”、ｖ_３＝“Ｍａｒ”、・・・、ｖ_１２＝“Ｄｅｃ”と定義すれば、本実施の形態において示した手順で処理することができる。このような処理方法は、属性値が後述する実施の形態３に示す形式を持つような場合、特に有効である。 Furthermore, although the description has been made mainly using the example of the integer value so far, the value of each digit does not have to be a number, and may be a numerical value of one digit or more or a set of ordered words. For example, when the month of the date is used, the English name of the month (January, February, March, ..., December) or its abbreviation (Jan, Feb, Mar, ..., Dec) is used. There is also. In this case, the month may be regarded as one attribute element having 12 values. At this time, if v ₁ = “Jan”, v ₂ = “Feb”, v ₃ = “Mar”,..., V ₁₂ = “Dec”, processing is performed according to the procedure shown in this embodiment. be able to. Such a processing method is particularly effective when the attribute value has the format shown in the third embodiment to be described later.

以上のように、本実施の形態に係る正規表現生成装置１００によれば、整数値のような特徴を持った属性の下限値、上限値、書式などを指定した属性範囲条件から、その下限値から上限値までの範囲に含まれる属性値を表現する文字列を照合するための正規表現を自動的に生成することができる。これにより、従来は正確に記述することが困難であった複雑な属性範囲条件を照合する正規表現を、特別な知識も試行錯誤も必要なく、容易に短時間で得ることができる。 As described above, according to the regular expression generation device 100 according to the present embodiment, the lower limit value is determined from the attribute range condition that specifies the lower limit value, upper limit value, format, etc. of the attribute having characteristics such as an integer value. A regular expression for collating a character string representing an attribute value included in the range from to the upper limit value can be automatically generated. As a result, a regular expression for matching complicated attribute range conditions, which has been difficult to describe accurately in the past, can be easily obtained in a short time without requiring special knowledge or trial and error.

実施の形態２．
本実施の形態について、主に実施の形態１との差異を説明する。 Embodiment 2. FIG.
In the present embodiment, differences from the first embodiment will be mainly described.

実施の形態１で説明したように、整数値のデータ型や書式の指定の仕方には様々なものがある。例えば、符合の有無、最大桁数、何進数表記であるか、整数値の値域、数値の先頭の０の有無、カンマ区切りの有無、表記する文字の種類、・・・など、これらの書式を属性範囲条件で細かく指定するのは煩雑である。そこで、本実施の形態では、正規表現生成装置１００において、予め設定された属性範囲条件を記憶装置１５１に記憶しておき、これを選択して利用できるようにする。 As described in the first embodiment, there are various ways of specifying the data type and format of the integer value. For example, these formats such as presence / absence of sign, maximum number of digits, decimal number notation, integer value range, presence / absence of leading zeros, presence / absence of comma delimiters, type of characters to be written, etc. It is complicated to specify the attribute range condition in detail. Therefore, in the present embodiment, in the regular expression generation device 100, a preset attribute range condition is stored in the storage device 151 so that it can be selected and used.

図８は、本実施の形態に係る正規表現生成装置１００の構成を示すブロック図である。 FIG. 8 is a block diagram showing a configuration of regular expression generation apparatus 100 according to the present embodiment.

図８において、正規表現生成装置１００は、実施の形態１で説明した図１に示したものに加え、条件記憶部１０７、識別子入力部１０８を備える。 In FIG. 8, the regular expression generation apparatus 100 includes a condition storage unit 107 and an identifier input unit 108 in addition to the one shown in FIG. 1 described in the first embodiment.

条件記憶部１０７は、複数の属性範囲条件データを予め記憶装置１５１に記憶しておく。条件記憶部１０７は、各属性範囲条件データを一意の識別子と対応付けて記憶装置１５１に記憶する。 The condition storage unit 107 stores a plurality of attribute range condition data in the storage device 151 in advance. The condition storage unit 107 stores each attribute range condition data in the storage device 151 in association with a unique identifier.

識別子入力部１０８は、任意の識別子を入力装置１５３から入力する。 The identifier input unit 108 inputs an arbitrary identifier from the input device 153.

属性範囲条件入力部１０１は、識別子入力部１０８により入力された識別子に対応付けて条件記憶部１０７により記憶された属性範囲条件データを入力する。 The attribute range condition input unit 101 inputs the attribute range condition data stored in the condition storage unit 107 in association with the identifier input by the identifier input unit 108.

図９は、本実施の形態に係る正規表現生成方法を示すフローチャートである。図９のフローチャートに示すフローは、正規表現生成装置１００を実現するコンピュータ上で実行されるプログラム（正規表現生成プログラム）の処理手順に相当する。この処理手順において、正規表現生成プログラムは、以下に示す各処理をコンピュータに実行させる。 FIG. 9 is a flowchart showing a regular expression generation method according to the present embodiment. The flow shown in the flowchart of FIG. 9 corresponds to a processing procedure of a program (regular expression generation program) executed on a computer that implements the regular expression generation apparatus 100. In this processing procedure, the regular expression generation program causes the computer to execute the following processes.

正規表現生成装置１００の利用者がキーボード９０２やマウス９０３で識別子を指定すると、識別子入力部１０８は、その識別子をキーボード９０２やマウス９０３から入力する（ステップＳ５０１：識別子入力処理）。属性範囲条件入力部１０１は、条件記憶部１０７により磁気ディスク装置９２０に予め記憶されている複数の属性範囲条件データの中から、識別子入力部１０８により入力された識別子に対応付けて記憶された属性範囲条件データを読み出して入力する（ステップＳ５０２：属性範囲条件入力処理）。ステップＳ５０２の後は、実施の形態１で説明した図４のフローチャートと同様に、ステップＳ１０３〜Ｓ１０７の処理が実行される。 When the user of the regular expression generation device 100 designates an identifier with the keyboard 902 or the mouse 903, the identifier input unit 108 inputs the identifier from the keyboard 902 or the mouse 903 (step S501: identifier input processing). The attribute range condition input unit 101 is an attribute stored in association with an identifier input by the identifier input unit 108 from among a plurality of attribute range condition data stored in the magnetic disk device 920 in advance by the condition storage unit 107. Range condition data is read out and input (step S502: attribute range condition input process). After step S502, similar to the flowchart of FIG. 4 described in the first embodiment, the processes of steps S103 to S107 are executed.

このように、本実施の形態において、正規表現生成装置１００は、正規表現を生成する処理のために参照可能な記憶装置１５１内の記憶領域の一部に、正規表現生成規則（即ち、属性範囲条件）を記憶するための領域を設けておき、識別番号や識別名（いずれも識別子の一例）と属性値の書式や正規表現の生成手順（いずれも属性範囲条件に含まれる情報の一例）を組にして記憶しておく。属性範囲条件として、属性の下限値と上限値は、実施の形態１と同様にその都度指定し、属性の書式だけは、それを特定する識別番号や識別名を指定するようにしてもよい。また、このとき、属性範囲条件入力部１０１が、指定された識別番号や識別名に対応する書式の情報や正規表現生成手順を記憶装置１５１から読み出し、正規表現生成部１０４や正規表現結合部１０５が、その手順に従って正規表現を生成するようにしてもよい。あるいは、属性範囲条件入力部１０１が、属性の下限値と上限値を文字列で入力するようにし、正規表現生成部１０４や正規表現結合部１０５が、入力された文字列を分析してカンマ区切りの有無などを自動的に判別するようにしてもよい。 As described above, in the present embodiment, the regular expression generation device 100 includes regular expression generation rules (that is, attribute ranges) in a part of the storage area in the storage device 151 that can be referred to for processing for generating a regular expression. Area for storing (conditions), identification number, identification name (all are examples of identifiers), attribute value format and regular expression generation procedure (all are examples of information included in attribute range conditions) Remember in pairs. As the attribute range condition, the lower limit value and the upper limit value of the attribute may be specified each time as in the first embodiment, and only the attribute format may specify an identification number or an identification name for specifying the attribute format. At this time, the attribute range condition input unit 101 reads the format information and regular expression generation procedure corresponding to the specified identification number and identification name from the storage device 151, and the regular expression generation unit 104 and the regular expression combination unit 105. However, the regular expression may be generated according to the procedure. Alternatively, the attribute range condition input unit 101 inputs a lower limit value and an upper limit value of an attribute as a character string, and the regular expression generation unit 104 and the regular expression combination unit 105 analyze the input character string and separate them with a comma. The presence / absence or the like may be automatically determined.

さらに、正規表現生成規則を記憶するための領域を書き換え可能な領域に定義しておき、条件記憶部１０７が、属性の書式を表す識別番号や識別名と、属性のデータ型、書式、正規表現の生成手順などの情報をこの領域に追加できるようにしてもよい。 Further, an area for storing the regular expression generation rule is defined as a rewritable area, and the condition storage unit 107 includes an identification number and an identification name representing an attribute format, an attribute data type, a format, and a regular expression. Information such as the generation procedure may be added to this area.

正規表現生成規則を記憶するための領域は、ディスク装置や不揮発性メモリなどの記憶装置１５１に記憶され、正規表現の生成手順を実行する処理装置１５２から読み出すことができるように構成してもよいし、実行時には高速な不揮発性メモリ上に記憶されるように構成してもよい。 The area for storing the regular expression generation rule may be configured to be stored in the storage device 151 such as a disk device or a non-volatile memory and to be read from the processing device 152 that executes the regular expression generation procedure. However, it may be configured to be stored on a high-speed nonvolatile memory at the time of execution.

実施の形態３．
本実施の形態について、主に実施の形態１との差異を説明する。 Embodiment 3 FIG.
In the present embodiment, differences from the first embodiment will be mainly described.

本実施の形態は、実施の形態１と異なる形式の属性について、正規表現を生成する処理の流れを説明するものである。本実施の形態における正規表現生成装置１００の構成は、実施の形態１で説明した図１に示したものと同じである。また、正規表現生成装置１００の動作（正規表現生成方法、正規表現生成プログラムの処理手順）は、実施の形態１で説明した図４に示したものと同じである。 In the present embodiment, a flow of processing for generating a regular expression for an attribute having a format different from that in the first embodiment will be described. The configuration of regular expression generating apparatus 100 in the present embodiment is the same as that shown in FIG. 1 described in the first embodiment. The operation of the regular expression generation device 100 (regular expression generation method, regular expression generation program processing procedure) is the same as that shown in FIG. 4 described in the first embodiment.

以下、実施の形態１と同様に、正規表現を生成する処理の流れを、例を交えて説明する。ここで説明するのは、属性範囲条件の下限値Ａと上限値Ｂに対して、Ａ≦Ｘ＜Ｂを満たす値Ｘ（下限値Ａから上限値Ｂまでの属性値Ｘ）を表現する文字列を照合するための正規表現を生成する処理の流れである。 Hereinafter, as in the first embodiment, the flow of processing for generating a regular expression will be described with an example. What is described here is a character string expressing a value X (attribute value X from lower limit value A to upper limit value B) that satisfies A ≦ X <B with respect to lower limit value A and upper limit value B of the attribute range condition. This is a flow of processing for generating a regular expression for matching.

実施の形態１とは異なり、ここでの属性値を以下のように表すものとする。
下限値Ａ：Ａ_１Ａ_２．．．Ａ_ｎ−１Ａ_ｎ
上限値Ｂ：Ｂ_１Ｂ_２．．．Ｂ_ｍ−１Ｂ_ｍ
Ａ_ｉ、Ｂ_ｊ（１≦ｉ≦ｎ、１≦ｊ≦ｍ）は、それぞれ１個の数字や文字で、その値域をｖ_１≦Ａ_ｉ、Ｂ_ｊ≦ｖ_ｑとする。本実施の形態における属性値の大小関係は、次の順序で決まるものとする。
（１）Ａ_１＝Ｂ_１，Ａ_２＝Ｂ_２，．．．，Ａ_ｋ−１＝Ｂ_ｋ−１（１≦ｋ≦ｎ又はｍ）のとき、Ｂ_ｋ＞Ａ_ｋならばＢ＞Ａ
（２）Ａ_１＝Ｂ_１，Ａ_２＝Ｂ_２，．．．，Ａ_ｎ＝Ｂ_ｎでかつｍ＞ｎならばＢ＞Ａ
Ａ_ｉ＋１と記述してＡ_ｉの値を１大きくすることとし、Ａ_ｉ＝ｖ_ｊの場合Ａ_ｉ＋１＝ｖ_ｊ＋１とする。ただし、Ａ_ｉ＝ｖ_ｑの場合はＡ_ｉ＋１＝ｖ_１とし、さらにＡ_ｉ＋１＋１とする。同様に、Ａ_ｉ−１と記述してＡ_ｉの値を１小さくすることとし、Ａ_ｉ＝ｖ_ｊの場合Ａ_ｉ−１＝ｖ_ｊ−１とする。ただし、Ａ_ｉ＝ｖ_１の場合はＡ_ｉ−１＝ｖ_ｑとし、さらにＡ_ｉ＋１−１とする。属性値Ａを１大きくすることをＡ＋１と記述し、Ａ_ｎ＋１を意味するものとする。同様に、属性値Ａを１小さくすることをＡ−１と記述し、Ａ_ｎ−１と意味するものとする。 Unlike the first embodiment, the attribute values here are expressed as follows.
Lower limit A: A ₁ A ₂ . . . A _n-1 _An
Upper limit B: B ₁ B ₂ . . . B _m-1 B _m
A _i and B _j (1 ≦ i ≦ n, 1 ≦ j ≦ m) are each a single number or letter, and their ranges are v ₁ ≦ A _i and B _j ≦ v _q . The magnitude relationship between attribute values in the present embodiment is determined in the following order.
(1) A ₁ = B ₁ , A ₂ = B ₂ ,. . . , A _k−1 = B _k−1 (1 ≦ k ≦ n or m), B> A if B _k > A _k
(2) A ₁ = B ₁ , A ₂ = B ₂ ,. . . , A _n = B _n and m> n, B> A
A _i +1 is described to increase the value of A _i by 1. When A _i = v _j , A _i + 1 = v _{j + 1} . However, when A _i = v _q , A _i + 1 = v ₁ and further A _{i + 1} +1. Similarly, A _i −1 is described, and the value of A _i is decreased by 1. When A _i = v _j , A _i −1 = v _j−1 . However, when A _i = v ₁ , A _i −1 = v _q, and further A _{i + 1} −1. Increasing the attribute value A by 1 is described as A + 1, which means A _n +1. Similarly, reducing the attribute value A by 1 is described as A-1, and is assumed to be A _n -1.

このような特徴を持つ属性値としては、文字列（辞書式順）や小数点数の小数部分（小数点以下の値）などがある。 Attribute values having such characteristics include a character string (in lexicographic order) and a decimal part of a decimal point (value after the decimal point).

ここで、下限値Ａと上限値Ｂが等しい場合は、ステップＳ１０２を実行するまでもなく以下の正規表現を出力するだけでよい。
正規表現：“Ａ_１Ａ_２．．．Ａ_ｎ−１Ａ_ｎ” Here, if the lower limit value A and the upper limit value B are equal, it is only necessary to output the following regular expression without executing step S102.
Regular expression: “A ₁ A ₂ ... A _n-1 A _n ”

ステップＳ１０２では、演算部１０２は、属性値を下位、中位、上位の３領域に分割する。このとき、まずは下限値と上限値で桁数の少ない方に合わせて考えるとよい。 In step S102, the calculation unit 102 divides the attribute value into three regions, lower, middle, and upper. At this time, it is better to consider the lower limit value and the upper limit value according to the smaller number of digits.

まず、以下のようなｒ桁の下限値Ａ´と上限値Ｂ´に対して、中位領域を求める。
下限値Ａ´：Ａ_１Ａ_２．．．Ａ_ｒ−１Ａ_ｒ
上限値Ｂ´：Ｂ_１Ｂ_２．．．Ｂ_ｒ−１Ｂ_ｒ
ここでｒはｎとｍの値の小さい方とする。中位領域を求める処理の流れを図１０に示す。図１０は、図５のステップＳ２０１を「ｉ＝１」とし、ステップＳ２０４、Ｓ２０６を「ｉ＝ｉ＋１」とし、ステップＳ２０７を「ｉ≦ｒ？」としたものである。 First, the middle region is obtained for the r-digit lower limit value A ′ and upper limit value B ′ as follows.
Lower limit value A ′: A ₁ A ₂ . . . A _r-1 A _r
Upper limit B ′: B ₁ B ₂ . . . B _r-1 B _r
Here, r is the smaller of n and m. FIG. 10 shows a flow of processing for obtaining the middle region. In FIG. 10, step S201 in FIG. 5 is set to “i = 1”, steps S204 and S206 are set to “i = i + 1”, and step S207 is set to “i ≦ r?”.

図１０において、演算部１０２は、最下位桁（１桁目）から順番に（ステップＳ６０１）属性値の下限値Ａ´と上限値Ｂ´の同じ桁の値をＣＰＵ９１１により比較していく（ステップＳ６０２）。ある桁（ｉ桁目）について、Ａ´とＢ´とで値が同じ場合、演算部１０２は、ＬとＵの同じ桁（Ｌ_ｉ、Ｕ_ｉ）もその値（Ａ_ｉ）に設定し（ステップＳ６０３）、次の桁について（ステップＳ６０４）Ａ´とＢ´を比較する。比較した桁について、Ａ´とＢ´とで値が異なる場合、演算部１０２は、Ｌの同じ桁（Ｌ_ｉ）をＡ´の値より１大きい値（Ａ_ｉ＋１）に設定するとともに、Ｕの同じ桁（Ｕ_ｉ）をＢ´の値より１小さい値（Ｂ_ｉ−１）に設定する（ステップＳ６０５）。そして、次の桁からｒ桁目までの各桁について（ステップＳ６０６、Ｓ６０７）、Ｌをその桁の最小値（ｖ_１）に設定するとともに、Ｕをその桁の最大値（ｖ_ｑ）に設定する（ステップＳ６０８）。その結果、中位領域の下限値Ｌと上限値Ｕが得られる。このとき、属性値記憶部１０３はＬとＵをＲＡＭ９１４に記憶している。 In FIG. 10, the calculation unit 102 compares the values of the same digit of the lower limit value A ′ and the upper limit value B ′ of the attribute value in order from the least significant digit (first digit) (step S601) by the CPU 911 (step S601). S602). When the value of A ′ and B ′ is the same for a certain digit (i-th digit), the arithmetic unit 102 also sets the same digit (L _i , U _i ) of L and U to the value (A _i ) ( In step S603), A ′ and B ′ are compared for the next digit (step S604). When the compared digits have different values between A ′ and B ′, the calculation unit 102 sets the same digit (L _i ) of L to a value (A _i +1) that is one greater than the value of A ′, and U The same digit (U _i ) is set to a value (B _i −1) that is one smaller than the value of B ′ (step S605). For each digit from the next digit to the r-th digit (steps S606 and S607), L is set to the minimum value (v ₁ ) of that digit, and U is set to the maximum value (v _q ) of that digit. (Step S608). As a result, the lower limit value L and the upper limit value U of the middle region are obtained. At this time, the attribute value storage unit 103 stores L and U in the RAM 914.

上記の手順に従うと、演算部１０２は、次のように下位領域（Ａ´〜Ｌ−１）、中位領域（Ｌ〜Ｕ）、上位領域（Ｕ＋１〜Ｂ´）を求めることができる。
下位領域：Ａ_１Ａ_２．．．Ａ_ｒ−１Ａ_ｒ〜Ａ_１．．．Ａ_ｋ（ｖ_ｑ）_ｋ−１．．．（ｖ_ｑ）_ｒ
中位領域：Ａ_１．．．Ａ_ｋ−１（Ａ_ｋ＋１）（ｖ_１）_ｋ＋１．．．（ｖ_１）_ｒ〜Ｂ_１．．．Ｂ_ｋ−１（Ｂ_ｋ−１）（ｖ_ｑ）_ｋ＋１．．．（ｖ_ｑ）_ｒ
上位領域：Ｂ_１．．．Ｂ_ｋ（ｖ_１）_ｋ−１．．．（ｖ_１）_ｒ〜Ｂ_１Ｂ_２．．．Ｂ_ｒ−１Ｂ_ｒ
ただし、Ａ_１＝Ｂ_１，．．．，Ａ_ｋ−１＝Ｂ_ｋ−１（１≦ｋ≦ｒ）
このとき、属性値記憶部１０３は下限値Ａ´、上限値Ｂ´、第１の値（Ｌ−１）、第２の値（Ｕ＋１）をＲＡＭ９１４に記憶している。 According to the above procedure, the calculation unit 102 can obtain the lower region (A ′ to L−1), the middle region (L to U), and the upper region (U + 1 to B ′) as follows.
Subregion: A ₁ A ₂ . . . A _r-1 A _{r to} A ₁ . . . A _k (v _q ) _k−1 . . . (V _q ) _r
Middle region: A ₁ . . . A _k-1 (A _k +1) (v ₁ ) _{k + 1} . . . (V ₁ ) _r -B ₁ . . . B _k−1 (B _k −1) (v _q ) _{k + 1} . . . (V _q ) _r
Upper region: B ₁ . . . B _k (v ₁ ) _k−1 . . . (V ₁ ) _{r to} B ₁ B ₂ . . . B _r-1 B _r
However, A ₁ = B ₁ ,. . . , A _k−1 = B _k−1 (1 ≦ k ≦ r)
At this time, the attribute value storage unit 103 stores the lower limit value A ′, the upper limit value B ′, the first value (L−1), and the second value (U + 1) in the RAM 914.

続いて、ｎ＞ｍの場合は、演算部１０２は、下位領域にｒ＋１桁目からｎ桁目までを追加する。
下位領域：Ａ_１．．．Ａ_ｒＡ_ｒ＋１．．．Ａ_ｎ〜Ａ_１．．．Ａ_ｋ（ｖ_ｑ）_ｋ−１．．．（ｖ_ｑ）_ｒ（ｖ_ｑ）_ｒ＋１．．．（ｖ_ｑ）_ｎ
一方、ｎ＜ｍの場合は、演算部１０２は、上位領域にｒ＋１桁目からｍ桁目までを追加する。
上位領域：Ｂ_１．．．Ｂ_ｋ（ｖ_１）_ｋ−１．．．（ｖ_１）_ｒ（ｖ_１）_ｒ＋１．．．（ｖ_１）_ｍ〜Ｂ_１．．．Ｂ_ｒＢ_ｒ＋１．．．Ｂ_ｍ Subsequently, when n> m, the calculation unit 102 adds the r + 1-th to n-th digits to the lower area.
Lower region: A ₁ . . . A _r A _{r + 1} . . . _{An to} A ₁ . . . A _k (v _q ) _k−1 . . . (V _q ) _r (v _q ) _{r + 1} . . . (V _q ) _n
On the other hand, when n <m, the calculation unit 102 adds the r + 1-th to m-th digits to the upper area.
Upper region: B ₁ . . . B _k (v ₁ ) _k−1 . . . (V ₁ ) _r (v ₁ ) _{r + 1} . . . (V ₁ ) _{m to} B ₁ . . . B _r B _{r + 1} . . . B _m

ステップＳ１０３において、正規表現生成部１０４は、中位領域の正規表現を生成する。中位領域の正規表現は、中位領域のＡ_ｋ、Ｂ_ｋの値にのみ依存し、以下の通り生成することができる。１桁目からｋ−１桁目までの値は、ステップＳ１０６において、共通部分として処理できるため、ここでは省略している。
中位領域の正規表現：“［（Ａ_ｋ＋１）−（Ｂ_ｋ−１）］”
ここで、Ｂ_ｋ−Ａ_ｋ＝１の場合、（Ａ_ｋ＋１）＞（Ｂ_ｋ−１）となるが、この場合は中位領域の正規表現を出力しないものとする。それ以外は同様に処理することができる。 In step S103, the regular expression generation unit 104 generates a regular expression for the middle region. The regular expression of the middle region depends only on the values of A _k and B _k of the middle region, and can be generated as follows. Since the values from the first digit to the (k−1) th digit can be processed as common parts in step S106, they are omitted here.
Regular expression in the middle region: “[(A _k +1) − (B _k −1)]”
Here, when B _k −A _k = 1, (A _k +1)> (B _k −1) is satisfied. In this case, the regular expression of the middle region is not output. Other than that, it can process similarly.

図１１は、以下のような属性値の範囲に対応する正規表現の生成処理の流れを示す。
下限値：Ａ_ｋＡ_ｋ−１．．．Ａ_ｎ
上限値：（ｖ_ｊ）_ｋ（ｖ_ｑ）_ｋ−１．．．（ｖ_ｑ）_ｎ（ただし、ｖ_ｊ≧Ａ_ｋ） FIG. 11 shows the flow of processing for generating a regular expression corresponding to the following attribute value ranges.
Lower limit: A _k A _k−1 . . . _An
Upper limit value: (v _j ) _k (v _q ) _k−1 . . . (V _q ) _n (where v _j ≧ A _k )

図１１の処理の流れは、図６に示したものとほぼ同じであり、正規表現生成部１０４は、最下位の桁から上位の桁に向かって処理を進める。ステップＳ７０１において、正規表現生成部１０４は、ＲＡＭ９１４内の正規表現の格納領域（Ｅ）に初期値として“Ａ_ｋ．．．Ａ_ｎー１［Ａ_ｎ−ｖ_ｑ］”をセットする。ステップＳ７０２以降では、下位２桁目から始めてｋ＋１桁目までステップＳ７０４〜Ｓ７０６の処理を繰り返す。ステップＳ７０３において、処理の対象がｋ桁目より下位の場合（ＹＥＳ）、ステップＳ７０４において、その桁の値がｖ_ｑか否かによって処理を切り分ける。ｖ_ｑの場合には（ＹＥＳ）、正規表現生成部１０４は特に何も出力せずに次の桁の処理に移行する。ｖ_ｑではない場合には（ＮＯ）、正規表現生成部１０４は、正規表現の出力領域（Ｅ）の末尾に“｜Ａ_ｋ．．．Ａ_ｉ−１［（Ａ_ｉ＋１）−ｖ_ｑ］”を追加し、次の桁の処理に移行する。ステップＳ７０３において、ｋ＋１桁目までの処理が完了していた場合は（ＮＯ）、ステップＳ７０７に進む。ステップＳ７０７において、Ａ_ｋとｖ_ｊが等しい場合は（ＹＥＳ）、正規表現生成部１０４は何もせずに処理を終了する。Ａ_ｋとｖ_ｊが等しくない場合は（ＮＯ）、正規表現生成部１０４はステップＳ７０８に進み、正規表現格納領域（Ｅ）の末尾に“｜［（Ａ_ｋ＋１）−ｖ_ｊ］”を追加し、処理を終了する。処理が終了した時点で正規表現の格納領域（Ｅ）に格納されているものが、上記の範囲に対応する正規表現である。 The processing flow in FIG. 11 is almost the same as that shown in FIG. 6, and the regular expression generation unit 104 advances the processing from the lowest digit to the higher digit. In step S _< b> 701, the regular expression generation unit 104 sets “A _k ... A _n−1 [A _n −v _q ]” as an initial value in the regular expression storage area (E) in the RAM 914. In step S702 and subsequent steps, the processes in steps S704 to S706 are repeated from the second least significant digit to the k + 1th digit. If it is determined in step S703 that the processing target is lower than the k-th digit (YES), the process is divided in step S704 depending on whether or not the value of the digit is v _q . In the case of v _q (YES), the regular expression generation unit 104 shifts to the next digit process without outputting anything. When it is not v _q (NO), the regular expression generation unit 104 adds “| A _k ... A _i-1 [(A _i +1) −v _q ] at the end of the output area (E) of the regular expression. ”Is added, and the process proceeds to the next digit. In step S703, if the processing up to the (k + 1) th digit has been completed (NO), the process proceeds to step S707. If A _k and v _j are equal in step S707 (YES), the regular expression generation unit 104 ends the process without doing anything. If A _k and v _j are not equal (NO), the regular expression generation unit 104 proceeds to step S708 and adds “| [(A _k +1) −v _j ]” to the end of the regular expression storage area (E). Then, the process ends. What is stored in the regular expression storage area (E) when the processing is completed is a regular expression corresponding to the above range.

ステップＳ１０４における下位領域の正規表現生成処理の流れは、図１１においてＡ_１＝Ｂ_１，．．．，Ａ_ｋ−１＝Ｂ_ｋ−１（１≦ｋ≦ｎ）を満たすｋに対して下限値と上限値を以下のようにした場合に相当する。
下限値：Ａ_ｋＡ_ｋ＋１．．．Ａ_ｎ
上限値：Ａ_ｋ（ｖ_ｑ）_ｋ＋１．．．（ｖ_ｑ）_ｎ
そして、ステップＳ１０４において、正規表現生成部１０４が生成する下位領域の正規表現は以下の通りとなる。１桁目からｋ−１桁目までの値は、ステップＳ１０６において、共通部分として処理できるため、ここでも省略している。
下位領域の正規表現：“Ａ_ｋＡ_ｋ＋１．．．Ａ_ｎ−１［Ａ_ｎ−ｖ_ｑ］｜Ａ_ｋＡ_ｋ＋１．．．Ａ_ｎ−２［（Ａ_ｎ−１＋１）−ｖ_ｑ］｜．．．｜Ａ_ｋ［（Ａ_ｋ＋１＋１）−ｖ_ｑ］” _A 1 _{= B} 1, the flow of regular expression generation processing of the lower region, in FIG. 11 in step S104. . . , A _k−1 = B _k−1 (1 ≦ k ≦ n), this corresponds to the case where the lower limit value and the upper limit value are set as follows for k.
Lower limit: A _k A _{k + 1} . . . _An
Upper limit value: A _k (v _q ) _{k + 1} . . . (V _q ) _n
In step S104, the regular expression of the lower region generated by the regular expression generation unit 104 is as follows. Since the values from the first digit to the (k−1) th digit can be processed as common parts in step S106, they are also omitted here.
Regular expression of the _{_{_{_{sub-regions: "A k A k + 1}}}} ... A n-1 [A n -v q] | A k A k + 1 ... A n-2 [(A n-1 +1) -v q] | ... | A _k [(A _{k + 1} +1) −v _q ] ”

図１２は、以下のような属性値の範囲に対応する正規表現の生成処理の流れを示す。
下限値：（ｖ_ｊ）_ｋ（ｖ_１）_ｋ＋１．．．（ｖ_１）_ｍ（ただし、ｖ_ｊ≦Ｂ_ｋ）
上限値：Ｂ_ｋＢ_ｋ＋１．．．Ｂ_ｍ FIG. 12 shows the flow of processing for generating a regular expression corresponding to the following attribute value ranges.
Lower limit value: (v _j ) _k (v ₁ ) _{k + 1} . . . (V ₁ ) _m (where v _j ≦ B _k )
Upper limit value: B _k B _{k + 1} . . . B _m

図１２の処理の流れは、図７に示したものとほぼ同じであり、正規表現生成部１０４は、最下位の桁から上位の桁に向かって処理を進める。ステップＳ８０１において、正規表現生成部１０４は、ＲＡＭ９１４内の正規表現の格納領域（Ｅ）に初期値として“Ｂ_ｋ．．．Ｂ_ｍ−１［ｖ_１−（Ｂ_ｍ−１）］”をセットする。ステップＳ８０２以降では、下位２桁目から始めてｋ＋１桁目までステップＳ８０４〜Ｓ８０６の処理を繰り返す。ステップＳ８０３において、処理の対象がｋ桁目より下位の場合（ＹＥＳ）、ステップＳ８０４において、その桁の値がｖ_１か否かによって処理を切り分ける。ｖ_１の場合には（ＹＥＳ）、正規表現生成部１０４は特に何も出力せずに次の桁の処理に移行する。ｖ_１ではない場合には（ＮＯ）、正規表現生成部１０４は、正規表現の出力領域（Ｅ）の末尾に“｜Ｂ_ｋ．．．Ｂ_ｉ＋１［ｖ_１−（Ｂ_ｉ−１）］”を追加し、次の桁の処理に移行する。ステップＳ８０３において、ｋ＋１桁目までの処理が完了していた場合は（ＮＯ）、ステップＳ８０７に進む。ステップＳ８０７において、Ｂ_ｋとｖ_ｊが等しい場合は（ＹＥＳ）、正規表現生成部１０４は何もせずに処理を終了する。Ｂ_ｋとｖ_ｊが等しくない場合は（ＮＯ）、正規表現生成部１０４はステップＳ８０８に進み、正規表現格納領域（Ｅ）の末尾に“｜［ｖ_ｊ−（Ｂ_ｋ−１）］”を追加し、処理を終了する。処理が終了した時点で正規表現の格納領域（Ｅ）に格納されているものが、上記の範囲に対応する正規表現である。 The processing flow in FIG. 12 is almost the same as that shown in FIG. 7, and the regular expression generation unit 104 advances the processing from the lowest digit to the higher digit. In step S801, the regular expression generation unit 104 sets “B _k ... B _m−1 [v _1- (B _m −1)]” as an initial value in the regular expression storage area (E) in the RAM 914. To do. In step S802 and subsequent steps, the processes in steps S804 to S806 are repeated from the second least significant digit to the k + 1th digit. In step S803, if the processing of the target is lower than the k-th digit (YES), in step S804, the carving process depending on whether the value of that digit is _{v 1} or. In the case of v ₁ (YES), the regular expression generation unit 104 shifts to the next digit process without outputting anything. When it is not v ₁ (NO), the regular expression generation unit 104 adds “| B _k ... B _{i + 1} [v ₁ − (B _i −1)]” at the end of the regular expression output area (E). To move to the next digit processing. In step S803, if the processing up to the (k + 1) th digit has been completed (NO), the process proceeds to step S807. In step S807, when B _k and v _j are equal (YES), the regular expression generation unit 104 ends the process without doing anything. If B _k and v _j are not equal (NO), the regular expression generation unit 104 proceeds to step S808 and adds “| [v _j − (B _k −1)]” to the end of the regular expression storage area (E). Add and finish the process. What is stored in the regular expression storage area (E) when the processing is completed is a regular expression corresponding to the above range.

ステップＳ１０５における上位領域の正規表現生成処理の流れは、図１２においてＡ_１＝Ｂ_１，．．．，Ａ_ｋー１＝Ｂ_ｋー１（１≦ｋ≦ｍ）を満たすｋに対して下限値と上限値を以下のようにした場合に相当する。
下限値：Ｂ_ｋ（ｖ_１）_ｋ＋１．．．（ｖ_１）_ｍ
上限値：Ｂ_ｋＢ_ｋ＋１．．．Ｂ_ｍ
そして、ステップＳ１０５において、正規表現生成部１０４が生成する上位領域の正規表現は以下の通りとなる。１桁目からｋ−１桁目までの値は、ステップＳ１０６において、共通部分として処理できるため、ここでも省略している。
上位領域の正規表現：“Ｂ_ｋＢ_ｋ＋１．．．Ｂ_ｍ−１［ｖ_１−（Ｂ_ｍ−１）］｜Ｂ_ｋＢ_ｋ＋１．．．Ｂ_ｍ−２［ｖ_１−（Ｂ_ｍ−１−１）］｜．．．｜Ｂ_ｋ［ｖ_１−（Ｂ_ｋ＋１−１）］” The flow of regular expression generation processing for the upper region in step S105 is as follows: A ₁ = B ₁ ,. . . , A _k−1 = B _k−1 (1 ≦ k ≦ m), this corresponds to the case where the lower limit value and the upper limit value are set as follows for k.
Lower limit: B _k (v ₁ ) _{k + 1} . . . (V ₁ ) _m
Upper limit value: B _k B _{k + 1} . . . B _m
In step S105, the regular expression of the upper region generated by the regular expression generation unit 104 is as follows. Since the values from the first digit to the (k−1) th digit can be processed as common parts in step S106, they are also omitted here.
Regular expression of upper region: “B _k B _{k + 1} ... B _m−1 [v ₁ − (B _m −1)] | B _k B _{k + 1} ... B _m−2 [v ₁ − (B _m−1 _{_{-1)] | ... | B k}} [v 1 - (B k + 1 -1)] "

ステップＳ１０６において、正規表現結合部１０５は、ステップＳ１０３〜Ｓ１０５で個別に生成された下位、中位、上位領域の正規表現を結合し、結合した正規表現に、ステップＳ１０３〜Ｓ１０５で省略されていた共通部分（１桁目からｋ−１桁目まで）を追加して、属性範囲条件に対応する正規表現を生成する。この正規表現は、属性範囲条件の下限値と上限値の共通部分と、下位、中位、上位領域の正規表現から、以下のような形となる。
“Ａ_１．．．Ａ_ｋ−１（＜下位領域の正規表現＞｜＜中位領域の正規表現＞｜＜上位領域の正規表現＞）”
また、中位領域がない場合に生成される正規表現は以下のような形式となる。
“Ａ_１．．．Ａ_ｋ−１（＜下位領域の正規表現＞｜＜上位領域の正規表現＞）” In step S106, the regular expression combining unit 105 combines the regular expressions of the lower, middle, and upper regions generated individually in steps S103 to S105, and the combined regular expressions are omitted in steps S103 to S105. A common part (from the first digit to the (k−1) th digit) is added to generate a regular expression corresponding to the attribute range condition. This regular expression has the following form from the common part of the lower limit value and upper limit value of the attribute range condition and the regular expressions of the lower, middle, and upper areas.
“A ₁ ... A _k-1 (<Regular expression of lower region> | <Regular expression of middle region> | <Regular expression of upper region>)”
In addition, the regular expression generated when there is no middle region has the following format.
“A ₁ ... A _k-1 (<Regular expression in lower region> | <Regular expression in upper region>)”

このような正規表現では、正規表現に記述された１又はｎ又はｍ文字まで文字列を照合した時点で、文字の照合を停止してしまう。よって、それよりも長い文字列については、ヒットはするものの、その文字列の末尾の位置まではわからないことがある。そこで、正規表現にヒットする文字列の終端の位置まで知りたい場合は、正規表現の末尾に“［ｖ_１−ｖ_ｑ］＊”を追加すればよい。 In such a regular expression, character collation is stopped when a character string is collated up to 1 or n or m characters described in the regular expression. Therefore, a character string longer than that may hit, but may not know the position at the end of the character string. Therefore, when it is desired to know the position of the end of the character string hitting the regular expression, “[v ₁ −v _q ] *” may be added to the end of the regular expression.

また、上記の正規表現では、Ａ≦Ｘ＜Ｂを満たす値Ｘだけでなく、値ｖ_ｉＸもヒットしてしまう。そこで、このようなヒットを避けたい場合には、正規表現の先頭に除外文字指定“［＾ｖ_１−ｖ_ｑ］”を追加するとよい。 In the regular expression, not only the value X satisfying A ≦ X <B but also the value v _i X is hit. Therefore, in order to avoid such a hit, an exclusion character designation “[^ v ₁ −v _q ]” may be added to the head of the regular expression.

上記の方式を英小文字の文字列に適用した場合（全角英小文字又は半角英小文字いずれか１種類の場合）の具体例を示す。ここで、ｖ_１＝ａ、ｖ_ｑ＝ｚである。 A specific example in the case where the above method is applied to a lowercase character string (in the case of either one type of lowercase alphabet or lowercase alphabet) is shown. Here, v ₁ = a and v _q = z.

（例３−１）属性範囲条件：「ｅｎｄ」〜「ｓｔａｒｔ」
下位領域：「ｅｎｄ」〜「ｅｚｚ」
中位領域：「ｆａａ」〜「ｒｚｚ」
上位領域：「ｓａａａａ」〜「ｓｔａｒｔ」
正規表現：“（ｅｎ［ｄ−ｚ］｜ｅ［ｏ−ｚ］）｜［ｆ−ｒ］｜（ｓｔａｒ［ａ−ｓ］｜ｓｔａ［ａ−ｑ］｜ｓ［ａ−ｓ］）” (Example 3-1) Attribute range condition: “end” to “start”
Lower region: “end” to “ezz”
Middle region: “faa” to “rzz”
Upper area: “saaa” to “start”
Regular expression: “(en [d−z] | e [o−z]) | [fr] | (star [a−s] | sta [a−q] | s [a−s])”

また、上記の方式を０＜Ａ、Ｂ≦１となる下限値Ａと上限値Ｂに適用した場合の具体例を示す。 A specific example in which the above method is applied to the lower limit value A and the upper limit value B satisfying 0 <A and B ≦ 1 is shown.

（例３−２）属性範囲条件：０．００３２１〜０．８７６
下位領域：０．００３２１〜０．０９９９９
中位領域：０．１００〜０．７９９
上位領域：０．８００〜０．８７６
正規表現：“０．（（００３２［１−９］｜００３［３−９］｜００［４−９］｜０［１−９］）｜［１−７］｜（８７［０−５］｜８［０−６］））” (Example 3-2) Attribute range condition: 0.00321 to 0.876
Lower region: 0.00321 to 0.09999
Middle region: 0.100 to 0.799
Upper area: 0.800 to 0.876
Regular expression: “0. ((0032 [1-9] | 003 [3-9] | 00 [4-9] | 0 [1-9]) | [1-7] | (87 [0-5] | 8 [0-6])) "

まず、正規表現結合部１０５は、下限値Ａと同じ桁数の属性値の最大値について正規表現を生成する。即ち、以下のようなｎ桁の下限値と上限値について正規表現を生成する。
下限値Ａ：Ａ_１．．．Ａ_ｎ
上限値Ｂ´：（ｖ_ｑ）_１．．．（ｖ_ｑ）_ｎ
その上で、正規表現結合部１０５は、（ｖ_ｑ）_１．．．（ｖ_ｑ）_ｎ．．．の正規表現“ｖ_ｑ｛ｎ｝［ｖ_１−ｖ_ｑ］＊［ｖ_２−ｖ_ｑ］”を追加する。生成される正規表現は以下の形式になる。
正規表現：“＜Ａ〜Ｂ´の範囲の正規表現＞｜ｖ_ｑ｛ｎ｝［ｖ_１−ｖ_ｑ］＊［ｖ_２−ｖ_ｑ］” First, the regular expression combining unit 105 generates a regular expression for the maximum value of attribute values having the same number of digits as the lower limit value A. That is, a regular expression is generated for the following n-digit lower and upper limit values.
Lower limit A: A ₁ . . . _An
Upper limit B ′: (v _q ) ₁ . . . (V _q ) _n
In addition, the regular expression combining unit 105 performs (v _q ) ₁ . . . (V _q ) _n . . . The regular expression “v _q {n} [v ₁ −v _q ] * [v ₂ −v _q ]” is added. The generated regular expression is in the following format.
Regular expression: “<Regular expression in the range of A to B ′> | v _q {n} [v ₁ −v _q ] * [v ₂ −v _q ]”

下限値Ａの指定がない場合や下限値が「無限小」と指定された場合について説明する。まず、属性のデータ型として最小値Ａ´があるならば、正規表現結合部１０５は下限値Ａ´と上限値Ｂについて正規表現を生成するようにすればよい。それ以外の場合は、以下のような下限値Ａ´と上限値Ｂについて正規表現を生成すればよい。
下限値Ａ´：（ｖ_１）_１
上限値Ｂ：Ｂ_１．．．Ｂ_ｍ A case where the lower limit value A is not specified or a case where the lower limit value is specified as “infinitesimal” will be described. First, if there is a minimum value A ′ as an attribute data type, the regular expression combining unit 105 may generate a regular expression for the lower limit value A ′ and the upper limit value B. In other cases, a regular expression may be generated for the lower limit A ′ and the upper limit B as follows.
Lower limit value A ′: (v ₁ ) ₁
Upper limit B: B ₁ . . . B _m

英字に大文字と小文字が、文字に全角文字と半角文字があるように、属性の各桁の値域が２種類以上存在する場合がある。その場合は、実施の形態１の属性と同様に、選択「｜」や文字クラスを使用して並べて記述することにより、対応する正規表現を生成することができる。 There may be two or more value ranges for each digit of the attribute such that uppercase and lowercase letters are used for letters and full-width and half-width letters are used for letters. In that case, as in the attribute of the first embodiment, the corresponding regular expression can be generated by using the selection “|” and the character class to describe them side by side.

以上のように、本実施の形態に係る正規表現生成装置１００によれば、文字列のような特徴を持つ属性値の下限値、上限値、書式などを指定した属性範囲条件から、その下限値から上限値までの範囲に含まれる属性値を表現する文字列を照合するための正規表現を自動的に生成することができる。これにより、従来は正確に記述することが困難であった複雑な属性範囲条件を照合する正規表現を、特別な知識も試行錯誤も必要なく、容易に短時間で得ることができる。 As described above, according to the regular expression generating apparatus 100 according to the present embodiment, the lower limit value is determined from the attribute range condition that specifies the lower limit value, upper limit value, format, etc. of the attribute value having characteristics such as a character string. A regular expression for collating a character string representing an attribute value included in the range from to the upper limit value can be automatically generated. As a result, a regular expression for matching complicated attribute range conditions, which has been difficult to describe accurately in the past, can be easily obtained in a short time without requiring special knowledge or trial and error.

実施の形態４．
本実施の形態について、主に実施の形態１との差異を説明する。 Embodiment 4 FIG.
In the present embodiment, differences from the first embodiment will be mainly described.

本実施の形態は、複数の属性値と、属性値間を区切る区切り文字とから構成される階層構造を持つ属性について、その正規表現を生成する処理の流れを説明するものである。本実施の形態における正規表現生成装置１００の構成は、実施の形態１で説明した図１に示したものと同じである。また、正規表現生成装置１００の動作（正規表現生成方法、正規表現生成プログラムの処理手順）は、実施の形態１で説明した図４に示したものと同じである。上記のような階層構造を持つ形式の属性の正規表現を生成するため、正規表現生成装置１００は、階層構造における最上位の階層から順に下位の階層に向かって、実施の形態１や３と同様の手順を適用することにより、正規表現を生成していく。 In the present embodiment, the flow of processing for generating a regular expression for an attribute having a hierarchical structure composed of a plurality of attribute values and delimiters that separate attribute values will be described. The configuration of regular expression generating apparatus 100 in the present embodiment is the same as that shown in FIG. 1 described in the first embodiment. The operation of the regular expression generation device 100 (regular expression generation method, regular expression generation program processing procedure) is the same as that shown in FIG. 4 described in the first embodiment. In order to generate the regular expression of the attribute having the hierarchical structure as described above, the regular expression generating apparatus 100 is the same as in the first and third embodiments from the highest hierarchy in the hierarchical structure toward the lower hierarchy. The regular expression is generated by applying the above procedure.

本実施の形態では、正規表現生成装置１００の利用者は属性の上限値、下限値、区切り文字、階層の順序、各階層の属性の形式、値域などを属性範囲条件データとして指定する。そして、ステップＳ１０１では、属性範囲条件入力部１０１が、その属性範囲条件データを入力する。つまり、属性範囲条件入力部１０１は、属性値の書式として、属性値が区切り文字を用いた階層構造をとる属性値であり、属性値全体を区切り文字で区切った部分の各々がその階層構造の１階層であることを示す属性範囲条件データを入力する。このような階層構造を持つ属性値としては、日付、時刻、ＩＰ（Ｉｎｔｅｒｎｅｔ・Ｐｒｏｔｏｃｏｌ）アドレス、小数点数などがある。例えば日付は、最上位から年、月、日という３つの階層からなる階層構造をとるものであり、よく使用される区切り文字としてはスラッシュ「／」がある。また、属性値が日付の場合、各階層はいずれも正の整数値で、月の階層の値域は１〜１２、日の階層の値域は１〜３１（又は、２８、２９、３０）である。例えば、属性値が日付や時刻であれば、属性範囲条件入力部１０１は、属性値の書式として、属性値が日付や時刻であることを示す属性範囲条件データを入力することとなる。また、例えば、属性値がＩＰアドレスであれば、属性範囲条件入力部１０１は、属性値の書式として、属性値がＩＰアドレスであることを示す属性範囲条件データを入力することとなる。 In the present embodiment, the user of the regular expression generation apparatus 100 specifies an attribute upper limit value, lower limit value, delimiter, hierarchy order, attribute format of each hierarchy, value range, and the like as attribute range condition data. In step S101, the attribute range condition input unit 101 inputs the attribute range condition data. That is, the attribute range condition input unit 101 is an attribute value having a hierarchical structure in which the attribute value uses a delimiter as the format of the attribute value, and each of the portions where the entire attribute value is delimited by the delimiter The attribute range condition data indicating that it is one layer is input. Attribute values having such a hierarchical structure include date, time, IP (Internet / Protocol) address, decimal number, and the like. For example, the date has a hierarchical structure consisting of three layers of year, month and day from the top, and a slash “/” is often used as a delimiter. Further, when the attribute value is a date, each layer is a positive integer value, the range of the month layer is 1 to 12, and the range of the day layer is 1 to 31 (or 28, 29, 30). . For example, if the attribute value is date or time, the attribute range condition input unit 101 inputs attribute range condition data indicating that the attribute value is date or time as the format of the attribute value. For example, if the attribute value is an IP address, the attribute range condition input unit 101 inputs attribute range condition data indicating that the attribute value is an IP address as the format of the attribute value.

ステップＳ１０２において、演算部１０２は、属性値全体について、各階層を１桁として、第１の値と第２の値とを演算する。 In step S <b> 102, the calculation unit 102 calculates the first value and the second value with respect to the entire attribute value with each layer as one digit.

ステップＳ１０３において、正規表現生成部１０４は、第１の値と第２の値との間に属性値が存在する場合には、属性値全体について、各階層を１桁として、中位領域データを生成する。その後さらに、その中位領域データを各階層を正規表現で表すものにＣＰＵ９１１で変換する。また、ステップＳ１０４において、正規表現生成部１０４は、属性値全体について、各階層を１桁として、下位領域データを生成する。その後に、その下位領域データを各階層を正規表現で表すものにＣＰＵ９１１で変換する。同様に、ステップＳ１０５において、正規表現生成部１０４は、属性値全体について、各階層を１桁として、上位領域データを生成する。その後に、その上位領域データを各階層を正規表現で表すものにＣＰＵ９１１で変換する。 In step S103, when there is an attribute value between the first value and the second value, the regular expression generation unit 104 sets the middle region data to the entire attribute value with each layer as one digit. Generate. Thereafter, the CPU 911 further converts the middle region data into data representing each layer in a regular expression. Further, in step S104, the regular expression generation unit 104 generates lower area data for each attribute value with each layer as one digit. After that, the CPU 911 converts the lower area data into data representing each hierarchy by a regular expression. Similarly, in step S105, the regular expression generation unit 104 generates upper area data for each attribute value, with each layer as one digit. After that, the CPU 911 converts the upper area data into data representing each hierarchy with a regular expression.

このように、本実施の形態において、正規表現生成装置１００は、複数の値と区切り文字とから構成される階層を持った属性値の上限値と下限値とを指定する属性範囲条件を正規表現に変換する検索条件生成方式、又は、この方式を計算機上で実行するための検索条件生成プログラムを実装するものである。この方式では、最上位の階層から最下位の階層に向かって順番に、それぞれの値を実施の形態１や３と同様の手順に従って正規表現に変換する。 As described above, in the present embodiment, the regular expression generation device 100 uses the regular expression to express the attribute range condition that specifies the upper limit value and lower limit value of the attribute value having a hierarchy composed of a plurality of values and delimiters. Or a search condition generation program for executing this method on a computer. In this method, each value is converted into a regular expression according to the same procedure as in the first and third embodiments in order from the highest hierarchy to the lowest hierarchy.

上記検索条件生成方式では、例えば、日付の属性範囲条件を正規表現に変換する。また、例えば、時刻の属性範囲条件を正規表現に変換する。また、例えば、ＩＰアドレスの属性範囲条件を正規表現に変換する。 In the search condition generation method, for example, the date attribute range condition is converted into a regular expression. Also, for example, the time attribute range condition is converted into a regular expression. For example, the attribute range condition of the IP address is converted into a regular expression.

以下、実施の形態１や３と同様に、正規表現を生成する処理の流れを、例を交えて説明する。ここで説明するのは、属性範囲条件の下限値Ａと上限値Ｂに対して、Ａ≦Ｘ≦Ｂを満たす値Ｘ（下限値Ａから上限値Ｂまでの属性値Ｘ）の文字列表現を受理する正規表現の生成処理の流れである。ここでは、区切り文字を＜ｄ_１＞＜ｄ_２＞．．．として、属性値を次のように表すものとする。
下限値Ａ：Ａ_１＜ｄ_１＞Ａ_２＜ｄ_２＞．．．＜ｄ_ｎ−１＞Ａ_ｎ
上限値Ｂ：Ｂ_１＜ｄ_１＞Ｂ_２＜ｄ_２＞．．．＜ｄ_ｎ−１＞Ｂ_ｎ
ここで、Ａ_ｉ、Ｂ_ｉ（１≦ｉ≦ｎ）は各階層の１桁以上の属性値を表し、その値域はＶ_ｉ ^ｍｉｎ≦Ａ_ｉ、Ｂ_ｉ≦Ｖ_ｉ ^ｍａｘとする。 Hereinafter, similarly to Embodiments 1 and 3, the flow of processing for generating a regular expression will be described with an example. What is described here is a character string representation of a value X that satisfies A ≦ X ≦ B (attribute value X from the lower limit value A to the upper limit value B) with respect to the lower limit value A and the upper limit value B of the attribute range condition. This is a flow of processing for generating a regular expression to be accepted. Here, the delimiters are <d ₁ ><d ₂ >. . . Assuming that attribute values are expressed as follows:
Lower limit A: A ₁ <d ₁ > A ₂ <d ₂ >. . . <D _n-1 > A _n
Upper limit B: B ₁ <d ₁ > B ₂ <d ₂ >. . . <D _n-1 > B _n
Here, A _i and B _i (1 ≦ i ≦ n) represent attribute values of one or more digits in each layer, and their value ranges are V _i ^min ≦ A _i and B _i ≦ V _i ^max .

ステップＳ１０２では、演算部１０２は、各階層を１桁の値と見なして全体を下位、中位、上位の各領域に分割する。このときの分割手順は、図５又は図１０に示したのと同様の手順によって行う。即ち、属性範囲条件Ａ〜Ｂを以下のように分割する。階層構造のある属性では、第２階層以下（即ち、最上位層以外の階層）の属性値が実施の形態１の形式の属性値の場合、Ｖ_ｉ ^ｍｉｎ、Ｖ_ｉ ^ｍａｘが定義されていると考えられる。それ以外の場合には、Ａ_ｉと桁数が同じ値の範囲で、最小値、最大値を考えればよい。
下位領域：Ａ_１＜ｄ_１＞Ａ_２＜ｄ_２＞．．．＜ｄ_ｎ−１＞Ａ_ｎ〜Ａ_１＜ｄ_１＞Ｖ_２ ^ｍａｘ＜ｄ_２＞．．．＜ｄ_ｎ−１＞Ｖ_ｎ ^ｍａｘ
中位領域：（Ａ_１＋１）＜ｄ_１＞Ｖ_２ ^ｍｉｎ＜ｄ_２＞．．．＜ｄ_ｎ−１＞Ｖ_ｎ ^ｍｉｎ〜（Ｂ_１−１）＜ｄ_１＞Ｖ_２ ^ｍａｘ＜ｄ_２＞．．．＜ｄ_ｎ−１＞Ｖ_ｎ ^ｍａｘ
上位領域：Ｂ_１＜ｄ_１＞Ｖ_２ ^ｍｉｎ＜ｄ_２＞．．．＜ｄ_ｎ−１＞Ｖ_ｎ ^ｍｉｎ〜Ｂ_１＜ｄ_１＞Ｂ_２＜ｄ_２＞．．．＜ｄ_ｎ−１＞Ｂ_ｎ In step S102, the calculation unit 102 regards each layer as a one-digit value and divides the whole into lower, middle, and upper regions. The division procedure at this time is performed by the same procedure as shown in FIG. That is, the attribute range conditions A to B are divided as follows. In an attribute having a hierarchical structure, if the attribute values of the second and lower layers (that is, the layers other than the highest layer) are attribute values in the format of the first embodiment, V _i ^min and V _i ^max are defined. Conceivable. In other cases, the minimum value and the maximum value may be considered within the range of the same number of digits as A _i .
Lower region: A ₁ <d ₁ > A ₂ <d ₂ >. . . <D _n-1 > A _{n to} A ₁ <d ₁ > V ₂ ^max <d ₂ >. . . <D _n-1 > V _n ^max
Middle region: (A ₁ +1) <d ₁ > V ₂ ^min <d ₂ >. . . <D _n-1 > V _n ^min ~ (B ₁ -1) <d ₁ > V ₂ ^max <d ₂ >. . . <D _n-1 > V _n ^max
Upper region: B ₁ <d ₁ > V ₂ ^min <d ₂ >. . . <D _n-1 > V _n ^{min to} B ₁ <d ₁ > B ₂ <d ₂ >. . . <D _n-1 > B _n

ステップＳ１０３において、正規表現生成部１０４は、中位領域について、階層ごとに正規表現を生成することにより、そのまま正規表現を生成することができる。
中位領域の正規表現：“＜（Ａ_１＋１）〜（Ｂ_１−１）の正規表現＞＜ｄ_１＞＜Ｖ_２ ^ｍｉｎ〜Ｖ_２ ^ｍａｘの正規表現＞＜ｄ_２＞．．．＜ｄ_ｎ−１＞＜Ｖ_ｎ ^ｍｉｎ〜Ｖ_ｎ ^ｍａｘの正規表現＞”
正規表現生成部１０４は、各階層の正規表現の生成方法として、それぞれの属性値の形式に従って、実施の形態１や３に示したものと同様の手順を用いることができる。つまり、正規表現生成部１０４は、各階層について、演算部１０２が行う処理と同様に、下位、中位、上位領域に値の範囲を分割し、各領域に関して正規表現を生成し、正規表現結合部１０５が行う処理と同様に、生成した正規表現を以下の形式で書き出す。
正規表現：“（＜ｉ階層の下位領域の正規表現＞｜＜ｉ階層の中位領域の正規表現＞｜＜ｉ階層の上位領域の正規表現＞）”
これにより、正規表現生成部１０４は、各階層を１桁として扱っていた中位領域の正規表現を、各階層もそれぞれに対応する正規表現を含んだ正規表現に変換することができる。 In step S <b> 103, the regular expression generation unit 104 can generate a regular expression as it is by generating a regular expression for each hierarchy in the middle region.
Regular expression in the middle region: “<Regular expression of (A ₁ +1) to (B ₁ −1)><Regular expression of d ₁ ><V ₂ ^{min to} V ₂ ^max ><d ₂ > ... <d _n-1 ><regular expression of V _n ^{min to} V _n ^max > ”
The regular expression generation unit 104 can use a procedure similar to that shown in the first and third embodiments according to the format of each attribute value as a method for generating a regular expression for each layer. That is, the regular expression generation unit 104 divides the range of values into lower, middle, and upper regions for each layer, similarly to the processing performed by the calculation unit 102, generates a regular expression for each region, and combines the regular expressions. Similar to the processing performed by the unit 105, the generated regular expression is written in the following format.
Regular expression: “(<regular expression in the lower area of the i hierarchy> || <regular expression in the middle area of the i hierarchy> || regular expression in the upper area of the i hierarchy>)”
As a result, the regular expression generation unit 104 can convert the regular expression in the middle region, which treats each hierarchy as one digit, into a regular expression including a regular expression corresponding to each hierarchy.

下位領域に関しては、ステップＳ１０４において、正規表現生成部１０４が、第２階層以下を、上記の手順と同様にして下位、中位（下位、上位と考えることもできる）の各領域に分割する。上位領域がない（中位領域がないと考えることもできる）のは、元の下位領域の上限値の第２階層の値が、その属性の値域の最大値Ｖ_２ ^ｍａｘになっているためである。これにより、下位領域は、さらに下位領域と中位領域に分割できる。
下位領域：Ａ_１＜ｄ_１＞Ａ_２＜ｄ_２＞．．．＜ｄ_ｎ−１＞Ａ_ｎ〜Ａ_１＜ｄ_１＞Ａ_２＜ｄ_２＞Ｖ_３ ^ｍａｘ＜ｄ_３＞．．．＜ｄ_ｎ−１＞Ｖ_ｎ ^ｍａｘ
中位領域：Ａ_１＜ｄ_１＞（Ａ_２＋１）＜ｄ_２＞Ｖ_３ ^ｍｉｎ＜ｄ_３＞．．．＜ｄ_ｎ−１＞Ｖ_ｎ ^ｍｉｎ〜Ａ_１＜ｄ_１＞Ｖ_２ ^ｍａｘ＜ｄ_２＞．．．＜ｄ_ｎ−１＞Ｖ_ｎ ^ｍａｘ
続いて、正規表現生成部１０４は、ステップＳ１０３と同様の手順で中位領域の正規表現を生成するとともに、下位領域の次の階層以下をさらに分割する。このようにして、下位、中位領域への分割と中位領域の正規表現の生成とを最下位の階層まで繰り返すことにより、正規表現生成部１０４は、各階層を１桁として扱っていた下位領域の正規表現を、各階層もそれぞれに対応する正規表現を含んだ正規表現に変換することができる。 Regarding the lower region, in step S104, the regular expression generation unit 104 divides the second and lower layers into lower and middle regions (which can be considered as lower and upper) in the same manner as described above. The reason why there is no upper region (it can be considered that there is no middle region) is because the value of the second layer of the upper limit value of the original lower region is the maximum value V ₂ ^max of the attribute value range. is there. Thereby, the lower region can be further divided into a lower region and a middle region.
Lower region: A ₁ <d ₁ > A ₂ <d ₂ >. . . <D _n-1 > A _{n to} A ₁ <d ₁ > A ₂ <d ₂ > V ₃ ^max <d ₃ >. . . <D _n-1 > V _n ^max
Middle region: A ₁ <d ₁ > (A ₂ +1) <d ₂ > V ₃ ^min <d ₃ >. . . <D _n-1 > V _n ^{min to} A ₁ <d ₁ > V ₂ ^max <d ₂ >. . . <D _n-1 > V _n ^max
Subsequently, the regular expression generation unit 104 generates a regular expression in the middle region in the same procedure as in step S103, and further divides the next layer and lower in the lower region. In this way, by repeating the division into the lower and middle regions and the generation of the regular expression in the middle region up to the lowest layer, the regular expression generation unit 104 has treated each layer as one digit. A regular expression of a region can be converted into a regular expression including a regular expression corresponding to each layer.

上位領域に関しても、ステップＳ１０５において、正規表現生成部１０４が、同様に第２階層以下を中位、上位（下位、上位と考えることもできる）の各領域に分割する。下位領域がない（中位領域がないと考えることもできる）のは、元の上位領域の下限値の第２階層の値が、その属性の値域の最小値Ｖ_２ ^ｍｉｎになっているためである。これにより、上位領域は、さらに中位領域と上位領域に分割できる。
中位領域：Ｂ_１＜ｄ_１＞Ｖ_２ ^ｍｉｎ＜ｄ_２＞．．．＜ｄ_ｎ−１＞Ｖ_ｎ ^ｍｉｎ〜Ｂ_１＜ｄ_１＞（Ｂ_２−１）＜ｄ_２＞Ｖ_３ ^ｍａｘ＜ｄ_３＞．．．＜ｄ_ｎ−１＞Ｖ_ｎ ^ｍａｘ
上位領域：Ｂ_１＜ｄ_１＞Ｂ_２＜ｄ_２＞Ｖ_３ ^ｍｉｎ＜ｄ_３＞．．．＜ｄ_ｎ−１＞Ｖ_ｎ ^ｍｉｎ〜Ｂ_１＜ｄ_１＞Ｂ_２＜ｄ_２＞．．．＜ｄ_ｎ−１＞Ｂ_ｎ
続いて、正規表現生成部１０４は、ステップＳ１０３と同様の手順で中位領域の正規表現を生成するとともに、上位領域の次の階層以下を同様に分割する。このようにして、中位、上位領域への分割と中位領域の正規表現の生成とを最下位の階層まで繰り返すことにより、正規表現生成部１０４は、各階層を１桁として扱っていた上位領域の正規表現を、各階層もそれぞれに対応する正規表現を含んだ正規表現に変換することができる。 Regarding the upper area, in step S105, the regular expression generating unit 104 similarly divides the second hierarchy and lower into the middle and upper areas (which can be considered as the lower and upper areas). The reason why there is no lower region (it can be considered that there is no middle region) is because the value of the second layer of the lower limit value of the original upper region is the minimum value V ₂ ^min of the value range of the attribute. is there. Thus, the upper area can be further divided into a middle area and an upper area.
Middle region: B ₁ <d ₁ > V ₂ ^min <d ₂ >. . . <D _n-1 > V _n ^{min to} B ₁ <d ₁ > (B ₂ -1) <d ₂ > V ₃ ^max <d ₃ >. . . <D _n-1 > V _n ^max
Upper region: B ₁ <d ₁ > B ₂ <d ₂ > V ₃ ^min <d ₃ >. . . <D _n-1 > V _n ^{min to} B ₁ <d ₁ > B ₂ <d ₂ >. . . <D _n-1 > B _n
Subsequently, the regular expression generation unit 104 generates a regular expression for the middle region in the same procedure as in step S103, and similarly divides the next layer and lower in the upper region. In this way, by repeating the division into the middle and upper regions and the generation of the regular expression in the middle region up to the lowest layer, the regular expression generation unit 104 has treated each layer as one digit. A regular expression of a region can be converted into a regular expression including a regular expression corresponding to each layer.

ステップＳ１０４及びＳ１０５の正規表現生成処理において、第ｋ階層以下（２≦ｋ≦ｎ）の全階層の値が、全ての値の範囲をとる場合、即ち、以下の場合には値の範囲を分割する必要はない。
下限値：Ａ_１＜ｄ_１＞．．．＜ｄ_ｋ−２＞Ａ_ｋ−１＜ｄ_ｋ−１＞Ｖ_ｋ ^ｍｉｎ＜ｄ_ｋ＞．．．＜ｄ_ｎ−１＞Ｖ_ｎ ^ｍｉｎ
上限値：Ｂ_１＜ｄ_１＞．．．＜ｄ_ｋ−２＞Ｂ_ｋ−１＜ｄ_ｋ−１＞Ｖ_ｋ ^ｍａｘ＜ｄ_ｋ＞．．．＜ｄ_ｎ−１＞Ｖ_ｎ ^ｍａｘ
そして、正規表現生成部１０４は、ｋ−１桁目以下に対する正規表現を、以下のようにして生成することができる。
正規表現：“［Ａ_ｋ−１−Ｂ_ｋ−１］＜ｄ_ｋ−１＞＜Ｖ_ｋ ^ｍｉｎ〜Ｖ_ｋ ^ｍａｘの正規表現＞＜ｄ_ｋ＞．．．＜ｄ_ｎ−１＞＜Ｖ_ｎ ^ｍｉｎ〜Ｖ_ｎ ^ｍａｘの正規表現＞” In the regular expression generation processing in steps S104 and S105, when the values of all layers below the k-th layer (2 ≦ k ≦ n) take the range of all values, that is, the value range is divided in the following cases do not have to.
Lower limit: A ₁ <d ₁ >. . . _{_{_{<D k-2> A k}}} -1 <d k-1> V k min <d k>. . . <D _n-1 > V _n ^min
Upper limit value: B ₁ <d ₁ >. . . <D _k-2 > B _k-1 <d _k-1 > V _k ^max <d _k >. . . <D _n-1 > V _n ^max
And the regular expression production | generation part 104 can produce | generate the regular expression with respect to the (k-1) th digit or less as follows.
Regular expression: "[A _k-1 -B _k-1 ] <d _k-1 ><regular expression of V _k ^{min to} V _k ^max ><d _k > ... <d _n-1 ><V _n ^min Regular expression of ~ V _n ^max > ”

ステップＳ１０６において、正規表現結合部１０５は、ステップＳ１０３〜Ｓ１０５で個別に生成された下位、中位、上位領域の正規表現を結合して、以下のように属性範囲条件に対応する正規表現を生成する。
“＜下位領域の正規表現＞｜＜中位領域の正規表現＞｜＜上位領域の正規表現＞”
中位領域がない場合に生成される正規表現は以下のような形式となる。
“＜下位領域の正規表現＞｜＜上位領域の正規表現＞” In step S106, the regular expression combining unit 105 combines the lower, middle, and upper region regular expressions individually generated in steps S103 to S105 to generate a regular expression corresponding to the attribute range condition as follows. To do.
"<Regular expression in lower region> | <Regular expression in middle region> | <Regular expression in upper region>"
The regular expression generated when there is no middle-level area has the following format.
"<Regular expression in lower area> | <Regular expression in upper area>"

上記の方式を日付に適用した場合の具体例を示す。典型的な日付の形式として、ここでは年月日がスラッシュ「／」で区切られているものとする。 A specific example of applying the above method to a date is shown. As a typical date format, it is assumed here that the date is separated by a slash “/”.

（例４−１）属性範囲条件：１９９６／１１／１５〜２００６／９／２０ (Example 4-1) Attribute range condition: 1996/11/15 to 2006/9/20

属性範囲条件入力部１０１は、属性範囲条件の入力として、下限値、上限値の他に、区切り文字「／」、属性値の種類（日付）、年月日の順序などを指定する。ここで、実施の形態２と同様に、条件記憶部１０７が、正規表現を生成する処理のために参照可能な記憶装置１５１内の記憶領域の一部に、複数種類の日付の形式を記憶しておき、識別子入力部１０８が識別子を指定することにより、自動的に対応する日付の形式を選択できるようにしてもよい。この場合、属性範囲条件入力部１０１は、条件記憶部１０７により記憶装置１５１に予め記憶されている複数の属性範囲条件データの中から、識別子入力部１０８により入力された識別子に対応付けて記憶された属性範囲条件データを読み出して入力する。そして、この属性範囲条件データに基づいて、正規表現が生成される。あるいは、演算部１０２や正規表現生成部１０４などが、属性範囲条件入力部１０１により指定された下限値、上限値から日付の形式を自動的に識別するようにしてもよい。例えば、下限値が「Ａ_３年Ａ_２月Ａ_１日」（Ａ_３、Ａ_２、Ａ_１はいずれも数値）と指定されたならば、区切り文字は「年」「月」「日」で、左から順に上位の階層であることを容易に認識することができる。ここで、日付を、あたかも０≦Ａ_３、１≦Ａ_２≦１２、１≦Ａ_１≦３１という値域の属性要素からなる３桁の属性Ａ_３Ａ_２Ａ_１であるかのように扱うこともできる。 The attribute range condition input unit 101 designates the delimiter “/”, the type of attribute value (date), the year / month / day order, etc. in addition to the lower limit value and the upper limit value as input of the attribute range condition. Here, as in the second embodiment, the condition storage unit 107 stores a plurality of types of date formats in a part of the storage area in the storage device 151 that can be referred to for processing for generating a regular expression. Alternatively, the identifier input unit 108 may automatically select a corresponding date format by designating an identifier. In this case, the attribute range condition input unit 101 is stored in association with the identifier input by the identifier input unit 108 from among a plurality of attribute range condition data stored in advance in the storage device 151 by the condition storage unit 107. Read and input the attribute range condition data. Then, a regular expression is generated based on the attribute range condition data. Alternatively, the calculation unit 102, the regular expression generation unit 104, and the like may automatically identify the date format from the lower limit value and the upper limit value specified by the attribute range condition input unit 101. For example, if the lower limit is "A _3-year A ₂ May A ₁ day" _{_{(A 3, A 2, A}} 1 is any number) has been designated as the delimiter in the "year", "month", "day" From the left, it can be easily recognized that it is a higher hierarchy. Here, the date is treated as if it is a three-digit attribute A ₃ A ₂ A ₁ consisting of attribute elements in the range of 0 ≦ A ₃ , 1 ≦ A ₂ ≦ 12, 1 ≦ A ₁ ≦ 31. You can also.

上記の手順に従うと、ステップＳ１０２において、演算部１０２は、この例の属性範囲を以下の通り下位、中位、上位領域に分割できる。
下位領域：１９９６／１１／１５〜１９９６／１２／３１
中位領域：１９９７／１／１〜２００５／１２／３１
上位領域：２００６／１／１〜２００６／９／２０ According to the above procedure, in step S102, the calculation unit 102 can divide the attribute range of this example into lower, middle, and upper regions as follows.
Lower region: 1996/11/15 to 1996/12/31
Middle region: 1997/1/1 to 2005/12/31
Upper area: 2006/1/1 to 2006/9/20

ステップＳ１０３において、正規表現生成部１０４は、中位領域の正規表現を以下の通りに生成する。
中位領域の正規表現：“＜１９９７〜２００５の正規表現＞／＜１〜１２の正規表現＞／＜１〜３１の正規表現＞“ In step S103, the regular expression generation unit 104 generates a regular expression for the middle region as follows.
Regular expression of middle region: “<regular expression of 1997-2005> / <regular expression of 1-12> / <regular expression of 1-31>”

ステップＳ１０４において、正規表現生成部１０４は、下位領域（１９９６／１１／１５〜１９９６／１２／３１）を、さらに下位と中位領域とに分割できる。
下位領域：１９９６／１１／１５〜１９９６／１１／３１
中位領域：１９９６／１２／１〜１９９６／１２／３１
これより、元の属性範囲の下位領域の正規表現は、以下の通り生成される。
下位領域（１９９６／１１／１５〜１９９６／１２／３１）の正規表現：“１９９６／（１１／＜１５〜３１の正規表現＞｜１２／＜１〜３１の正規表現＞）” In step S104, the regular expression generation unit 104 can further divide the lower region (1996/11/15 to 1996/12/31) into a lower region and a middle region.
Lower region: 1996/11/15 to 1996/11/31
Middle region: 1996/12/1 to 1996/12/31
Thus, the regular expression of the lower area of the original attribute range is generated as follows.
Regular expression of lower region (1996/11/15 to 1996/12/31): “1996 / (11 / <regular expression of 15 to 31> | 12 / <regular expression of 1 to 31>)”

ステップＳ１０５において、正規表現生成部１０４は、上位領域（２００６／１／１〜２００６／９／２０）を、さらに中位と上位領域とに分割できる。
中位領域：２００６／１／１〜２００６／８／３１
上位領域：２００６／９／１〜２００６／９／２０
これより、元の属性範囲の上位領域の正規表現は、以下の通り生成される。
上位領域（２００６／１／１〜２００６／９／２０）の正規表現：“２００６（＜１〜８の正規表現＞／＜１〜３１の正規表現＞｜９／＜１〜２０の正規表現＞）” In step S105, the regular expression generation unit 104 can further divide the upper region (2006/1/1 to 2006/9/20) into a middle region and an upper region.
Middle region: 2006/1/1 to 2006/8/31
Upper area: 2006/9/1 to 2006/9/20
Thus, the regular expression of the upper area of the original attribute range is generated as follows.
Regular expression of upper region (2006/1/1 to 2006/9/20): “2006 (<regular expression of 1 to 8> / <regular expression of 1 to 31 >> | 9 / <regular expression of 1 to 20> ) ”

ステップＳ１０６において、正規表現結合部１０５は、上記属性範囲条件の正規表現を以下のように構成することとなる。
正規表現：“（１９９６／（１１／＜１５〜３１の正規表現＞｜１２／＜１〜３１の正規表現＞））｜（＜１９９７〜２００５の正規表現＞／＜１〜１２の正規表現＞／＜１〜３１の正規表現＞）｜（２００６（＜１〜８の正規表現＞／＜１〜３１の正規表現＞｜９／＜１〜２０の正規表現＞））”
これを展開すると、次のような正規表現が得られる。
正規表現：“（１９９６／（１１／（１［５−９］｜２［０−９］｜３［０−１］）｜１２／（［０−９］｜［１−２］［０−９］｜３［０−１］）））｜（（１９９［７−９］｜２００［０−５］）／（［１−９］｜１［０−２］）／（［０−９］｜［１−２］［０−９］｜３［０−１］））｜（２００６／（［１−８］／（［０−９］｜［１−２］［０−９］｜３［０−１］）｜９／（［１−９］｜１［０−９］｜２０）））” In step S106, the regular expression combination unit 105 configures the regular expression of the attribute range condition as follows.
Regular expression: "((1996 / (11 / <regular expression of 15 to 31 >> | 12 / <regular expression of 1 to 31>)) | (<regular expression of 1997 to 2005> / <regular expression of 1 to 12>) / <Regular expression of 1-31>) | (2006 (<Regular expression of 1-8> / <Regular expression of 1-31> | 9 / <Regular expression of 1-20>)) ”
If this is expanded, the following regular expression is obtained.
Regular expression: "(1996 / (11 / (1 [5-9] | 2 [0-9] | 3 [0-1]) | 12 / ([0-9] | [1-2] [0- 9] | 3 [0-1]))) | ((199 [7-9] | 200 [0-5]) / ([1-9] | 1 [0-2]) / ([0-9 ] | [1-2] [0-9] | 3 [0-1])) | (2006 / ([1-8] / ([0-9] | [1-2] [0-9] | 3 [0-1]) | 9 / ([1-9] | 1 [0-9] | 20))) "

上記の日付の例では、「１１／３１」という本来存在しない日付まで検索してしまう。ここで、厳密に日の値域を処理したいのであれば、以下の例のように日の値域が異なるグループごとに正規表現を分けて生成すればよい。
修正前：“＜１〜１２の正規表現＞／＜１〜３１の正規表現＞”
修正後：“（１｜３｜５｜７｜８｜１０｜１２）／＜１〜３１の範囲の正規表現＞｜（４｜６｜９｜１１）／＜１〜３０の範囲の正規表現＞｜２／＜１〜２８の範囲の正規表現＞” In the example of the above date, the search is performed up to a date “11/31” which does not exist originally. Here, if it is desired to strictly process the day value range, a regular expression may be generated separately for each group having different day value ranges as in the following example.
Before modification: “<Regular expression of 1 to 12> / <Regular expression of 1 to 31>”
After correction: “(1 | 3 | 5 | 7 | 8 | 10 | 12) / <regular expression in the range of 1-31> | (4 | 6 | 9 | 11) / <regular expression in the range of 1-30 > | 2 / <regular expression in the range 1-28>"

これまでの説明では、属性値の下限値Ａ、上限値Ｂともに指定されている場合について示してきた。ここでは、上限値Ｂの指定がない場合や上限値が「無限大」と指定された場合について説明する。属性のデータ型として最大値Ｂ´が存在する場合は、正規表現結合部１０５は下限値Ａから上限値Ｂ´までの正規表現を生成すればよい。属性のデータ型として最大値がない場合には、第１階層の値に着目して処理するとよい。 In the description so far, the case where both the lower limit value A and the upper limit value B of the attribute value are specified has been shown. Here, a case where the upper limit value B is not specified or a case where the upper limit value is specified as “infinity” will be described. When the maximum value B ′ exists as the attribute data type, the regular expression combining unit 105 may generate a regular expression from the lower limit value A to the upper limit value B ′. When there is no maximum value as the data type of the attribute, it is good to process by paying attention to the value of the first hierarchy.

まず、正規表現結合部１０５は、下限値Ａの第１階層の値と同じ桁数ｍの属性値の最大値を第１階層に持つような属性値の中で値が最も大きいもの、即ち、第２階層以下の値が全てＶ_ｉ ^ｍａｘ（２≦ｉ≦ｎ）であるような値を上限値として正規表現を生成する。例えば、「２００６／９／２０」以降の日付を検索するための正規表現であれば、「２００６／９／２０」〜「９９９９／１２／３１」の属性範囲条件に対して正規表現を生成する。さらに、第１階層の値がｎ＋１桁以上の任意の属性値を検索する正規表現を追加する。上記の例であれば、５桁以上の年を表現する正規表現を“［１−９］［０−９］｛４，｝”と書くことができる。 First, the regular expression combining unit 105 has the largest value among the attribute values having the maximum value of the attribute value having the same number of digits m as the value of the first layer of the lower limit A, that is, A regular expression is generated with an upper limit value such that all values in the second layer and below are V _i ^max (2 ≦ i ≦ n). For example, if it is a regular expression for searching for dates after “2006/9/20”, a regular expression is generated for the attribute range condition of “2006/9/20” to “9999/12/31”. . Furthermore, a regular expression that searches for an arbitrary attribute value having a value of the first hierarchy of n + 1 digits or more is added. In the above example, a regular expression expressing a year of five digits or more can be written as “[1-9] [0-9] {4,}”.

下限値Ａの指定がない場合や下限値が「無限小」と指定された場合について説明する。まず、属性のデータ型として最小値Ａ´があるならば、正規表現結合部１０５は下限値Ａ´と上限値Ｂについて正規表現を生成するようにすればよい。それ以外の場合は、第１階層以下の値が最小値Ｖ_ｉ ^ｍｉｎ（１≦ｉ≦ｎ）であるような属性値を下限値として、正規表現を生成すればよい。 A case where the lower limit value A is not specified or a case where the lower limit value is specified as “infinitesimal” will be described. First, if there is a minimum value A ′ as an attribute data type, the regular expression combining unit 105 may generate a regular expression for the lower limit value A ′ and the upper limit value B. In other cases, a regular expression may be generated with an attribute value having a minimum value V _i ^min (1 ≦ i ≦ n) as a value below the first layer as a lower limit value.

階層構造を持つ属性の属性範囲条件による検索には、次のような応用も考えられる。 The following applications are also conceivable for the search based on the attribute range condition of attributes having a hierarchical structure.

（例４−２）属性範囲条件：２００６年９月〜２００６年１２月の平日
上記の条件を満たす日付を表す文字列を照合するための正規表現の生成手順を示す。上記の条件は、次の点でこれまで説明した属性範囲条件と異なる。
（１）「月」と「曜日」の２階層に範囲が設定されている。先に示した日付の属性範囲条件の例は、「年」と「月」に範囲が設定されているが、値としての実体は連続値である。
（２）第３階層の「日」に明示的な範囲条件が設定されていない。
（３）「日」と「曜日」は連動して変化する。厳密には上下の階層関係にない値である。
ここで、簡単のためにここでの日付の属性値は、左から順に「年」「月」「日」「曜日」が並んでおり、「年」「月」「日」間の区切り文字はスラッシュ「／」、「日」と「曜日」間の区切り文字は空白文字（厳密には、全角空白文字又は半角空白文字いずれか一方）であるとする。また、「曜日」の値域は、Ｖ_４ ^ｍｉｎ＝“ＳＵＮ”、Ｖ_４ ^ｍａｘ＝“ＳＡＴ”という値を取るものとする。この属性範囲条件の第３階層は明示的に設定されていないが、Ｖ_３ ^ｍｉｎ〜Ｖ_３ ^ｍａｘの値をとると考えてよい。このような属性範囲条件は以下のように考えられる。
属性範囲条件１：２００６／９／１〜２００６／１２／３１
属性範囲条件２：ＭＯＮ〜ＦＲＩ (Example 4-2) Attribute range condition: Weekdays from September 2006 to December 2006 A procedure for generating a regular expression for matching a character string representing a date satisfying the above conditions is shown. The above conditions differ from the attribute range conditions described so far in the following respects.
(1) Ranges are set in two layers of “month” and “day of the week”. In the example of the date attribute range condition shown above, ranges are set for “year” and “month”, but the substance as a value is a continuous value.
(2) No explicit range condition is set for “day” in the third hierarchy.
(3) “Day” and “Day of the week” change in conjunction. Strictly speaking, it is a value that is not in a hierarchical relationship.
Here, for the sake of simplicity, the attribute values of the date here are "year", "month", "day", and "day of the week" in order from the left. Assume that the delimiter between the slash “/” and “day” and “day of the week” is a blank character (strictly speaking, either a full-width space character or a half-width space character). In addition, the value range of “day of the week” is assumed to have values of V ₄ ^min = “SUN” and V ₄ ^max = “SAT”. Although the third hierarchy of this attribute range condition is not explicitly set, it may be considered that the value ranges from V ₃ ^{min to} V ₃ ^max . Such an attribute range condition is considered as follows.
Attribute range condition 1: 2006/9/1 to 2006/12/31
Attribute range condition 2: MON to FRI

あとは、先に説明した日付の属性範囲条件の正規表現を生成する手順に従って、正規表現結合部１０５が属性範囲条件１の正規表現を生成し、さらに区切り文字に続けて属性範囲条件２の正規表現“（ＭＯＮ｜ＴＵＥ｜ＷＥＤ｜ＴＨＵ｜ＦＲＩ）”を結合すればよい。これにより最終的に生成される正規表現は以下のようになる。
正規表現：“（２００６／（９｜１［０−２］）／（［１−９］｜［１２］［０−９］｜３［０１］））（ＭＯＮ｜ＴＵＥ｜ＷＥＤ｜ＴＨＵ｜ＦＲＩ）” After that, the regular expression combining unit 105 generates a regular expression for the attribute range condition 1 in accordance with the procedure for generating the regular expression for the date attribute range condition described above, and further generates a regular expression for the attribute range condition 2 following the delimiter. The expression “(MON | TUE | WED | THU | FRI)” may be combined. Thus, the regular expression finally generated is as follows.
Regular expression: “(2006 / (9 | 1 [0-2]) / ([1-9] | [12] [0-9] | 3 [01])) (MON | TUE | WED | THU | FRI ) ”

次に、上記の方式を時刻に適用した場合の具体例を示す。ここでは、時分秒の各階層がコロン「：」で区切られ、時間は２４時間表記であるものとする。 Next, a specific example when the above method is applied to time will be shown. Here, it is assumed that each level of hour, minute and second is separated by a colon “:” and the time is expressed in 24 hours.

（例４−３）属性範囲条件：８：４５：００〜１７：１５：００ (Example 4-3) Attribute range condition: 8:45:00 to 17:15:00

ステップＳ１０２において、演算部１０２は、この属性範囲条件を以下のように下位、中位、上位領域に分割できる。
下位領域：８：４５：００〜８：５９：５９
中位領域：９：００：００〜１６：５９：５９
上位領域：１７：００：００〜１７：１５：００ In step S102, the calculation unit 102 can divide this attribute range condition into lower, middle, and upper regions as follows.
Lower area: 8:45:00 to 8:59:59
Middle region: 9:00:00 to 16:59:59
Upper area: 17:00: 00 to 17: 15:00

ここから、正規表現生成部１０４及び正規表現結合部１０５が同様に処理して、以下の通り正規表現を生成することができる。
正規表現：“（８：（４［５−９］｜５［０−９］）：（０［０−９］｜［１−４］［０−９］｜５［０−９］）｜（９｜１［０−６］）：（［０−５］［０−９］）：（０［０−９］｜［１−４］［０−９］｜５［０−９］）｜１７：（（０［０−９］｜１［０−４］）：（０［０−９］｜［１−４］［０−９］｜５［０−９］）｜１５：００））” From here, the regular expression generation unit 104 and the regular expression combination unit 105 can perform the same processing to generate a regular expression as follows.
Regular expression: “(8: (4 [5-9] | 5 [0-9]) :( 0 [0-9] | [1-4] [0-9] | 5 [0-9]) | (9 | 1 [0-6]): ([0-5] [0-9]) :( 0 [0-9] | [1-4] [0-9] | 5 [0-9]) | 17: ((0 [0-9] | 1 [0-4]) :( 0 [0-9] | [1-4] [0-9] | 5 [0-9]) | 15: 00 )) ”

上記の例では、秒の値の範囲である「００〜５９」に対応する正規表現を、実施の形態１に示した手順で生成するものとして、“０［０−９］｜［１−４］［０−９］｜５［０−９］”としたが、この正規表現は“［０−５］［０−９］”のようにより簡略な表記にすることもできる。時刻の中の属性で、分や秒の「００〜５９」のように出現頻度の高いパターンに対しては、定型の正規表現を磁気ディスク装置９２０に記憶しておき、必要に応じて読み出して使用することにより、正規表現の生成処理をより簡略化することができる。このような出現頻度の高いパターンの例としては、日付の「（０）１〜１２」、「（０）１〜３１」、時刻の「０〜１２」、「０〜２４」、「００〜５９」、ＩＰアドレスの「０〜２５５」など、「Ｖ_ｉ ^ｍｉｎ〜Ｖ_ｉ ^ｍａｘ」となるパターンが挙げられる。 In the above example, it is assumed that the regular expression corresponding to “00 to 59” that is the range of the second value is generated by the procedure shown in the first embodiment, and “0 [0-9] | [1-4 ] [0-9] | 5 [0-9] ", but this regular expression can be expressed in a simpler form such as" [0-5] [0-9] ". For patterns with a high frequency of appearance such as minutes and seconds of “00 to 59” in the time, a regular expression is stored in the magnetic disk device 920 and read out as necessary. By using this, the regular expression generation process can be further simplified. Examples of such patterns having a high appearance frequency include date “(0) 1 to 12”, “(0) 1 to 31”, time “0 to 12”, “0 to 24”, and “00 to 00”. 59 ”, IP address“ 0 to 255 ”, and the like are“ V _i ^{min to} V _i ^max ”.

以下のように、時刻が１２時間表記の場合は、午前と午後とに分けて考えればよい。 As shown below, when the time is expressed in 12 hours, it may be divided into morning and afternoon.

（例４−４）属性範囲条件：８：４５：００ＡＭ〜５：１５：００ＰＭ
このような属性範囲条件は以下のように考えられる。
属性範囲条件１：８：４５：００ＡＭ〜１１：５９：５９ＡＭ
属性範囲条件２：０：００：００ＰＭ〜５：１５：００ＰＭ
このように午前と午後とで個別に正規表現を生成した後で、正規表現結合部１０５が以下の通り正規表現を結合すればよい。
正規表現：“＜属性範囲条件１の正規表現＞｜＜属性範囲条件２の正規表現＞” (Example 4-4) Attribute range condition: 8:45:00 AM to 5:15:00 PM
Such an attribute range condition is considered as follows.
Attribute range condition 1: 8: 45: 00 AM to 11:59:59 AM
Attribute range condition 2:00:00 PM to 5:15:00 PM
After the regular expressions are generated individually in the morning and afternoon in this way, the regular expression combining unit 105 may combine the regular expressions as follows.
Regular expression: “<Regular expression of attribute range condition 1> | <Regular expression of attribute range condition 2>”

ＩＰアドレスの場合は、４個の正の整数が区切り文字であるピリオド「．」で区切られており、左から順に上位の階層となっている。また、各階層の数値の値域は０〜２５５である。また、ＩＰｖ６のＩＰアドレスは、８個の１６進数値が区切り文字であるコロン「：」で区切られている。いずれの場合も、上位の階層から下位の階層に向かって領域を分割する手順を適用し、各階層の値については実施の形態１の手順を適用することにより、同様に正規表現を生成することができる。各階層は、０以上の整数、即ち、実施の形態１と同様の形式の属性値として処理することができる。 In the case of an IP address, four positive integers are separated by a period “.”, Which is a delimiter, and the upper layers are arranged in order from the left. Moreover, the numerical value range of each hierarchy is 0-255. Further, the IPv6 IP address is delimited by a colon “:” having eight hexadecimal values as delimiters. In either case, a regular expression is generated in the same manner by applying a procedure for dividing an area from an upper hierarchy toward a lower hierarchy, and applying the procedure of the first embodiment for each hierarchy value. Can do. Each hierarchy can be processed as an integer value of 0 or more, that is, an attribute value in the same format as in the first embodiment.

（例４−５）属性範囲条件：１０．０．１．１〜１０．２．１００．２５４ (Example 4-5) Attribute range condition: 10.0.1.1 to 10.2.100.254

ステップＳ１０２において、演算部１０２は、この属性範囲条件を以下のように下位、中位、上位領域に分割できる。
下位領域：１０．０．１．１〜１０．０．２５５．２５５
中位領域：１０．１．０．０〜１０．１．２５５．２５５
上位領域：１０．２．０．０〜１０．２．１００．２５４ In step S102, the calculation unit 102 can divide this attribute range condition into lower, middle, and upper regions as follows.
Lower region: 10.0.1.1 to 10.0.255.255
Middle region: 10.1.0.0-10.1.25.255
Upper area: 10.2.0.0.0 to 10.2.10.254

ここから、同様に下位の階層に向かって処理を行う。即ち、ステップＳ１０４において、正規表現生成部１０４は、下位領域を、さらに以下のように分割できる。
下位領域：１０．０．１．１〜１０．０．１．２５５
中位領域：１０．０．２．０〜１０．０．２５５．２５５ From here, the processing is similarly performed toward the lower hierarchy. That is, in step S104, the regular expression generation unit 104 can further divide the lower area as follows.
Lower region: 10.0.1.1 to 10.0.1.255
Middle region: 10.0.2.0 to 10.0.255.255

ステップＳ１０５において、正規表現生成部１０４は、上位領域を、さらに以下のように分割できる。
中位領域：１０．２．０．０〜１０．２．０．９９．２５５
上位領域：１０．２．１００．０〜１０．２．１００．２５４ In step S105, the regular expression generation unit 104 can further divide the upper area as follows.
Middle region: 10.2.0.0-10.2.0.99.255
Upper region: 10.2.100.0-10.2.210.254

最終的に、ステップＳ１０６において、正規表現結合部１０５は、以下の通り正規表現を生成することができる。
正規表現：“１０￥．（（０￥．（１￥．（［１−９］｜［１−９］［０−９］｜１［０−９］［０−９］｜（２５［０−５］｜２［０−４］［０−９］））｜（［２−９］｜［１−９］［０−９］｜１［０−９］［０−９］｜（２５［０−５］｜２［０−４］［０−９］））￥．（［０−９］｜［１−９］［０−９］｜１［０−９］［０−９］｜（２５［０−５］｜２［０−４］［０−９］））））｜（１￥．（［０−９］｜［１−９］［０−９］｜１［０−９］［０−９］｜（２５［０−５］｜２［０−４］［０−９］））￥．（［０−９］｜［１−９］［０−９］｜１［０−９］［０−９］｜（２５［０−５］｜２［０−４］［０−９］）））｜（２￥．（（［０−９］｜［１−９］［０−９］）￥．（［０−９］｜［１−９］［０−９］｜１［０−９］［０−９］｜（２５［０−５］｜２［０−４］［０−９］））｜１００￥．（［０−９］｜［１−９］［０−９］｜１［０−９］［０−９］｜（２５［０−４］｜２［０−４］［０−９］）））））” Finally, in step S106, the regular expression combining unit 105 can generate a regular expression as follows.
Regular expression: “10 ¥. ((0 ¥. (1 ¥. ([1-9] | [1-9] [0-9] | 1 [0-9] [0-9] | (25 [0 -5] | 2 [0-4] [0-9])) | ([2-9] | [1-9] [0-9] | 1 [0-9] [0-9] | (25 [0-5] | 2 [0-4] [0-9])) ¥. ([0-9] | [1-9] [0-9] | 1 [0-9] [0-9] | (25 [0-5] | 2 [0-4] [0-9])))) | (1 ¥. ([0-9] | [1-9] [0-9] | 1 [0 -9] [0-9] | (25 [0-5] | 2 [0-4] [0-9])) ¥ ([0-9] | [1-9] [0-9] | 1 [0-9] [0-9] | (25 [0-5] | 2 [0-4] [0-9]))) | (2 ¥. (([0-9] | [1- 9] [0-9]) ¥. ([0-9] | [1-9] [0- ] | 1 [0-9] [0-9] | (25 [0-5] | 2 [0-4] [0-9])) | 100 ¥. ([0-9] | [1-9 ] [0-9] | 1 [0-9] [0-9] | (25 [0-4] | 2 [0-4] [0-9]))))) "

小数は、整数部と小数部がピリオド「．」で区切られた数値である。正規表現の生成手順は、上位の階層である整数部、下位の階層である小数部の順に領域を分割する手順を適用し、整数部の値については実施の形態１の手順を適用し、小数部の値については実施の形態３の手順を適用することにより、同様に正規表現を生成することができる。 The decimal is a numerical value in which an integer part and a decimal part are separated by a period “.”. The regular expression generation procedure applies the procedure of dividing the region in the order of the integer part that is the upper hierarchy and the decimal part that is the lower hierarchy, and applies the procedure of the first embodiment for the value of the integer part. By applying the procedure of the third embodiment for the value of the part, a regular expression can be similarly generated.

（例４−６）属性範囲条件：１．４１４２〜６．６３ (Example 4-6) Attribute range condition: 1.4142 to 6.63

ステップＳ１０２において、演算部１０２は、この属性範囲条件を以下のように下位、中位、上位領域に分割できる。ここで、第２階層は実施の形態３の形式の属性値なので、その最小値、最大値は桁数が同じ値の範囲で考える。即ち、下限値の第２階層の値が４桁なので、演算部１０２は、下位領域の上限値として４桁の数値の最大値を選択する。また、上限値の第２階層の値が２桁なので、演算部１０２は、上位領域の下限値として２桁の数値の最小値を選択する。
下位領域：１．４１４２〜１．９９９９
中位領域：２．００〜５．９９
上位領域：６．００〜６．６３ In step S102, the calculation unit 102 can divide this attribute range condition into lower, middle, and upper regions as follows. Here, since the second hierarchy is an attribute value in the format of the third embodiment, the minimum value and the maximum value are considered within a range of values having the same number of digits. That is, since the value of the second layer of the lower limit value is 4 digits, the calculation unit 102 selects the maximum value of 4 digits as the upper limit value of the lower area. Further, since the value of the second layer of the upper limit value is two digits, the calculation unit 102 selects the minimum value of two digits as the lower limit value of the upper area.
Lower region: 1.4142 to 1.9999
Middle region: 2.00 to 5.99
Upper region: 6.00 to 6.63

最終的に、ステップＳ１０６において、正規表現結合部１０５は、以下の正規表現を生成することができる。
正規表現：“（１￥．（（４１４［２−９］｜４１［５−９］｜４［２−９］）｜（［５−９］）））｜（［２−５］￥．［０−９］）｜（６￥．（（［０−５］）｜（６［０−３］）））” Finally, in step S106, the regular expression combining unit 105 can generate the following regular expressions.
Regular expression: “(1 ¥. ((414 [2-9] | 41 [5-9] | 4 [2-9]) | ([5-9]))) | ([2-5] ¥. [0-9]) | (6 ¥. (([0-5]) | (6 [0-3]))) "

なお、区切り文字が正規表現で使用されるメタキャラクタ（正規表現において特別な意味を持つ文字）の場合には、その区切り文字の直前にエスケープ文字（バックスラッシュ、又は、日本語Ｗｉｎｄｏｗｓ（登録商標）の環境では主に「￥」が使用される）を挿入しなければならない。そのようなメタキャラクタとしては、「｜」、「？」、「＊」、「＋」、「．」、「（」、「）」、「｛」、「｝」、「［」、「］」、「￥」、「＾」、「＄」、「＜」、「＞」がある。ただし、メタキャラクタの種類は、正規表現の処理系によって異なる場合がある。 When the delimiter is a metacharacter (character having a special meaning in the regular expression) used in the regular expression, an escape character (backslash or Japanese Windows (registered trademark)) immediately before the delimiter. In this environment, “¥” is mainly used). Such metacharacters include “|”, “?”, “*”, “+”, “.”, “(”, “)”, “{”, “}”, “[”, “] ”,“ ¥ ”,“ ^ ”,“ $ ”,“ <”,“> ”. However, the type of metacharacter may differ depending on the regular expression processing system.

以上のように、本実施の形態に係る正規表現生成装置１００によれば、複数の属性値と区切り記号から構成される階層構造を持った属性の下限値、上限値、書式などを指定した属性範囲条件から、その下限値から上限値までの範囲に含まれる属性値を表現する文字列を照合するための正規表現を自動的に生成することができる。これにより、従来は正確に記述することが困難であった複雑な属性範囲条件を照合する正規表現を、特別な知識も試行錯誤も必要なく、容易に短時間で得ることができる。 As described above, according to the regular expression generating apparatus 100 according to the present embodiment, attributes that specify a lower limit value, an upper limit value, a format, and the like of an attribute having a hierarchical structure composed of a plurality of attribute values and delimiters. From the range condition, it is possible to automatically generate a regular expression for collating a character string expressing an attribute value included in the range from the lower limit value to the upper limit value. As a result, a regular expression for matching complicated attribute range conditions, which has been difficult to describe accurately in the past, can be easily obtained in a short time without requiring special knowledge or trial and error.

実施の形態５．
本実施の形態について、主に実施の形態１との差異を説明する。 Embodiment 5. FIG.
In the present embodiment, differences from the first embodiment will be mainly described.

本実施の形態は、既定の構文規則や書式に従って記述されたテキストデータの、特定の範囲に含まれる文字列を検索するための正規表現を生成する処理の流れを説明するものである。ここでは、従来のように検索対象として指定される特定の文字列だけではなく、下限値から上限値までの特定の範囲に含まれる属性値を表現する文字列が検索対象となる。 In the present embodiment, a flow of processing for generating a regular expression for searching a character string included in a specific range of text data described according to a predetermined syntax rule or format will be described. Here, not only a specific character string designated as a search target as in the prior art, but also a character string expressing an attribute value included in a specific range from a lower limit value to an upper limit value is a search target.

図１３は、本実施の形態に係る正規表現生成装置１００の構成を示すブロック図である。 FIG. 13 is a block diagram showing a configuration of regular expression generating apparatus 100 according to the present embodiment.

図１３において、正規表現生成装置１００は、実施の形態１で説明した図１に示したものに加え、テキスト範囲条件入力部１０９を備える。正規表現生成装置１００は、実施の形態２と同様に、さらに、条件記憶部１０７、識別子入力部１０８を備えていてもよい。 In FIG. 13, the regular expression generation device 100 includes a text range condition input unit 109 in addition to the one shown in FIG. 1 described in the first embodiment. The regular expression generation device 100 may further include a condition storage unit 107 and an identifier input unit 108 as in the second embodiment.

テキスト範囲条件入力部１０９は、テキスト範囲条件データを入力装置１５３から入力する。テキスト範囲条件データは、所定の構文規則に従って記述されたテキストの特定の領域を示すテキスト範囲条件のデータである。テキスト範囲条件によって、テキスト中の特定の範囲が指定される。例えば、テキストが電子メールであれば、テキスト範囲条件入力部１０９は、テキストの特定の領域として、電子メールの特定のヘッダフィールドを示すテキスト範囲条件データを入力することができる。また、例えば、テキストがＣＳＶなどのように区切り文字で複数のフィールドに区切られたものであれば、テキスト範囲条件入力部１０９は、テキストの特定の領域として、複数のフィールドのいずれかを示すテキスト範囲条件データを入力することができる。 The text range condition input unit 109 inputs text range condition data from the input device 153. The text range condition data is text range condition data indicating a specific area of text described according to a predetermined syntax rule. A text range condition specifies a specific range in the text. For example, if the text is an e-mail, the text range condition input unit 109 can input text range condition data indicating a specific header field of the e-mail as a specific area of the text. In addition, for example, if the text is separated into a plurality of fields by a delimiter character such as CSV, the text range condition input unit 109 displays text indicating one of the plurality of fields as a specific area of the text. Range condition data can be entered.

属性範囲条件入力部１０１は、属性値の書式として、属性値がテキスト範囲条件入力部１０９により入力されたテキスト範囲条件データが示す領域に含まれることを示す属性範囲条件データを入力する。 The attribute range condition input unit 101 inputs attribute range condition data indicating that the attribute value is included in the area indicated by the text range condition data input by the text range condition input unit 109 as the format of the attribute value.

正規表現生成部１０４は、さらに、テキスト範囲条件入力部１０９により入力されたテキスト範囲条件データが示す領域を正規表現で表すテキスト領域データ（以下、単に「正規表現」、又は「範囲」などという場合がある）を処理装置１５２で生成する。 The regular expression generation unit 104 further includes text region data (hereinafter simply referred to as “regular expression” or “range”) that represents the region indicated by the text range condition data input by the text range condition input unit 109 in a regular expression. Is generated by the processing device 152.

正規表現結合部１０５は、属性範囲条件入力部１０１により入力された属性範囲条件データが示す書式に基づいて、正規表現生成部１０４により生成された下位領域データと上位領域データと中位領域データとテキスト領域データとを結合して、属性値を含む前記テキストの特定の領域を正規表現で表す正規表現データを生成する。 Based on the format indicated by the attribute range condition data input by the attribute range condition input unit 101, the regular expression combining unit 105 generates lower region data, upper region data, and middle region data generated by the regular expression generation unit 104. The regular expression data is generated by combining the text area data and expressing the specific area of the text including the attribute value by a regular expression.

正規表現生成装置１００が、条件記憶部１０７、識別子入力部１０８を備えている場合には、条件記憶部１０７は、複数の属性範囲条件データと複数のテキスト範囲条件データとを予め記憶装置１５１に記憶しておく。また、条件記憶部１０７は、各属性範囲条件データと各テキスト範囲条件データとの組み合わせを一意の識別子と対応付けて予め記憶装置１５１に記憶しておく。識別子入力部１０８は、任意の識別子を入力装置１５３から入力する。属性範囲条件入力部１０１は、識別子入力部１０８により入力された識別子に対応付けて条件記憶部１０７により記憶された組み合わせの属性範囲条件データを入力する。同様に、テキスト範囲条件入力部１０９は、識別子入力部１０８により入力された識別子に対応付けて条件記憶部１０７により記憶された組み合わせのテキスト範囲条件データを入力する。 When the regular expression generation device 100 includes the condition storage unit 107 and the identifier input unit 108, the condition storage unit 107 stores a plurality of attribute range condition data and a plurality of text range condition data in the storage device 151 in advance. Remember. In addition, the condition storage unit 107 stores a combination of each attribute range condition data and each text range condition data in advance in the storage device 151 in association with a unique identifier. The identifier input unit 108 inputs an arbitrary identifier from the input device 153. The attribute range condition input unit 101 inputs the attribute range condition data of the combination stored in the condition storage unit 107 in association with the identifier input by the identifier input unit 108. Similarly, the text range condition input unit 109 inputs the text range condition data of the combination stored in the condition storage unit 107 in association with the identifier input from the identifier input unit 108.

図１４は、本実施の形態に係る正規表現生成方法を示すフローチャートである。図１４のフローチャートに示すフローは、正規表現生成装置１００を実現するコンピュータ上で実行されるプログラム（正規表現生成プログラム）の処理手順に相当する。この処理手順において、正規表現生成プログラムは、以下に示す各処理をコンピュータに実行させる。 FIG. 14 is a flowchart showing a regular expression generation method according to the present embodiment. The flow shown in the flowchart of FIG. 14 corresponds to a processing procedure of a program (regular expression generation program) executed on a computer that implements the regular expression generation apparatus 100. In this processing procedure, the regular expression generation program causes the computer to execute the following processes.

正規表現生成装置１００の利用者がキーボード９０２やマウス９０３でテキスト範囲条件データを指定すると、テキスト範囲条件入力部１０９は、そのテキスト範囲条件データをキーボード９０２やマウス９０３から入力する（ステップＳ９０１：テキスト範囲条件入力処理）。また、正規表現生成装置１００の利用者がキーボード９０２やマウス９０３で、そのテキスト範囲条件データが示すテキストの領域に属性値が含まれることを示す属性範囲条件データを指定すると、属性範囲条件入力部１０１は、その属性範囲条件データをキーボード９０２やマウス９０３から入力する（ステップＳ９０２：属性範囲条件入力処理）。ステップＳ９０２の後は、実施の形態１で説明した図４のフローチャートと同様に、ステップＳ１０２の処理が実行される。 When the user of the regular expression generation apparatus 100 specifies text range condition data with the keyboard 902 or the mouse 903, the text range condition input unit 109 inputs the text range condition data from the keyboard 902 or the mouse 903 (step S901: Text Range condition input processing). Further, when the user of the regular expression generation apparatus 100 designates attribute range condition data indicating that an attribute value is included in the text area indicated by the text range condition data with the keyboard 902 or the mouse 903, an attribute range condition input unit 101 inputs the attribute range condition data from the keyboard 902 or the mouse 903 (step S902: attribute range condition input process). After step S902, similarly to the flowchart of FIG. 4 described in the first embodiment, the process of step S102 is executed.

正規表現生成部１０４は、ステップＳ９０１でテキスト範囲条件入力部１０９により入力されたテキスト範囲条件データが示すテキストの領域を正規表現で表すテキスト領域データをＣＰＵ９１１で生成する（ステップＳ９０３：正規表現生成処理の一部）。ステップＳ９０３の後は、実施の形態１で説明した図４のフローチャートと同様に、ステップＳ１０３〜Ｓ１０５の処理が実行される。 The regular expression generation unit 104 generates, in the CPU 911, text area data representing the text area indicated by the text range condition data input by the text range condition input unit 109 in step S901 (step S903: regular expression generation processing). Part of). After step S903, similarly to the flowchart of FIG. 4 described in the first embodiment, the processes of steps S103 to S105 are executed.

正規表現結合部１０５は、ステップＳ９０２で属性範囲条件入力部１０１により入力された属性範囲条件データが示す書式に基づいて、ステップＳ９０３、Ｓ１０３〜Ｓ１０５で正規表現生成部１０４により生成されたテキスト領域データと下位領域データと上位領域データと中位領域データとをＣＰＵ９１１で結合して（ステップＳ１０３で中位領域データが生成されなかった場合には、テキスト領域データと下位領域データと上位領域データのみを結合することになる）、属性値記憶部１０３により記憶された下限値から上限値までの属性値だけでなく、その属性値を含むテキストの領域を正規表現で表す正規表現データを生成する（ステップＳ９０４：正規表現結合処理）。ステップＳ９０４の後は、実施の形態１で説明した図４のフローチャートと同様に、ステップＳ１０７の処理が実行される。 The regular expression combining unit 105 generates the text region data generated by the regular expression generating unit 104 in steps S903 and S103 to S105 based on the format indicated by the attribute range condition data input by the attribute range condition input unit 101 in step S902. The CPU 911 combines the lower region data, the upper region data, and the middle region data (if the middle region data is not generated in step S103, only the text region data, the lower region data, and the upper region data are In addition to the attribute values from the lower limit value to the upper limit value stored by the attribute value storage unit 103, regular expression data that represents the text area including the attribute value as a regular expression is generated (step S3). S904: Regular expression combining process). After step S904, the process of step S107 is executed as in the flowchart of FIG. 4 described in the first embodiment.

このように、本実施の形態において、正規表現生成装置１００は、既定の構文規則に従って記述されたテキストデータの特定の領域を選択するテキスト範囲条件と、属性値の値域に含まれる特定の範囲を選択する属性範囲条件を入力とし、これらを、上記領域の中から属性範囲条件に合致する属性値を検出するための正規表現に変換する検索条件生成方式、又は、この方式を計算機上で実行するための検索条件生成プログラムを実装するものである。この方式では、テキスト範囲条件と等価な正規表現と、属性範囲条件と等価な正規表現を生成し、さらに、それらの正規表現を結合する。 As described above, in the present embodiment, the regular expression generation device 100 sets the text range condition for selecting a specific region of text data described according to a predetermined syntax rule, and the specific range included in the value range of the attribute value. Select the attribute range condition to be selected as input and search condition generation method for converting these into regular expressions for detecting attribute values that match the attribute range condition from the above region, or execute this method on the computer A search condition generation program is implemented. In this method, a regular expression equivalent to the text range condition and a regular expression equivalent to the attribute range condition are generated, and these regular expressions are combined.

上記検索条件生成方式では、例えば、電子メールの特定のヘッダフィールドを選択するテキスト範囲条件と、属性範囲条件を入力し、それら条件を正規表現に変換する。また、例えば、１行が既定の区切り文字によって複数のフィールドに分割されているテキストデータの特定のフィールドを選択するテキスト範囲条件と、属性範囲条件とを入力し、それら条件を正規表現に変換する。 In the search condition generation method, for example, a text range condition for selecting a specific header field of an e-mail and an attribute range condition are input, and these conditions are converted into regular expressions. In addition, for example, a text range condition for selecting a specific field of text data in which one line is divided into a plurality of fields by a predetermined delimiter and an attribute range condition are input, and these conditions are converted into a regular expression. .

また、正規表現生成装置１００は、検索条件から正規表現への変換規則を記憶するための領域を記憶装置１５１内に設けておいてもよい。この場合、属性範囲条件入力部１０１、テキスト範囲条件入力部１０９、識別子入力部１０８が、属性範囲条件とテキスト範囲条件のいずれか又はその両方と、識別子を入力として受け付ける。上記領域には、条件記憶部１０７が、例えば、識別子と、属性値の形式によって異なる属性範囲条件を正規表現に変換するための変換規則の組と、識別子と、テキストデータの構文によって異なるテキスト範囲条件を正規表現に変換するための変換規則の組のいずれか、又はその両方を記憶しておく。そして、正規表現生成部１０４や正規表現結合部１０５は、入力された識別子と関連付けられた変換規則を上記領域から取り出し、変換規則に従って入力済の属性範囲条件又はテキスト範囲条件を正規表現に変換する。属性範囲条件入力部１０１、テキスト範囲条件入力部１０９、識別子入力部１０８が、属性範囲条件又はテキスト範囲条件を正規表現に変換するための変換規則と識別子との組の入力を受け付け、条件記憶部１０７が、入力された変換規則と識別子の組を上記領域に記憶するようにしてもよい。 Further, the regular expression generation device 100 may provide an area for storing the conversion rule from the search condition to the regular expression in the storage device 151. In this case, the attribute range condition input unit 101, the text range condition input unit 109, and the identifier input unit 108 receive either or both of the attribute range condition and the text range condition, and an identifier as input. In the above area, for example, the condition storage unit 107 has a text range that varies depending on an identifier, a set of conversion rules for converting an attribute range condition that varies depending on the attribute value format into a regular expression, an identifier, and the syntax of the text data. One or both of a set of conversion rules for converting a condition into a regular expression are stored. Then, the regular expression generation unit 104 and the regular expression combination unit 105 take out the conversion rule associated with the input identifier from the area, and convert the input attribute range condition or text range condition into a regular expression according to the conversion rule. . An attribute range condition input unit 101, a text range condition input unit 109, and an identifier input unit 108 accept input of a pair of a conversion rule and an identifier for converting an attribute range condition or a text range condition into a regular expression, and a condition storage unit 107 may store the input conversion rule and identifier pair in the area.

以下、正規表現を生成する処理の流れを、例を交えて説明する。 Hereinafter, the flow of processing for generating a regular expression will be described with an example.

初めに、既定の構文規則や書式を持つテキストの例を示す。第１の例は、電子メールである。電子メールは、ヘッダフィールドの書式が、ＲＦＣ（Ｒｅｑｕｅｓｔ・Ｆｏｒ・Ｃｏｍｍｅｎｔ）２８２２などにより定められている。ヘッダフィールドの１項目は、基本的に、フィールド名、フィールドの値、改行（ＣＲＬＦ）から構成されている。第２の例は、ＣＳＶファイルである。ＣＳＶファイルは、複数のフィールドの値をカンマで区切って記述したテキストファイルで、表形式のデータを表現するために利用されることが多い。類似の形式として、ＴＳＶ（Ｔａｂ・Ｓｅｐａｒａｔｅｄ・Ｖａｌｕｅｓ）といわれる区切り文字にタブを使用した形式や、区切り文字にスペースを使用した形式などがあるが、いずれも、区切り文字で複数のフィールドに区切られたテキストである。 First, here is an example of text with default syntax rules and formats. The first example is e-mail. In the e-mail, the header field format is defined by RFC (Request For Comment) 2822 or the like. One item of the header field basically includes a field name, a field value, and a line feed (CRLF). The second example is a CSV file. A CSV file is a text file in which values of a plurality of fields are delimited by commas, and is often used to represent tabular data. Similar formats include TSV (Tab / Separated / Values) using a tab as a delimiter and a format using a space as a delimiter, both of which are delimited by multiple fields. Text.

上記のようなテキスト中の、特定の範囲に含まれる文字列を検索するための正規表現は、テキストの書式ごとに異なるテキスト中の範囲を特定するための正規表現と、その範囲内の文字列を検索するための固定キーワードや正規表現とを組み合わせることにより実現することができる。そのために、本実施の形態では、正規表現生成装置１００が備える記憶装置１５１内の参照可能な記憶領域に、書式と範囲を特定するための正規表現を予め記憶しておくことが望ましい。この場合、正規表現生成装置１００の入力は、属性範囲条件に示される書式の指定と、テキスト範囲条件に示される検索対象となる範囲の指定と、指定の範囲内を検索する正規表現となる。正規表現生成装置１００は、それらの入力がなされると、書式と検索対象とする範囲の指定から、対応する範囲を特定するための正規表現を生成し、さらに生成した正規表現と、範囲内を検索する正規表現とを結合して出力する。 The regular expression to search for the character string included in a specific range in the text as described above is a regular expression for specifying a different range in the text for each text format, and the character string in that range. Can be realized by combining fixed keywords and regular expressions for searching. Therefore, in this embodiment, it is desirable to store in advance a regular expression for specifying a format and a range in a referenceable storage area in the storage device 151 included in the regular expression generation device 100. In this case, the input of the regular expression generation device 100 is a specification of a format indicated by the attribute range condition, a specification of a range to be searched indicated by the text range condition, and a regular expression for searching within the specified range. When the input is made, the regular expression generation device 100 generates a regular expression for specifying the corresponding range from the specification of the format and the range to be searched, and further generates the regular expression and the range within the range. Combine and output the regular expression to search.

また、テキスト中の特定の範囲を対象に、実施の形態１〜４に示したような属性範囲条件を指定した検索を行うための正規表現を生成する処理は、テキストの範囲を特定するための正規表現を生成するとともに、実施の形態１〜４に示した手順に従って属性範囲条件に対応する正規表現を生成し、それらを結合することにより実現することができる。 In addition, a process for generating a regular expression for performing a search specifying an attribute range condition as shown in Embodiments 1 to 4 for a specific range in the text is for specifying the text range. It can be realized by generating a regular expression, generating a regular expression corresponding to the attribute range condition according to the procedure shown in the first to fourth embodiments, and combining them.

電子メールの例を示す。電子メールのヘッダフィールドの構文は以下の通りである。
＜ヘッダフィールド名＞：＜属性値＞＜改行（ＣＲＬＦ）＞ An example of an email is shown. The syntax of the email header field is:
<Header field name>: <Attribute value><Line feed (CRLF)>

また、複数行にまたがる属性値の場合の構文は以下の通りである。電子メールのヘッダフィールドにおいて、行の先頭が半角空白文字かタブであった場合は、その行は直前の行の続きであることを意味している。
＜ヘッダフィールド名＞：＜属性値＞＜改行（ＣＲＬＦ）＞
＜半角空白又はタブ＞＜属性値＞＜改行（ＣＲＬＦ）＞
・・・・・・ The syntax for attribute values that span multiple lines is as follows. In the header field of an e-mail, if the beginning of a line is a space character or a tab, it means that the line is a continuation of the previous line.
<Header field name>: <Attribute value><Line feed (CRLF)>
<Single-byte space or tab><Attributevalue><Line feed (CRLF)>
・・・・・・

このとき、電子メールのヘッダフィールドのテキスト範囲条件の正規表現は、以下の形になる。
（１）ヘッダフィールドが１行の場合
正規表現：“（＾｜￥ｎ）＜ヘッダフィールド名＞：［＾￥ｎ］＊＜属性値の正規表現＞”
（２）ヘッダフィールドが複数行にわたる場合
正規表現：“（＾｜￥ｎ）＜ヘッダフィールド名＞：［＾￥ｎ］＊（＜属性値の正規表現＞｜（￥ｎ（￥ｓ｜￥ｔ）＋［＾￥ｎ］＊）＋＜属性値の正規表現＞）”
ここでは、「￥ｎ」は改行、「￥ｓ」は半角空白、「￥ｔ」はタブを意味するものとする。 At this time, the regular expression of the text range condition of the header field of the e-mail has the following form.
(1) When header field is one line Regular expression: “(^ | ¥ n) <Header field name>: [^ ¥ n] * <Regular expression of attribute value>”
(2) When header field extends over multiple lines Regular expression: “(^ | ¥ n) <Header field name>: [^ ¥ n] * (<Regular expression of attribute value> | (¥ n (¥ s | ¥ t) ) + [^ \ N] *) + <regular expression of attribute value>) "
Here, “¥ n” means a line feed, “¥ s” means a single-byte space, and “¥ t” means a tab.

例えば、電子メールの表題（Ｓｕｂｊｅｃｔ）に文字列「メール」を含む属性値を検索したい場合、正規表現生成装置１００の入力は、次の情報を含んでいればよい。
（ａ）文書の形式＝「電子メールのヘッダフィールド」
（ｂ）検索対象の範囲（ヘッダフィールド名）＝「Ｓｕｂｊｅｃｔ」
（ｃ）範囲の検索条件（正規表現）＝「メール」
これらの条件を指定することが可能であれば、その指定の方法は問わないが、例えば、テキスト範囲条件入力部１０９が、テキスト範囲条件として、（ａ）文書の形式と（ｂ）検索対象の範囲を指定し、属性範囲条件入力部１０１が、属性範囲条件として、（ｃ）範囲の検索条件を指定すればよい（このとき、属性範囲条件入力部１０１は、範囲の検索条件を指定することで、属性値がテキスト範囲条件データに示された範囲に含まれることも示したこととなる）。 For example, when searching for an attribute value that includes the character string “mail” in the title (Subject) of the electronic mail, the input of the regular expression generation device 100 may include the following information.
(A) Document format = “e-mail header field”
(B) Search target range (header field name) = “Subject”
(C) Range search condition (regular expression) = "mail"
As long as these conditions can be specified, the method of specifying them is not limited. For example, the text range condition input unit 109 may select (a) a document format and (b) a search target as a text range condition. A range is specified, and the attribute range condition input unit 101 may specify (c) a range search condition as the attribute range condition (at this time, the attribute range condition input unit 101 specifies the range search condition. This also indicates that the attribute value is included in the range indicated in the text range condition data).

上記の入力を受けて、正規表現生成部１０４と正規表現結合部１０５とで、以下のように、上記の電子メールのヘッダフィールドのテキスト範囲条件の正規表現中の＜ヘッダフィールド名＞を「Ｓｕｂｊｅｃｔ」に、＜属性値の正規表現＞を「メール」に置き換えた正規表現を出力する。
“（＾｜￥ｎ）Ｓｕｂｊｅｃｔ：［＾￥ｎ］＊（（メール）｜（￥ｎ（￥ｓ｜￥ｔ）＋［＾￥ｎ］＊）＋（メール））”
ここで、検索対象のヘッダフィールドが、複数行にわたることがないことがわかっていれば、単に以下の正規表現を出力するようにしてもよい。
“（＾｜￥ｎ）Ｓｕｂｊｅｃｔ：［＾￥ｎ］＊（メール）” In response to the above input, the regular expression generation unit 104 and the regular expression combination unit 105 set the <header field name> in the regular expression of the text range condition of the header field of the e-mail as follows as follows. ”Is output as a regular expression in which <regular expression of attribute value> is replaced with“ mail ”.
“(^ | ¥ n) Subject: [^ ¥ n] * ((mail) | (¥ n (¥ s | ¥ t) + [^ ¥ n] *) + (mail))”
Here, if it is known that the header field to be searched does not extend over a plurality of lines, the following regular expression may be simply output.
“(^ | ¥ n) Subject: [^ ¥ n] * (mail)”

電子メールの送信日時を表すヘッダフィールド（Ｄａｔｅ）のような対象に、その下限値と上限値とを指定した検索を実行するための正規表現を生成する手順を示す。このときの正規表現生成装置１００の入力は、以下の通りである。
（ａ）文書の形式＝「電子メールのヘッダフィールド」
（ｂ）検索対象の範囲（ヘッダフィールド名）＝「Ｄａｔｅ」
（ｃ）検索条件（下限値と上限値）＝「２００５／１０／１〜２００６／９／３１」
例えば、テキスト範囲条件入力部１０９が、テキスト範囲条件として、（ａ）文書の形式と（ｂ）検索対象の範囲を指定し、属性範囲条件入力部１０１が、属性範囲条件として、（ｃ）範囲の検索条件を指定する。 A procedure for generating a regular expression for executing a search specifying a lower limit value and an upper limit value for a target such as a header field (Date) indicating the transmission date and time of an e-mail will be described. The input of the regular expression generation device 100 at this time is as follows.
(A) Document format = “e-mail header field”
(B) Search target range (header field name) = “Date”
(C) Search conditions (lower limit and upper limit) = “2005/10/1 to 2006/9/31”
For example, the text range condition input unit 109 specifies (a) the document format and (b) the search target range as the text range condition, and the attribute range condition input unit 101 sets (c) range as the attribute range condition. Specify search criteria for.

また、属性範囲条件入力部１０１が、属性範囲条件として、以下の条件も指定する。
（ｄ）属性の種類＝「日付」
（ｅ）区切り文字と階層の順序＝「日月年」（区切り文字が空白文字（厳密には、半角空白）で、右から順に上位とする）
（ｆ）階層「月」の値域＝「Ｊａｎ〜Ｄｅｃ」 The attribute range condition input unit 101 also designates the following conditions as attribute range conditions.
(D) Attribute type = “date”
(E) Delimiter and hierarchy order = “day, month, year” (separator is a space character (strictly, a single-byte space), and the higher order from the right)
(F) Value range of the hierarchy “month” = “Jan to Dec”

ここから、正規表現生成部１０４は、テキスト範囲条件の正規表現と、属性範囲条件の正規表現を以下のように生成することができる。
範囲：“（＾｜￥ｎ）＜ヘッダフィールド名＞：［＾￥ｎ］＊＜属性値の正規表現＞”
属性範囲（属性値の正規表現）：“（（［１−９］｜［１２］［０−９］｜３［０−１］）（Ｏｃｔ｜Ｎｏｖ｜Ｄｅｃ）２００５）｜（（［１−９］｜［１２］［０−９］｜３［０−１］）（Ｊａｎ｜Ｆｅｂ｜Ｍａｒ｜Ａｐｒ｜Ｍａｙ｜Ｊｕｎ｜Ｊｕｌ｜Ａｕｇ｜Ｓｅｐ）２００６）”
最後に、正規表現結合部１０５が、これらを結合すると、所望の正規表現を得ることができる。 From here, the regular expression generation unit 104 can generate a regular expression for the text range condition and a regular expression for the attribute range condition as follows.
Range: “(^ | ¥ n) <Header field name>: [^ ¥ n] * <Regular expression of attribute value>”
Attribute range (regular expression of attribute value): “(([1-9] | [12] [0-9] | 3 [0-1]) (Oct | Nov | Dec) 2005) | (([1- 9] | [12] [0-9] | 3 [0-1]) (Jan | Feb | Mar | Apr | May | Jun | Jul | Aug | Sep) 2006) "
Finally, when the regular expression combining unit 105 combines these, a desired regular expression can be obtained.

ＣＳＶ形式のテキストの場合でも、正規表現の生成手順は電子メールの場合と同様である。ＣＳＶ形式のテキストの各行のｉ番目とｉ＋１番目のカンマに挟まれた範囲を対象に、正規表現によって検索する場合、範囲を特定するための正規表現は、以下の形式になる。
“（＾｜￥ｎ）（［＾，］＊，）｛ｉ｝［＾，］＊＜属性値の正規表現＞” Even in the case of CSV text, the regular expression generation procedure is the same as in the case of electronic mail. When searching for a range sandwiched between the i-th and i + 1-th commas of each line of CSV format text by a regular expression, the regular expression for specifying the range is in the following format.
“(^ | ¥ n) ([^,] *,) {i} [^,] * <regular expression of attribute value>”

このときの、正規表現生成装置１００の入力としては、次の情報が含まれていればよい。
（ａ）文書の形式＝「ＣＳＶ」
（ｂ）検索対象の範囲（フィールドの番号）
（ｃ）範囲の検索条件（正規表現）
例えば、テキスト範囲条件入力部１０９が、テキスト範囲条件として、（ａ）文書の形式と（ｂ）検索対象の範囲を指定し、属性範囲条件入力部１０１が、属性範囲条件として、（ｃ）範囲の検索条件を指定する。 At this time, the input to the regular expression generation device 100 may include the following information.
(A) Document format = “CSV”
(B) Search target range (field number)
(C) Range search condition (regular expression)
For example, the text range condition input unit 109 specifies (a) the document format and (b) the search target range as the text range condition, and the attribute range condition input unit 101 sets (c) range as the attribute range condition. Specify search criteria for.

正規表現には、各フィールドの値がダブルクォーテーション「”」で囲まれているものもある。各フィールドの値がダブルクォーテーションで囲まれているとき、値にカンマが含まれる場合や、値にダブルクォーテーションが２個連続して含まれる場合もある。このときのテキスト範囲の正規表現は以下の形式となる（左端と右端のダブルクォーテーションはこれまでに記述したものと同様、正規表現を強調表示するものであり、正規表現の一部ではない）。
“（＾｜￥ｎ）（“（［＾”］｜”“）＊”，）｛ｉ｝”（［＾”］｜”“）＊＜属性値の正規表現＞” In some regular expressions, the value of each field is enclosed in double quotation marks """. When the value of each field is enclosed in double quotations, the value may include a comma or the value may include two consecutive double quotations. The regular expression of the text range at this time is in the following format (double quotations on the left and right edges highlight the regular expression as described above and are not part of the regular expression).
"(^ | \ N) (" ([^ "] |"")*",) {i} "([^"] | "") * <regular expression of attribute value>"

ＴＳＶ形式の場合は、上記のテキスト範囲の正規表現の区切り文字であるカンマ「，」を、以下の通りタブ（「￥ｔ」で表す）に置き換えるだけでよい。
“（＾｜￥ｎ）（“（［＾”］｜”“）＊”￥ｔ）｛ｉ｝”（［＾”］｜”“）＊＜属性値の正規表現＞” In the case of the TSV format, it is only necessary to replace the comma “,” which is a delimiter for the regular expression in the above text range with a tab (represented by “¥ t”) as follows.
"(^ | \ N) (" ([^ "] |"")*" \ t) {i} "([^"] | "") * <regular expression of attribute value>"

スペース区切り形式のテキストの場合は、上記のテキスト範囲の正規表現の区切り文字であるカンマ「，」やタブを、以下の通り半角空白（「￥ｓ」で表す）に置き換えるだけでよい。
“（＾｜￥ｎ）（“（［＾”］｜”“）＊”￥ｓ）｛ｉ｝”（［＾”］｜”“）＊＜属性値の正規表現＞” In the case of the text in the space delimited format, the comma “,” or tab that is the delimiter for the regular expression in the above text range need only be replaced with a single-byte space (represented by “¥ s”) as follows.
"(^ | \ N) (" ([^ "] |""" * "\ s) {i}" ([^ "] |"") * <regular expression of attribute value>"

また、区切り文字の連続を１個の区切り文字と見なす場合は、単体の区切り文字を“＜区切り文字＞＋”の形式に変更するだけでよい。 In addition, when a series of delimiters is regarded as a single delimiter, it is only necessary to change the single delimiter to the “<delimiter> +” format.

実施の形態１や３において、属性範囲条件の正規表現の前に除外文字指定“［＾ｖ_１−ｖ_ｑ］”という表現を追加することにより、属性範囲条件を厳密に照合する方法を示した。このような属性範囲条件の正規表現と、ＣＳＶ形式やＴＳＶ形式などのフィールドが区切り文字で区切られたテキストを検索するテキスト範囲条件の正規表現を結合する場合には、直前の除外文字指定と合わせて、除外文字指定に区切り文字も含めて記述する。即ち、以下のように正規表現を構成する。
正規表現：“（＾｜￥ｎ）（［＾＜区切り文字＞］＊＜区切り文字＞）｛ｉ｝［＾ｖ_１−ｖ_ｑ＜区切り文字＞］＊＜属性値の正規表現＞”
これにより、“［＾ｖ_１−ｖ_ｑ］”に区切り文字自体が照合されてしまうのを防ぐことができる。 In the first and third embodiments, the method of strictly matching the attribute range condition by adding the expression “[^ v ₁ −v _q ]” to be excluded before the regular expression of the attribute range condition is shown. . When combining a regular expression of such an attribute range condition and a regular expression of a text range condition for searching for text in which fields such as CSV format or TSV format are separated by a delimiter, it is combined with the preceding exclusion character specification. And include the delimiter in the exclusion character specification. That is, a regular expression is constructed as follows.
Regular expression: “(^ | ¥ n) ([^ <delimiter>] * <delimiter>) {i} [^ v ₁ −v _q <delimiter>] * <regular expression of attribute value>”
Thereby, it is possible to prevent the delimiter character from being collated with “[^ v ₁ −v _q ]”.

以上のように、本実施の形態に係る正規表現生成装置１００によれば、既定の構文・書式に従って記述されたテキストの中の、特定の範囲を指定するテキスト範囲条件と、正規表現又は属性範囲条件から、その範囲に含まれる文字列を照合するための正規表現を自動的に生成することができる。これにより、従来は正確に記述することが困難であった複雑な属性範囲条件を照合する正規表現を、特別な知識も試行錯誤も必要なく、容易に短時間で得ることができる。 As described above, according to the regular expression generating apparatus 100 according to the present embodiment, the text range condition for designating a specific range and the regular expression or attribute range in the text described according to the predetermined syntax / format. From the condition, a regular expression for collating character strings included in the range can be automatically generated. As a result, a regular expression for matching complicated attribute range conditions, which has been difficult to describe accurately in the past, can be easily obtained in a short time without requiring special knowledge or trial and error.

以上、本発明の実施の形態について説明したが、これらのうち、２つ以上の実施の形態を組み合わせて実施しても構わない。あるいは、これらのうち、１つの実施の形態を部分的に実施しても構わない。あるいは、これらのうち、２つ以上の実施の形態を部分的に組み合わせて実施しても構わない。 As mentioned above, although embodiment of this invention was described, you may implement combining 2 or more embodiment among these. Alternatively, one of these embodiments may be partially implemented. Or you may implement combining two or more embodiment among these partially.

実施の形態１、３、４に係る正規表現生成装置の構成を示すブロック図である。1 is a block diagram illustrating a configuration of a regular expression generation device according to Embodiments 1, 3, and 4. FIG. 各実施の形態における正規表現生成装置の外観の一例を示す図である。It is a figure which shows an example of the external appearance of the regular expression production | generation apparatus in each embodiment. 各実施の形態における正規表現生成装置のハードウェア資源の一例を示す図である。It is a figure which shows an example of the hardware resource of the regular expression production | generation apparatus in each embodiment. 実施の形態１、３、４に係る正規表現生成方法を示すフローチャートである。5 is a flowchart showing a regular expression generation method according to the first, third, and fourth embodiments. 実施の形態１における図４のステップＳ１０２の詳細を示すフローチャートである。5 is a flowchart showing details of step S102 of FIG. 4 in the first embodiment. 実施の形態１における図４のステップＳ１０４の詳細を示すフローチャートである。5 is a flowchart showing details of step S104 in FIG. 4 in the first embodiment. 実施の形態１における図４のステップＳ１０５の詳細を示すフローチャートである。5 is a flowchart showing details of step S105 in FIG. 4 in the first embodiment. 実施の形態２に係る正規表現生成装置の構成を示すブロック図である。6 is a block diagram illustrating a configuration of a regular expression generation device according to Embodiment 2. FIG. 実施の形態２に係る正規表現生成方法を示すフローチャートである。10 is a flowchart showing a regular expression generation method according to the second embodiment. 実施の形態３における図４のステップＳ１０２の詳細を示すフローチャートである。6 is a flowchart showing details of step S102 in FIG. 4 in the third embodiment. 実施の形態３における図４のステップＳ１０４の詳細を示すフローチャートである。6 is a flowchart showing details of step S104 in FIG. 4 in the third embodiment. 実施の形態３における図４のステップＳ１０５の詳細を示すフローチャートである。6 is a flowchart showing details of step S105 of FIG. 4 in the third embodiment. 実施の形態５に係る正規表現生成装置の構成を示すブロック図である。FIG. 10 is a block diagram illustrating a configuration of a regular expression generation device according to Embodiment 5. 実施の形態５に係る正規表現生成方法を示すフローチャートである。10 is a flowchart showing a regular expression generation method according to the fifth embodiment.

Explanation of symbols

１００正規表現生成装置、１０１属性範囲条件入力部、１０２演算部、１０３属性値記憶部、１０４正規表現生成部、１０５正規表現結合部、１０６出力部、１０７条件記憶部、１０８識別子入力部、１０９テキスト範囲条件入力部、１５１記憶装置、１５２処理装置、１５３入力装置、１５４出力装置、９０１表示装置、９０２キーボード、９０３マウス、９０４ＦＤＤ、９０５ＣＤＤ、９０６プリンタ装置、９１０システムユニット、９１１ＣＰＵ、９１２バス、９１３ＲＯＭ、９１４ＲＡＭ、９１５通信ボード、９２０磁気ディスク装置、９２１オペレーティングシステム、９２２ウィンドウシステム、９２３プログラム群、９２４ファイル群、９４０インターネット、９４１ゲートウェイ、９４２ＬＡＮ。 100 regular expression generator, 101 attribute range condition input unit, 102 arithmetic unit, 103 attribute value storage unit, 104 regular expression generation unit, 105 regular expression combination unit, 106 output unit, 107 condition storage unit, 108 identifier input unit, 109 Text range condition input unit, 151 storage device, 152 processing device, 153 input device, 154 output device, 901 display device, 902 keyboard, 903 mouse, 904 FDD, 905 CDD, 906 printer device, 910 system unit, 911 CPU, 912 Bus, 913 ROM, 914 RAM, 915 communication board, 920 magnetic disk drive, 921 operating system, 922 window system, 923 programs, 924 files, 940 Internet, 941 gateway Lee, 942 LAN.

Claims

An attribute range condition input unit for inputting attribute range condition data indicating a lower limit value, an upper limit value and a format of the attribute value from an input device;
Based on the format indicated by the attribute range condition data input by the attribute range condition input unit, the attribute value is equal to or greater than the lower limit value indicated by the attribute range condition data input by the attribute range condition input unit, and the least significant digit A first value in which at least one digit is the maximum value of the digit and an attribute value equal to or lower than the upper limit value indicated by the attribute range condition data input by the attribute range condition input unit, and at least 1 from the lowest digit A computing unit that computes the second value, the digit of which is the minimum value of the digit, by the processing device;
Attribute value storage that stores the lower limit value and the upper limit value indicated by the attribute range condition data input by the attribute range condition input unit, and the first value and the second value calculated by the calculation unit in a storage device And
Lower region data representing the attribute value from the lower limit value to the first value stored by the attribute value storage unit in a regular expression, and the attribute value from the second value to the upper limit value stored by the attribute value storage unit If the attribute value exists between the first value and the second value stored in the attribute value storage unit, the upper region data representing the regular expression in the regular expression is generated by the processing device. A regular expression generating unit that generates intermediate region data representing a value in a regular expression by a processing device;
The lower region data, the upper region data, and the middle region data generated by the regular expression generation unit are combined by a processing device, and the attribute values from the lower limit value to the upper limit value stored in the attribute value storage unit are normalized. A regular expression generation apparatus comprising: a regular expression combining unit that generates regular expression data represented by an expression.

The arithmetic unit has the same number of digits as the lower limit value indicated by the attribute range condition data input by the attribute range condition input unit based on the format indicated by the attribute range condition data input by the attribute range condition input unit. The first value in which at least the digit other than the most significant digit is the maximum value of each digit and the upper limit value indicated by the attribute range condition data input by the attribute range condition input unit have the same number of digits and at least other than the most significant digit The regular expression generation device according to claim 1, wherein the second value is a minimum value of which the minimum value of each digit is calculated.

The arithmetic unit is a first value having the same value as the lower limit value and the most significant digit indicated by the attribute range condition data input by the attribute range condition input unit, and the other digits being the maximum value of each digit, An upper limit value indicated by the attribute range condition data input by the attribute range condition input unit and a second value in which the most significant digit is the same value and the other digits are the minimum value of each digit are calculated. The regular expression generation device according to claim 2.

2. The regular expression according to claim 1, wherein the attribute range condition input unit inputs attribute range condition data indicating that the attribute value is one of a numerical value and a character string as a format of the attribute value. Generator.

The attribute range condition input unit is an attribute value in which the attribute value has a hierarchical structure using a delimiter as a format of the attribute value, and each of the portions where the entire attribute value is delimited by the delimiter is one layer of the hierarchical structure Attribute range condition data indicating that
The calculation unit calculates a first value and a second value for each attribute value as a single digit for the entire attribute value,
The regular expression generation unit generates the lower area data and the upper area data for each attribute value as a single digit with respect to the entire attribute value, and then represents the lower area data and the upper area data in a regular expression. Converting, generating an intermediate region data when an attribute value exists between the first value and the second value, and further converting each layer into one represented by a regular expression The regular expression generation device according to claim 1.

The attribute range condition input unit inputs attribute range condition data indicating that the attribute value is one of a date, a time, and an IP (Internet Protocol) address as a format of the attribute value. Item 6. The regular expression generation device according to Item 5.

The regular expression generation device further includes:
A text range condition input unit for inputting text range condition data indicating a specific area of text described according to a predetermined syntax rule from an input device;
The attribute range condition input unit inputs attribute range condition data indicating that the attribute value is included in an area indicated by the text range condition data input by the text range condition input unit as a format of the attribute value,
The regular expression generation unit further generates text region data representing the region indicated by the text range condition data input by the text range condition input unit with a regular expression by a processing device,
The regular expression combining unit, based on the format indicated by the attribute range condition data input by the attribute range condition input unit, lower region data, upper region data, and middle region data generated by the regular expression generation unit, The regular expression generation device according to claim 1, wherein the regular expression data is generated by combining the text area data and expressing the specific area of the text including the attribute value by a regular expression.

The regular expression generation device according to claim 7, wherein the text range condition input unit inputs text range condition data indicating a specific header field of an e-mail as the specific area of the text.

8. The regular expression according to claim 7, wherein the text range condition input unit inputs text range condition data indicating any one of the fields in which the text is separated by a delimiter as the specific area of the text. Generator.

The regular expression generation device further includes:
A plurality of attribute range condition data and a plurality of text range condition data are stored in advance in a storage device, and a combination of each attribute range condition data and each text range condition data is associated with a unique identifier in advance. A condition storage unit for storing
An identifier input unit for inputting an arbitrary identifier from an input device;
The attribute range condition input unit inputs attribute range condition data of a combination stored by the condition storage unit in association with the identifier input by the identifier input unit,
The regular text range condition input unit according to claim 7, wherein the text range condition input unit inputs the text range condition data of the combination stored in the condition storage unit in association with the identifier input by the identifier input unit. Expression generator.

The attribute range condition input unit of the regular expression generation device inputs attribute range condition data indicating the lower limit value, upper limit value and format of the attribute value from the input device,
Based on the format indicated by the attribute range condition data input by the attribute range condition input unit, the arithmetic unit of the regular expression generation apparatus is equal to or greater than the lower limit value indicated by the attribute range condition data input by the attribute range condition input unit. A first value in which at least one digit from the least significant digit is the maximum value of the digit, and an attribute value equal to or lower than the upper limit value indicated by the attribute range condition data input by the attribute range condition input unit And a processing device calculates a second value in which at least one digit from the least significant digit is the minimum value of the digit,
The attribute value storage unit of the regular expression generation device includes a lower limit value and an upper limit value indicated by the attribute range condition data input by the attribute range condition input unit, a first value calculated by the calculation unit, and a second value Store the value in a storage device,
The regular expression generation unit of the regular expression generation device stores lower region data representing the attribute value from the lower limit value to the first value stored in the attribute value storage unit in a regular expression, and is stored in the attribute value storage unit. High-order area data representing the attribute value from the second value to the upper limit value with a regular expression is generated by the processing device,
When a regular expression generating unit of the regular expression generating device has an attribute value between the first value and the second value stored by the attribute value storage unit, the regular value generating unit represents the attribute value by a regular expression. The middle region data is generated by the processing device,
The regular expression combining unit of the regular expression generating device combines the lower region data, the upper region data, and the middle region data generated by the regular expression generating unit with a processing device, and is stored in the attribute value storage unit. A regular expression generation method characterized by generating regular expression data representing an attribute value from a lower limit value to an upper limit value by a regular expression.

Attribute range condition input processing for inputting attribute range condition data indicating the lower limit value, upper limit value and format of the attribute value from the input device;
Based on the format indicated by the attribute range condition data input by the attribute range condition input process, the attribute value is equal to or greater than the lower limit value indicated by the attribute range condition data input by the attribute range condition input process, and the least significant digit A first value in which at least one digit is the maximum value of the digit, and an attribute value equal to or lower than the upper limit value indicated by the attribute range condition data input by the attribute range condition input process, and at least 1 from the least significant digit An arithmetic process in which a processing unit calculates a second value whose digit is the minimum value of the digit;
Attribute value storage that stores the lower limit value and the upper limit value indicated by the attribute range condition data input by the attribute range condition input process, and the first value and the second value calculated by the calculation process in a storage device Processing,
Lower region data representing the attribute value from the lower limit value to the first value stored by the attribute value storage processing by a regular expression, and the attribute value from the second value to the upper limit value stored by the attribute value storage processing When the attribute value exists between the first value and the second value stored by the attribute value storage process, the upper region data representing the regular expression in the regular expression is generated by the processing device. A regular expression generation process for generating intermediate region data representing a value in a regular expression by a processing device;
The lower region data, the upper region data, and the middle region data generated by the regular expression generation process are combined by a processing device, and the attribute values from the lower limit value to the upper limit value stored by the attribute value storage process are normalized. A regular expression generation program that causes a computer to execute regular expression combining processing for generating regular expression data represented by an expression.