JP2008217170A

JP2008217170A - Information processor and program

Info

Publication number: JP2008217170A
Application number: JP2007050755A
Authority: JP
Inventors: Hiromoto Kino; 浩誠木野; Kentaro Akashi; 健太郎明石; Tomoo Yoshida; 智生吉田
Original assignee: Toshiba Corp; Toshiba Solutions Corp
Current assignee: Toshiba Corp; Toshiba Digital Solutions Corp
Priority date: 2007-02-28
Filing date: 2007-02-28
Publication date: 2008-09-18

Abstract

<P>PROBLEM TO BE SOLVED: To flexibly perform setting of a conversion rule for converting a text into an XML. <P>SOLUTION: An analysis definition preparation part 21 generates a setting screen for performing the designation of conditions for determining a block as the unit of a range as an analysis object in at least a text and the designation of conditions for extracting individual data in the block as a setting screen for setting a conversion rule for analyzing a text, and for converting it into the XML(extensible markup language), and on the basis of individual conditions designated through the setting screen, prepares analysis definition data including a character string where information showing the individual conditions with tags added thereto as the conversion rule. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、データ変換に使用される変換ルールの設定を行うことが可能な情報処理装置およびプログラムに関する。 The present invention relates to an information processing apparatus and program capable of setting conversion rules used for data conversion.

近年、通信社から新聞社などへ電文として配信される記事の形式をテキストからＸＭＬ（extensible markup language）（ＮｅｗｓＭＬなどのＸＭＬベースの言語を含む）へと順次推移する作業が進められている。各新聞社側においても、記事をＸＭＬ形式で管理する環境づくりが進められている。その一方で、新聞社が自社で発信するような記事は、依然としてテキスト形式で作成されることが少なくない。今後は、このようなテキストなどについても、ＸＭＬへの変換を行ってＸＭＬファイルとして管理し、更にはそのＸＭＬファイルを他の用途にも二次的に利用できるようにすることが望まれる。 2. Description of the Related Art In recent years, the work of sequentially changing the format of an article distributed as a message from a news agency to a newspaper company or the like from text to XML (extensible markup language) (including XML-based languages such as NewsML) has been progressing. Each newspaper company is also working on creating an environment for managing articles in XML format. On the other hand, articles that newspaper companies send themselves are still often created in text format. In the future, it is desired that such texts are also converted to XML and managed as XML files, and the XML files can be used secondarily for other purposes.

テキストの解析やＸＭＬへの変換を行う技術には種々なものがある。例えば、特許文献１には、テキストを文単位に分割する等の処理を通じて読み上げに適した部分を切り出す手法が開示されている。また、特許文献２には、自然文テキストに対して形態素解析等により文節単位での切り出しを行う等の処理を通じてＸＭＬ文書を作成する手法が開示されている。
特開２００２−３３４０７０号公報特開２００３−２８８３３２号公報 There are various techniques for analyzing text and converting to XML. For example, Patent Document 1 discloses a method of cutting out a part suitable for reading through processing such as dividing text into sentence units. Patent Document 2 discloses a technique for creating an XML document through processing such as segmenting a natural sentence text by a morphological analysis or the like.
JP 2002-334070 A JP 2003-288332 A

しかしながら、上記特許文献１および特許文献２を含む従来の手法では、テキストから抽出すべき対象を変更し又は設定したい場合や、その対象に対する変換結果の形態を変更し又は設定したい場合には、柔軟に対応することができない。特に、操作者にとって視覚的に分かりやすい設定画面上で上記した変更の作業や設定の作業を行わせる技術は今後非常に有望視される技術であり、これらの技術は未だ提案されていない。 However, the conventional methods including Patent Document 1 and Patent Document 2 described above are flexible when it is desired to change or set the target to be extracted from the text, or to change or set the form of the conversion result for the target. Can not cope with. In particular, techniques for performing the above-described change work and setting work on a setting screen that is visually understandable for the operator are very promising techniques in the future, and these techniques have not yet been proposed.

本発明は上記実情に鑑みてなされたものであり、テキストをＸＭＬに変換する変換ルールの設定を柔軟に行うことが可能な情報処理装置およびプログラムを提供することを目的とする。 The present invention has been made in view of the above circumstances, and an object thereof is to provide an information processing apparatus and program capable of flexibly setting a conversion rule for converting text into XML.

本発明に係る情報処理装置は、テキストを解析してＸＭＬ（extensible markup language）に変換する変換ルールを設定するための設定画面として、少なくとも前記テキスト内の解析対象となる範囲の単位であるブロックを決定する条件の指定と、当該ブロック内の個別データを抽出する条件の指定とが可能な設定画面を生成する手段と、前記設定画面を通じて指定される個々の条件に基づき、前記変換ルールとして、当該個々の条件を示す情報にそれぞれタグが付された文字列を含む解析定義データを作成する手段とを具備することを特徴とする。 The information processing apparatus according to the present invention, as a setting screen for setting a conversion rule for analyzing text and converting it into XML (extensible markup language), at least a block that is a unit of a range to be analyzed in the text. A means for generating a setting screen capable of specifying a condition to be determined and a condition for extracting individual data in the block, and based on the individual conditions specified through the setting screen, And means for creating analysis definition data including character strings each having a tag attached to information indicating individual conditions.

本発明に係るプログラムは、テキストを解析してＸＭＬ（extensible markup language）に変換する変換ルールを設定するための設定画面として、少なくとも前記テキスト内の解析対象となる範囲の単位であるブロックを決定する条件の指定と、当該ブロック内の個別データを抽出する条件の指定とが可能な設定画面を生成する機能と、前記設定画面を通じて指定される個々の条件に基づき、前記変換ルールとして、当該個々の条件を示す情報にそれぞれタグが付された文字列を含む解析定義データを作成する機能とをコンピュータに実現させることを特徴とする。 The program according to the present invention determines at least a block that is a unit of a range to be analyzed in the text as a setting screen for setting a conversion rule for analyzing text and converting it into XML (extensible markup language). A function for generating a setting screen that can specify a condition and a condition for extracting individual data in the block, and the individual conditions specified through the setting screen, The computer is realized with a function of creating analysis definition data including character strings each having a tag attached to information indicating a condition.

本発明によれば、テキストをＸＭＬに変換する変換ルールの設定を柔軟に行うことが可能となる。 According to the present invention, it is possible to flexibly set a conversion rule for converting text into XML.

以下、図面を参照して、本発明の実施形態について説明する。 Embodiments of the present invention will be described below with reference to the drawings.

図１は、本発明の一実施形態に係る情報処理システムの構成の一例を示す図である。 FIG. 1 is a diagram illustrating an example of a configuration of an information processing system according to an embodiment of the present invention.

情報処理システムは、例えば新聞社に設けられるものであり、サーバシステム１、通信部２、および複数の端末１１，１２，１３…により構成される。通信部２は、通信社から電文として配信されてくる記事を受信し、受信した記事をサーバシステム１へ渡す。サーバシステム１は、記事の素材を管理する素材管理サーバや、記事の素材を用いて記事の加工を行う記事加工サーバなど、各種のサーバを備えている。各サーバは、端末１１，１２，１３…からの要求に応じ、対応するデータベース（ＤＢ）を使用することにより要求された処理を実行する。端末１１，１２，１３…は、それぞれサーバシステム１に各種の処理の実行を要求することができる。 The information processing system is provided in, for example, a newspaper company, and includes a server system 1, a communication unit 2, and a plurality of terminals 11, 12, 13,. The communication unit 2 receives an article distributed as a message from a communication company, and passes the received article to the server system 1. The server system 1 includes various servers such as a material management server that manages article materials and an article processing server that processes articles using article materials. Each server executes a requested process by using a corresponding database (DB) in response to a request from the terminals 11, 12, 13. Each of the terminals 11, 12, 13... Can request the server system 1 to execute various processes.

図２は、本実施形態に係る情報処理装置の構成の一例を示す図である。 FIG. 2 is a diagram illustrating an example of the configuration of the information processing apparatus according to the present embodiment.

情報処理装置（コンピュータ）２０は、例えば前述のサーバシステム１に含まれるあるサーバ（例えば記事加工サーバ）に相当する。この情報処理装置２０には、プロセッサ（ＣＰＵ（central processing unit）など）により実行されるコンピュータプログラムとして、解析定義作成部２１および解析部２２が搭載されている。記憶装置２３には、解析定義作成部２１により作成される解析定義ファイルを保存するための解析定義保存部が格納される。記憶装置２４には、解析部２２へ入力されるテキストファイルが格納される。記憶装置２５には、解析部２２から出力されるＸＭＬファイルが格納される。ここでいう記憶装置２４及び記憶装置２５は物理的に分かれている構成となっているが、物理的に一体の記憶装置としてその内部構成として論理的に分かれているようになっていても構わない。 The information processing apparatus (computer) 20 corresponds to, for example, a certain server (for example, an article processing server) included in the server system 1 described above. The information processing apparatus 20 includes an analysis definition creation unit 21 and an analysis unit 22 as computer programs executed by a processor (CPU (central processing unit) or the like). The storage device 23 stores an analysis definition storage unit for storing an analysis definition file created by the analysis definition creation unit 21. The storage device 24 stores a text file that is input to the analysis unit 22. The storage device 25 stores an XML file output from the analysis unit 22. Here, the storage device 24 and the storage device 25 are physically separated. However, the storage device 24 and the storage device 25 may be physically separated as an internal configuration as a physically integrated storage device. .

なお、入力されるテキストファイル、及び出力されるＸＭＬファイルはファイル形式を想定して記載しているが、ファイル形式に限定せず、データベースのテーブル形式で構成されていても構わない。 The input text file and the output XML file are described assuming the file format, but the file format is not limited to the file format and may be configured in a database table format.

解析定義作成部２１は、ある端末からの要求に応じ、記事のテキストを解析してそれをＸＭＬに変換する変換ルール（以下、「解析定義」と呼ぶ場合もある）を設定するための設定画面を生成する機能を有する。この設定画面では、少なくとも上記テキスト内の解析対象となる範囲の単位であるブロックを決定する条件の指定と、当該ブロック内の個別データを抽出する条件の指定とが可能である。生成された設定画面の情報は、要求元の端末へ送られ、当該端末の表示部に表示される。 The analysis definition creation unit 21 sets a conversion rule (hereinafter also referred to as “analysis definition”) that analyzes the text of an article and converts it into XML in response to a request from a certain terminal. It has the function to generate. In this setting screen, it is possible to specify a condition for determining a block that is a unit of a range to be analyzed in at least the text and a condition for extracting individual data in the block. The generated setting screen information is sent to the requesting terminal and displayed on the display unit of the terminal.

また、解析定義作成部２１は、この生成された設定画面を通じて指定される個々の条件に基づき、上記変換ルールとして、当該個々の条件を示す情報にそれぞれタグが付された文字列を含む解析定義データ（ファイル）を作成し、これを記憶装置２３内の解析定義保存部に保存する。なお、解析定義データはファイル形式を想定して記載しているが、ファイル形式に限定せず、データベースのテーブル形式で構成されていても構わない。 The analysis definition creating unit 21 also includes an analysis definition including character strings each tagged with information indicating the individual conditions as the conversion rule based on the individual conditions specified through the generated setting screen. Data (file) is created and stored in the analysis definition storage unit in the storage device 23. Although the analysis definition data is described assuming a file format, the analysis definition data is not limited to the file format, and may be configured in a database table format.

解析部２２は、記憶装置２３内の解析定義保存部に保存されている解析定義データに従って、記憶装置２４に記憶されているテキストに対する解析範囲の決定と個別データの抽出とを行うことにより、当該テキストをＸＭＬに変換する変換処理を行い、変換処理後のＸＭＬデータを記憶装置２５に格納したり要求元の端末に送信したりする。 The analysis unit 22 determines the analysis range for the text stored in the storage device 24 and extracts the individual data according to the analysis definition data stored in the analysis definition storage unit in the storage device 23, thereby Conversion processing for converting the text into XML is performed, and the XML data after the conversion processing is stored in the storage device 25 or transmitted to the requesting terminal.

図３は、解析定義の一例を説明するための図である。 FIG. 3 is a diagram for explaining an example of the analysis definition.

入力されるテキストが、例えば、東京商店の従業員３名の氏名、年齢、性別、居住地を示すデータ一覧である場合を考える。解析定義データには、「解析対象の範囲（ブロック）」を決定する条件と、解析対象の範囲の中の「個別データ」を抽出する条件とが定義される。 Consider a case where the text to be input is, for example, a data list indicating the name, age, gender, and residence of three employees of a Tokyo store. In the analysis definition data, a condition for determining “analysis target range (block)” and a condition for extracting “individual data” in the analysis target range are defined.

例えば、図３に示されるように、「解析対象の範囲」を決定する条件として、親ブロック３１および複数の子ブロック４１が定義される。親ブロック３１は、テキストの先頭から８行目までに相当し、ＸＭＬのタグとしてrootを割り付けることが定義される。複数の子ブロック４１は、それぞれ、親ブロック３１の中の２行分に相当し、ＸＭＬのタグとしてemployeeを割り付けることが定義される。 For example, as shown in FIG. 3, a parent block 31 and a plurality of child blocks 41 are defined as conditions for determining the “range to be analyzed”. The parent block 31 corresponds to the 8th line from the beginning of the text, and it is defined that root is assigned as an XML tag. Each of the plurality of child blocks 41 corresponds to two rows in the parent block 31, and it is defined that employee is assigned as an XML tag.

一方、「個別データ」を抽出する条件として、複数の子ブロック４１の中にそれぞれ存在する氏名、年齢、性別、居住地に対し、図４中の条件に示されるような位置で抽出を行うことが定義される。また、これら氏名、年齢、性別、居住地に対し、それぞれ、ＸＭＬのタグとしてname、age、sex、placeを割り付けることが定義される。 On the other hand, as a condition for extracting “individual data”, the name, age, gender, and residence existing in each of the plurality of child blocks 41 are extracted at the positions shown in the condition in FIG. Is defined. In addition, it is defined that name, age, sex, and place are assigned as XML tags to the name, age, sex, and place of residence, respectively.

このような解析定義データに基づいて解析・変換が行われた結果であるＸＭＬデータが、図３の右側に示されている。この結果に示されるように、親ブロック３１と子ブロック４１との関係に応じた階層構造が形成され、かつ、抽出された個々の個別データにそれぞれタグが付されているので、各種の用途（例えば、従業員の平均年齢を算出する処理、性別の割合を算出する処理、居住地の都道府県別の分布を算出する処理など）に使用しやすい汎用性の高いデータ構造となっていることがわかる。また、上記ＸＭＬデータを利用する際に、タグを用いたデータアクセスが可能となる。 XML data, which is a result of analysis / conversion based on such analysis definition data, is shown on the right side of FIG. As shown in this result, a hierarchical structure corresponding to the relationship between the parent block 31 and the child block 41 is formed, and each individual piece of extracted data is provided with a tag. (For example, the process of calculating the average age of employees, the process of calculating the gender ratio, the process of calculating the distribution of residences by prefecture, etc.) Recognize. In addition, when using the XML data, data access using a tag becomes possible.

これにより、新たに抽出すべきデータ項目の追加が生じたり、逆に、削除が生じたりしても、ＸＭＬデータを利用する側のシステムにおいて処理手順の大幅な変更等を行う必要がない。 As a result, even if a new data item to be extracted is added or conversely deleted, it is not necessary to make a significant change in the processing procedure in the system using the XML data.

なお、複数の親ブロックのそれぞれの中に子ブロックがある場合、個々の子ブロック内の個別データ抽出条件を定義するにあたり、親ブロック毎に異なる個別データ抽出条件を定義することが可能である。例えば図５に示されるように、テキスト中にブロック５２，５２，５３があり、これらのブロックのそれぞれの中に子ブロックがある場合を考える。この場合、ブロック５１中の子ブロック内の個別データ抽出条件と、ブロック５２の中の子ブロック内の個別データ抽出条件と、ブロック５３の中の子ブロック内の個別データ抽出条件とを異ならせることも可能である。 When there are child blocks in each of the plurality of parent blocks, it is possible to define different individual data extraction conditions for each parent block when defining individual data extraction conditions in each child block. For example, as shown in FIG. 5, consider the case where there are blocks 52, 52, and 53 in the text, and there are child blocks in each of these blocks. In this case, the individual data extraction condition in the child block in the block 51, the individual data extraction condition in the child block in the block 52, and the individual data extraction condition in the child block in the block 53 are made different. Is also possible.

また、解析定義を作成する際に指定可能な条件の種類としては、図６に示されるように「繰り返し定義」、「ブロック変換」、「ブロック抽出条件」、「個別抽出条件」の４種が挙げられる。以下、これら４種の条件について詳細に説明する。 As shown in FIG. 6, four types of conditions that can be specified when creating an analysis definition are “repetition definition”, “block conversion”, “block extraction condition”, and “individual extraction condition”. Can be mentioned. Hereinafter, these four conditions will be described in detail.

１．繰り返し定義
繰り返し定義は、ブロック抽出条件、個別抽出条件を繰り返し適用する範囲を定義する事を目的として使用される。解析対象に対して繰り返し定義が適用された結果は“行単位”に変換（整形）され、ブロック抽出条件、個別抽出条件の全ての繰り返し（行）に対して適用される。 1. Repeat definition The repeat definition is used for the purpose of defining the range where the block extraction condition and individual extraction condition are repeatedly applied. The result of applying the repetition definition to the analysis target is converted (formatted) into “line units” and applied to all the repetitions (rows) of the block extraction condition and the individual extraction conditions.

２．ブロック変換
ブロック変換は、ブロック抽出条件、個別抽出条件を設定可能にするために、ブロックを整形することを目的として使用される。解析対象に対してブロック変換が適用された結果は“行単位”に整形され、整形されたブロックに対し、繰り返し適宜、ブロック抽出条件、個別抽出条件が適用される。 2. Block Conversion Block conversion is used for the purpose of shaping a block so that block extraction conditions and individual extraction conditions can be set. The result of the block transformation applied to the analysis target is shaped in “row units”, and the block extraction condition and the individual extraction condition are repeatedly applied to the shaped block as appropriate.

３．ブロック抽出条件
ブロック抽出条件は、個別抽出条件の適用範囲を決定するために、解析対象をブロックという単位で分割する事を目的として使用される。解析対象に対してブロック抽出条件（複数可）が適用された結果は“行単位”となる。そのため、１つのブロックに対して複数のブロックを定義（入れ子定義）した場合は、元のブロックがそれぞれ別の行（ブロック）として扱われる。最終的には、複数のブロック条件を適用することにより個別抽出条件の適用範囲となる“最小ブロック単位”となるようにブロック条件を指定する。また、上位のブロック定義に繰り返し指定があるときは、繰り返し対象の全てのブロックに対して同じブロック抽出条件が適用される。 3. Block extraction condition The block extraction condition is used for the purpose of dividing the analysis target into units called blocks in order to determine the application range of the individual extraction conditions. The result of applying the block extraction condition (s) to the analysis target is “line unit”. Therefore, when a plurality of blocks are defined (nested definition) for one block, the original block is treated as a separate row (block). Finally, by applying a plurality of block conditions, the block conditions are designated so as to be “minimum block unit” that is the application range of the individual extraction conditions. When the upper block definition is repeatedly specified, the same block extraction condition is applied to all blocks to be repeated.

４．個別抽出条件
個別抽出条件は、入力テキストから最終的に抽出する個別データ項目とタグ名を定義することを目的として使用される。個別抽出条件（複数可）の適用範囲は１つのブロックのみである。そのため、個別抽出条件の定義のためには、必ず上位にブロックを定義して適用範囲を決定する。また、上位のブロック定義に繰り返し指定があるときは、繰り返し対象の全てのブロックに対して同じ個別抽出条件が適用される。 4). Individual extraction conditions The individual extraction conditions are used for the purpose of defining individual data items and tag names that are finally extracted from the input text. The application range of the individual extraction condition (s) is only one block. Therefore, in order to define the individual extraction conditions, the application range is always determined by defining a block at the upper level. When the upper block definition is repeatedly specified, the same individual extraction condition is applied to all the blocks to be repeated.

図７は、テキストをルビ付きＸＭＬデータに変換する処理の一例を示す図である。 FIG. 7 is a diagram illustrating an example of processing for converting text into XML data with ruby.

例えば、テキストが「？本＊日（ほん＊じつ）は？晴＊天（せい＊てん）なり。」である場合を考える。 For example, let us consider a case where the text is “? * Day * day?

このテキストを解析対象として、ルビ付き文字の抽出を実現するためには、次のような処理を実行する。 In order to realize the extraction of characters with ruby using this text as an analysis target, the following processing is executed.

（１）繰り返し解析を行う範囲を設定する。 (1) Set the range for repeated analysis.

（２）上記（１）の繰り返し解析の処理内容を設定する。 (2) The contents of the repeated analysis in (1) above are set.

（２−１）「？▲＊■（△＊□）」の部分のみを他の文字列から分けて抽出する。 (2-1) Extract only the part “? ▲ * ■ (Δ * □)” separately from other character strings.

（但し、▲，■，△，□は、任意の文字であることを示している。）
（２−２）「▲と△」、「■と□」のペアで個別データを抽出する。 (However, ▲, ■, △, and □ indicate any characters.)
(2-2) Individual data is extracted with a pair of “▲ and Δ” and “■ and □”.

このような抽出を行ってタグを付す変換処理を行うことにより、次のような結果が得られる。 The following results are obtained by performing a conversion process for attaching a tag by performing such extraction.

<root>
<Rb><R1>本</R1><R2>ほん</R2></Rb><Rb><R1>日</R1><R2>じつ</R2></Rb>
<Rb><R1>晴</R1><R2>せい</R2></Rb><Rb><R1>天</R1><R2>てん</R2></Rb>
</root>
図８は、解析定義の設定画面の一例を示す図である。ここでは、図７にて説明したルビ解析の例を適用している。 <root>
<Rb><R1> Book </ R1><R2> Hon </ R2></Rb><Rb><R1> Day </ R1><R2> That </ R2></Rb>
<Rb><R1> Sunny </ R1><R2> Sei </ R2></Rb><Rb><R1> Ten </ R1><R2> Ten </ R2></Rb>
</ root>
FIG. 8 is a diagram illustrating an example of an analysis definition setting screen. Here, the example of ruby analysis described in FIG. 7 is applied.

この設定画面は、領域７０Ａと領域７０Ｂとに分けられる。領域７０Ａには、作成する解析定義のファイル名、テスト電文（テストに使用するテキスト）のファイル名、表示方式などを指定することが可能である。 This setting screen is divided into an area 70A and an area 70B. In the area 70A, the file name of the analysis definition to be created, the file name of the test message (text used for the test), the display method, and the like can be designated.

領域７０Ｂには、図６にて説明した各種の条件の指定が可能な項目が設けられる。ここでは、「繰り返し定義」を指定するための項目７１、「ブロック変換」を指定するための項目７２、および「ブロック抽出条件」を指定するための項目７３が設けられる。これらの項目の左側にある階層構造情報を見ることにより、設定対象となっているブロックもしくは項目がわかるようになっている。また、階層構造情報の中の所望の階層をクリック操作することにより、所望のブロックもしくは項目に対する条件指定を行えるようになっている。また、テスト電文の中の対象となっているテストデータを表示する領域７４、テストデータを作成中の解析定義で変換処理した実行結果をリアルタイムに表示する領域７５が設けられる。 In the area 70B, items for which various conditions described in FIG. 6 can be specified are provided. Here, an item 71 for specifying “repetition definition”, an item 72 for specifying “block conversion”, and an item 73 for specifying “block extraction condition” are provided. By looking at the hierarchical structure information on the left side of these items, the block or item to be set can be identified. In addition, by clicking on a desired hierarchy in the hierarchical structure information, it is possible to specify conditions for a desired block or item. In addition, an area 74 for displaying test data as a target in the test message and an area 75 for displaying in real time execution results obtained by converting the test data using the analysis definition being created are provided.

ここで、領域７０Ｂ中に示される各種の項目についてより詳細に説明する。 Here, various items shown in the region 70B will be described in more detail.

・「繰り返し」は、現在選択されているブロックに対して同じ解析条件を繰り返し適用したいときに、その内容を以下から選択する。 “Repetition” selects the content from the following when it is desired to repeatedly apply the same analysis condition to the currently selected block.

「指定なし」：なにもしない。 “Not specified”: Do nothing.

「行方向」：「繰り返し値」で指定された行数で子ブロックとして取り出す。現在のブロックが空になるまで繰り返す。 “Row direction”: Take out as child blocks with the number of rows specified in “Repetition value”. Repeat until the current block is empty.

「行方向（自動）」：現在のブロックが空になるまでブロック抽出を繰り返す。繰り返す行数が全く同一でない場合に指定する。（「終了位置」で「指定文字列＋」を指定した場合など）
「列方向」：「繰り返し値」で指定された列数（文字数）で子ブロックとして取り出す。現在のブロックが空になるまで繰り返す。 “Row direction (automatic)”: Repeat block extraction until the current block is empty. Specify when the number of repeated lines is not exactly the same. (For example, when "Specified character string +" is specified for "End position")
“Column direction”: Extract as child blocks with the number of columns (number of characters) specified in “Repetition value”. Repeat until the current block is empty.

「列方向（ＣＳＶ列）」：「繰り返し値」で指定された文字列をＣＳＶの区切り文字とみなして列方向にブロックとして分割する。 “Column direction (CSV column)”: A character string designated by “Repetition value” is regarded as a delimiter of CSV and is divided into blocks in the column direction.

・「繰り返し値」は、「繰り返し」の選択内容に応じた値を入力する。 In “Repetition value”, a value corresponding to the selection content of “Repetition” is input.

・「ブロック変換」は、ブロックに対して行いたい変換内容を以下から選択する。・ "Block conversion" selects the conversion contents to be performed on the block from the following.

「縦横変換」：行と列を入れ替えることでブロックを変換する。 “Vertical / Horizontal conversion”: Convert blocks by swapping rows and columns.

「指定字詰」：字詰め列数を「変換パラメータ」で指定した数となるようにブロックを変換する。 “Designated character padding”: The block is converted so that the number of character padding columns becomes the number specified by “conversion parameter”.

「指定文字区切」：「変換パラメータ」で指定した文字列を改行とみなしてブロックを変換する。 “Specified character delimiter”: The block is converted by regarding the character string specified in “Conversion parameter” as a line feed.

「指定文字区切（前ＳＰ）」：「変換パラメータ」で指定した文字列を改行とみなしてブロックを変換する。各行での列数が異なる場合は同一列数になるように全角スペースを行の先頭に付加する。 “Designated character delimiter (previous SP)”: The block is converted by regarding the character string designated by the “conversion parameter” as a line feed. If the number of columns in each row is different, a full-width space is added to the beginning of the row so that the number of columns is the same.

・「変換パラメータ」は、「ブロック変換」の選択内容に応じた値を入力する。 In “Conversion parameter”, a value corresponding to the selected content of “Block conversion” is input.

・「追加」は、現在のブロック内に、子ブロックもしくは子項目を追加する。ブロック内にはブロック、項目どちらか一種類のみ追加できる。ブロック、項目はそれぞれ複数追加できる。 “Add” adds a child block or child item within the current block. Only one type of block or item can be added in the block. Multiple blocks and items can be added.

・「削除」は、現在のブロック内の、子ブロックもしくは子項目を削除する。 -“Delete” deletes a child block or child item in the current block.

・「タグ名」は、ブロック／項目のタグ名を入力する。 • For “Tag name”, enter the tag name of the block / item.

・「開始位置」は、ブロック／項目の開始位置の指定方法を以下から選択する。 “Start position” selects the designation method of the start position of the block / item from the following.

「相対位置」：ブロック中の前の項目／ブロックの終了位置から「開始文字数」分の位置を開始位置とする。 “Relative position”: The position corresponding to the “number of start characters” from the previous item in the block / the end position of the block is set as the start position.

「絶対位置」：ブロック内の先頭から「開始文字数」分の位置を開始位置とする。 “Absolute position”: The position corresponding to the “number of start characters” from the beginning in the block is set as the start position.

「指定文字列」：「開始文字列」で指定された文字列が存在する行／列を検索、さらにそこから「開始文字数」で指定された数だけ行／列を移動する。 “Designated character string”: A row / column in which the character string designated by “start character string” exists is searched, and the row / column is moved from there by the number designated by “number of start characters”.

・「開始文字数」：「開始位置」の選択内容に応じた値を入力する。 -“Start Character Count”: Enter a value according to the selected content of “Start Position”.

・「終了位置」：ブロック／項目の終了位置の指定方法を以下から選択する。 "End position": Select a block / item end position designation method from the following.

「相対位置」：開始位置から「終了文字数」分の位置を終了位置とする。 “Relative position”: The position corresponding to the “number of end characters” from the start position is set as the end position.

「絶対位置」：ブロック内の先頭から「終了文字数」分の位置を終了位置とする。 “Absolute position”: The position corresponding to the “number of end characters” from the beginning in the block is set as the end position.

「指定文字列」：「終了文字列」で指定された文字列が存在する行／列を検索、さらにそこから「終了文字数」で指定された数だけ行／列を移動する。 “Specified character string”: A row / column in which the character string specified by “End character string” exists is searched, and the number of lines / columns specified by “Number of end characters” is moved from there.

「最後まで」：現在のブロックの最終行を選択する。 “To the end”: Select the last line of the current block.

「指定文字列＋」：現在行から次行を検査して「終了文字列」で指定された文字列が存在するならば、次の行も同様に検査していく。指定された文字列が存在しない場合にその行で検査を終了する。 “Designated character string +”: The next line is inspected from the current line, and if the character string designated by the “end character string” exists, the next line is also examined in the same manner. If the specified string does not exist, the check ends at that line.

・「終了文字数」：「終了位置」での選択内容に応じた値を入力する。 -“End Character Count”: Enter a value according to the selection at “End Position”.

・「終了文字列」：「終了位置」での選択内容に応じた値を入力する。 "End character string": Enter a value according to the selection at "End position".

図９は、解析定義の設定画面の別の例を示す図である。この設定画面は、図８の設定画面で編集対象としていたブロックの中に存在するある項目を編集対象とする画面である。図８の設定画面では、図９の設定画面と同様、「繰り返し定義」を指定するための項目８１、「ブロック変換」を指定するための項目８２が設けられる。特に、図８の設定画面とは異なり、「項目抽出条件」を指定するための項目８３が設けられる。また、テスト電文の中の対象となっているテストデータを表示する領域８４、テストデータを作成中の解析定義で変換処理した実行結果をリアルタイムに表示する領域８５が設けられる。 FIG. 9 is a diagram illustrating another example of the analysis definition setting screen. This setting screen is a screen for editing an item existing in the block that is the editing target in the setting screen of FIG. In the setting screen of FIG. 8, as in the setting screen of FIG. 9, an item 81 for designating “repetition definition” and an item 82 for designating “block conversion” are provided. In particular, unlike the setting screen of FIG. 8, an item 83 for designating “item extraction condition” is provided. In addition, an area 84 for displaying the test data that is the target in the test message and an area 85 for displaying the execution result obtained by converting the test data with the analysis definition being created in real time are provided.

なお、実行結果の表示方式としては、領域７０Ａの中に示されるように「ＸＭＬ表示」と「テーブル表示」とが用意されており、所望の表示方式を選択することができる。図８および図９では、「ＸＭＬ表示」の方式で実行結果が表示されている。「テーブル表示」については、例えば図１０中の領域７５に示されるような形で表示される。ここでは、野球イニングの例が示されている。３つに分類された情報が横方向にテーブル形式で表示されている様子がわかる。 As an execution result display method, "XML display" and "table display" are prepared as shown in the area 70A, and a desired display method can be selected. In FIG. 8 and FIG. 9, the execution result is displayed by the method of “XML display”. The “table display” is displayed, for example, in a form as shown in an area 75 in FIG. Here, an example of baseball inning is shown. It can be seen that the information classified into three is displayed in a table format in the horizontal direction.

図１１に、図８〜図９にて説明した設定画面を通じて作成された解析定義ファイルの一例を示す。図１１からわかるように、抽出すべきブロックや項目がタグと組み合わせてＸＭＬ方式で定義されているので、ＸＭＬデータへの変換処理を確実に実行することができる。 FIG. 11 shows an example of the analysis definition file created through the setting screen described with reference to FIGS. As can be seen from FIG. 11, since the block or item to be extracted is defined by the XML method in combination with the tag, the conversion process to the XML data can be surely executed.

次に、図１２を参照して、解析定義作成部２１による動作の一例を説明する。 Next, an example of the operation performed by the analysis definition creation unit 21 will be described with reference to FIG.

解析定義作成部２１は、テスト電文を入力し（ステップＳ１１）、ブロック毎に、ブロック抽出のためのブロック変換の条件設定（ステップＳ１３）、ブロック抽出の条件設定（ステップＳ１４）、そのブロックの中での個別データ抽出の条件設定（ステップＳ２０）を含む第１のループ処理（ステップＳ１２〜Ｓ２１）を実行する。 The analysis definition creation unit 21 inputs a test message (step S11), sets block conversion condition settings for block extraction (step S13), block extraction condition settings (step S14) for each block, The first loop processing (steps S12 to S21) including individual data extraction condition setting (step S20) is executed.

第１のループ処理の中において、ブロック抽出の条件設定（ステップＳ１４）を行った後、もしそのブロックの中に子ブロックがあれば、子ブロック毎に、子ブロック抽出の条件設定（ステップＳ１７）および個別データ抽出の条件設定（ステップＳ１８）を含む第２のループ処理（ステップＳ１６〜Ｓ１９）を実行する。一方、子ブロックがなければ、第２のループ処理（ステップＳ１６〜Ｓ１９）は実行しない。 In the first loop process, after the block extraction condition is set (step S14), if there is a child block in the block, the child block extraction condition is set for each child block (step S17). The second loop processing (steps S16 to S19) including the individual data extraction condition setting (step S18) is executed. On the other hand, if there is no child block, the second loop process (steps S16 to S19) is not executed.

第１のループ処理（ステップＳ１２〜Ｓ２１）の後、操作者によって図８や図９にて示した実行ボタンが押下されるなどの操作がなされると、テスト電文の変換テストが実行される（ステップＳ２２）。テスト結果に問題が無く、操作者によって図８や図９にて示した保存ボタンが押下されるなどの操作がなされると、解析定義ファイルが作成され（ステップＳ２３）、解析定義作成部２１による動作が終了する。 After the first loop processing (steps S12 to S21), when the operator performs an operation such as pressing the execution button shown in FIG. 8 or FIG. 9, a test message conversion test is executed ( Step S22). When there is no problem in the test result and the operator performs an operation such as pressing the save button shown in FIG. 8 or FIG. 9, an analysis definition file is created (step S23), and the analysis definition creating unit 21 The operation ends.

次に、図１３および図１４を参照して、解析部２２による動作の一例を説明する。 Next, an example of the operation performed by the analysis unit 22 will be described with reference to FIGS. 13 and 14.

図１３に示されるように、解析部２２は、解析定義作成部２１により作成された解析定義を読み込み（ステップＳ３１）、この解析定義に従って、対象のテキストをＸＭＬに変換する変換処理を実行する（ステップＳ３２）。この変換処理が完了すると、解析部２２による動作が終了する。次に、前述のステップＳ３２における変換処理の詳細を図１４に示す。 As shown in FIG. 13, the analysis unit 22 reads the analysis definition created by the analysis definition creation unit 21 (step S31), and executes a conversion process for converting the target text into XML according to the analysis definition (step S31). Step S32). When this conversion process is completed, the operation by the analysis unit 22 is completed. Next, details of the conversion processing in step S32 described above are shown in FIG.

図１４に示されるように、解析部２２は、まず、与えられたテキストから変換対象となる処理範囲を抽出し（ステップＳ４１）、開始を示すタグを出力する（ステップＳ４２）。そして、変換対象となる処理範囲内の最初から最後まで、ブロック毎の変換処理（解析範囲の決定および個別データの抽出）を含むループ処理（ステップＳ４３〜Ｓ４７）を実行する。 As shown in FIG. 14, the analysis unit 22 first extracts the processing range to be converted from the given text (step S41), and outputs a tag indicating the start (step S42). Then, loop processing (steps S43 to S47) including conversion processing for each block (determination of analysis range and extraction of individual data) is executed from the beginning to the end of the processing range to be converted.

このループ処理の中においては、現在の変換処理の対象となっているブロック内において子変換処理の条件が定義されているか否かが判定される（ステップＳ４４）。子変換処理は、親ブロックの中に１または２以上の子ブロックがある場合に、子ブロック毎に行われる変換処理（解析範囲の決定および個別データの抽出）を意味する。子変換処理の条件が定義されていなければ、現在の変換処理の対象となっているブロック内の個別データの抽出を行い、抽出した結果をタグと共に出力する（ステップＳ４５）。一方、子変換処理の条件が定義されていれば、子変換処理を実行する（ステップＳ４６）。 In this loop process, it is determined whether or not a condition for the child conversion process is defined in the block that is the object of the current conversion process (step S44). The child conversion process means a conversion process (determination of analysis range and extraction of individual data) performed for each child block when there are one or more child blocks in the parent block. If the condition for the child conversion process is not defined, the individual data in the block that is the object of the current conversion process is extracted, and the extracted result is output together with the tag (step S45). On the other hand, if the conditions for the child conversion process are defined, the child conversion process is executed (step S46).

上記ループ処理（ステップＳ４３〜Ｓ４７）の完了後、終了を示すタグを出力し、解析部２２による動作が終了する。 After completion of the loop processing (steps S43 to S47), a tag indicating the end is output, and the operation by the analysis unit 22 is ended.

このように、上記実施形態によれば、解析定義データの作成において、「繰り返し定義」、「ブロック変換」、「ブロック抽出条件」、「個別抽出条件」などの条件の指定を行える設定画面において、テストデータを用いた実行結果を見ながら設定の作業を行うことができるので、テキストをＸＭＬに変換する変換ルールを容易に作成できる。また、テキストから抽出すべき対象を変えたい場合や、変換結果の形態を変えたい場合にも、解析定義を容易に変更することができる。また、作成した解析定義に基づく変換結果をＸＭＬファイルとして作成することにより、要素の値以外にデータの持つ意味、関連、形式を一元的に扱うことが可能となるため、データ利用の汎用性を向上させることができる。具体的には、次のような効果が挙げられる。 Thus, according to the above embodiment, in the creation of analysis definition data, in the setting screen that can specify conditions such as “repetition definition”, “block conversion”, “block extraction condition”, “individual extraction condition”, Since the setting operation can be performed while viewing the execution result using the test data, a conversion rule for converting the text into XML can be easily created. Also, the analysis definition can be easily changed when the object to be extracted from the text is to be changed or when the conversion result is to be changed. In addition, by creating the conversion result based on the created analysis definition as an XML file, it is possible to handle the meaning, relations, and format of the data in addition to the element values in a unified manner. Can be improved. Specifically, the following effects can be mentioned.

・ＸＭＬから要素を抽出する手段は、いくつかの標準的な方法がさまざまなプラットフォームで実装されているため、汎用的に利用できる。 The means for extracting elements from XML can be used universally because several standard methods are implemented on various platforms.

・テキストからの要素抽出と抽出した結果の加工とを分離して行うと、入力テキストフォーマットの変更を行う際には、変更のあった箇所に対する解析設定を変更するのみで良く、解析処理の変更箇所が局所化できる。 -If element extraction from text and processing of the extracted results are performed separately, when changing the input text format, it is only necessary to change the analysis settings for the changed part, and change the analysis process The location can be localized.

・要素の抽出結果に変更（追加、削除）が発生した場合でも、入力テキストを分析しその分析結果を二次的に利用する後段処理ではＸＭＬのタグ名や構造を元に値を取得するため、影響は変更があったタグを扱う箇所のみとなり、後段処理の変更箇所が局所化できる。・ Even if there is a change (addition or deletion) in the extraction result of an element, the value is obtained based on the XML tag name or structure in the subsequent processing that analyzes the input text and uses the analysis result secondarily. , The effect is only on the part where the changed tag is handled, and the changed part of the subsequent processing can be localized.

・同一のテキストフォーマットに対して後段処理で異なる加工を行う場合でも、テキスト解析部分は共通化できるため、汎用性が高くなる。 -Even when different processing is performed on the same text format in subsequent processing, the text analysis part can be made common, so that versatility is enhanced.

（変形例１）
上記実施形態における解析定義作成部２１の追加機能として、設定画面の中に表示される各種の解析用定義（ブロック定義、要素定義）の条件入力を行うための項目のうち、所定の項目に対して条件の指定が行われなかった場合に不適切もしくは不正であることを示すメッセージなどの情報を表示する機能を備えるようにしてもよい。この場合、所定の項目に対し、例えば「入力必須」を示す属性情報などを予め付与しておくことにより、その他の項目と区別することができる。この手法により、入力テキストを解析する時の事前フォーマット検証を容易に行える。 (Modification 1)
As an additional function of the analysis definition creation unit 21 in the above embodiment, among the items for inputting conditions of various analysis definitions (block definition, element definition) displayed in the setting screen, a predetermined item is selected. If a condition is not specified, a function for displaying information such as a message indicating that the condition is inappropriate or illegal may be provided. In this case, for example, attribute information indicating “input required” or the like is given to a predetermined item in advance, so that it can be distinguished from other items. This technique facilitates preformat verification when parsing input text.

また、上記実施形態における解析定義作成部２１の追加機能として、設定画面の中に表示される各種の解析用定義（ブロック定義、要素定義）の条件入力を行うための項目のうち、所定の項目に対して条件の指定が行われなかった場合にその項目に対して予め定められた条件を指定する補完処理を行う機能を備えるようにしてもよい。この場合、所定の項目に対し、例えば、値が未入力のときに値を補うための仮の値である「暗黙値」の自動入力を行うことを示す属性情報などを予め付与しておけばよい。この手法により、入力テキストを解析する時に、動作しない等の不都合の発生を防ぐことができる。 In addition, as an additional function of the analysis definition creation unit 21 in the above embodiment, among predetermined items for inputting conditions of various analysis definitions (block definition, element definition) displayed in the setting screen, a predetermined item When a condition is not designated for the item, a function of performing a complementing process for designating a predetermined condition for the item may be provided. In this case, for example, attribute information indicating that automatic input of an “implicit value” that is a provisional value for supplementing a value when a value is not input may be given in advance to a predetermined item. Good. By this method, it is possible to prevent the occurrence of inconveniences such as malfunction when the input text is analyzed.

（変形例２）
上記実施形態における解析定義作成部２１の追加機能として、入力テキスト解析時に使用する解析用定義（ブロック定義、要素定義）からＤＴＤ（document type definition）を自動生成する機能を備えるようにしてもよい。このようにすると、抽出結果（出力ＸＭＬファイル）を利用する側でフォーマット検証を行うことが可能となる。 (Modification 2)
As an additional function of the analysis definition creating unit 21 in the above embodiment, a function of automatically generating a DTD (document type definition) from an analysis definition (block definition, element definition) used at the time of input text analysis may be provided. In this way, it is possible to perform format verification on the side using the extraction result (output XML file).

なお、本発明は上記実施形態そのままに限定されるものではなく、実施段階ではその要旨を逸脱しない範囲で構成要素を変形して具体化できる。また、上記実施形態に開示されている複数の構成要素の適宜な組み合わせにより、種々の発明を形成できる。例えば、実施形態に示される全構成要素から幾つかの構成要素を削除してもよい。さらに、異なる実施形態にわたる構成要素を適宜組み合わせてもよい。 Note that the present invention is not limited to the above-described embodiment as it is, and can be embodied by modifying the constituent elements without departing from the scope of the invention in the implementation stage. In addition, various inventions can be formed by appropriately combining a plurality of constituent elements disclosed in the embodiment. For example, some components may be deleted from all the components shown in the embodiment. Furthermore, constituent elements over different embodiments may be appropriately combined.

本発明の一実施形態に係る情報処理システムの構成の一例を示す図。The figure which shows an example of a structure of the information processing system which concerns on one Embodiment of this invention. 同実施形態に係る情報処理装置の構成の一例を示す図。The figure which shows an example of a structure of the information processing apparatus which concerns on the embodiment. 解析定義の一例を説明するための図。The figure for demonstrating an example of an analysis definition. 抽出項目の条件を説明するための図。The figure for demonstrating the conditions of an extraction item. 各ブロックの中の子ブロック内の個別データ抽出条件を異ならせることが可能であることを説明するための図。The figure for demonstrating that the separate data extraction conditions in the child block in each block can be varied. 解析定義を作成する際に指定可能な４種類の条件を説明するための図。The figure for demonstrating four types of conditions which can be specified when creating an analysis definition. テキストをルビ付きＸＭＬデータに変換する処理の一例を示す図。The figure which shows an example of the process which converts a text into XML data with a ruby. 解析定義の設定画面の一例を示す図。The figure which shows an example of the setting screen of an analysis definition. 解析定義の設定画面の別の例を示す図。The figure which shows another example of the setting screen of an analysis definition. テーブル形式の表示を含む設定画面の一例を示す図。The figure which shows an example of the setting screen containing the display of a table format. 設定画面を通じて作成された解析定義ファイルの一例を示す図。The figure which shows an example of the analysis definition file produced through the setting screen. 解析定義作成部による動作の一例を示すフローチャート。The flowchart which shows an example of the operation | movement by an analysis definition preparation part. 解析部による動作の一例を示すフローチャート。The flowchart which shows an example of operation | movement by an analysis part. 図１３中の変換処理の詳細の一例を示すフローチャート。14 is a flowchart showing an example of details of conversion processing in FIG. 13.

Explanation of symbols

１…サーバシステム、２…通信部、１１，１２，１３…端末、２０…情報処理装置、２１…解析定義作成部、２２…解析部、２３，２４，２５…記憶装置。 DESCRIPTION OF SYMBOLS 1 ... Server system, 2 ... Communication part, 11, 12, 13 ... Terminal, 20 ... Information processing apparatus, 21 ... Analysis definition preparation part, 22 ... Analysis part, 23, 24, 25 ... Memory | storage device.

Claims

As a setting screen for setting a conversion rule for analyzing text and converting it into XML (extensible markup language), specification of a condition for determining at least a block as a unit of a range to be analyzed in the text, and the block A means for generating a setting screen capable of specifying conditions for extracting individual data in
Based on the individual conditions specified through the setting screen, the conversion rule includes means for creating analysis definition data including character strings each tagged with information indicating the individual conditions. Information processing apparatus.

2. The information processing apparatus according to claim 1, further comprising means for converting the text into XML according to the analysis definition data.

In the setting screen,
A first item for designating a condition for extracting the block;
A second item for specifying a condition for extracting individual data included in the block;
A third item for designating a range for repeatedly applying the condition specified by the first item or a range for repeatedly applying the condition specified by the second item;
The information processing apparatus according to claim 1, wherein a fourth item for designating a condition for shaping the extraction target block is displayed.

2. The information processing apparatus according to claim 1, further comprising: means for displaying information indicating that the predetermined item displayed in the setting screen is inappropriate when a condition is not specified for the predetermined item. 4. The information processing apparatus according to any one of 3.

And a means for performing a complementing process for designating a predetermined special condition for the item when the condition is not designated for the predetermined item displayed in the setting screen. The information processing apparatus according to claim 1, wherein the information processing apparatus is an information processing apparatus.

As a setting screen for setting a conversion rule for analyzing text and converting it into XML (extensible markup language), specification of a condition for determining at least a block as a unit of a range to be analyzed in the text, and the block A function to generate a setting screen that can specify the conditions for extracting individual data in
Based on the individual conditions specified through the setting screen, the computer realizes, as the conversion rule, a function of creating analysis definition data including character strings each tagged with information indicating the individual conditions. A program characterized by