JP2007310617A

JP2007310617A - Structured document processing system, structured document processing method and program

Info

Publication number: JP2007310617A
Application number: JP2006138527A
Authority: JP
Inventors: Kazuya Koyama; 和也小山; Keiichi Iguchi; 圭一井口
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2006-05-18
Filing date: 2006-05-18
Publication date: 2007-11-29

Abstract

<P>PROBLEM TO BE SOLVED: To perform processing of evaluating a search expression at high speed by using differential information given, in information extraction processing by search expression evaluation from a structured document. <P>SOLUTION: A search expression 111 registered beforehand is classified into a simple structural expression, an expression with predicate, a complex expression, and the like and registered to a search expression table 108, the search expression is applied to an original document 112 registered beforehand, and its application result and calculation process information are calculated, and registered to an original document table 109. When the differential information 113 is input, it is analyzed by a differential analysis part 104 and classified into either a value difference or structure difference, and in accordance with combination of the kind of differential information 113 and the kind of search expression, the search expression is evaluated by using the differential information 113 and the calculation process information by a differential search expression evaluation part 105 when differential calculation is possible; otherwise, a structured document is restored by a full-text search expression evaluation part 106 and the search expression is evaluated. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は主に構造化文書に対する検索処理に関し、特に他の構造化文書との差分情報によって表現された構造化文書に対する検索式の評価を、他の構造化文書に対する検索式の評価結果と差分情報とに基づいて行う構造化文書処理システム、構造化文書処理方法および構造化文書処理用プログラムに関する。 The present invention mainly relates to a search process for a structured document, and in particular, evaluates a search expression for a structured document expressed by difference information from another structured document, and evaluates a difference between a search expression evaluation result for another structured document and a difference. The present invention relates to a structured document processing system, a structured document processing method, and a structured document processing program that are performed based on information.

(XMLとXPath)
XMLやSGMLを始めとする構造化文書は、柔軟性や拡張性などの利点から通信の標準規格など様々な用途で利用されている。構造化文書を標準規格に採用した通信規格としては、SOAP(Simple Object Access Protocol)などが著名である。 (XML and XPath)
Structured documents such as XML and SGML are used for various purposes such as communication standards because of their advantages such as flexibility and extensibility. As a communication standard that employs a structured document as a standard, SOAP (Simple Object Access Protocol) and the like are prominent.

構造化文書は、一つの要素を根とした木構造を構成する。木の幹や葉の部分をノードと呼び、一般に各ノードは要素名、要素値、属性名、属性値、子要素などから構成される。繋がりのあるノード同士の関係は、より根に近い側を"親"や"先祖"、より葉に近い側を"子"や"子孫"、同一の親を持つ子ノードを"兄弟"などと表現される。ノードの中で、特に要素値や属性値など値を示し子孫を持たないノードを、本明細書では「値ノード」と表現する。 A structured document constitutes a tree structure with one element as a root. A tree trunk or a leaf portion is called a node, and each node is generally composed of an element name, an element value, an attribute name, an attribute value, a child element, and the like. The relationship between the connected nodes is as follows: “parent” and “ancestor” on the side closer to the root, “child” and “descendant” on the side closer to the leaf, and “sibling” on the child node with the same parent. Expressed. Among nodes, a node that indicates a value such as an element value or an attribute value and has no descendant is expressed as a “value node” in this specification.

このような構造化文書、特にXML内の特定の要素を抽出するための検索式としてXPath(XML Path Language)が利用されている。XPathは標準化団体W3C（WWWコンソーシアム）によって規格化され非特許文献１に仕様が記載されている。 XPath (XML Path Language) is used as a search expression for extracting such a structured document, particularly a specific element in XML. XPath is standardized by the standardization organization W3C (WWW Consortium), and its specifications are described in Non-Patent Document 1.

XPathでは検索式を、XMLの要素を“/”で区切って列記し要素間の構造関係を示すことで表現し、それによりXML文書内でその関係を満たす箇所を指定する。よく利用される式は、"/A/B/C"のように文書の根から子の方向に辿るパスを記述するものであり、構造化文書中で該当する"C"の要素とその子孫が検索式の結果となる。一つの検索式で一つの文書内での該当箇所は複数になり得る。 In XPath, a search expression is expressed by listing XML elements separated by “/” to indicate the structural relationship between the elements, thereby specifying a location that satisfies the relationship in the XML document. Commonly used expressions describe a path that goes from the root of the document to the child, such as "/ A / B / C". The corresponding "C" element and its descendants in the structured document Is the result of the search expression. There can be a plurality of corresponding locations in one document with one search expression.

また述語と呼ばれる表現で、構造関係にさらに付加的な条件を課す事も出来る。これは要素の後ろに"[]"で囲む形で表現し、"[]"内にさらに検索式やその結果に対する条件を記述することで表現する。よくある述語の用途は、"/A/B[C=1]"のようにある要素に対してその近傍の値ノードの内容に対する条件の記述や、"/A/B[2]"など兄弟ノード群の中での順番の指定などである。 It is also possible to impose additional conditions on the structural relationship with expressions called predicates. This is expressed by enclosing "[]" after the element, and further describing the search expression and the condition for the result in "[]". Common uses of predicates include the description of conditions for the contents of value nodes near the element such as "/ A / B [C = 1]" and siblings such as "/ A / B [2]" For example, the order of the node group.

さらにXPathでは、親子以外の関係やある検索式での指定箇所に対する関数処理、またそれらの様々な組合せで、非常に複雑な検索条件を記述する事が可能となっている。 Furthermore, in XPath, it is possible to describe very complex search conditions by using functions other than parent-child relationships, function processing for specified locations in certain search expressions, and various combinations thereof.

(XPath検索の既存技術)
しかしながら、このXPathの検索は負荷の重い処理であり、これを方式の工夫により高速に処理する技術が提案されてきた。例えば特許文献１では、検索すべきXPathを事前に登録してオートマトンに変換しておき、XML文書をSAXパーサを用いて逐次解析しながらこのオートマトンを駆動することで高速にXPath検索を行う技術を開示している。 (Existing XPath search technology)
However, this XPath search is a heavy processing, and a technique for processing this at a high speed by devising the method has been proposed. For example, in Patent Document 1, the XPath to be searched is registered in advance and converted into an automaton, and an XPath search is performed at high speed by driving the automaton while sequentially analyzing an XML document using a SAX parser. Disclosure.

一方XPathの検索ではないが、類似の構造化文書を多数扱うようなシステム、例えば同一システム間の通信メッセージの処理システムなどにおいては、ある一つの文書を処理したらその結果を保持しておき、類似した他の文書を処理する時には、以前の文書との差分を求め、この差分を以前の文書の処理結果に適応することで処理を高速化する技術が幾つか提案されている。 On the other hand, it is not an XPath search, but in a system that handles many similar structured documents, for example, a communication message processing system between the same system, if one document is processed, the result is retained and similar When processing other documents, several techniques have been proposed for speeding up the processing by obtaining a difference from the previous document and adapting this difference to the processing result of the previous document.

例えば特許文献２や特許文献３では、構造化文書の処理としてHTML文書のレンダリングを高速化するためにこのような方式を使用している。図23は差分情報を用いた構造化文書処理装置の従来技術の例である。差分情報を用いた従来の構造化文書処理装置は、入力部2301と、差分計算部2302と、差分反映部2303と、出力部2304と、元文書格納部2305と、元文書処理結果格納部2306から構成されている。 For example, in Patent Document 2 and Patent Document 3, such a method is used in order to speed up rendering of an HTML document as structured document processing. FIG. 23 shows an example of the prior art of a structured document processing apparatus using difference information. A conventional structured document processing apparatus using difference information includes an input unit 2301, a difference calculation unit 2302, a difference reflection unit 2303, an output unit 2304, an original document storage unit 2305, and an original document processing result storage unit 2306. It is composed of

このような構成を有する従来の差分情報を用いた構造化文書処理装置はつぎのように動作する。 The conventional structured document processing apparatus using the difference information having such a configuration operates as follows.

すなわち、事前に元文書格納部2305に元文書が登録され、その処理結果であるレンダリング結果が元文書処理結果格納部2306に登録される。ここで入力部2301に実際に処理すべき新文書が与えられると、まず差分計算部2302で元文書と新文書の差分を計算し、差分反映部2303で元文書処理結果格納部2306に保存されている処理結果をとりだし、その差分情報と元文書の処理結果から元文書の処理結果の修正すべき点を決定し、該当部分のみを修正してそれ以外の部分は元文書の処理結果を再利用した新文書の処理結果を作成し、出力部2304で出力する。 That is, the original document is registered in the original document storage unit 2305 in advance, and the rendering result that is the processing result is registered in the original document processing result storage unit 2306. Here, when a new document to be actually processed is given to the input unit 2301, first, the difference calculation unit 2302 calculates the difference between the original document and the new document, and the difference reflection unit 2303 stores the difference in the original document processing result storage unit 2306. The processing result of the original document is determined from the difference information and the processing result of the original document, the point to be corrected of the processing result of the original document is determined, only the corresponding part is corrected, and the processing result of the original document is re-executed for the other parts. A processing result of the used new document is created and output by the output unit 2304.

また特許文献４では、XML文書の解析処理に同様な手法を用いている。 In Patent Document 4, a similar technique is used for XML document analysis processing.

これらの技術では、差分情報によって元文書の変更が指示された場合、その処理結果への反映を行うべく、必ず差分情報と処理結果の対応関係を取得する。すなわち、差分情報と処理結果の対応関係が取得できない場合は想定されていない。 In these techniques, when the change of the original document is instructed by the difference information, the correspondence between the difference information and the processing result is always acquired in order to reflect the change in the processing result. That is, it is not assumed that the correspondence between the difference information and the processing result cannot be acquired.

例えば特許文献２や特許文献３では、処理結果であるレンダリング結果が画面部品の入れ子関係に基づく木構造を構成し、これが元の構造化文書の木構造にほぼそのまま対応するため、元文書の木構造のある箇所が変更されていた場合、該当するレンダリング結果のどの画面部品を修正するのかを特定している。 For example, in Patent Document 2 and Patent Document 3, the rendering result, which is the processing result, forms a tree structure based on the nesting relationship of the screen parts, and this corresponds to the tree structure of the original structured document as it is. When a certain part of the structure is changed, it specifies which screen part of the corresponding rendering result is to be corrected.

一方特許文献４では、入力としてバイナリ列としての構造化文書が与えられ、差分情報もこのバイナリ列の差分となるが、これを処理結果である解析済み文書データへ反映するために、状態遷移機械を用いてその対応関係を得ている。 On the other hand, in Patent Document 4, a structured document as a binary string is given as an input, and difference information is also a difference of the binary string. In order to reflect this in the analyzed document data that is the processing result, the state transition machine The correspondence is obtained using.

特開2003-323429号公報JP 2003-323429 A 特開2005-108101号公報JP 2005-108101 A 特開2002-091730号公報JP 2002-091730 A 特開2006-024179号公報JP 2006-024179 A 「エックスエムエルパスランゲージ（エックスパス）“XML Path Language(XPath)”」、［online］、［２００４年１２月２２日検索］、インターネット＜URL:http://www.w3.org/TR/xpath＞“XML Path Language (XPath)“ XML Path Language (XPath) ””, [online], [searched on December 22, 2004], Internet <URL: http://www.w3.org/TR/ xpath>

特許文献１のような高速化手法では、方式自体は比較的高速であるものの、差分情報を有効に利用できないため、処理のためには毎回文書全体を処理する必要がある。その結果、類似の文書を多数扱わねばならない時でも常に毎回文書全体を検索処理せねばならず、高速化の効果には限界がある。 In the speed-up method as in Patent Document 1, although the method itself is relatively fast, the difference information cannot be used effectively, so that the entire document needs to be processed every time for processing. As a result, even when a large number of similar documents have to be handled, the entire document must always be searched and the speed-up effect is limited.

一方、特許文献２，３，４などの従来技術では差分情報と以前の処理結果を用いてレンダリングや文書解析処理を高速化する手法は開示されているが、XPathなどの検索式評価による情報抽出処理に対して差分情報を有効活用した事例は見当たらない。その理由は、XPathなどの検索式評価による情報抽出処理では、元の文書構造と処理結果の関係がきわめて多様であり、特許文献２，３，４などの従来技術のように差分情報と処理結果の対応関係を常に正確に求めることができないためである。 On the other hand, conventional techniques such as Patent Documents 2, 3 and 4 disclose a method for speeding up rendering and document analysis processing using difference information and previous processing results, but information extraction by search expression evaluation such as XPath is disclosed. There are no examples of effective use of difference information for processing. The reason is that in the information extraction process based on the evaluation of the search expression such as XPath, the relationship between the original document structure and the processing result is extremely diverse, and the difference information and the processing result are different from the conventional techniques such as Patent Documents 2, 3, and 4. This is because it is not always possible to accurately obtain the corresponding relationship.

このため、類似の構造化文書を多数扱うようなシステムにおける構造化文書からの検索式評価による情報抽出処理において、既知の構造化文書である元文書との差分情報が与えられた時に、その差分情報を用いて元文書とそれに対する情報抽出処理結果から高速に情報抽出を行える新規な手法の開発が切望されている。 For this reason, in the information extraction process based on the retrieval formula evaluation from a structured document in a system that handles many similar structured documents, when the difference information from the original document that is a known structured document is given, the difference Development of a new method that can extract information at high speed from the original document and the information extraction processing result for the original document using information is eagerly desired.

［発明の目的］
本発明はこのような事情に鑑みて提案されたものであり、その目的は、XPathなどの検索式評価による情報抽出処理に対して差分情報を有効活用した新規な構造化文書処理システム及びその方法を提供することにある。 [Object of invention]
The present invention has been proposed in view of such circumstances, and its object is to provide a novel structured document processing system and method for effectively using difference information for information extraction processing based on search expression evaluation such as XPath. Is to provide.

本発明者は、XPathなどの検索式評価による情報抽出処理において元の文書構造と処理結果の関係がきわめて多様になる原因は検索式および差分情報の双方にあること、検索式および差分情報をその性質により分類すれば、差分情報による検索式の評価が行えるケースを検索式と差分情報の種類の組合せで特定できることを見出して本発明を完成させたものであり、本発明の第１の構造化文書処理システムは、検索式をその性質に基づいて分類して保持する検索式管理部と、元文書を前記保持された検索式に適用した結果および適用時の処理過程情報とともに格納する元文書管理部と、構造化文書の差分情報をその性質に基づいて分類し、差分の種類と検索式の種類とに基づいて検索式評価方法を決定する差分解析部と、差分情報による検索式評価が可能な場合に、差分情報と、検索式及びその分類結果と、元文書と、元文書に対する検索式の評価結果とを用いて検索式の評価を行う差分検索式評価部と、差分情報による検索式評価が不可能な場合に、元文書に差分情報を適用した新文書を生成してこれに検索式を適用する全文検索式評価部と、差分検索式評価部と全文検索式評価部の結果をあわせて元文書に差分情報を適用した文書に対する検索式の評価結果として出力する出力部とを備えたことを特徴とする。 The present inventor has found that the reason why the relationship between the original document structure and the processing result is extremely diverse in the information extraction process by the evaluation of the search expression such as XPath is in both the search expression and the difference information. The present invention has been completed by finding that a case in which a search formula can be evaluated based on difference information can be specified by a combination of the search formula and the type of difference information if classified according to the nature. The document processing system includes a search expression management unit that classifies and holds search expressions based on their properties, and an original document management that stores the original document together with a result of applying the original document to the held search expression and processing process information at the time of application. A difference analysis unit that classifies difference information of a structured document based on its properties, determines a search expression evaluation method based on a difference type and a search expression type, and a search expression based on difference information A difference search expression evaluation unit that evaluates the search expression using the difference information, the search expression and its classification result, the original document, and the evaluation result of the search expression for the original document when the price is possible, and the difference information A full-text search expression evaluation unit that generates a new document in which difference information is applied to the original document and applies the search expression to the original document, a differential search expression evaluation unit, and a full-text search expression evaluation unit And an output unit that outputs the evaluation result of the retrieval formula for the document in which the difference information is applied to the original document.

本発明の第２の構造化文書処理システムは、第１の構造化文書処理システムにおいて、前記検索式管理部は、検索式を、検索式が構造化文書のノードの親子関係のみを含む場合に単純構造式と分類し、検索式が単純構造式に単純構造式を含む述語を付加したものである場合に述語付式と分類し、それ以外の式を複雑式と分類するものであり、前記差分解析部は、差分情報を変更箇所が値か文書構造かで分類し、変更箇所が値のみの場合に値差分と分類するものであり、且つ、差分情報が値差分の場合でかつ検索式が単純構造式あるいは述語付式である場合に差分情報による検索式評価が可能と判断するものであることを特徴とする。 According to a second structured document processing system of the present invention, in the first structured document processing system, the search expression management unit includes a search expression when the search expression includes only a parent-child relationship of nodes of the structured document. It is classified as a simple structural formula, and when the search formula is a simple structural formula with a predicate including the simple structural formula added, it is classified as a predicated expression, and the other formula is classified as a complex formula, The difference analysis unit classifies the difference information according to whether the change part is a value or a document structure, and classifies the difference information as a value difference when the change part is only a value, and when the difference information is a value difference and a search formula Is a simple structural formula or a predicate-added formula, it is judged that the retrieval formula evaluation based on the difference information is possible.

本発明の第３の構造化文書処理システムは、第２の構造化文書処理システムにおいて、前記元文書管理部は、前記処理過程情報を、検索式の適用結果に含まれるノードのリストである結果ノードセットと、検索式の適用結果を導出するために検索式が直接評価したノードのリストである条件ノードセットとして保存することを特徴とする。 According to a third structured document processing system of the present invention, in the second structured document processing system, the original document management unit is a result of the processing process information being a list of nodes included in the application result of the search expression. The node set is stored as a conditional node set that is a list of nodes directly evaluated by the search expression in order to derive the application result of the search expression.

本発明の第４の構造化文書処理システムは、第３の構造化文書処理システムにおいて、前記元文書管理部は、前記処理過程情報としてさらに、検索式に述語が含まれ、かつ述語以外の式は該当し、かつ述語の評価結果によって該当しないと判断されたノードのリストである候補ノードセットを保存することを特徴とする。 According to a fourth structured document processing system of the present invention, in the third structured document processing system, the original document management unit further includes a predicate in the search expression as the processing step information, and an expression other than the predicate Is a candidate node set that is a list of nodes that are applicable and determined not to be applicable according to the evaluation result of the predicate.

本発明の第５の構造化文書処理システムは、第４の構造化文書処理システムにおいて、前記元文書管理部は、元文書を各ノードに一意なIDをつけて保存するとともに処理過程情報のノードセットを前記IDによって指定し、前記差分解析部は、差分情報を前記IDによって変更箇所を指定することを特徴とする。 According to a fifth structured document processing system of the present invention, in the fourth structured document processing system, the original document management unit stores the original document by assigning a unique ID to each node and processing process information nodes The set is designated by the ID, and the difference analysis unit designates the change location of the difference information by the ID.

本発明の第６の構造化文書処理システムは、第４または第５の構造化文書処理システムにおいて、前記差分検索式評価部は、差分が前記値差分であり、かつ検索式が前記単純構造式である場合に、前記結果ノードセットによって前記元文書に対する処理結果の出力に対する変更があるか否かを調べ、無い場合には前記処理結果をそのまま検索式の評価結果とし、変更がある場合はそれを適用しこの元文書の処理結果に差分を適用したものを評価結果とすることを特徴とする。 According to a sixth structured document processing system of the present invention, in the fourth or fifth structured document processing system, the difference search expression evaluation unit is configured such that the difference is the value difference and the search expression is the simple structure expression. If there is a change to the output of the processing result for the original document by the result node set, if there is no change, the processing result is directly used as the evaluation result of the search expression. And applying the difference to the processing result of the original document is used as the evaluation result.

本発明の第７の構造化文書処理システムは、第６の構造化文書処理システムにおいて、前記差分検索式評価部は、さらに、差分が前記値差分であり、かつ検索式が前記述語付式である場合に、前記条件ノードセットによって差分による変更が前記元文書に対する検索式評価の出力結果が無効になる可能性があるか否かを調べ、可能性がある場合には前記条件ノードセットに関連する差分のみを適用してその部分に対して検索式を再評価し、前記条件ノードセットの調査で可能性が無いか、あるいは前記検索式の再評価の結果検索式と該当すると判断された場合に、前記結果ノードセットによって前記元文書に対する処理結果の出力に対する変更があるか否かを調べ、無い場合には前記処理結果をそのまま検索式の評価結果とし、変更がある場合はそれを適用しこの元文書の処理結果に差分を適用したものを評価結果とすると同時に、前記候補ノードセットによって差分による変更が前記元文書に対する検索式評価で該当しないと判断されたノードが該当するようになる可能性があるか否かを調べ、可能性がある場合には候補ノードセットに関連する差分のみを適用してその部分に対して検索式を再評価し、その結果検索式と該当すると判断された場合に、該当箇所の出力結果に対する変更があるか否かを調べ、無い場合には前記出力結果をそのまま検索式の評価結果とし、変更がある場合はそれを適用したものを評価結果とすることを特徴とする。 According to a seventh structured document processing system of the present invention, in the sixth structured document processing system, the difference search expression evaluation unit further includes the difference as the value difference and the search expression as a pre-descriptive word-added expression. If it is, it is checked whether there is a possibility that an output result of the search expression evaluation for the original document may be invalidated by a change due to the difference by the condition node set. Re-evaluate the search formula for the part by applying only the relevant differences, and it is determined that there is no possibility in the search of the condition node set, or the search formula is the result of the re-evaluation of the search formula In this case, it is checked whether or not there is a change in the output of the processing result for the original document by the result node set. If there is no change, the processing result is directly used as the evaluation result of the search expression. Applies the result and applies the difference to the processing result of the original document as the evaluation result, and at the same time, the node determined by the candidate node set that the change due to the difference does not correspond to the search expression evaluation for the original document If there is a possibility, only the difference related to the candidate node set is applied and the search expression is reevaluated for that part. When it is determined that it corresponds, it is checked whether there is a change to the output result of the corresponding part. If there is no change, the output result is directly used as the evaluation result of the search expression. It is characterized by having an evaluation result.

本発明の第８の構造化文書処理システムは、第１乃至第７の何れかの構造化文書処理システムにおいて、入力された構造化文書を元文書と比較して差分を計算する差分計算部と、入力された構造化文書を保存する入力文書保存部とをさらに備え、前記差分検索式評価部と前記全文検索式評価部は、元文書に差分を適用する代わりに前記入力文書保存部に保存された入力文書を参照することを特徴とする。 According to an eighth structured document processing system of the present invention, in any one of the first to seventh structured document processing systems, a difference calculation unit that compares an input structured document with an original document and calculates a difference; An input document storage unit that stores the input structured document, and the difference search expression evaluation unit and the full-text search expression evaluation unit store the difference in the original document instead of applying the difference to the original document. The input document is referred to.

本発明の第９の構造化文書処理システムは、第１乃至第７の何れかの構造化文書処理システムにおいて、元文書に対して一部あるいは全部の差分を適用した新文書を管理する新文書管理部をさらに備え、前記差分検索式評価部と前記全文検索式評価部は、元文書への差分の適用時に新文書管理部に適用と結果の取得を依頼し、新文書管理部が、既に差分適用済みの箇所は再度差分適用を行わず保存している新文書から適用結果を返すことを特徴とする。 According to a ninth structured document processing system of the present invention, in any one of the first to seventh structured document processing systems, a new document that manages a new document in which a part or all of differences are applied to an original document. The difference search expression evaluation unit and the full-text search expression evaluation unit request the new document management unit to apply and obtain the result when applying the difference to the original document, and the new document management unit The difference applied part is not subjected to difference application again, and the application result is returned from the stored new document.

本発明の第１０の構造化文書処理システムは、１以上の検索式とその種類、および、元文書と該元文書に前記検索式を適用した結果と処理過程情報を記憶する記憶部と、与えられた構造化文書の差分情報をその性質に基づいて分類し、該差分情報の種類と前記記憶部に記憶されている検索式の種類とに従って、それぞれの前記検索式毎に、差分情報を用いて評価を行うか否かを決定する差分解析部と、差分情報を用いて評価を行うと決定された検索式について、前記記憶部に記憶された元文書に前記検索式を適用した結果と処理過程情報とを用いて評価を行う差分検索式評価部と、差分情報を用いずに評価を行うと決定した検索式について、前記記憶部に記憶された元文書に前記差分情報を適用した文書を用いて評価を行う全文検索式評価部とを備えることを特徴とする。 A tenth structured document processing system according to the present invention provides one or more search expressions and their types, an original document, a storage unit for storing a result of applying the search expression to the original document, and processing process information, The difference information of the structured document is classified based on its property, and the difference information is used for each search formula according to the type of the difference information and the type of the search formula stored in the storage unit. A difference analysis unit that determines whether or not to perform the evaluation, and a result and processing of applying the search formula to the original document stored in the storage unit for the search formula determined to be evaluated using the difference information A difference search expression evaluation unit that performs evaluation using process information, and a search expression that is determined to be evaluated without using difference information, and a document in which the difference information is applied to the original document stored in the storage unit. Full-text search expression evaluation unit Characterized in that it comprises a.

本発明の第１１の構造化文書処理システムは、第１０の構造化文書処理システムにおいて、前記記憶部には、構造化文書のノードの親子関係のみを含む単純構造式の検索式と、該検索式を元文書に適用した結果と処理過程情報とが記憶されており、前記差分解析部は、与えられた構造化文書の差分情報が、変更箇所が値のみである値差分である場合、前記単純構造式の検索式については差分情報を用いて評価を行うと決定し、前記差分検索式評価部は、前記単純構造式の検索式について、前記記憶部に記憶された元文書に前記単純構造式の検索式を適用した結果と処理過程情報とを用いて評価を行うことを特徴とする。 According to an eleventh structured document processing system of the present invention, in the tenth structured document processing system, the storage unit includes a simple structural formula search formula including only a parent-child relationship of nodes of the structured document, and the search The result of applying the formula to the original document and the process information are stored, and the difference analysis unit, when the difference information of the given structured document is a value difference whose change part is only a value, The retrieval formula for the simple structural formula is determined to be evaluated using the difference information, and the differential retrieval formula evaluation unit stores the simple structure in the original document stored in the storage unit for the retrieval formula for the simple structural formula. It is characterized in that evaluation is performed using a result of applying an expression search expression and processing process information.

本発明の第１２の構造化文書処理システムは、第１０の構造化文書処理システムにおいて、前記記憶部には、構造化文書のノードの親子関係のみを含む単純構造式に単純構造式を含む述語を付加した述語付式の検索式と、該検索式を元文書に適用した結果と処理過程情報とが記憶されており、前記差分解析部は、与えられた構造化文書の差分情報が、変更箇所が値のみである値差分である場合、前記述語付式の検索式については差分情報を用いて評価を行うと決定し、前記差分検索式評価部は、前記述語付式の検索式について、前記記憶部に記憶された元文書に前記述語付式の検索式を適用した結果と処理過程情報とを用いて評価を行うことを特徴とする。 According to a twelfth structured document processing system of the present invention, in the tenth structured document processing system, a predicate including a simple structural formula in a simple structural formula including only a parent-child relationship of nodes of the structured document in the storage unit. Is stored, a result of applying the search expression to the original document, and processing process information, and the difference analysis unit changes the difference information of the given structured document. If the location is a value difference that is only a value, it is determined that the search expression with the predescript word-added expression is evaluated using the difference information, and the difference search expression evaluation unit determines the search expression with the pre-description word add-on expression. Is evaluated using a result obtained by applying a search expression with a pre-description word to the original document stored in the storage unit and processing process information.

本発明の第１の構造化文書処理方法は、コンピュータを用いて構造化文書に対する検索式の評価を行う方法において、前記コンピュータが、１以上の検索式をその種類を付加して記憶部に記憶すると共に、元文書を該元文書に前記検索式を適用した結果と処理過程情報とを付加して前記記憶部に記憶する第１のステップと、前記コンピュータが、与えられた構造化文書の差分情報をその性質に基づいて分類し、該差分情報の種類と前記記憶部に記憶されている検索式の種類とに従って、それぞれの前記検索式毎に、差分情報を用いて評価を行うか否かを決定する第２のステップと、前記コンピュータが、差分情報を用いて評価を行うと決定した検索式について、前記記憶部に記憶された元文書に前記検索式を適用した結果と処理過程情報とを用いて評価を行い、差分情報を用いずに評価を行うと決定した検索式について、前記記憶部に記憶された元文書に前記差分情報を適用した文書を用いて評価を行う第３のステップとを含むことを特徴とする。 A first structured document processing method of the present invention is a method for evaluating a search expression for a structured document using a computer, wherein the computer adds one or more search expressions to the storage unit and adds the types thereof. A first step of adding the result of applying the search formula to the original document and processing process information to the original document and storing the original document in the storage unit; Whether to classify information based on its properties, and to perform evaluation using difference information for each of the search formulas according to the type of difference information and the type of search formula stored in the storage unit A second step of determining the search result, and a result of applying the search formula to the original document stored in the storage unit and processing process information for the search formula determined to be evaluated by the computer using the difference information; The And a third step of evaluating the search expression determined to be evaluated without using the difference information using a document in which the difference information is applied to the original document stored in the storage unit. It is characterized by including.

本発明の第２の構造化文書処理方法は、コンピュータを用いて構造化文書に対する検索式の評価を行う方法において、前記コンピュータが、構造化文書のノードの親子関係のみを含む単純構造式の検索式と、元文書を該元文書に前記検索式を適用した結果と処理過程情報とを付加して前記記憶部に記憶する第１のステップと、前記コンピュータが、与えられた構造化文書の差分情報をその性質に基づいて分類し、変更箇所が値のみである値差分である場合、前記単純構造式の検索式については差分情報を用いて評価を行うと決定する第２のステップと、前記コンピュータが、前記単純構造式の検索式について、前記記憶部に記憶された元文書に前記単純構造式の検索式を適用した結果と処理過程情報とを用いて評価を行う第３のステップとを含むことを特徴とする。 A second structured document processing method of the present invention is a method for evaluating a search expression for a structured document using a computer, wherein the computer searches for a simple structure expression including only parent-child relationships of nodes of the structured document. A first step of adding an expression, a result of applying the search expression to the original document, and processing process information to the storage unit, and storing the difference in the given structured document If the information is classified based on its property and the changed part is a value difference whose value is only a value, the second step of determining to evaluate using the difference information for the search formula of the simple structural formula, A third step in which the computer evaluates the simple structural formula retrieval formula using the result of applying the simple structural formula retrieval formula to the original document stored in the storage unit and the process information; And wherein the Mukoto.

本発明の第３の構造化文書処理方法は、コンピュータを用いて構造化文書に対する検索式の評価を行う方法において、前記コンピュータが、構造化文書のノードの親子関係のみを含む単純構造式に単純構造式を含む述語を付加した述語付式の検索式と、元文書を該元文書に前記検索式を適用した結果と処理過程情報とを付加して前記記憶部に記憶する第１のステップと、前記コンピュータが、与えられた構造化文書の差分情報をその性質に基づいて分類し、変更箇所が値のみである値差分である場合、前記述語付式の検索式については差分情報を用いて評価を行うと決定する第２のステップと、前記コンピュータが、前記述語付式の検索式について、前記記憶部に記憶された元文書に前記述語付式の検索式を適用した結果と処理過程情報とを用いて評価を行う第３のステップとを含むことを特徴とする。 According to a third structured document processing method of the present invention, in a method for evaluating a search expression for a structured document using a computer, the computer simply converts to a simple structure expression including only a parent-child relationship of nodes of the structured document. A predicate-added search expression to which a predicate including a structural formula is added, a first step of adding the result of applying the search formula to the original document and processing process information and storing the result in the storage unit; When the computer classifies the difference information of the given structured document based on its property and the change part is a value difference whose value is only the value, the difference information is used for the search expression with the pre-descriptive expression A second step of determining that the evaluation is performed, and a result of the computer applying the search expression with the predescription word to the original document stored in the storage unit for the search expression with the predescription word; With process information Characterized in that it comprises a third step of evaluation are.

『作用』
本発明の構造化文書処理システムにあっては、評価すべき検索式と差分の対象となる元文書とを事前に登録しておく。ここで、検索式管理部は、登録された検索式を保持すると同時に、検索式をその性質に基づいて分類しておき、元文書管理部は、登録された元文書と、元文書に登録された検索式を適用した結果とその処理過程情報とを共に保持する。 "Action"
In the structured document processing system of the present invention, a search expression to be evaluated and an original document to be subjected to difference are registered in advance. Here, the search expression management unit holds the registered search expressions, and at the same time classifies the search expressions based on their properties, and the original document management unit registers the registered original document and the original document. The result of applying the retrieval formula and the processing process information are both held.

次に、ある新文書と元文書との差分情報が与えられると、差分解析部は与えられた構造化文書の差分情報を解析し、差分の種類と検索式の種類とに基づいて、各検索式毎に検索式評価方法を決定する。そして、差分情報による検索式評価が可能な場合に、差分検索式評価部が、差分情報と、検索式及びその分類結果と、元文書と、元文書に対する検索式の評価結果とを用いて検索式の評価を行い、差分情報による検索式評価が不可能な場合に、全文検索式評価部が、元文書に差分情報を適用した新文書を生成してこれに検索式を適用し、出力部が、差分検索式評価部と全文検索式評価部の結果をあわせて元文書に差分情報を適用した文書に対する検索式の評価結果として出力する。 Next, when the difference information between a new document and the original document is given, the difference analysis unit analyzes the difference information of the given structured document, and performs each search based on the difference type and the search expression type. The retrieval formula evaluation method is determined for each formula. When the search formula evaluation using the difference information is possible, the differential search formula evaluation unit searches using the difference information, the search formula and its classification result, the original document, and the evaluation result of the search formula for the original document. When the evaluation of the expression is performed and the search expression cannot be evaluated using the difference information, the full-text search expression evaluation unit generates a new document in which the difference information is applied to the original document, applies the search expression to the original document, and outputs the output unit. However, the results of the difference search expression evaluation unit and the full text search expression evaluation unit are combined and output as the evaluation result of the search expression for the document in which the difference information is applied to the original document.

このように検索式と差分情報をその性質によって分類し、検索式と差分情報の種類の組合せから適切な検索式評価方法を決定し、特に差分情報を元文書の検索式評価結果に適用可能な組合せの場合に差分情報と元文書の検索式評価結果を用いた検索式評価を行うことで、元文書への全ての差分情報の適用と、全ての検索式の文書全体に対する評価を省略でき、本発明の目的を達成することができる。 In this way, the search expression and the difference information are classified according to their properties, and an appropriate search expression evaluation method is determined from the combination of the search expression and the difference information type. In particular, the difference information can be applied to the search expression evaluation result of the original document. In the case of the combination, by performing the search expression evaluation using the difference information and the search expression evaluation result of the original document, the application of all the difference information to the original document and the evaluation of the entire document of all the search expressions can be omitted. The object of the present invention can be achieved.

本発明によれば、検索式と、構造化文書の差分情報が与えられたときに、検索式と差分情報の性質によっては、差分の対象となる元文書とそれに対する情報抽出処理結果を利用して、高速に情報抽出処理を行える効果がある。 According to the present invention, when the search formula and the difference information of the structured document are given, depending on the nature of the search formula and the difference information, the original document to be subjected to the difference and the information extraction processing result for the original document are used. Thus, the information extraction process can be performed at high speed.

その理由は、検索式と差分情報の性質を解析し、両者の性質毎に適切な検索式評価方法を用い、特に差分情報を元文書の検索式評価結果に適用可能な場合には適用することで、検索式の文書全体に対する評価を省略できるためである。 The reason for this is to analyze the properties of the search expression and difference information, use an appropriate search expression evaluation method for each property, especially when the difference information is applicable to the search expression evaluation results of the original document. This is because the evaluation of the entire search expression document can be omitted.

次に、本発明を実施するための最良の形態について図面を参照して詳細に説明する。 Next, the best mode for carrying out the present invention will be described in detail with reference to the drawings.

図1を参照すると、本発明の構造化文書処理装置101の第１の実施の形態は、検索式管理部102と、元文書管理部103と、差分解析部104と、差分検索式評価部105と、全文検索式評価106と、出力部107とから構成されている。 Referring to FIG. 1, the structured document processing apparatus 101 according to the first embodiment of the present invention includes a search expression management unit 102, an original document management unit 103, a difference analysis unit 104, and a difference search expression evaluation unit 105. And a full-text search expression evaluation 106 and an output unit 107.

検索式管理部102は、内部に分類部102aと保存部102bとを持ち、保存部102bは検索式テーブル108を持つ。図2は検索式テーブル108の構成図である。検索式テーブル108は、個々の検索式毎に、検索式と、検索式の種類とを持つ。 The search formula management unit 102 has a classification unit 102a and a storage unit 102b therein, and the storage unit 102b has a search formula table 108. FIG. 2 is a configuration diagram of the search expression table 108. The search expression table 108 has a search expression and a search expression type for each search expression.

また元文書管理部103は、内部に検索式評価部103aと保存部103bとを持ち、保存部103bは元文書テーブル109を持つ。図3は元文書テーブル109の構成図である。元文書テーブル109は、元文書と、検索式と、処理結果と、処理結果ノードセットと、条件ノードセットと、候補ノードセットとを持つ。各元文書は検索式を複数持てる。各検索式は処理結果を複数持てる。各処理結果は一つずつの結果ノードセットと条件ノードセットを持てる。また各検索式は一つの候補ノードセットを持てる。ここで、本実施の形態では、元文書テーブル109中の或る元文書および検索式に対応して、その元文書にその検索式を適用した結果とは別に、その適用結果に現れるノード群のIDのセットを結果ノードセットとして格納しているのは、ID比較により差分の影響の有無を容易にチェックできるようにするためである。別の実施の形態として、結果ノードセットを省略し、適用結果から毎回、結果ノードセットを動的に生成する実施の形態も考えられる。 The original document management unit 103 includes a search expression evaluation unit 103a and a storage unit 103b, and the storage unit 103b includes an original document table 109. FIG. 3 is a configuration diagram of the original document table 109. The original document table 109 has an original document, a search expression, a processing result, a processing result node set, a condition node set, and a candidate node set. Each original document can have multiple search expressions. Each search expression can have multiple processing results. Each processing result can have one result node set and one condition node set. Each search expression can have one candidate node set. Here, in the present embodiment, corresponding to a certain original document and search expression in the original document table 109, a node group appearing in the application result is separated from the result of applying the search expression to the original document. The reason why the ID set is stored as a result node set is to make it possible to easily check whether there is an influence of a difference by comparing IDs. As another embodiment, an embodiment in which the result node set is omitted and the result node set is dynamically generated every time from the application result is also conceivable.

差分解析部104は、内部に分類部104aと評価部決定部104bとを持つ。 The difference analysis unit 104 includes a classification unit 104a and an evaluation unit determination unit 104b.

これらの手段はそれぞれ概略つぎのように動作する。 Each of these means generally operates as follows.

検索式管理部102は、外部から登録された検索式111を分類部102aで解析、分類し、その結果を検索式111とともに保存部102bを通して検索式テーブル108に保持する。ここで、検索式111の分類は差分計算の容易性に基づいて行う。 The search formula management unit 102 analyzes and classifies the search formula 111 registered from outside by the classification unit 102a, and stores the result together with the search formula 111 in the search formula table 108 through the storage unit 102b. Here, the search formula 111 is classified based on the ease of difference calculation.

元文書管理部103は、外部から登録された元文書112に対して検索式管理部102で検索式テーブル108に保持された検索式を適用し、元文書112と、検索式の適用結果と、適用過程の情報とを保存部103bを通して元文書テーブル109に保持する。 The original document management unit 103 applies the search expression held in the search expression table 108 by the search expression management unit 102 to the original document 112 registered from the outside, and the original document 112, the application result of the search expression, Information on the application process is stored in the original document table 109 through the storage unit 103b.

差分解析部104は入力された差分情報113を分類部104aで解析、分類し、評価部決定部104bで検索式管理部102に登録された各検索式毎に、差分情報の分類結果と検索式の分類結果とから差分情報を用いた検索が可能か否かを判定し、可能な検索式は差分検索式評価部105に、不可能な検索式は全文検索式評価部106に、それぞれ評価させる。 The difference analysis unit 104 analyzes and classifies the input difference information 113 by the classification unit 104a, and for each search formula registered in the search formula management unit 102 by the evaluation unit determination unit 104b, the difference information classification result and the search formula Whether or not a search using difference information is possible based on the classification result, and let the differential search formula evaluation unit 105 evaluate the search formula that can be used and the full-text search formula evaluation unit 106 to evaluate the search formula that cannot be used. .

差分検索式評価部105は、差分情報113および検索式の種類と、差分情報113の変更対象と元文書管理部103に保存してある処理結果と処理過程の情報とから差分情報の利用方法を決定し、差分情報を用いた検索式評価を行う。 The difference retrieval formula evaluation unit 105 determines a method of using the difference information from the difference information 113 and the type of the retrieval formula, the change target of the difference information 113, the processing result stored in the original document management unit 103, and the processing process information. The search expression is evaluated using the difference information.

全文検索式評価部106は、差分情報113を元文書に適用した新文書を作成し、新文書に対して検索式の評価を行う。 The full-text search expression evaluation unit 106 creates a new document in which the difference information 113 is applied to the original document, and evaluates the search expression for the new document.

出力部107は、差分検索式評価部105の結果と全文検索式評価部106の結果とをあわせて出力する。 The output unit 107 outputs the result of the difference search expression evaluation unit 105 and the result of the full text search expression evaluation unit 106 together.

図4は検索式管理部102の動作を示すフローチャートである。検索式管理部102は、S401で外部からの検索式111の登録要求を受けて、S402でその検索式111に含まれる記述内容を調べて検索式111を分類し、S403で検索式111と分類結果とを検索式テーブル108に追加し、S404で加えて式が追加された事を元文書管理部103に通知する。ここで分類は、検索式111が構造化文書の構造情報である要素の単純な親から子への参照関係のみを含む場合は単純構造式、単純構造式に加え同じく単純な参照関係のみで記述される値に対する条件である述語を持つ場合は述語付式、それ以外の式は複雑式として分類する。 FIG. 4 is a flowchart showing the operation of the search expression management unit 102. The search expression management unit 102 receives an external registration request for the search expression 111 in S401, examines the description contents included in the search expression 111 in S402, classifies the search expression 111, and classifies the search expression 111 in S403. The result is added to the search expression table 108, and the original document management unit 103 is notified that the expression is added in S404. If the search formula 111 includes only the simple parent-to-child reference relationship of the element that is the structure information of the structured document, the classification is described by only the simple reference formula and the simple reference formula. If there is a predicate that is a condition for the value to be processed, it is classified as a predicate expression, and other expressions are classified as complex expressions.

図5は元文書管理部103の動作を示すフローチャートである。元文書管理部103は、S501で外部からの要求を受け、S502で元文書112の登録要求の場合、S503で登録要求から対象となる元文書112を取り出し、S504で元文書112を元文書テーブル109に保存するとともに、S505で検索式管理部102から検索式の一覧を獲得し、S506で各検索式について検索式の処理結果と処理過程とを計算し、S507で処理結果と処理過程情報とを保存する。 FIG. 5 is a flowchart showing the operation of the original document management unit 103. In step S501, the original document management unit 103 receives an external request. If the original document 112 is registered in step S502, the original document management unit 103 extracts the target original document 112 from the registration request in step S503. The original document 112 is stored in the original document table in step S504. 109, and a list of search formulas is acquired from the search formula management unit 102 in S505, the processing results and processing steps of the search formulas are calculated for each search formula in S506, and the processing results and processing step information are calculated in S507. Save.

ここで、元文書保存時には、元文書112の各ノードに文書内で一意なIDを振り、このIDとともに保存する。また検索式の処理結果と処理過程の計算は、具体的には、元文書112に適用した結果を元文書テーブル109に処理結果として保存し、適用結果に現われるノード群のIDのセットを処理過程の結果ノードセットとして保存し、適用結果を導出するために検索式が直接評価したノード群のIDのセットを処理過程の条件ノードセットとして保存し、もし検索式が述語付式の場合には、述語を除いた単純構造式部分のみであれば該当したが述語部によって該当しなくなったノードを仮の適用結果と見なした時の条件ノードセット相当のノード群のIDのセットを、候補ノードセットとして保存する。 Here, when storing the original document, an ID unique to the document is assigned to each node of the original document 112, and the original document 112 is stored together with this ID. In addition, the processing result of the search expression and the calculation of the processing process are specifically, the result applied to the original document 112 is stored as the processing result in the original document table 109, and the set of node group IDs appearing in the application result is processed. As a result node set, the set of node group IDs directly evaluated by the search expression to derive the application result is stored as a condition node set of the process, and if the search expression is a predicate expression, Candidate node set is the set of node group IDs corresponding to the conditional node set when a node that is applicable only if it is a simple structure excluding predicates but is no longer applicable by the predicate part is regarded as a temporary application result. Save as.

また、S502で要求が検索式管理部102からの新たな検索式の追加要求だった場合には、S511で検索式追加要求から対象となる検索式を取り出し、S512で既に登録されている全ての元文書を取り出し、S513でこれに対してその検索式の適用結果と処理過程とを計算し、S514でこれを元文書テーブル109に追加する。 If the request is a request for adding a new search formula from the search formula management unit 102 in S502, the target search formula is extracted from the search formula addition request in S511, and all the registered search formulas are registered in S512. The original document is taken out, the application result of the search formula and the processing process are calculated for it in S513, and added to the original document table 109 in S514.

図6Aは差分解析部104の動作を示すフローチャートである。差分解析部104は、S601で元文書管理部103に保存された元文書への参照と元文書の各ノードを元文書管理部103がつけたIDによって指定しそのノードの変更を指示することで表現される差分情報113を受けると、S602で差分情報113に記述された変更先ノードの一覧を取り出し、S603で変更先ノードに要素値や属性値などの値ノード以外のノードが含まれているか否かを調べることで差分を分類する。差分の分類は、変更先ノードに値ノード以外のノードが含まれていない差分を値差分、値ノード以外のノードが一つでも含まれている差分を構造差分とする。 FIG. 6A is a flowchart showing the operation of the difference analysis unit 104. The difference analysis unit 104 designates reference to the original document stored in the original document management unit 103 in S601 and each node of the original document by the ID given by the original document management unit 103, and instructs to change the node. When the expressed difference information 113 is received, a list of change destination nodes described in the difference information 113 is extracted in S602, and whether a node other than a value node such as an element value or an attribute value is included in the change destination node in S603. The difference is classified by checking whether or not. In the difference classification, a difference in which a node other than the value node is not included in the change destination node is a value difference, and a difference in which at least one node other than the value node is included is a structure difference.

次に、S604で検索式管理部102から登録されている検索式とその種類の一覧を取得し、S605とS606で各検索式について、差分の種類と検索式の種類とから、差分情報を用いて検索式の評価が可能か否かを調べ、S607で差分情報が値差分であり、かつS608で検索式が単純構造式かあるいは述語付式である場合には差分情報で処理可能と判断してS610でその検索式を差分検索式評価部105に割り当て、S607やS608で差分情報が構造差分あるいは検索式が複雑式の場合には不可能と判断してS609でその検索式を全文検索式評価部106に割り当て、S611で各検索式について割り当てられた検索式評価部105，106に検索式と差分情報と差分情報の種類を渡して検索式の評価を行う。 Next, in S604, a list of search expressions and their types registered from the search expression management unit 102 is acquired, and in S605 and S606, difference information is used for each search expression from the difference type and the search expression type. If the difference information is a value difference in S607 and the search expression is a simple structural expression or a predicate expression in S608, it is determined that the difference information can be processed. In S610, the search formula is assigned to the differential search formula evaluation unit 105. In S607 and S608, if the difference information is a structural difference or the search formula is a complex formula, the search formula is determined to be impossible. The search expression is evaluated by passing the search expression, difference information, and type of difference information to the search expression evaluation sections 105 and 106 assigned to the evaluation section 106 and assigned to each search expression in S611.

すなわち、本実施の形態においては、図6Bに示されるように、差分情報をその性質により値差分と構造差分との２種類に分類し、検索式をその性質により単純構造式と述語付式とそれら以外の複雑式との３種類に分類し、差分情報の種類が値差分で且つ検索式の種類が単純構造式および述語付式となる○印を付けた組合せの場合に差分情報による検索を行い、それ以外の×印を付けた組合せの場合には全文検索を行う。 That is, in the present embodiment, as shown in FIG. 6B, the difference information is classified into two types, that is, a value difference and a structure difference, depending on its property, and the search expression is expressed by a simple structure formula and a predicate-added expression based on its property. If the combination is classified into three types with other complex expressions, the difference information type is a value difference, and the search expression type is a simple structural expression and a predicate expression, a search based on the difference information is performed. If the combination is marked with an X, a full text search is performed.

図7は差分検索式評価部105の動作を示すフローチャートである。差分検索式評価部105は、検索式のセットと差分情報と差分情報の種類とを受け取ると、各検索式について、元文書管理部103から該当検索式の処理過程情報と処理結果とを取得し、差分情報を用いて以下のように検索式の評価を行い、結果を出力部107に送る。 FIG. 7 is a flowchart showing the operation of the difference search expression evaluation unit 105. When the difference search expression evaluation unit 105 receives the set of search expressions, the difference information, and the type of the difference information, the difference search expression evaluation unit 105 acquires the process information and the processing result of the corresponding search expression from the original document management unit 103 for each search expression. The search information is evaluated using the difference information as follows, and the result is sent to the output unit 107.

(1)検索式が単純構造式の場合、元文書の処理結果への差分適用を行い、これを結果とする。具体的には、S710で変更対象ノードIDが一つでも結果ノードセットに含まれるか否かを調べ、含まれていなければS712で元文書の処理結果をそのまま結果とし、含まれればS711で元文書の処理結果に該当する部分を差分情報によって書換えたものを結果とする。 (1) When the retrieval formula is a simple structural formula, the difference is applied to the processing result of the original document, and this is used as the result. Specifically, in S710, it is checked whether even one change target node ID is included in the result node set. If it is not included, the processing result of the original document is directly used as a result in S712. A result obtained by rewriting the portion corresponding to the processing result of the document with the difference information is the result.

(2)検索式が述語付式の場合、以下の両方の処理を行う。 (2) If the search expression is a predicate expression, both of the following processes are performed.

(2.1)S720で変更対象ノードIDが一つでも条件ノードセットに含まれるか否かを調べ、変更対象ノードIDが一つでも条件ノードセットに含まれる場合、S721で条件ノードセットが示す元文書の一部分を差分情報によって書換え、S722でこの一部分に対してのみ検索式を再度評価する。S720で条件ノードセットに含まれないか、あるいはS722で再度検索式を満たす場合、S723,S724,S725で前記の元文書の処理結果への差分適用を行い、これを結果とする。 (2.1) In S720, it is checked whether even one change target node ID is included in the condition node set. If even one change target node ID is included in the condition node set, the original document indicated by the condition node set in S721 Is rewritten with the difference information, and the search expression is evaluated again only for this part in S722. If it is not included in the conditional node set in S720, or if the search expression is satisfied again in S722, the difference is applied to the processing result of the original document in S723, S724, and S725, and this is the result.

(2.2)S731で変更対象ノードIDが一つでも候補ノードセットに含まれるか否かを調べ、含まれている場合、S732で候補ノードセットが示す元文書の一部分を差分情報によって書換え、S733でこの一部分に対してのみ検索式を再度評価し、その結果検索式が新たに該当した場合、S734でその結果に対しての差分があるか否かを調べ、無ければS736でその結果をそのまま、あればS735で結果に対して再度差分適用を行い、これを結果とする。 (2.2) In S731, it is checked whether even one change target node ID is included in the candidate node set. If it is included, in S732, a part of the original document indicated by the candidate node set is rewritten with the difference information, and in S733 If the search expression is evaluated again only for this part, and the search expression is newly applicable as a result, it is checked whether there is a difference with respect to the result in S734. If there is, the difference is applied again to the result in S735, and this is used as the result.

図8Aは全文検索式評価部106の動作を示すフローチャートである。全文検索式評価部106は、S801で検索式のセットと差分情報113を受け取ると、S802で元文書管理部103から元文書を取り出し、S803で元文書に差分情報を適用して新文書を生成し、S804で新文書に対して指定された検索式を評価し、S805で結果を出力部に出力する。 FIG. 8A is a flowchart showing the operation of the full-text search expression evaluation unit 106. Upon receiving the search expression set and difference information 113 in S801, the full-text search expression evaluation unit 106 extracts the original document from the original document management unit 103 in S802, and generates a new document by applying the difference information to the original document in S803. In step S804, the search expression designated for the new document is evaluated, and in step S805, the result is output to the output unit.

出力部107は各検索式毎にその評価結果を出力する。 The output unit 107 outputs the evaluation result for each search expression.

(アルゴリズム解説)
ここで、検索式の種類と、差分の種類と、処理過程情報と、差分検索式評価部105の動作との関係について補足する。差分による文書の変更が元文書の処理結果に与える影響は、以下の4つに分類できる。 (Algorithm explanation)
Here, a supplementary description will be given of the relationship among the types of search expressions, the types of differences, the process information, and the operation of the difference search expression evaluation unit 105. The impact of document changes due to differences on the processing results of the original document can be classified into the following four categories.

(1)検索式の評価とは無関係な箇所の変更である。
(2)元文書で検索式に該当した箇所への影響として、
(2-1)元文書で該当した箇所が該当しなくなる。
(2-2)元文書で該当した箇所が引続き該当するが、出力される該当箇所の中が変更される。
(3)元文書で該当しなかった箇所が該当するようになる。 (1) It is a change of a part unrelated to the evaluation of the search expression.
(2) As an influence on the part corresponding to the search expression in the original document,
(2-1) The corresponding part in the original document is no longer applicable.
(2-2) The corresponding part in the original document continues to be applicable, but the corresponding part to be output is changed.
(3) The part that was not applicable in the original document becomes applicable.

ただし、一つの変更が一つの検索式の結果に与える影響は、(2)と(3)は同時にあり得る(このことを考慮してS734〜S736が設けられている)。(2)と(3)いずれにも該当しない場合が(1)となる。 However, the influence of one change on the result of one search expression may be (2) and (3) at the same time (S734 to S736 are provided in consideration of this). Cases that do not fall under either (2) or (3) are (1).

検索式と差分情報の内容は極めて多様でありその一般的な関係を明確化するのは困難であるため、本発明では差分の与える影響が上記のいずれに該当するかが明確な差分と検索式の組合せの場合には差分情報を用いた計算を行い、それ以外、すなわち差分が元文書に対する検索結果に与える影響が不明な場合には従来技術での通常の検索式評価を行っている。 Since the contents of the retrieval formula and the difference information are extremely diverse and it is difficult to clarify the general relationship, in the present invention, the difference and the retrieval formula that clearly indicate which of the above the influence of the difference corresponds to. In the case of the above combination, calculation using the difference information is performed, and in other cases, that is, when the influence of the difference on the search result with respect to the original document is unknown, normal search expression evaluation in the conventional technique is performed.

(2-1)に該当するか否かを調べるには、まず元文書で検索式に該当した箇所を導くための条件となったノードが変更されていないかどうかを調べるが、これを行うための処理過程情報が条件ノードセットである。ただし単純構造式の場合は値に関する制約となる述語を含まないため、値の変化は検索式の結果には影響を与えず、その結果、単純構造式でかつ値差分では(2-1)となる事は無い。述語付式で(2-1)に該当する場合、元文書での結果は新文書では該当しなくなる可能性がある事から、当該箇所のみ検索式の再評価を行う(S721，S722)。 In order to check whether or not it falls under (2-1), it is first checked whether the node that is the condition for deriving the location corresponding to the search expression in the original document has been changed. Is the condition node set. However, in the case of simple structural formulas, predicates that restrict values are not included, so the change in value does not affect the result of the search formula.As a result, (2-1) There is no such thing. If the predicate-added expression falls under (2-1), the result in the original document may not be applicable in the new document, so the search expression is re-evaluated only at that location (S721, S722).

次に、(2-1)にならない場合、(2-2)に該当するか否かを調べるが、これを行うための処理過程情報が結果ノードセットである。これは単純に出力に含まれるノードに変更箇所が含まれるか否かを調べるものであり、該当する場合には、元文書での結果のみを差分で書換えて出力とする(S711，S724)。 Next, when (2-1) is not satisfied, it is checked whether or not it corresponds to (2-2). Processing result information for performing this is a result node set. In this case, it is simply checked whether or not the changed part is included in the node included in the output. If applicable, only the result in the original document is rewritten with the difference and output (S711, S724).

一方、(3)に該当するか否かを調べるには、元文書の検索で該当しなかった箇所の中から、差分によって該当する可能性のある箇所を選んでその部分のみ再評価を行う必要がある。ただし単純構造式の場合は値に関する制約となる述語を含まないため、値の変化は検索式の結果には影響を与えず、その結果、単純構造式でかつ値差分では(2-1)となる事は無い。ただし先に述べた通り、単純構造式の場合は値に関する制約となる述語を含まないため、値の変更によって該当箇所が変わる事は無く、よって単純構造式が(3)になる事は無い。また述語式も、値の変更では述語部分を除いた単純構造式相当の部分は変わらず、その中で述語の評価によって元文書では該当していなかった箇所が該当するように変わる可能性があるのみである。この可能性をチェックするために使用する処理過程情報が候補ノードセットである(S731，S733)。 On the other hand, in order to check whether or not it corresponds to (3), it is necessary to select a part that may be applicable by the difference from the parts that were not applicable in the original document search and re-evaluate only that part. There is. However, in the case of simple structural formulas, predicates that restrict values are not included, so the change in value does not affect the result of the search formula.As a result, (2-1) There is no such thing. However, as described above, in the case of a simple structural formula, since the predicate which is a restriction on the value is not included, the corresponding part does not change by changing the value, and therefore the simple structural formula does not become (3). In addition, in the predicate expression, the part corresponding to the simple structural expression excluding the predicate part does not change when the value is changed, and the part that was not applicable in the original document may be changed by the evaluation of the predicate. Only. The process information used to check this possibility is a candidate node set (S731, S733).

以上の検索アルゴリズムの内容を表形式にまとめたものを図8Bに示す。 FIG. 8B shows a summary of the contents of the above search algorithm in a table format.

次に、本実施の形態の効果について説明する。 Next, the effect of this embodiment will be described.

本実施の形態では、事前に登録された検索式を分類し、また事前に登録された元文書に対して検索式を適用した結果と処理過程の情報を保持しておき、入力された差分情報を分類して、これら分類結果や処理過程情報から利用可能な検索式評価方法を決定し、差分情報を用いて計算の省力化が可能な場合には省力化された方式を用いる事で、差分情報を用いて高速に構造化文書の検索式を評価する事ができる。 In this embodiment, the search formulas registered in advance are classified, and the result of applying the search formula to the original document registered in advance and the information on the processing process are held, and the input difference information The search method evaluation method that can be used is determined from these classification results and processing process information, and when the labor saving of the calculation is possible using the difference information, the difference is obtained by using the labor-saving method. The retrieval formula of the structured document can be evaluated at high speed using the information.

また、本実施の形態では、さらに、検索式を単純構造式、述語付式、これら以外の複雑式に分類し、差分情報を値差分と構造差分に分類する事で、値差分と単純構造式あるいは述語付式の組合せの場合に、元文書に対する検索式の処理結果と、処理過程情報と、差分情報を用いて検索結果を得る事ができる。 Further, in this embodiment, the value difference and the simple structural formula are further classified by classifying the search formula into a simple structural formula, a predicate-added formula, and other complex formulas, and classifying the difference information into a value difference and a structural difference. Alternatively, in the case of a combination of predicates, the search result can be obtained using the processing result of the search expression for the original document, the processing process information, and the difference information.

(第2の実施の形態)
次に、本発明の第２の実施の形態について図面を参照して詳細に説明する。 (Second embodiment)
Next, a second embodiment of the present invention will be described in detail with reference to the drawings.

図9を参照すると、本発明の第２の実施の形態は、第１の実施の形態の構成に加え、入力された構造化文書114と元文書112との差分情報を計算する差分計算部901と、入力文書を保持する入力文書保存部902とを含む。 Referring to FIG. 9, in the second embodiment of the present invention, in addition to the configuration of the first embodiment, a difference calculation unit 901 that calculates difference information between the input structured document 114 and the original document 112. And an input document storage unit 902 that holds the input document.

本実施の形態の構造化文書処理装置101は、第１の実施の形態と異なり差分情報の代わりに構造化文書114を受け取り、差分計算部901で、入力された構造化文書114と事前に登録されている元文書との差分を計算し差分情報として差分解析部104に出力するとともに、入力文書を、差分計算時に元文書と一致するとみなした入力文書のノードとの対応関係情報とともに入力文書保存部902に保存する。対応関係情報は例えば、該当する元文書のノードと同じIDを入力文書のノードに割り当てる事で実現できる。 Unlike the first embodiment, the structured document processing apparatus 101 according to the present embodiment receives the structured document 114 instead of the difference information, and the difference calculation unit 901 registers the input structured document 114 in advance. Calculates the difference from the original document and outputs it to the difference analysis unit 104 as difference information, and saves the input document together with the correspondence information with the input document node considered to match the original document at the time of the difference calculation Stored in the part 902. Correspondence relationship information can be realized, for example, by assigning the same ID as the node of the corresponding original document to the node of the input document.

また全文検索式評価部106は、第１の実施の形態において差分情報と元文書から新文書を生成する処理の代わりに入力文書保存部902に保存されている文書を取り出し、これを利用する。 The full-text search expression evaluation unit 106 takes out a document stored in the input document storage unit 902 instead of the process of generating a new document from the difference information and the original document in the first embodiment, and uses this.

また差分検索式評価部105は、第１の実施の形態における検索式が述語付式で差分が値差分の場合の処理で、条件ノードセットや候補ノードセットが示す元文書の一部分を差分情報によって書換える処理の代わりに、入力文書保存部902に保存されている入力文書と元文書のノードの対応関係情報を利用して、書換え対象の元文書の一部に該当する入力文書の一部を決定し、そこに対してのみ検索式を評価する。 Also, the difference search expression evaluation unit 105 is a process in the case where the search expression in the first embodiment is a predicate expression and the difference is a value difference, and a part of the original document indicated by the condition node set or the candidate node set is determined based on the difference information. Instead of rewriting, using the correspondence information between the input document stored in the input document storage unit 902 and the node of the original document, a part of the input document corresponding to a part of the original document to be rewritten Decide and evaluate the search formula only there.

本実施の形態により、構造化文書処理装置101への入力が差分情報ではなく構造化文書である場合でも、差分計算の負荷が検索式の文書全体の適用の負荷よりも十分に小さければ、検索式の処理を高速化する事が可能となる。 According to the present embodiment, even when the input to the structured document processing apparatus 101 is not a difference information but a structured document, if the load of difference calculation is sufficiently smaller than the load of application of the entire search expression document, the search It is possible to speed up the processing of expressions.

(第3の実施の形態)
次に、本発明の第３の実施の形態について図面を参照して詳細に説明する。 (Third embodiment)
Next, a third embodiment of the present invention will be described in detail with reference to the drawings.

図10を参照すると、本発明の第３の実施の形態は、第１の実施の形態の構成に加え、元文書に差分情報を適用した新文書を保存する新文書管理部1001を含む。 Referring to FIG. 10, the third embodiment of the present invention includes a new document management unit 1001 that stores a new document in which difference information is applied to an original document, in addition to the configuration of the first embodiment.

第１の実施の形態では、検索式を評価する時に検索式毎に別々に必要に応じて元文書に対して差分情報の一部あるいは全部を適用していたが、これを、新文書管理部1001により元文書への差分情報の適用を一元管理して、適用結果を全ての検索式にまたがって共有するようにし、同じ差分を複数回適用しないようにする。 In the first embodiment, when evaluating a search expression, a part or all of the difference information is applied to the original document as needed separately for each search expression. The application of difference information to the original document is centrally managed by 1001 so that the application result is shared across all search expressions, and the same difference is not applied multiple times.

具体的には、差分解析部104は、全文検索式評価部106および差分検索式評価部105に差分情報を渡すと同時に、新文書管理部1001にも差分情報を渡す。 Specifically, the difference analysis unit 104 passes the difference information to the full-text search expression evaluation unit 106 and the difference search expression evaluation unit 105, and also transfers the difference information to the new document management unit 1001.

一方、全文検索式評価部106および差分検索式評価部105は、元文書に差分情報を適用した新文書の一部あるいは全部が必要な場合、必要な箇所を新文書管理部1001に要求して取得するようにする。 On the other hand, the full-text search expression evaluation unit 106 and the difference search expression evaluation unit 105 request the new document management unit 1001 for necessary portions when a part or all of the new document in which the difference information is applied to the original document is necessary. To get.

新文書管理部1001は、元文書テーブル109から元文書を取り出し、全文検索式評価部106および差分検索式評価部105から新文書の一部あるいは全部を要求されると、要求された箇所に対する新文書の一部が存在しているか否かを調べ、無ければ要求された一部に関連する差分情報のみを元文書に適用した新文書の一部を生成し、これを保存すると同時に要求元にその情報を返す。 When the new document management unit 1001 takes out the original document from the original document table 109 and requests a part or all of the new document from the full-text search expression evaluation unit 106 and the difference search expression evaluation unit 105, the new document management unit 1001 Check whether a part of the document exists, if not, generate a part of the new document by applying only the difference information related to the requested part to the original document, and save this as a request source Returns that information.

これにより、個別の検索式に必要な元文書への差分適用で同じ箇所に対する差分情報の適用を行う必要がある場合に、差分情報毎に適用処理を一度で済ます事ができ、重複した差分適用処理を省く事が可能となる。 As a result, when it is necessary to apply the difference information to the same part in the difference application to the original document required for the individual search formula, the application process can be completed once for each difference information, and the overlapping difference application Processing can be omitted.

なお、第１の実施の形態では差分検索式評価部105で評価できない差分と検索式の組合せは全て全文検索式評価部106で処理するとしたが、これは、これらの組合せを必ず全て全文検索式評価部106で処理しなければならないと制限するものではない。 In the first embodiment, all combinations of differences and search formulas that cannot be evaluated by the differential search formula evaluation unit 105 are processed by the full-text search formula evaluation unit 106. It is not limited that the evaluation unit 106 has to process.

例えば、本発明とは異なる方式を用いた第３の検索式評価部をさらに備え、本発明の差分検索式評価部105や全文検索式評価部106と組合せて、差分解析部104が、第１の実施の形態で差分検索式評価部105や全文検索式評価部106に割り当てた、差分と検索式の組合せの一部を第３の検索式評価部に割り当てる事は、一般的な技術者であれば容易に想像できるであろう。 For example, a third retrieval formula evaluation unit using a method different from the present invention is further provided, and in combination with the differential retrieval formula evaluation unit 105 and the full-text retrieval formula evaluation unit 106 according to the present invention, the difference analysis unit 104 includes a first retrieval formula evaluation unit. A general engineer assigns a part of the combination of the difference and the search expression assigned to the difference search expression evaluation unit 105 and the full-text search expression evaluation unit 106 in the embodiment to the third search expression evaluation unit. You can easily imagine if you have.

次に、具体的な実施例を用いて本発明を実施するための最良の形態の動作を説明する。 Next, the operation of the best mode for carrying out the present invention will be described using specific examples.

図11は本実施例のメッセージフィルタリング装置の構成図である。メッセージフィルタリング装置は、受信したメッセージに対して事前に与えられたフィルタ条件を適用し、結果を外部に出力する。フィルタ条件はXPath式として与え、XPath式で該当したノードとその子孫を結果として出力する。 FIG. 11 is a configuration diagram of the message filtering apparatus according to the present embodiment. The message filtering apparatus applies a filter condition given in advance to the received message and outputs the result to the outside. The filter condition is given as an XPath expression, and the corresponding node and its descendants are output as a result.

図11によれば、本実施例のメッセージフィルタリング装置1101は、検索式管理部102と、元文書管理部103と、差分解析部104と、差分検索式評価部105と、全文検索式評価部106と、出力部107と、新文書管理部1001と、通信部1102と、管理部1103とから構成されている。 According to FIG. 11, the message filtering apparatus 1101 of this embodiment includes a search expression management unit 102, an original document management unit 103, a difference analysis unit 104, a difference search expression evaluation unit 105, and a full-text search expression evaluation unit 106. An output unit 107, a new document management unit 1001, a communication unit 1102, and a management unit 1103.

通信部1102は外部からのメッセージ115を受信し、中に含まれる差分情報を差分解析部104に渡す。管理部1103はメッセージフィルタリング装置1101の管理者がユーザインタフェースを通してフィルタ条件(検索式111)や元文書112をメッセージフィルタリング装置1101に登録する。 The communication unit 1102 receives the external message 115 and passes the difference information included therein to the difference analysis unit 104. In the management unit 1103, the administrator of the message filtering device 1101 registers the filter condition (search formula 111) and the original document 112 in the message filtering device 1101 through the user interface.

構造化文書はXMLを、検索式は全てXPathを使用する。 XML is used for structured documents, and XPath is used for all search expressions.

(実施例動作)
(実施例動作/事前登録)
ここで、まず管理部1103を通して検索式111として、"/A/B/C"と"/A/B[C=t1]/D"の2つが登録されると、検索式管理部102は両方の検索式の内容をチェックし、"/A/B/C"は単純な親から子への参照関係のみを含むため単純構造式、"/A/B[C=t1]/D"は単純構造式に加えて述語部を含むため述語付式として分類し、結果を検索式テーブル108に保存する。図12の検索式テーブル1201は検索式が登録された検索式テーブル108の例である。 (Example operation)
(Example operation / pre-registration)
Here, when two of "/ A / B / C" and "/ A / B [C = t1] / D" are registered as search formulas 111 through the management unit 1103, the search formula management unit 102 will both be registered. The contents of the search expression are checked, and "/ A / B / C" includes only a simple parent-to-child reference relationship, so "/ A / B [C = t1] / D" is simple Since it includes a predicate part in addition to the structural formula, it is classified as a predicate-added formula, and the result is stored in the search formula table 108. The search formula table 1201 in FIG. 12 is an example of the search formula table 108 in which search formulas are registered.

なおここで、"/A/B/C"はXPathにおいて「ルートノードの下のAというタグの中のBというタグの中のCというタグのノード」という意味であり、"/A/B[C=t1]/D"は「ルートノードの下のAというタグの中のBというタグの中のDというタグのノードで、かつBの中のCというタグの値がt1であるもの」という意味である。 Here, "/ A / B / C" means "node of tag C in tag B in tag A in root node" in XPath, and "/ A / B [ "C = t1] / D" is "a node with a tag of D in a tag of B in a tag of A under the root node, and a value of the tag of C in B is t1" Meaning.

次に頻繁に使用されるメッセージ(元文書)の雛型として、図13の構造化文書である文書1301が元文書として管理部1103から登録されると、元文書管理部103はその文書を各ノードに出現順にIDを振り、IDと元文書を元文書テーブル109に保存する。ここで、図14はIDを付け木構造として表現した元文書1401であり、カッコ付の数字がIDを示している。 Next, as a template of frequently used messages (original documents), when the document 1301 that is the structured document in FIG. 13 is registered as an original document from the management unit 1103, the original document management unit 103 stores the document in each form. IDs are assigned to nodes in the order of appearance, and the ID and original document are stored in the original document table 109. Here, FIG. 14 shows an original document 1401 in which an ID is expressed as a tree structure, and numbers in parentheses indicate the ID.

次に元文書管理部103は、既に登録されているXPath式2つをそれぞれ元文書に対して適用し、その結果と処理過程情報を元文書テーブル109に保存する。この場合、"/A/B/C"の条件を満たすノードは、ID=3とID=8の2つとなるが、これらの結果を導くための条件として評価されたノードは、検索式"/A/B/C"の'A'、'B'、'C'にそれぞれ該当したノードで、ID=3のための条件ノードセットはID=1,ID=2,ID=3で、ID=8のための条件ノードはID=1,ID=7,ID=8となり、結果として出力されるノードはID=3以下であるID=3とID=4、ID=8以下であるID=8とID=9となる。"/A/B/C"は単純構造式なので、候補ノードセットは持たない。 Next, the original document management unit 103 applies two already registered XPath expressions to the original document, and saves the result and processing process information in the original document table 109. In this case, two nodes satisfying the condition of “/ A / B / C” are ID = 3 and ID = 8. The node evaluated as a condition for deriving these results is a search expression “/ Nodes corresponding to 'A', 'B', and 'C' in A / B / C, respectively. The condition node set for ID = 3 is ID = 1, ID = 2, ID = 3, ID = The condition nodes for 8 are ID = 1, ID = 7, ID = 8, and the resulting output nodes are ID = 3 and below ID = 3 and ID = 4, ID = 8 and below ID = 8 And ID = 9. Since “/ A / B / C” is a simple structure, it has no candidate node set.

一方、"/A/B[C=t1]/D"の条件を満たすノードは、ID=5となるが、これの条件として評価されるノードセットは"/A/B[C=t1]/D"の'A'、'B'、'C'、'D'と't1'に該当するテキストノードであるID=1,ID=2,ID=3,ID=4,ID=5となり、出力されるノードはID=5以下であるID=5とID=6となる。さらに、"/A/B[C=t1]/D"は述語付式であり、述語[C=t1]を除いた式"/A/B/D"が該当するが述語[C=t1]によって該当しなくなったノードはID=9の部分が該当しなかったツリーの右側半分の部分であり、候補ノードセットはこの部分の条件ノードセット相当、すなわち、ID=1,ID=7,ID=8,ID=9,ID=10となる。 On the other hand, a node satisfying the condition of “/ A / B [C = t1] / D” has ID = 5, but the node set evaluated as this condition is “/ A / B [C = t1] / ID = 1, ID = 2, ID = 3, ID = 4, ID = 5, which are text nodes corresponding to 'A', 'B', 'C', 'D' and 't1' The output nodes are ID = 5 and ID = 6, where ID = 5 or less. Furthermore, "/ A / B [C = t1] / D" is a predicate expression, and the expression "/ A / B / D" excluding the predicate [C = t1] is applicable, but the predicate [C = t1] The node that is no longer applicable by the part of the right half of the tree where the part of ID = 9 is not applicable, the candidate node set is equivalent to the conditional node set of this part, that is, ID = 1, ID = 7, ID = 8, ID = 9, ID = 10.

図15の元文書テーブル1501は元文書、結果、処理過程情報が登録された元文書テーブル109の例である。 An original document table 1501 in FIG. 15 is an example of the original document table 109 in which original documents, results, and processing process information are registered.

図16と図17は上記"/A/B/C"と"/A/B[C=t1]/D"のそれぞれの処理過程情報のイメージを図示したものである。すなわち処理過程情報は、元文書の各ノードがXPath式の評価結果に対してどのような関係にあるのかを示すものである。 FIG. 16 and FIG. 17 illustrate images of processing process information of “/ A / B / C” and “/ A / B [C = t1] / D”. That is, the process information indicates what relationship each node of the original document has with respect to the evaluation result of the XPath expression.

(実施例動作/差分情報に対する検索処理実行)
(差分例1)
次に、まず「文書1のID=11のノードを"t5"に置換」という差分情報を含むメッセージ115が通信部1102に届いた場合を考える。これは、図18の新文書1801の構造化文書を、"文書1"との差分情報として表現したものに相当する。まず通信部1102はメッセージ115に含まれる差分情報を抽出し、これを差分解析部104に渡す。差分解析部104は、まず元文書テーブル109に保存されている文書1を取り出し、ID=11のノードが値ノードか否かを調べる。ここでは、ID=11のノードはテキストノードであるから、これは値ノードとなり、かつ、差分情報はこの差分しか含まない事から、この差分情報は値差分に分類される。 (Execution example / Search processing for difference information)
(Difference example 1)
Next, consider a case where a message 115 including difference information “replace document 1 node ID = 11 with“ t5 ”” arrives at communication section 1102. This corresponds to a representation of the structured document of the new document 1801 in FIG. 18 as difference information from “Document 1”. First, the communication unit 1102 extracts difference information included in the message 115 and passes it to the difference analysis unit 104. The difference analysis unit 104 first retrieves the document 1 stored in the original document table 109, and checks whether or not the node with ID = 11 is a value node. Here, since the node with ID = 11 is a text node, this is a value node, and the difference information includes only this difference, so this difference information is classified as a value difference.

次に差分解析部104は検索式管理部102から登録されている式とその種類の一覧を取得し、差分情報の値差分という分類結果と、検索式の単純構造式および述語付式という分類結果とから、両方の式とも差分情報で処理可能と判断して差分検索式評価部105に検索式と差分情報と差分情報の種類とを渡す。 Next, the difference analysis unit 104 obtains a list of registered expressions and their types from the search expression management unit 102, classifies the difference information value difference, and the search structure simple structure expression and predicate expression classification result Therefore, it is determined that both formulas can be processed using the difference information, and the search formula, the difference information, and the type of the difference information are passed to the differential search formula evaluation unit 105.

次に差分検索式評価部105は、検索式のセットと差分情報と差分情報の種類とを受け取ると、まず検索式"/A/B/C"について元文書管理部103から"/A/B/C"の元文書に対する処理過程情報を取得し、これと差分情報の書換え対象ノードであるID=11とを比較する。この場合、結果ノードセット、条件ノードセットはいずれもID=11を含まないので、新文書1801に対する"/A/B/C"の適用結果は元文書と同一であると判断され、元文書の処理結果であるID=3,ID=8の2つのノード以下の部分が出力部107に送られ、その結果、"t1"、"t3"という検索式の該当箇所が出力される。同様に"/A/B[C=t1]/D"も結果ノードセット、条件ノードセットはいずれもID=11を含まないので、元文書の処理結果であるID=5のノード以下の部分が出力部107に送られ、その結果、"t2"が出力される。 Next, when the difference search expression evaluation unit 105 receives the set of search expressions, the difference information, and the type of the difference information, first, the search expression “/ A / B / C” is obtained from the original document management unit 103 with “/ A / B”. The process information for the original document of “/ C” is acquired, and this is compared with ID = 11, which is the rewrite target node of the difference information. In this case, since the result node set and the condition node set do not include ID = 11, it is determined that the application result of “/ A / B / C” for the new document 1801 is the same as the original document, and the original document The portion of the processing result after ID = 3 and ID = 8 below the two nodes is sent to the output unit 107, and as a result, the corresponding part of the search expression “t1” and “t3” is output. Similarly, since “/ A / B [C = t1] / D” does not include ID = 11 in both the result node set and the condition node set, the portion below the node of ID = 5 that is the processing result of the original document As a result, “t2” is output.

よってこの場合、XPath式の評価および結果とは無関係の差分情報だったため、差分の適用や検索式の再評価などは一切行われずに結果が出力された事になる。 Therefore, in this case, since the difference information is irrelevant to the evaluation and the result of the XPath expression, the result is output without applying any difference or re-evaluating the search expression.

(差分例2)
次に、「文書1で、ID=11のノードを"t5"に置換し、ID=4のノードを"t6"に置換」という差分情報を含むメッセージ115が通信部1102に届いた場合を考える。これは、図19の新文書1901の構造化文書を、"文書1"との差分情報として表現したものに相当する。 (Difference example 2)
Next, consider a case where the message 115 including the difference information “Replace the node with ID = 11 with“ t5 ”and replace the node with ID = 4 with“ t6 ”in document 1” reaches the communication unit 1102. . This corresponds to a representation of the structured document of the new document 1901 in FIG. 19 as difference information from “Document 1”.

この差分情報が差分解析部104に渡されると、差分解析部104はID=4とID=11が共に値ノードか否かを調べ、両方とも値ノードであることから、この差分情報は値差分に分類される。 When this difference information is passed to the difference analysis unit 104, the difference analysis unit 104 checks whether ID = 4 and ID = 11 are both value nodes, and both are value nodes. are categorized.

次に差分解析部104は検索式管理部102から登録されている式とその種類の一覧を取得し、差分情報の分類結果と、検索式の分類結果とから、両方の式とも差分情報で処理可能と判断して差分検索式評価部105に渡す。 Next, the difference analysis unit 104 obtains a list of registered expressions and their types from the search expression management unit 102, and processes both of the expressions using difference information from the difference information classification result and the search expression classification result. It is determined that it is possible and is passed to the difference search expression evaluation unit 105.

次に差分検索式評価部105は、検索式のセットと差分情報と差分情報の種類とを受け取ると、まず検索式"/A/B/C"について元文書管理部103から"/A/B/C"の元文書に対する処理過程情報を取得し、これと差分情報の書換え対象ノードであるID=4とID=11とを比較する。この場合、結果ノードセットがID=4を含む事から、差分検索式評価部105は新文書管理部1001にID=4に対する差分適用を指示する。 Next, when the difference search expression evaluation unit 105 receives the set of search expressions, the difference information, and the type of the difference information, first, the search expression “/ A / B / C” is obtained from the original document management unit 103 with “/ A / B”. The processing process information for the original document of “/ C” is acquired, and this is compared with ID = 4 and ID = 11, which are rewrite target nodes of the difference information. In this case, since the result node set includes ID = 4, the difference search expression evaluation unit 105 instructs the new document management unit 1001 to apply the difference for ID = 4.

新文書管理部1001はまだ何の差分適用も行っていない事から、元文書管理部103から元文書を取り出し、「ID=4のノードを"t6"に置換」という差分を適用してID=4の内容を"t6"に書換えて結果を返し、元文書の処理結果であるID=3,ID=8の2つのノード以下の部分が出力部107に送られ、その結果、"t6"、"t3"という検索式の該当箇所が出力される。 Since the new document management unit 1001 has not applied any difference yet, the new document management unit 1001 takes out the original document from the original document management unit 103, applies the difference of “replace node with ID = 4 to“ t6 ””, and ID = The content of 4 is rewritten to "t6" and the result is returned, and the part below the two nodes of ID = 3, ID = 8, which is the processing result of the original document, is sent to the output unit 107. As a result, "t6", The corresponding part of the search expression "t3" is output.

次に"/A/B[C=t1]/D"の処理過程情報をチェックするが、ここではまず条件ノードセットがID=4を含む事から、差分検索式評価部105は新文書管理部1001にID=4に対する差分適用を指示する。新文書管理部1001は既にID=4に対する差分を適用済みである事から、保存内容を参照して結果を返す。次に差分検索式評価部105は条件ノードセットにあるID=1,2,3,4,5に対して、"/A/B[C=t1]/D"の式の再評価を行う。ここでは"C=t1"の条件が満たされなくなっていることから、元文書に対するID=5という結果は既に条件を満たさなくなっており、新文書に対する結果からは外される。また、候補ノードセットはID=4を含まない事から、ID=1,7,8,9,10のノードに対する式の再評価は行われず、その結果、"/A/B[C=t1]/D"は該当箇所無しとして結果が出力される。 Next, the processing process information of “/ A / B [C = t1] / D” is checked. Here, since the condition node set includes ID = 4, the difference search expression evaluation unit 105 is a new document management unit. Instruct 1001 to apply the difference for ID = 4. Since the new document management unit 1001 has already applied the difference for ID = 4, the new document management unit 1001 returns the result with reference to the stored content. Next, the difference search expression evaluation unit 105 re-evaluates the expression “/ A / B [C = t1] / D” for ID = 1, 2, 3, 4, 5 in the condition node set. Here, since the condition “C = t1” is no longer satisfied, the result of ID = 5 for the original document no longer satisfies the condition and is excluded from the result for the new document. In addition, since the candidate node set does not include ID = 4, the expression is not re-evaluated for the nodes with ID = 1, 7, 8, 9, 10, and as a result, "/ A / B [C = t1] "/ D" is output as there is no corresponding part.

よってこの場合、XPath式の評価および結果に関連する「ID=4のノードを"t6"に置換」という差分のみが適用され、無関係な「ID=11のノードを"t5"に置換」という差分は適用されない。また式の再評価も、ID=1,2,3,4,5のノードに対する"/A/B[C=t1]/D"の式の評価のみ行われる。"/A/B/C"は式の再評価は行われず、出力結果の変更のみが行われる。 Therefore, in this case, only the difference “Replace node with ID = 4 with“ t6 ”” related to the evaluation and result of XPath expression is applied, and the difference with “Replace node with ID = 11 with“ t5 ”” is applied. Does not apply. In addition, re-evaluation of the expression is performed only for the expression of “/ A / B [C = t1] / D” with respect to the nodes of ID = 1, 2, 3, 4, and 5. "/ A / B / C" does not re-evaluate the expression, only the output result is changed.

(差分例3)
次に、「文書1で、ID=9のノードを"t1"に置換」という差分情報を含むメッセージ115が通信部1102に届いた場合を考える。これは、図20の新文書2001の構造化文書を、"文書1"との差分情報として表現したものに相当する。この差分情報が差分解析部104に渡されてから、差分検索式評価部105に渡される所までは差分例1と同じである。 (Difference example 3)
Next, let us consider a case where a message 115 including difference information “Replace node with ID = 9 with“ t1 ”in document 1” reaches communication unit 1102. This corresponds to a representation of the structured document of the new document 2001 in FIG. 20 as difference information from “Document 1”. The difference information is transferred from the difference analysis unit 104 to the difference search expression evaluation unit 105, and is the same as the difference example 1.

次に差分検索式評価部105は、検索式のセットと差分情報と差分情報の種類とを受け取ると、検索式"/A/B/C"について処理過程情報をチェックし、結果ノードセットがID=9を含む事から、差分例2の時と同様に新文書管理部1001にID=9に対する差分適用を指示し、ID=3と、ID=8以下のノードとを結果として出力する。 Next, when the difference search expression evaluation unit 105 receives the set of search expressions, the difference information, and the type of difference information, the difference search expression evaluation unit 105 checks the processing process information for the search expression “/ A / B / C” and the result node set is ID Since = 9 is included, the new document management unit 1001 is instructed to apply the difference with respect to ID = 9 as in the case of the difference example 2, and outputs ID = 3 and nodes with ID = 8 or less as a result.

次に"/A/B[C=t1]/D"の処理過程情報をチェックするが、ここではまず条件ノードセットと結果ノードセットはID=9を含まない事から、元文書での処理結果であるID=5はそのまま結果として出力される。 Next, the processing process information of "/ A / B [C = t1] / D" is checked, but here the condition node set and result node set do not include ID = 9, so the processing result in the original document ID = 5 is output as a result.

一方、ここでは候補ノードセットがID=9を含む事から、新文書管理部1001にID=9に対する差分適用指示を出し、既に差分は適用済みであるから、その適用結果に対して、候補ノードセットに記述されているID=1,7,8,9,10のノードに対して"/A/B[C=t1]/D"を再評価する。この場合、ID=9の置換により述語部に記述された条件が満たされるようになっておりその結果がID=10である事から、ID=10以下のノードが結果として出力され、その結果、差分検索式評価部105は先の元文書での結果とあわせてID=5とID=10の2つを"/A/B[C=t1]/D"の結果として出力する事になる。 On the other hand, since the candidate node set includes ID = 9, a difference application instruction for ID = 9 is issued to the new document management unit 1001 and the difference has already been applied. Re-evaluate "/ A / B [C = t1] / D" for the nodes with ID = 1, 7, 8, 9, 10 described in the set. In this case, the condition described in the predicate part is satisfied by the substitution of ID = 9, and the result is ID = 10. Therefore, a node having ID = 10 or less is output as a result, and as a result, The difference search expression evaluation unit 105 outputs two ID = 5 and ID = 10 as the result of “/ A / B [C = t1] / D” together with the result in the previous original document.

よってこの場合、元文書での評価結果の部分は検索式の再評価は行わず、差分情報による変更で検索式を満たす可能性のある新文書の一部分にのみ満たす可能性のある検索式のみの再評価が行われる。 Therefore, in this case, the part of the evaluation result in the original document is not re-evaluated, and only the part of the search expression that may satisfy only a part of the new document that may satisfy the search expression due to the change by difference information. Re-evaluation is performed.

(差分例4)
次に、「文書1で、ID=3のノードを"t1"に置換」という差分情報を含むメッセージ115が通信部1102に届いた場合を考える。これは、図21の新文書2101の構造化文書を、"文書1"との差分情報として表現したものに相当する。 (Difference example 4)
Next, consider a case where a message 115 including difference information “Replace node with ID = 3 with“ t1 ”in document 1” reaches communication unit 1102. This corresponds to a representation of the structured document of the new document 2101 in FIG. 21 as difference information from “Document 1”.

この差分情報が差分解析部104に渡されると、ID=3のノードが値ノードではなく要素ノードであることから、差分は構造差分として分類され、その結果、差分による計算は出来ないと判断され、全文検索式評価部106に差分情報と検索式とが渡される。 When this difference information is passed to the difference analysis unit 104, since the node with ID = 3 is not a value node but an element node, the difference is classified as a structure difference, and as a result, it is determined that calculation using the difference cannot be performed. Then, the difference information and the search formula are passed to the full-text search formula evaluation unit 106.

全文検索式評価部106は新文書管理部1001に全ての差分の適用を指示し、出来上がった新文書全体に対して登録されている2つの検索式をそれぞれ評価し、"/A/B/C"に対してはID=8のみを、"/A/B[C=t1]/D"については該当無しを結果として出力する。 The full-text search expression evaluation unit 106 instructs the new document management unit 1001 to apply all the differences, and evaluates each of the two search expressions registered for the entire completed new document, and "/ A / B / C As a result, only ID = 8 is output for “/ A / B [C = t1] / D”, and “not applicable” is output as a result.

よってこの場合、差分情報が複雑で、元文書での処理結果/処理過程情報と差分情報による差分計算が行えない場合にも、処理効率の向上は行えないものの、検索式の評価結果を出力する事が可能となる。 Therefore, in this case, even when the difference information is complicated and the difference between the processing result / processing process information and the difference information in the original document cannot be calculated, the processing efficiency cannot be improved, but the evaluation result of the search expression is output. Things will be possible.

(差分例5)
次に、検索式としてさらに、"/A/B/*[2]"というXPathが登録された場合を考える。図22の検索式テーブル2201は"/A/B/*[2]"が追加された検索式テーブル108の一例である。この式は「ルートノードの下のAというタグの中のBというタグの中に現われる任意のタグで2番目に出現するノード」という意味であり、「任意のタグで2番目」という単純な親子関係ではないノード間の関係を含む。この場合、検索式管理部102はこの検索式を複雑式として分類し、結果を検索式テーブル2201に保存する。 (Difference example 5)
Next, consider a case where an XPath “/ A / B / * [2]” is further registered as a search expression. The search formula table 2201 in FIG. 22 is an example of the search formula table 108 to which “/ A / B / * [2]” is added. This expression means "the node that appears second in any tag that appears in the tag B in the tag A under the root node", and a simple parent-child "second in any tag" Includes relationships between nodes that are not relationships. In this case, the search formula management unit 102 classifies the search formula as a complex formula and stores the result in the search formula table 2201.

ここで、差分例1と同じく「文書1のID=11のノードを"t5"に置換」という差分情報を含むメッセージ115が通信部1102に届いた場合を考える。 Here, as in the difference example 1, consider a case where the message 115 including the difference information “replace the node with ID = 11 of document 1 with“ t5 ”” reaches the communication unit 1102.

この場合、差分解析部104は差分情報を値差分に分類し、その結果、検索式"/A/B/C"と"/A/B[C=t1]/D"については、差分例1と同様に差分情報による検索が可能と判断され差分検索式評価部105に渡され、同様に評価される。これら2つの検索式の出力結果も差分例1と同じである。 In this case, the difference analysis unit 104 classifies the difference information as a value difference, and as a result, for the search expressions “/ A / B / C” and “/ A / B [C = t1] / D”, the difference example 1 In the same manner as above, it is determined that the search based on the difference information is possible, and it is passed to the difference search expression evaluation unit 105 and evaluated in the same manner. The output results of these two search expressions are also the same as in Difference Example 1.

一方、"/A/B/*[2]"は複雑式であるため、差分情報の種類によらず、常に全文検索式評価部106に渡される。全文検索式評価部106は新文書管理部1001に全ての差分の適用を指示し、出来上がった新文書全体に対して"/A/B/*[2]"のみを検索し、結果を出力する。この場合、ID=5とID=10のノードが結果として出力される。 On the other hand, since “/ A / B / * [2]” is a complex expression, it is always passed to the full-text search expression evaluation unit 106 regardless of the type of difference information. The full-text search expression evaluation unit 106 instructs the new document management unit 1001 to apply all the differences, searches only “/ A / B / * [2]” for the entire new document, and outputs the result. . In this case, nodes with ID = 5 and ID = 10 are output as results.

他の差分情報の例の場合も同様に、"/A/B/*[2]"は複雑式であるため、他の検索式の評価方法とは独立に、"/A/B/*[2]"は常に全文検索式評価部106で評価される事になる。 Similarly, in the case of other difference information examples, since "/ A / B / * [2]" is a complex expression, "/ A / B / * [[ 2] "is always evaluated by the full-text search expression evaluation unit 106.

よってこの場合、検索式が複雑で元文書での処理結果/処理過程情報と差分情報による差分計算が行えない場合にも、そのような検索式のみを全文評価するだけで、他の単純な検索式については可能であれば差分情報を用いた計算が可能となる。 Therefore, in this case, even if the search expression is complex and the difference calculation cannot be performed based on the processing result / process process information and difference information in the original document, other simple search can be performed by evaluating only such a search expression. If possible, the calculation using the difference information is possible.

以上本発明の実施の形態および実施例について説明したが、本発明は以上の例に限定されず、その他各種の付加変更が可能である。また、各実施の形態における構造化文書処理装置101および実施例におけるメッセージフィルタリング装置1101は、ハードウェア的に実現する以外に、コンピュータとプログラムとで実現することができる。プログラムは磁気ディスク等のコンピュータ可読記録媒体に記録されて提供され、コンピュータの立ち上げ時などにコンピュータに読み取られ、そのコンピュータの動作を制御することにより、そのコンピュータ上に前述した各実施の形態における検索式管理部102、元文書管理部103、差分解析部104、差分検索式評価部105、全文検索式評価部106、出力部107、差分計算部901、入力文書保存部902、新文書管理部1001、通信部1102、管理部1103を実現し、また前述した各実施の形態における処理をコンピュータに実行させる。 Although the embodiments and examples of the present invention have been described above, the present invention is not limited to the above examples, and various other additions and modifications can be made. The structured document processing apparatus 101 in each embodiment and the message filtering apparatus 1101 in the examples can be realized by a computer and a program in addition to being realized by hardware. The program is provided by being recorded on a computer-readable recording medium such as a magnetic disk, read by the computer when the computer is started up, and the operation of the computer is controlled to thereby execute the above-described embodiment on the computer. Search formula management unit 102, original document management unit 103, difference analysis unit 104, difference search formula evaluation unit 105, full-text search formula evaluation unit 106, output unit 107, difference calculation unit 901, input document storage unit 902, new document management unit 1001, a communication unit 1102, and a management unit 1103 are realized, and the computer executes the processes in the above-described embodiments.

本発明によれば、通信メッセージを検索式で指定されたルールによってフィルタリングやモニタリングをする通信装置といった用途に適用できる。 INDUSTRIAL APPLICABILITY According to the present invention, the present invention can be applied to applications such as a communication device that filters and monitors communication messages according to rules specified by a search expression.

本発明の第１の実施の形態の構成を示すブロック図である。It is a block diagram which shows the structure of the 1st Embodiment of this invention. 本発明の第１の実施の形態における検索式テーブルの構成の一例を示す図である。It is a figure which shows an example of a structure of the search expression table in the 1st Embodiment of this invention. 本発明の第１の実施の形態における元文書テーブルの構成の一例を示す図である。It is a figure which shows an example of a structure of the original document table in the 1st Embodiment of this invention. 本発明の第１の実施の形態における検索式管理部の動作を示す流れ図である。It is a flowchart which shows operation | movement of the search expression management part in the 1st Embodiment of this invention. 本発明の第１の実施の形態における元文書管理部の動作を示す流れ図である。It is a flowchart which shows operation | movement of the original document management part in the 1st Embodiment of this invention. 本発明の第１の実施の形態における差分解析部の動作を示す流れ図である。It is a flowchart which shows operation | movement of the difference analysis part in the 1st Embodiment of this invention. 本発明の第１の実施の形態における検索式および差分情報の種類の組合せと検索方式との対応関係を示す図である。It is a figure which shows the correspondence of the combination of the type | formula of a search formula and difference information, and a search method in the 1st Embodiment of this invention. 本発明の第１の実施の形態における差分検索式評価部の動作を示す流れ図である。It is a flowchart which shows operation | movement of the difference retrieval type | formula evaluation part in the 1st Embodiment of this invention. 本発明の第１の実施の形態における全文検索式評価部の動作を示す流れ図である。It is a flowchart which shows operation | movement of the full-text search expression evaluation part in the 1st Embodiment of this invention. 本発明の第１の実施の形態における検索アルゴリズムの解説図である。It is explanatory drawing of the search algorithm in the 1st Embodiment of this invention. 本発明の第２の実施の形態の構成を示すブロック図である。It is a block diagram which shows the structure of the 2nd Embodiment of this invention. 本発明の第３の実施の形態の構成を示すブロック図である。It is a block diagram which shows the structure of the 3rd Embodiment of this invention. 本発明の実施例の構成を示すブロック図である。It is a block diagram which shows the structure of the Example of this invention. 本発明の実施例における検索式テーブルの構成の一例を示す図である。It is a figure which shows an example of a structure of the search expression table in the Example of this invention. 本発明の実施例における元文書の解説図である。It is explanatory drawing of the original document in the Example of this invention. 本発明の実施例におけるＩＤを付与した元文書の解説図である。It is explanatory drawing of the original document which provided ID in the Example of this invention. 本発明の実施例における元文書テーブルの構成の一例を示す図である。It is a figure which shows an example of a structure of the original document table in the Example of this invention. 本発明の実施例における処理過程情報の解説図である。It is explanatory drawing of the process process information in the Example of this invention. 本発明の実施例における処理過程情報の解説図である。It is explanatory drawing of the process process information in the Example of this invention. 本発明の実施例における新文書の解説図である。It is explanatory drawing of the new document in the Example of this invention. 本発明の実施例における新文書の解説図である。It is explanatory drawing of the new document in the Example of this invention. 本発明の実施例における新文書の解説図である。It is explanatory drawing of the new document in the Example of this invention. 本発明の実施例における新文書の解説図である。It is explanatory drawing of the new document in the Example of this invention. 本発明の実施例における検索式テーブルの構成の一例を示す図である。It is a figure which shows an example of a structure of the search expression table in the Example of this invention. 従来技術の構造化文書処理装置の構成を示すブロック図である。It is a block diagram which shows the structure of the structured document processing apparatus of a prior art.

Explanation of symbols

１０１…構造化文書処理装置
１０２…検索式管理部
１０２ａ…分類部
１０２ｂ…保存部
１０３…元文書管理部
１０３ａ…検索式評価部
１０３ｂ…保存部
１０４…差分解析部
１０４ａ…分類部
１０４ｂ…評価部決定部
１０５…差分検索式評価部
１０６…全文検索式評価部
１０７…出力部
１０８…検索式テーブル
１０９…元文書テーブル
１１１…検索式
１１２…元文書
１１３…差分情報
１１４…構造化文書
１１５…メッセージ
９０１…差文計算部
９０２…入力文書保存部
１００１…新文書管理部
１１０１…メッセージフィルタリング装置
１１０２…通信部
１１０３…管理部
１２０１…検索式テーブル
１３０１、１４０１…元文書
１５０１…元文書テーブル
１６０１、１７０１…元文書
１８０１…新文書１
１９０１…新文書２
２００１…新文書３
２１０１…新文書４
２２０１…検索式テーブル
２３０１…入力部
２３０２…差文計算部
２３０３…差分反映部
２３０４…出力部
２３０５…元文書格納部
２３０６…元文書処理結果格納部 DESCRIPTION OF SYMBOLS 101 ... Structured document processing apparatus 102 ... Search formula management unit 102a ... Classification unit 102b ... Storage unit 103 ... Original document management unit 103a ... Search formula evaluation unit 103b ... Storage unit 104 ... Difference analysis unit 104a ... Classification unit 104b ... Evaluation unit Determination unit 105 ... difference search expression evaluation unit 106 ... full-text search expression evaluation unit 107 ... output unit 108 ... search expression table 109 ... original document table 111 ... search expression 112 ... original document 113 ... difference information 114 ... structured document 115 ... message 901 ... Difference sentence calculation unit 902 ... Input document storage unit 1001 ... New document management unit 1101 ... Message filtering device 1102 ... Communication unit 1103 ... Management unit 1201 ... Search formula table 1301, 1401 ... Original document 1501 ... Original document table 1601, 1701 ... original document 1801 ... new document 1
1901 ... New Document 2
2001 ... New Document 3
2101 ... New document 4
2201 ... Search formula table 2301 ... Input unit 2302 ... Difference calculation unit 2303 ... Difference reflection unit 2304 ... Output unit 2305 ... Original document storage unit 2306 ... Original document processing result storage unit

Claims

A search expression management unit that classifies and holds search expressions based on their properties, an original document management unit that stores an original document together with a result of applying the original document to the held search expression and processing process information at the time of application, and a structure A difference analysis unit that classifies document difference information based on its properties and determines a search expression evaluation method based on the type of difference and the type of search expression; A search formula evaluation unit that evaluates a search formula using information, a search formula and its classification result, an original document, and a search formula evaluation result for the original document, and a search formula evaluation using differential information is impossible In this case, a new document in which difference information is applied to the original document and a search expression is applied to the new document, and the results of the difference search expression evaluation section and the full-text search expression evaluation section are combined into the original document. Retrieval formula for documents with difference information applied Structured document processing system characterized by comprising an output section for outputting the evaluation result.

The search formula management unit classifies the search formula as a simple structural formula when the search formula includes only the parent-child relationship of the nodes of the structured document, and the search formula adds a predicate including the simple structural formula to the simple structural formula. If it is a thing, it is classified as a predicate expression, and other expressions are classified as complex expressions. The difference analysis unit classifies the difference information according to whether the changed part is a value or a document structure, and the changed part is If only the value is classified as a value difference, and if the difference information is a value difference and the search formula is a simple structure formula or a predicate-added formula, it is determined that the search formula evaluation using the difference information is possible The structured document processing system according to claim 1, wherein the structured document processing system is one.

The original document management unit includes a result node set that is a list of nodes included in the application result of the search expression and a list of nodes that are directly evaluated by the search expression to derive the application result of the search expression. 3. The structured document processing system according to claim 2, wherein the structured document processing system is stored as a conditional node set.

The original document management unit further includes a candidate node that is a list of nodes that are determined to be not included in the search expression, including a predicate included in the search expression, an expression other than the predicate being applicable, and an evaluation result of the predicate 4. The structured document processing system according to claim 3, wherein the set is stored.

The original document management unit stores the original document with a unique ID assigned to each node and designates a node set of processing process information by the ID, and the difference analysis unit identifies the change information by the ID. 5. The structured document processing system according to claim 4, wherein the system is designated.

The difference search expression evaluation unit determines whether or not there is a change to the output of the processing result for the original document by the result node set when the difference is the value difference and the search expression is the simple structural expression. If there is no change, the processing result is used as the evaluation result of the search formula as it is, and if there is a change, it is applied and the result obtained by applying the difference to the processing result of the original document is used as the evaluation result. Item 6. The structured document processing system according to Item 4 or 5.

The difference search expression evaluation unit further outputs a search expression evaluation for the original document when the difference is the value difference and the search expression is a pre-descriptive expression, and the change according to the difference is performed by the condition node set. Check whether there is a possibility that the result is invalid, and if there is a possibility, apply only the difference related to the condition node set and reevaluate the search expression for the part, and the condition node Whether or not there is a change in the output of the processing result for the original document by the result node set when there is no possibility in the search of the set or when it is determined that the search expression corresponds to the search expression as a result of the re-evaluation of the search expression If there is no change, the processing result is used as the evaluation result of the search expression as it is, and if there is a change, it is applied and the result of applying the difference to the processing result of the original document is used as the evaluation result. , Whether or not there is a possibility that a node determined by the candidate node set that the change due to the difference is not applicable in the search expression evaluation for the original document may be applicable. Apply only the differences related to the set and re-evaluate the search expression for that part, and if the result is determined to be applicable to the search expression, check whether there is a change to the output result of the corresponding part, 7. The structured document processing system according to claim 6, wherein when there is no change, the output result is directly used as an evaluation result of a search expression, and when there is a change, an evaluation result is applied.

A difference calculation unit that calculates a difference by comparing the input structured document with the original document; and an input document storage unit that stores the input structured document. The difference search expression evaluation unit and the full-text search 8. The structured document processing according to claim 1, wherein the expression evaluation unit refers to the input document stored in the input document storage unit instead of applying the difference to the original document. system.

A new document management unit that manages a new document in which a part or all of the differences are applied to the original document is further included, and the difference search expression evaluation unit and the full-text search expression evaluation unit are configured to apply the difference to the original document. Requests to the new document management unit to apply and obtain the result, and the new document management unit returns the application result from the saved new document without applying the difference again for the already applied difference. Item 8. The structured document processing system according to any one of Items 1 to 7.

A storage unit that stores one or more search expressions and their types, an original document, a result of applying the search expression to the original document, and processing process information;
The difference information of a given structured document is classified based on its property, and the difference information is classified for each search expression according to the type of the difference information and the type of the search expression stored in the storage unit. A difference analysis unit that determines whether or not to perform an evaluation, and
A difference search expression evaluation unit that performs evaluation using a result of applying the search expression to the original document stored in the storage unit and processing process information for the search expression determined to be evaluated using difference information; ,
A full-text search expression evaluation unit that performs evaluation using a document in which the difference information is applied to an original document stored in the storage unit for a search expression determined to be evaluated without using difference information. A structured document processing system.

The storage unit stores a simple structural formula search formula including only the parent-child relationship of the nodes of the structured document, a result of applying the search formula to the original document, and processing process information.
The difference analysis unit determines that when the difference information of the given structured document is a value difference in which the changed portion is only a value, the search formula for the simple structural formula is evaluated using the difference information. ,
The differential retrieval formula evaluation unit evaluates the simple structural formula retrieval formula using a result of applying the simple structural formula retrieval formula to the original document stored in the storage unit and processing process information. The structured document processing system according to claim 10.

In the storage unit, a predicate-added search expression obtained by adding a predicate including a simple structural formula to a simple structural formula including only the parent-child relationship of the nodes of the structured document, and a result and processing of applying the search formula to the original document Process information is stored,
The difference analysis unit determines that when the difference information of the given structured document is a value difference in which the changed part is only a value, the search expression with the predescription word is evaluated using the difference information. And
The difference search expression evaluation unit evaluates a search expression with a pre-descriptive word using a result obtained by applying the search expression with a pre-description word to the original document stored in the storage unit and processing process information. 11. The structured document processing system according to claim 10, wherein the structured document processing system is performed.

In a method for evaluating a search expression for a structured document using a computer,
The computer adds one or more types of search expressions to the storage unit and stores the result in the storage unit by adding a result of applying the search expression to the original document and processing process information. A first step of storing;
The computer classifies the difference information of a given structured document based on its property, and for each search formula according to the type of the difference information and the type of search formula stored in the storage unit A second step of determining whether to perform the evaluation using the difference information;
For the retrieval formula determined by the computer to be evaluated using the difference information, the evaluation is performed using the result of applying the retrieval formula to the original document stored in the storage unit and the process information, and the difference information And a third step of evaluating, using a document obtained by applying the difference information to the original document stored in the storage unit, for a search expression determined to be evaluated without using Document processing method.

In a method for evaluating a search expression for a structured document using a computer,
The computer adds a simple structural formula search formula including only the parent-child relationship of the nodes of the structured document, a result of applying the search formula to the original document, and processing process information to the storage unit. A first step of storing;
When the computer classifies the difference information of a given structured document based on its property and the change part is a value difference whose value is only a value, the retrieval formula of the simple structural formula is evaluated using the difference information A second step of determining to perform
A third step in which the computer evaluates the retrieval formula of the simple structural formula using a result of applying the retrieval formula of the simple structural formula to the original document stored in the storage unit and processing process information; A structured document processing method comprising:

In a method for evaluating a search expression for a structured document using a computer,
A result of applying the search formula to the original document by adding a predicate-added search formula in which the computer adds a predicate including the simple structural formula to a simple structural formula including only the parent-child relationship of the nodes of the structured document; And a first step of adding the process information and storing it in the storage unit;
When the computer classifies the difference information of a given structured document based on its property, and the change part is a value difference whose value is only a value, the difference information is used for a search expression with a pre-descriptive expression A second step for determining to perform the evaluation;
A third evaluation in which the computer evaluates a search formula with a pre-description word using a result of applying the search formula with a pre-description word to an original document stored in the storage unit and processing process information; A structured document processing method comprising: steps.

On the computer,
A first type of one or more retrieval formulas is added and stored in the storage unit, and a result of applying the retrieval formula to the original document and processing process information are added and stored in the storage unit. And processing
The difference information of a given structured document is classified based on its property, and the difference information is classified for each search expression according to the type of the difference information and the type of the search expression stored in the storage unit. A second process for determining whether or not to use the evaluation;
For the retrieval formula determined to be evaluated using the difference information, the evaluation is performed using the result of applying the retrieval formula to the original document stored in the storage unit and the process information, and the difference information is not used. A program for causing a search expression determined to be evaluated to perform a third process of performing an evaluation using a document in which the difference information is applied to an original document stored in the storage unit.

On the computer,
A first search formula that includes only a parent-child relationship between nodes of a structured document, a result obtained by applying the search formula to the original document, and processing process information are added to the storage unit and stored in the storage unit. And processing
The difference information of a given structured document is classified based on its property, and when the changed part is a value difference whose value is only a value, it is determined that the retrieval formula of the simple structural formula is evaluated using the difference information A second process to
To perform a third process for evaluating the simple structural formula search formula using the result of applying the simple structural formula search formula to the original document stored in the storage unit and the process information. Program.

On the computer,
A search expression of a predicate expression in which a predicate including a simple structure expression is added to a simple structure expression including only a parent-child relationship of nodes of a structured document, a result of applying the search expression to the original document, and processing process information And a first process of storing in the storage unit,
If the difference information of a given structured document is classified based on its properties, and the change part is a value difference whose value is only the value, the search expression with a pre-descriptive word expression is evaluated using the difference information A second process to determine;
For the search expression with a pre-descriptive word, a third process for performing an evaluation using the result of applying the search expression with the pre-description word to the original document stored in the storage unit and processing process information is performed. Program to let you.