JP2008171181A - Structured data search apparatus - Google Patents

Structured data search apparatus Download PDF

Info

Publication number
JP2008171181A
JP2008171181A JP2007003444A JP2007003444A JP2008171181A JP 2008171181 A JP2008171181 A JP 2008171181A JP 2007003444 A JP2007003444 A JP 2007003444A JP 2007003444 A JP2007003444 A JP 2007003444A JP 2008171181 A JP2008171181 A JP 2008171181A
Authority
JP
Japan
Prior art keywords
search
xpath expression
template
structured data
xml data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP2007003444A
Other languages
Japanese (ja)
Inventor
Tadamaru Kawasaki
Rei Yano
直丸 川崎
令 矢野
Original Assignee
Toshiba Corp
Toshiba Solutions Corp
東芝ソリューション株式会社
株式会社東芝
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Toshiba Corp, Toshiba Solutions Corp, 東芝ソリューション株式会社, 株式会社東芝 filed Critical Toshiba Corp
Priority to JP2007003444A priority Critical patent/JP2008171181A/en
Publication of JP2008171181A publication Critical patent/JP2008171181A/en
Application status is Granted legal-status Critical

Links

Images

Abstract

An XML data that does not conform to a specified schema can be searched from an XPath expression that conforms to a schema specified by a user.
An input unit inputs an XPath expression (first search expression) including a path component designated by a user. The XPath expression analysis generation unit 12 analyzes the input XPath expression and extracts path components included in the XPath expression. Based on the extracted path components, the XPath expression analysis generation unit 12 searches for XML data having a schema different from the input XPath expression stored in the XML data storage device 20 ( 2nd search formula) is generated. The XML data search unit 15 uses the generated fixed XPath expression to search the XML data storage device 20 for structured data that matches the fixed XPath expression. The result output unit 16 outputs the result of the XML data search unit 15.
[Selection] Figure 1

Description

  The present invention relates to a structured data search apparatus that searches a plurality of types of XML data having different schemas stored in an XML database according to an XPath expression specified by a user.

  In general, data having a logical structure is called structured data. In structured data, the logical structure of the data may be indicated by a tag described in the data. The structured data in which the logical structure is expressed using this tag is suitable for processing that is interpreted or processed by a computer for various purposes. As representative of structured data, XML data described in XML (Extensible Markup Language) format is known.

  In recent years, XML has been used in a large number of applications, and various data have been described in XML format. Accordingly, a technique for retrieving XML data described in the XML format is important.

  The XML data described above has a hierarchical logical structure composed of a plurality of components, and for example, the structure of element names of the components does not necessarily have to be determined in advance. For example, data conforming to different schemas (data structures) is mixed in the XML data. Examples of different schemas include Extensible HyperText Markup Language (XHTML), NewsML, Rich Site Summary (RSS), and Atom.

  For example, a method for searching XML data desired by a user (searcher) with respect to an XML database in which XML data conforming to different schemas is mixed includes the following first to fourth methods. There are several ways.

  First, as a first method, there is a method in which the user specifies an XPath expression (query language) for each different schema, for example. According to this, by performing a search using an XPath expression for each specified schema, it is possible to search even XML data that conforms to a different schema. The XPath expression is shown in a path format including a component, for example. According to this XPath expression, the XML data can be searched by designating the structure of the XML data by the component included in the XPath expression.

  As a second method, for example, there is a method of performing a full text search mechanically based on a keyword specified by a user. According to this, even XML data that conforms to a different schema can be searched for XML data including the specified keyword.

  As a third method, there is a method of creating a correspondence table of XPath expressions for different schemas. According to this, even when an XPath expression conforming to the schema designated by the user is input, XML data conforming to a schema different from the designated schema can be searched. As a technology related to this, for example, a technology is disclosed in which a user can search without being aware of the structure of XML data or the like and can acquire XML data desired by the user in a structure designated by the user (for example, (See Patent Document 1).

As a fourth method, there is a method of creating an abstract schema that matches each of different schemas. According to this, an abstract schema as an intermediate format corresponding to each of different schemas is created, and XML data conforming to the different schemas is searched by converting an XPath expression between the abstract schemas. be able to. As a related technology, for example, when retrieving desired structured data from a plurality of structured data having different data structures, the desired XML data is used as an element value without depending on the data structure. A technique for easily searching structured data including constituent elements is disclosed (see, for example, Patent Document 2).
JP 2003-316783 A JP 2004-164104 A

  According to the first to fourth methods as described above, it is possible to search even XML data conforming to different schemas.

  However, in the first method described above, the user needs to specify (input) a plurality of XPath expressions for different schemas, which is troublesome. Further, the user needs to grasp the structure of all XML data for different schemas.

  In the second method, since it is not possible to perform a search in consideration of the structure of XML data, a lot of noise data (data not required by the user) is included in the search results.

  Further, in the third method, it is possible to perform a search intended for the user, but it takes time and cost to create a correspondence table. Further, when XML data conforming to a new schema is added, it is necessary to add the XPath expression of the new schema to the correspondence table. As a result, there is a risk of further labor and cost.

  In the fourth method, the cost of creating an abstract schema as an intermediate format increases. Further, when XML data conforming to a new schema is added, there is a possibility that a change to the abstract schema occurs. This may further increase costs.

  An object of the present invention is to provide a structured data search apparatus that can also search XML data that does not conform to the schema from an XPath expression that conforms to the schema specified by the user.

  According to one aspect of the present invention, there is provided a structured data search device for searching data from structured data storage means for storing a plurality of structured data having different schemas. The structured data retrieval apparatus analyzes the first retrieval formula by inputting a first retrieval formula for retrieving structured data including a component designated by the user, and analyzing the first retrieval formula. Extraction means for extracting the constituent elements included in the search expression, and structured data having a schema different from that of the first search expression stored in the structured data storage means based on the extracted constituent elements A search expression generation unit that generates a second search expression for searching for data, and using the second search expression, the structured data that matches the second search expression is searched from the structured data storage unit. And a search result output means for outputting a result of the search means.

  According to the present invention, it is possible to retrieve XML data that does not conform to the designated schema from the XPath expression that conforms to the schema designated by the user.

  Embodiments of the present invention will be described below with reference to the drawings.

  FIG. 1 is a block diagram showing a functional configuration of the structured data search apparatus according to the present embodiment. As shown in FIG. 1, the structured data search device 10 includes an input unit 11, an XPath expression analysis generation unit 12, an XPath expression generation template reading unit 13, an XPath expression generation template storage unit 14, an XML data search unit 15, and a result output. Part 16 is included.

  An XML data storage device 20 is provided outside the structured data search device 10. The XML data storage device 20 stores, for example, a plurality of structured data (hereinafter referred to as XML data) having different schemas (data structures). The XML data is, for example, data hierarchized by tags described in the XML data. The different schemas include, for example, XHTML (Extensible HyperText Markup Language), NewsML, RSS (Rich Site Summary), Atom, and the like.

  The input unit 11 inputs, for example, a search expression (first search expression) for searching XML data that conforms to a schema specified by the user. This search expression (hereinafter referred to as XPath expression) is shown in a path format. Further, the XPath expression includes, for example, a component specified by the user (hereinafter referred to as a path component). In the XPath expression, the structure of structured data is represented by an array of components included in the XPath expression. The path component included in the XPath expression includes, for example, an element (or attribute) corresponding to a tag of XML data or a predicate indicating a restriction on the element. This predicate includes a function such as “contains” indicating that a certain character string is included in the content of the corresponding element, for example. The XPath expression is indicated as "/ 0th element [0th predicate] / first element [1st predicate] /.../ final element or attribute [final predicate]", for example. Note that n (n = 0, 1,...) Of the nth element (or the nth predicate) represents the hierarchy of the constituent elements included in the XPath expression. The [nth predicate] and [final predicate] portions can be omitted.

  The XPath expression analysis generation unit 12 analyzes the XPath expression input to the input unit 11 (hereinafter referred to as input XPath expression). The XPath expression analysis generation unit 12 extracts a path component included in the input XPath expression according to the analysis result. Further, the XPath expression analysis / generation unit 12 generates a standard XPath expression based on the extracted path component and the template stored in the XPath expression generation template storage unit 14. The XPath expression analysis generation unit 12 generates a standard XPath expression (second search expression) for searching structured data conforming to a schema specified by the user and a schema different from the specified schema.

  The XPath expression generation template reading unit 13 reads, for example, an XPath expression generation template (hereinafter simply referred to as a template) stored in advance in the XPath expression generation template storage unit 14. The XPath expression generation template reading unit 13 passes the read template to the XPath expression analysis generation unit 12.

  The XPath expression generation template storage unit 14 stores a plurality of templates. The plurality of templates are template files used for generating a standard XPath expression, for example. The plurality of templates are expressed, for example, in a format in which variables are embedded in path components included in an XPath expression. Each of the plurality of templates has a different combination of variables. The variable embedded in the template corresponds to the corresponding part of the path component included in the XPath expression, for example, and has a format that can mechanically generate the XPath expression. The variable is defined in advance so as to represent the hierarchy of components included in the XPath expression.

  Each template is stored in the XPath expression generation template storage unit 14 in association with, for example, a level indicating the ambiguity of an XPath expression generated using the template (hereinafter referred to as an ambiguity level). This ambiguity level is set, for example, at one or more stages.

  The XML data search unit 15 searches the XML data using the input XPath expression or the standard XPath expression generated by the XPath expression analysis generation unit 12 among the XML data stored in the XML data storage device 20. The XML data search unit 15 includes a storage unit (not shown) that stores (holds) the searched results.

  The result output unit 16 outputs the XML data searched by the XML data search unit 15 as a search result for display to a user, for example. Further, the result output unit 16 distinguishes and outputs, for example, XML data searched using the input XPath expression or XML data searched using the standard XPath expression generated by the XPath expression analysis generation unit 12. In addition, the result output unit 16 retrieves the XML data searched using the standard XPath expression generated by the XPath expression analysis generation unit 12 for each level of the template used when the XPath expression is generated. Is output (displayed).

  FIG. 2 is a block diagram illustrating a functional configuration of the XPath expression analysis generation unit 12. The XPath expression analysis generation unit 12 includes an XPath expression analysis unit 121, a template analysis unit 122, a conversion unit 123, and an XPath expression generation unit 124.

  The XPath expression analysis unit 121 analyzes the input XPath expression. The XPath expression analysis unit 121 extracts path components according to the analysis result.

  The template analysis unit 122 analyzes the template read by the XPath expression generation template reading unit 13. The template analysis unit 122 cuts out a template variable (part) according to the analysis result.

  The conversion unit 123 associates the path component extracted by the XPath expression analysis unit 121 and the variable extracted by the template analysis unit 122, and executes processing for replacing the corresponding path component with a variable.

  The XPath expression generation unit 124 generates a standard XPath expression from the template subjected to the replacement process by the conversion unit 123.

  FIG. 3 shows an example of the data structure of the XPath expression generation template storage unit 14. In the example of FIG. 3, templates A to F are stored in the XPath expression generation template storage unit 14. Templates A to C are stored in the XPath expression generation template storage unit 14 in association with the ambiguity level 1, templates D and E in association with the ambiguity level 2, and template F in association with the ambiguity level 3.

  The ambiguity level indicates the ambiguity of the standard XPath expression generated based on each template. This ambiguity level can be defined as, for example, ambiguity level 0 is no ambiguity and ambiguity level 3 is ambiguity close to full-text search. In the present embodiment, the input XPath is assumed to have an ambiguity level 0 (no ambiguity), and the ambiguity increases as the ambiguity level increases.

Hereinafter, a specific example of the template stored in the XPath expression generation template storage unit 14 will be described.
As a specific example of a template of ambiguous level 1 (for example, template A), “/${Elem.FIRST}//*[${Predicate.LAST}]” may be mentioned. Here, “$ {Elem.FIRST}” is a variable indicating, for example, the 0th element (element whose hierarchy is 0) included in the XPath expression, and is the path component included in the input XPath expression. The first element is shown. “$ {Predicate.LAST}” is a variable indicating, for example, the final predicate (the predicate having the lowest hierarchy), and indicates the last predicate among the path components included in the input XPath expression. “/” Included in this template indicates that, for example, the elements before and after the “/” are in a parent-child relationship, and “//” indicates that, for example, the elements before and after the “//” are in a descendant relationship ( (Including parent-child relationship). According to the XPath expression generated based on this template, for example, the final predicate included in the input XPath expression is included in the following hierarchy (descendant relationship) from the 0th element included in the input XPath expression (final description). XML data (satisfying the restriction indicated by the part) is retrieved.

  As a specific example of the template of the ambiguity level 2 (for example, template D), “//${Elem.LAST}[${Predicate.LAST}]” can be cited. “$ {Elem.LAST}” is a variable indicating the last element, and indicates the last element among the path components included in the input XPath expression. In addition, according to the XPath expression generated based on this template, for example, XML data is retrieved in which the final predicate included in the input XPath expression is included in the content (content) of the final element included in the input XPath expression. The

  As a specific example of the template of the ambiguity level 3 (for example, template F), “//*[${Predicate.LAST}]” is given. According to the XPath expression generated based on this template, for example, XML data including the final predicate included in the input XPath expression is searched.

  In addition to the above, the variable includes “$ {Elem.n}” indicating the nth element, “[$ {Predicate.FIRST}]” indicating the 0th predicate, or ““ indicating the nth predicate ”. [$ {Predicate.n}] "etc. are included.

Next, the processing procedure of the structured data search device 10 will be described with reference to the flowchart of FIG.
First, the input unit 11 receives an XPath expression for searching XML data that conforms to the schema specified by the user (step S1).

  Next, the XML data search unit 15 uses the XPath expression (input XPath expression) input to the input unit 11 to search the XML data storage unit 20 for XML data that matches the input XPath expression (step S2). .

  The XML data searched by the XML data search unit 15 is held in a storage unit included in the XML data search unit 15 (step S3). At this time, the XML data is held as a search result of an ambiguous level 0.

  The XPath expression analysis generator 12 analyzes the input XPath expression. Thereby, the XPath expression analysis generation unit 12 extracts each of the path components included in the input XPath expression (step S4).

  Next, the XPath expression analysis generation unit 12 outputs a request (reading request) for reading (reading) a template of the ambiguous level n from the XPath expression generation template storage unit 14 to the XPath expression generation template reading unit 13. . The XPath expression generation analysis generation unit 12 first requests to read a template with a low ambiguity.

  The XPath expression generation template reading unit 13 reads a template stored in the XPath expression generation template storage unit 14 in response to a read request from the XPath expression analysis generation unit 12 (step S5). First, an ambiguous level 1 template is read from the XPath expression generation template storage unit 14. At this time, when a plurality of templates are stored in association with the ambiguity level 1 as in the XPath expression generation template storage unit 14 shown in FIG. 3 described above, all of the plurality of templates are read. To do. The XPath expression generation template reading unit 13 outputs (passes) the read template to the XPath expression analysis generation unit 12.

  The XPath expression analysis generation unit 12 generates a standard XPath expression based on the path component extracted in step S4 and the template read by the XPath expression generation template reading unit 13 (step S6). At this time, for example, when a plurality of templates are read by the XPath expression generation template reading unit 13, a standard XPath expression is generated for each of the plurality of templates.

  The XML data search unit 15 searches the XML data storage device 20 for XML data that matches the XPath expression, using the standard XPath expression generated by the XPath expression analysis generation unit 12 (step S7). The XML data searched by the XML data search unit 15 is held in the storage unit of the XML data search unit 15 (step S8). In this case, the searched XML data is held as a search result for each ambiguity level.

  Next, the XPath expression analysis generation unit 12 determines whether or not there is a template (unprocessed template) for which processing for generating a fixed XPath expression is not executed in the XPath expression generation template storage unit 14 (step S9). .

  If it is determined that an unprocessed template exists in the XPath expression generation template storage unit 14 (YES in step S9), the process returns to step S5 and is repeated. In this case, the XPath expression analysis generation unit 12 outputs a reading request to the XPath expression generation template reading unit 13 so as to read the template of the ambiguity level 2. Thereby, in step S5, the XPath expression generation template reading unit 13 reads the template of the ambiguity level 2 stored in the XPath expression generation template storage unit 14 in response to the reading request from the XPath expression analysis generation unit 12. That is, the process is repeated by increasing the value of the ambiguity level n in step S5 by one. As described above, the template stored in the XPath expression generation template storage unit 14 is processed in steps 5 to 8 for each ambiguity level of the template.

  On the other hand, when it is determined that there is no unprocessed template in the XPath expression generation template storage unit 14 (NO in step S9), the result output unit 16 displays the search result held in the storage unit of the XML data search unit 15 For example, it is output for display to the user (step S10). The result output unit 16 outputs a search result for each ambiguity level, for example.

  Here, as described above, it has been described that the search process (step S2 and step S7) using the input XPath expression and the fixed XPath expression generated by the XPath expression analysis generation unit 12 is executed separately. For example, the process may be executed collectively between steps S9 and S10.

Next, a detailed processing procedure of the processing in step S4 shown in FIG. 4 will be described with reference to the flowchart in FIG.
First, the XPath expression analysis unit 121 of the XPath expression analysis generation unit 12 separates character strings included in the input XPath expression by “/”, for example (step S11). At this time, the XPath expression analysis unit 121 separates character strings included in the input XPath expression by “/” not included in “[]”. In other words, this separates each element (or a set of elements and predicates).

  Next, if each of the character strings separated in step S11 includes an element (name) and a predicate, the XPath expression analysis unit 121 separates the element and the predicate included in the character string (step S12). When “[]” is included in the character string separated in step S11, a portion surrounded by “[]” is a predicate, and the other is an element (or attribute).

  The XPath expression analysis unit 121 extracts each of the elements or predicates separated in steps S11 and S12 as a path component (step S13).

Here, with reference to FIG. 6, the process of extracting the path component described above will be specifically described.
As shown in FIG. 6, it is assumed that the input XPath expression is “/html/head/title[contains(./text(),'seismic impersonation ')]”. Hereinafter, processing for extracting a path component from the input XPath expression will be described.

  In this case, the XPath expression analysis unit 121 separates character strings included in the XPath expression with “/”. That is, the XPath expression analysis unit 121 separates “html”, “head”, and “title [contains (./ text (),’ seismic impersonation ’)]” from the input XPath expression.

  Next, since the separated “title [contains (./ text (),’ seismic impersonation ’)]” includes “[]”, it includes elements and predicates. For this reason, the XPath expression analysis unit 121 converts the separated “title [contains (./ text (), 'seismic impersonation')]” into the element “title” and the predicate “[contains (./ text (), 'Seismic camouflage')] ”.

  As a result, the XPath expression analysis unit 121 extracts “html”, “head”, “title”, and “[contains (./ text (), 'seismic disguise')” as path components.

  Here, as shown in FIG. 6, “html” extracted by the XPath expression analysis unit 121 is the 0th element of the input XPath expression, “head” is the first element, and “title” is the second element. “[Contains (./ text (),’ seismic impersonation ’)]” is the second predicate of the input XPath expression. Note that “null” shown in FIG. 6 indicates that it is empty, and the 0th and first predicates do not exist in the input XPath expression described here.

Next, a detailed processing procedure of the processing in step S6 shown in FIG. 4 will be described with reference to the flowchart in FIG.
First, the template analysis unit 122 of the XPath expression generation / generation unit 12 analyzes the template read by the XPath expression generation template reading unit 13. Thereby, the template analysis part 122 cuts out the variable (part) contained in the said template from the analyzed template (step S21).

  Next, the conversion unit 123 replaces the variable part extracted by the template analysis unit 122 with the path component corresponding to the variable part among the path components extracted by the XPath expression analysis unit 121 (step S22). ).

  A specific description will be given using an example of a path component extracted by the XPath expression analysis unit 121 shown in FIG. The template A of the ambiguous level 1 is set to “/${Elem.FIRST}//*[${Predicate.LAST}]”. The variable portion “$ {Elem.FIRST}” of template A is replaced with “html” that is the 0th element among the extracted path components. Also, the variable part “$ {Predicate.LAST}” is replaced with “contains (./ text (), 'seismic impersonation”) which is the second predicate (final predicate) among the extracted path components. Replace with.

  Next, the XPath expression generation unit 124 generates (determines) what is replaced by the conversion unit 123 as a standard XPath expression (step S23). Specifically, the part “//” or “[]” that is not a variable in the template is connected to the obtained path component. In the above example, “/html//*[contains(./text(),’seismic impersonation’)] ”is generated as the standard XPath expression. Note that, according to this standard XPath expression, XML data including a character string “seismic anti-spoofing” in the following hierarchy from the “html” element is searched.

  Here, with reference to FIG. 8 to FIG. 12, the search processing when an XPath expression is generated according to a template for each ambiguity level stored in the XPath expression generation template storage unit 14 will be specifically described. .

  8 to 12 show examples of XML data stored in the XML data storage unit 20, respectively. Note that the XML data 31 shown in FIG. 8 is XML data compliant with XHTML. The XML data 32 shown in FIG. 9 is XML data compliant with NewsML. XML data 33 shown in FIG. 10 is XML data compliant with RSS. The XML data 34 shown in FIG. 11 is XML data that conforms to Atom. The XML data 35 shown in FIG. 12 is XML data conforming to XHTML, similarly to the XML data 31 shown in FIG.

  Here, it is assumed that the input XPath expression is “/html/head/title[contains(./text(),’seismic impersonation’)] ”. The input XPath expression is an XPath expression that conforms to a schema (here, XHTML) specified by the user, for example. In this case, for example, as described with reference to FIG. 6, “html” that is the 0th element of the input XPath expression, “head” that is the first element, and “title” that is the second element (final element) as the path components. And the second predicate (final predicate) “[contains (./ text (), 'seismic disguise')]” is extracted.

  Further, as an example of a template for each ambiguity level, the template A at the ambiguity level 1 is “/${Elem.FIRST}//*[${Predicate.LAST}]”. The template D of the ambiguity level 2 is “//${Elem.LAST}[${Predicate.LAST}]”. Further, it is assumed that the template F of the ambiguous level 3 is “//*[${Predicate.LAST}]”.

  First, when search processing is executed using an input XPath expression, according to the input XPath expression, there is an element “head” below the element “html” and below the element “head”. There is an element “title”, and XML data in which the content of the element “title” includes the character string “quake-proof camouflage” is searched. The XML data that matches the input XPath expression corresponds to one case of the XML data 31 among the XML data 31 to 35. That is, XML data that conforms to XHTML is retrieved.

  Next, “/ html // * [contains (./ text (),’ seismic impersonation ’)]” is generated as a standard XPath expression from the template A of the ambiguous level 1 and the extracted path components. When search processing is executed using this standard XPath expression, XML data that includes the character string of “seismic impersonation” in the following hierarchy from the element “html” is searched according to the standard XPath expression. . As XML data that matches the standard XPath expression, two pieces of XML data 31 and 35 among the XML data 31 to 35 correspond. That is, XML data that conforms to XHTML including XML data (XML data 31) that matches the input XPath expression described above is retrieved.

  Also, “//title[contains(./text(),’seismic impersonation’)] ”is generated as a standard XPath expression from the template D of the ambiguous level 2 and the extracted path components. When search processing is executed using this standard XPath expression, XML data in which the character string “seismic impersonation” is included in the content of the element “title” is searched according to the standard XPath expression. As XML data that matches the standard XPath expression, three pieces of XML data 31, 33, and 34 among the XML data 31 to 35 correspond. That is, XML data (XML data 33 and 34) conforming to a schema (RSS and Atom) different from the schema (XHTML) specified by the user is also searched.

  In addition, “//*[contains(./text(),'seismic impersonation ')]” is generated as a standard XPath expression from the template F of the ambiguous level 3 and the extracted path components. When the search process is executed using the XML data, the XML data including the character string “quake-proof camouflage” is searched according to the standard XPath expression. In other words, this corresponds to a case where a full-text search is executed using a character string (keyword) of “seismic resistance camouflage”. All of the XML data 31 to 35 corresponds to the XML data that matches the standard XPath expression. That is, XML data (XML data 32, 33, and 34) that conforms to a schema (NewsML, RSS, and Atom) different from the schema (XHTML) specified by the user is also searched.

  As described above, for example, when the search process is executed using only the input XPath expression compliant with XHTML, only the XML data 31 compliant with XHTML is searched. However, by generating and searching a standard XPath expression based on a template stored in the XPath expression generation template storage unit 14 and a path component extracted from the XPath expression, the schema is different (that is, conforming to XHTML). It is possible to search even the XML data 32-34.

  In the present embodiment, a path component is extracted from an input XPath expression specified by the user, and a standard XPath expression is generated using the path component and the template. As a result, even XML data that conforms to a schema other than the schema specified by the user can be searched.

  Further, in the above embodiment, by creating a template that takes into account the hierarchical structure, it is possible to reduce noise data even when search processing is executed using a standard XPath expression.

  In the above embodiment, since the template stored in the XPath expression generation template storage unit 14 does not depend on the schema, for example, even when XML data having a new schema is a search target, it is changed. It is not necessary to perform such processing. In other words, XML data having a schema that does not comply can be retrieved by creating an XPath expression from a template, regardless of the input XPath expression conforming to the schema. Thereby, even when XML data having a new schema is a search target, it is possible to reduce labor and cost when changing a template, for example.

  In the above-described embodiment, the input XPath expression is described as conforming to a schema specified by the user, for example. However, the input XPath expression may not conform to the schema specified by the user, for example. Any XPath expression that can generate a standard XPath expression based on a template stored in the XPath expression generation template storage unit 14 may be used.

  If a template that can generate the same fixed XPath expression as the input XPath expression exists in the XPath expression generation template storage unit 14, the search process is not performed using the input XPath expression, and the template is used. The search processing may be performed using only the standard XPath expression generated in the same manner (same as the input XPath expression).

[First Modification]
A first modification of the present embodiment will be described. The structured data search device 10 according to the present modification executes search processing until the number of searches indicated by the input designated number information is exceeded.

With reference to the flowchart of FIG. 13, the processing procedure of the structured data search apparatus 10 will be described.
First, the input unit 11 receives an XPath expression for searching XML data that conforms to a schema specified by the user, and specified number information indicating the number of searches (search result number) specified by the user (step S31). ). Here, the description will be made assuming that the number of searches designated by the user is x (x is an integer of 1 or more).

  Next, the XML data search unit 15 uses the XPath expression (input XPath expression) input to the input unit 11 to search for XML data that matches the input XPath expression from the XML data storage unit 20 (step S32). .

  The XML data searched by the XML data search unit 15 is held in a storage unit included in the XML data search unit 15 (step S33). At this time, the XML data is held as a search result of an ambiguous level 0.

  Next, the XPath expression analysis generation unit 12 determines whether or not the number of XML data retrieved by the XML data retrieval unit 15 is less than x (step S34). When it is determined that the number of retrieved XML data is less than x (YES in step S34), the XPath expression analysis generation unit 12 analyzes the input XPath expression. Thereby, the XPath expression analysis generation unit 12 extracts each path component included in the input XPath expression (step S35).

  Next, the XPath expression analysis / generation unit 12 requests the XPath expression generation template reading unit 13 to read (read) the template of the ambiguous level n from the XPath expression generation template storage unit 14 (read request). Here, the XPath expression analysis generation unit 12 first requests reading from a template with low ambiguity.

  The XPath expression generation template reading unit 13 reads a template stored in the XPath expression generation template storage unit 14 in response to a read request from the XPath expression analysis generation unit 12 (step S36). First, an ambiguous level 1 template is read from the XPath expression generation template storage unit 14. The XPath expression generation template reading unit 13 outputs the read template to the XPath expression analysis generation unit 12.

  The XPath expression analysis generation unit 12 generates a standard XPath expression based on the path component extracted in step S35 and the template read by the XPath expression generation template reading unit 13 (step S37).

  The XML data search unit 15 searches the XML data storage device 20 for XML data that matches the XPath expression using the XPath expression generated by the XPath expression analysis generation unit 12 (step S38). The XML data searched by the XML data search unit 15 is held in the storage unit of the XML data search unit 15 (step S39). This XML data is held as a search result for each ambiguity level.

  Next, the XPath expression analysis generation unit 12 determines whether or not the total number of search results so far is less than x (step S40). Here, the total number of search results is the total number of search results of the ambiguity levels 0 and 1.

  When it is determined that the total number of search results is less than x (YES in step S40), the process returns to step S36 and is repeated. In this case, the processing for the template of the ambiguity level 2 is executed in step S36. That is, until the total number of search results exceeds x, the process is repeated in ascending order of the ambiguity level.

  On the other hand, when it is determined that the total number of search results is not less than x (NO in step S40), the result output unit 16 sends the search results held in the storage unit of the XML data search unit 15 to the user, for example. Output for display (step S41). The result output unit 16 outputs a search result for each ambiguity level, for example.

  On the other hand, if it is determined in step S34 that the number of searched XML data is not less than x, the process of step S41 is executed.

  As described above, in this modified example, the number of searches specified by the user is limited, and the processing is executed in the order of the template with the lower ambiguity level, and therefore, for example, stored in the XPath expression generation template storage unit 14. As compared with the case where the search process is executed using the standard XPath expression generated based on all the templates, it is possible to reduce the noise data and obtain a desired search result for the user.

[Second Modification]
A second modification of the present embodiment will be described. The structured data search apparatus 10 according to the present modification executes the search process by limiting to the ambiguity level indicated by the designation level information input to the input unit 11.

Next, the processing procedure of the structured data search device 10 will be described with reference to the flowchart of FIG.
First, the input unit 11 receives an XPath expression for searching XML data that conforms to the schema specified by the user and specified level information indicating the ambiguity level specified by the user (step S51). Here, the fuzzy level designated by the user (hereinafter referred to as a designated fuzzy level) is assumed to be N (N is an integer of 0 or more).

  Next, the XML data search unit 15 searches the XML data storage unit 20 for XML data that matches the input XPath expression using the XPath expression (input XPath expression) input by the input unit 11 (step S52). . The XML data searched by the XML data search unit 15 is held in a storage unit included in the XML data search unit 15 (step S53). At this time, the XML data is held as a search result of an ambiguous level 0.

Next, the XPath expression analysis generation unit 12 determines whether or not the designation ambiguity level (N in this case) indicated by the designation level information input to the input unit 11 is 0 (step S54).
When it is determined that the designated ambiguity level is not 0 (NO in step S54), the XPath expression analysis generation unit 12 analyzes the input XPath expression. Thereby, the XPath expression analysis generator 12 extracts each path component included in the input XPath expression (step S55).

  Next, the XPath expression analysis / generation unit 12 requests the XPath expression generation template reading unit 13 to read (read) the template of the ambiguous level n from the XPath expression generation template storage unit 14 (read request). Here, the XPath expression analysis generation unit 12 first requests reading from a template with low ambiguity.

  The XPath expression generation template reading unit 13 reads a template stored in the XPath expression generation template storage unit 14 in response to a read request from the XPath expression analysis generation unit 12 (step S56). First, an ambiguous level 1 template is read from the XPath expression generation template storage unit 14. The XPath expression generation template reading unit 13 outputs the read template to the XPath expression analysis generation unit 12.

  The XPath expression analysis generator 12 generates a standard XPath expression based on the path component extracted in step S55 and the template read by the XPath expression generation template reader 13 (step S57).

  The XML data search unit 15 searches the XML data storage device 20 for XML data that matches the XPath expression using the XPath expression generated by the XPath expression analysis generation unit 12 (step S58). The XML data searched by the XML data search unit 15 is held in the storage unit of the XML data search unit 15 (step S59). This XML data is held as a search result for each ambiguity level.

Next, the XPath expression analysis generation unit 12 determines whether or not the designated ambiguity level is n (step S60). At this time, the value of n is 1. That is, it is determined whether or not the designated ambiguity level is 1.
When it is determined that the ambiguity level indicated by the specified level information is not n (here, 1) (NO in step S60), the process returns to step S56 and is repeated. In this case, the processing for the template of the ambiguity level 2 is executed in step S56. That is, the processing is repeated in ascending order of the ambiguity level until the template with the ambiguity level N is processed. In other words, the process is executed only for a template associated with a level whose ambiguity level is lower than N.
On the other hand, when it is determined that the designated ambiguity level is n (YES in step S60), the result output unit 16 displays the search results held in the storage unit of the XML data search unit 15 to the user, for example. For this purpose (step S61). The result output unit 16 outputs a search result for each ambiguity level, for example.

  On the other hand, if it is determined in step S54 that the designated ambiguity level is 0, the process of step S61 is executed.

  As described above, in this modification, a fixed XPath expression is generated using a template of an ambiguous level designated by the user, and a search process is executed. As a result, it is possible to obtain a desired search result for the user.

  Note that the present invention is not limited to the above-described embodiment or its modifications, and can be embodied by modifying the components without departing from the scope of the invention in the implementation stage. In addition, various inventions can be formed by appropriately combining a plurality of constituent elements disclosed in the above-described embodiment or its modifications. For example, you may delete a some component from all the components shown by embodiment or each modification. Furthermore, you may combine suitably the component covering different embodiment or its modification.

It is a block diagram which shows the function structure of the structured data search device which concerns on embodiment of this invention. The block diagram which shows the function structure of the XPath type | formula analysis production | generation part 12 shown in FIG. The figure which shows an example of the data structure of the XPath expression production | generation template storage part 14. FIG. 4 is a flowchart showing a processing procedure of the structured data search device 10. The flowchart which shows the detailed process sequence of the process of step S4 shown in FIG. The figure for demonstrating concretely the process which extracts a path | pass component. 5 is a flowchart showing a detailed processing procedure of processing in step S6 shown in FIG. 4. The figure which shows an example of the XML data stored in the XML data storage part 20. FIG. The figure which shows an example of the XML data stored in the XML data storage part 20. FIG. The figure which shows an example of the XML data stored in the XML data storage part 20. FIG. The figure which shows an example of the XML data stored in the XML data storage part 20. FIG. The figure which shows an example of the XML data stored in the XML data storage part 20. FIG. The flowchart which shows the process sequence of the structured data search device 10 which concerns on the 1st modification of this embodiment. The flowchart which shows the process sequence of the structured data search device 10 which concerns on the 2nd modification of this embodiment.

Explanation of symbols

  DESCRIPTION OF SYMBOLS 10 ... Structured data search device, 11 ... Input part, 12 ... XPath expression analysis generation part, 13 ... XPath expression generation template reading part, 14 ... XPath expression generation template storage part, 15 ... XML data search part, 16 ... Result output , 20 ... XML data storage device, 121 ... XPath expression analysis section, 122 ... Template analysis section, 123 ... Conversion section, 124 ... XPath expression generation section.

Claims (6)

  1. In a structured data retrieval device for retrieving data from structured data storage means in which a plurality of structured data having different schemas are stored,
    Input means for inputting a first search expression for searching for structured data including a component designated by a user;
    Extracting means for analyzing the first search expression and extracting components included in the first search expression;
    A search expression for generating a second search expression for searching for structured data having a schema different from the first search expression stored in the structured data storage unit based on the extracted component. Generating means;
    Search means for searching the structured data storage means for structured data that matches the second search expression using the second search expression;
    A structured data search apparatus comprising: search result output means for outputting a result of the search means.
  2. In a structured data retrieval device for retrieving data from structured data storage means in which a plurality of structured data having different schemas are stored,
    An input means for inputting a first search expression including a component specified by the user for searching the structured data of the schema specified by the user;
    Extracting means for analyzing the first search expression and extracting components included in the first search expression;
    Search expression generation means for generating a second search expression for searching the structured data of the specified schema and a schema different from the specified schema based on the extracted component;
    Search means for searching the structured data storage means for structured data that matches the second search expression using the second search expression;
    A structured data search apparatus comprising: search result output means for outputting a result of the search means.
  3. Template storage means for preliminarily storing at least one template expressed in a form in which a variable is embedded in a component included in a search expression for searching for structured data stored in the structured data storage means; Equipped,
    The search expression generation means replaces a variable embedded in the template with the extracted component based on at least one template stored in the template storage means, thereby the second search expression. The structured data search device according to claim 1, wherein:
  4. The template storage means stores a plurality of templates each having a different combination of the variables including the at least one template,
    Each of the plurality of templates is associated with a level indicating the ambiguity of the search expression,
    The structured data search device according to claim 3, wherein the search result output means outputs a search result for each level associated with the template.
  5. The input means further inputs designated level information indicating ambiguity,
    The search expression generation means is based on a template associated with a level indicating an ambiguity level lower than the ambiguity level indicated by the specified level information among a plurality of templates stored in the template storage means. 5. The structured data search device according to claim 4, wherein the second search expression is generated.
  6. The input means further inputs designated number information indicating the number of searches,
    The search expression generation means generates the second search expression based on the template in ascending order of ambiguity associated with each of the plurality of templates stored in the template storage means,
    The retrieval means sequentially retrieves structured data that matches the second retrieval formula until the number of retrieved structured data exceeds the number of retrievals indicated by the specified number of pieces of information. 4. The structured data search device according to 4.
JP2007003444A 2007-01-11 2007-01-11 Structured data search apparatus Granted JP2008171181A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2007003444A JP2008171181A (en) 2007-01-11 2007-01-11 Structured data search apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2007003444A JP2008171181A (en) 2007-01-11 2007-01-11 Structured data search apparatus

Publications (1)

Publication Number Publication Date
JP2008171181A true JP2008171181A (en) 2008-07-24

Family

ID=39699217

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2007003444A Granted JP2008171181A (en) 2007-01-11 2007-01-11 Structured data search apparatus

Country Status (1)

Country Link
JP (1) JP2008171181A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9087204B2 (en) 2012-04-10 2015-07-21 Sita Information Networking Computing Ireland Limited Airport security check system and method therefor
US9324043B2 (en) 2010-12-21 2016-04-26 Sita N.V. Reservation system and method
JP2016085758A (en) * 2012-08-31 2016-05-19 フェイスブック,インク. Api version testing based on query schema
US9460572B2 (en) 2013-06-14 2016-10-04 Sita Information Networking Computing Ireland Limited Portable user control system and method therefor
US9460412B2 (en) 2011-08-03 2016-10-04 Sita Information Networking Computing Usa, Inc. Item handling and tracking system and method therefor
US9491574B2 (en) 2012-02-09 2016-11-08 Sita Information Networking Computing Usa, Inc. User path determining system and method therefor
US10001546B2 (en) 2014-12-02 2018-06-19 Sita Information Networking Computing Uk Limited Apparatus for monitoring aircraft position
US10095486B2 (en) 2010-02-25 2018-10-09 Sita Information Networking Computing Ireland Limited Software application development tool
US10235641B2 (en) 2014-02-19 2019-03-19 Sita Information Networking Computing Ireland Limited Reservation system and method therefor
US10320908B2 (en) 2013-03-25 2019-06-11 Sita Information Networking Computing Ireland Limited In-flight computing device for aircraft cabin crew

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10095486B2 (en) 2010-02-25 2018-10-09 Sita Information Networking Computing Ireland Limited Software application development tool
US9324043B2 (en) 2010-12-21 2016-04-26 Sita N.V. Reservation system and method
US9460412B2 (en) 2011-08-03 2016-10-04 Sita Information Networking Computing Usa, Inc. Item handling and tracking system and method therefor
US10129703B2 (en) 2012-02-09 2018-11-13 Sita Information Networking Computing Usa, Inc. User path determining system and method therefor
US9491574B2 (en) 2012-02-09 2016-11-08 Sita Information Networking Computing Usa, Inc. User path determining system and method therefor
US9087204B2 (en) 2012-04-10 2015-07-21 Sita Information Networking Computing Ireland Limited Airport security check system and method therefor
US9667627B2 (en) 2012-04-10 2017-05-30 Sita Information Networking Computing Ireland Limited Airport security check system and method therefor
JP2016085758A (en) * 2012-08-31 2016-05-19 フェイスブック,インク. Api version testing based on query schema
US10320908B2 (en) 2013-03-25 2019-06-11 Sita Information Networking Computing Ireland Limited In-flight computing device for aircraft cabin crew
US9460572B2 (en) 2013-06-14 2016-10-04 Sita Information Networking Computing Ireland Limited Portable user control system and method therefor
US10235641B2 (en) 2014-02-19 2019-03-19 Sita Information Networking Computing Ireland Limited Reservation system and method therefor
US10001546B2 (en) 2014-12-02 2018-06-19 Sita Information Networking Computing Uk Limited Apparatus for monitoring aircraft position

Similar Documents

Publication Publication Date Title
Baumgartner et al. Visual web information extraction with lixto
JP4658420B2 (en) A system that generates a normalized display of strings
US6175830B1 (en) Information management, retrieval and display system and associated method
US9208185B2 (en) Indexing and search query processing
CA2242158C (en) Method and apparatus for searching and displaying structured document
US20020002567A1 (en) Method and system for managing documents
US6434554B1 (en) Method for querying a database in which a query statement is issued to a database management system for which data types can be defined
US20020078041A1 (en) System and method of translating a universal query language to SQL
JP3842573B2 (en) Structured document search method, structured document management apparatus and program
US6766330B1 (en) Universal output constructor for XML queries universal output constructor for XML queries
US7165216B2 (en) Systems and methods for converting legacy and proprietary documents into extended mark-up language format
JP3754253B2 (en) Structured document search method, structured document search apparatus, and structured document search system
US20090125529A1 (en) Extracting information based on document structure and characteristics of attributes
US7127452B1 (en) Image search apparatus, image search method and storage medium
US7353222B2 (en) System and method for the storage, indexing and retrieval of XML documents using relational databases
JP4264118B2 (en) How to configure information from different sources on the network
US20100169311A1 (en) Approaches for the unsupervised creation of structural templates for electronic documents
JP2005502100A (en) DICOMMXLDTD / Schema Generator
JP3842577B2 (en) Structured document search method, structured document search apparatus and program
US20070208769A1 (en) System and method for generating an XPath expression
US7739257B2 (en) Search engine
JP2004295674A (en) Xml document analysis method, xml document retrieval method, xml document analysis program, xml document retrieval program, and xml document retrieval device
Ferragina et al. Compressing and searching XML data via two zips
US7962474B2 (en) Parent-child query indexing for XML databases
US20030233618A1 (en) Indexing and querying of structured documents

Legal Events

Date Code Title Description
A300 Withdrawal of application because of no request for examination

Free format text: JAPANESE INTERMEDIATE CODE: A300

Effective date: 20100406