WO2006046523A1 - 文書解析システム、及び文書適応システム - Google Patents
文書解析システム、及び文書適応システム Download PDFInfo
- Publication number
- WO2006046523A1 WO2006046523A1 PCT/JP2005/019531 JP2005019531W WO2006046523A1 WO 2006046523 A1 WO2006046523 A1 WO 2006046523A1 JP 2005019531 W JP2005019531 W JP 2005019531W WO 2006046523 A1 WO2006046523 A1 WO 2006046523A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- document
- layout
- structured
- title
- semi
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/80—Information retrieval; Database structures therefor; File system structures therefor of semi-structured data, e.g. markup language structured data such as SGML, XML or HTML
- G06F16/84—Mapping; Conversion
- G06F16/88—Mark-up to mark-up conversion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/103—Formatting, i.e. changing of presentation of documents
- G06F40/117—Tagging; Marking up; Designating a block; Setting of attributes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/12—Use of codes for handling textual entities
- G06F40/14—Tree-structured documents
- G06F40/143—Markup, e.g. Standard Generalized Markup Language [SGML] or Document Type Definition [DTD]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/12—Use of codes for handling textual entities
- G06F40/151—Transformation
- G06F40/16—Automatic learning of transformation rules, e.g. from examples
Definitions
- the present invention provides a document analysis system, a document analysis method, a document analysis program, and a structured 'semi-structured document that can analyze a layout from a structured' semi-structured document 'using a layout.
- the document adaptation system, document adaptation method, and document adaptation program are applicable.
- the document description element refers to an element that is a description unit of a structured 'semi-structured document', for example, an HTML tag element such as a TABLE element or an A element in an HTML document.
- the component of the layout refers to a partial area having related information power that constitutes a display surface represented by a screen, for example, a partial area including related information of a certain heading in an HTML document.
- a table of contents document is generated from a document description element having a specific name according to a rule using the name of the document description element, and the contents of the table of contents items are described. (See Japanese Patent Application Laid-Open No. 9-251457).
- JP-A-10-289250 discloses a list of registered URLs. When displaying, not only the title information but also the image information is displayed so that the registered URL page can be intuitively recognized.
- JP-A-11 203285 discloses a line attribute indicating the position of a document element in a line for each line, and is based on the meaning of each morpheme constituting the document element and the line attribute of the line to which the document element belongs. In this technology, the meaning of each document element is determined for each line, and each document element in the original document is given a precise meaning.
- Japanese Patent Laid-Open No. 2003-85159 analyzes a head document of a desired structured document group, automatically creates a table of contents, and synthesizes the table of contents with image data of related documents for easy reading. Disclosure of technology for providing documents to users.
- JP-A-2004-86855 discloses a technique for facilitating the creation and editing of a document while referring to the contents of the document and the table of contents.
- a link for generating document content information corresponding to a table of contents item is embedded when generating a table of contents of a document.
- document content information including the table of contents item is generated.
- a link for instructing the output of the table of contents is embedded in the document content information.
- a link for generating document content information corresponding to the table of contents item is embedded in the table of contents.
- Japanese Patent Application Laid-Open No. 2003-288334 discloses a technique for generating a structured document tagged with a print document force that has a multi-page force with high accuracy.
- Japanese Patent Laid-Open No. 2003-330856 discloses both content local information and global information by dynamically performing layout generation and information granularity adjustment in response to an operation to change the zoom state. Discloses a technology that enables comfortable access to a computer.
- the first problem in the prior art is that the layout intended by the document provider cannot always be analyzed in the document analysis system for structured and semi-structured documents.
- the reason for this is that there are a variety of document description methods, and layout analysis using the strength of document description element delimiters cannot always analyze the layout intended by the document provider. It is.
- a second problem in the prior art is that only a part of titles can be analyzed in a document analysis system for structured 'semi-structured documents'.
- the reason is that the title is generally expressed using the name, attributes, style, and contents of the document description element. Therefore, the title analysis according to the rule using only the name of the document description element can be used to analyze some titles and force analysis. Because there is no.
- a third problem in the prior art is that, in the document analysis system for structured and semi-structured documents, application software cannot be developed using layout information analyzed by a third party. The reason is that the conventional document analysis system does not output the analyzed layout information in a format that can be used by a third party.
- a fourth problem in the prior art is that the document adaptation system for structured 'semi-structured documents' accurately captures the logical structure of the document intended by the document provider, and connects the document to a network or terminal. It cannot be adapted to the user's environment. The reason is that when a table of contents document is generated according to the rules using the names of document description elements, some titles cannot be analyzed and the table of contents document cannot be generated correctly. Also, when generating a composite document according to the rules defined by the user using the URL of the document (Uniform Resource Locator) and the reference of the document description element that indicates the location of the required information of the document, the user is required to update the document. This is because the composite document desired by the document provider may not be generated correctly, and these rules do not accurately capture the logical structure of the document intended by the document provider.
- URL of the document Uniform Resource Locator
- a first object of the present invention is to provide a document analysis system that can analyze a layout intended by a document provider.
- the second object of the present invention is to provide a document analysis system capable of comprehensively analyzing titles.
- a third object of the present invention is to provide a document analysis system capable of outputting layout information in a format that can be used by a third party to develop application software.
- the fourth object of the present invention is to provide a document adaptation system capable of accurately grasping the logical structure of a document intended by a document provider and adapting the document to a network, terminal, or user environment. There is.
- a document analysis system refers to the arrangement of document description elements included in a structured document or semi-structured document, and uses the document description elements juxtaposed in a certain direction.
- a basic layout analysis unit that analyzes the layout of the structured document or the semi-structured document by grouping is provided.
- the document analysis system of the present invention stores a title analysis rule that stores a title analysis rule based on one or more values of the name, attribute, style, and content of a document description element of a structured document or semi-structured document.
- Title for analyzing title by comparing storage unit with one or more of name, attribute, style, contents of document description element contained in structured document or semi-structured document and title analysis rule A layout analysis in which the layout components are grouped using the analysis unit, the layout analyzed by the basic layout analysis unit, and the title analyzed by the title analysis unit, and a new layout is generated. Part.
- the layout analysis unit includes the layout analyzed by the basic layout analysis unit, the title analyzed by the title analysis unit, and the analysis analyzed by the layout analysis unit.
- a block selecting unit that selects a main component of the layout using a new layout; the layout analyzed by the basic layout analyzing unit; the new layout analyzed by the layout analyzing unit;
- a section calculation unit is provided for grouping the layout components using the main layout components selected by the block selection unit and generating a layout.
- the basic layout analysis unit refers to the arrangement of the document description elements below the grouped document description elements and is used for the immediately preceding group.
- the process of grouping the document description elements juxtaposed in the direction orthogonal to the direction is repeated a predetermined number of times to analyze the layout.
- the basic layout analysis unit refers to the arrangement of the document description elements below the grouped document description elements and is used for the immediately preceding group. Group the document description elements juxtaposed in the direction orthogonal to the direction The process is repeated a predetermined number of times to analyze the layout of the next layer.
- the basic layout analysis unit refers to an arrangement of only some designated document description elements among the document description elements.
- the document analysis system includes a title analysis rule based on one or more values of a name, an attribute, a style, and contents of a document description element of a structured document or a semi-structured document.
- the title analysis rule storage unit, and the title analysis rule is compared with one or more of the names, attributes, styles, and contents of the document description elements included in the structured document or the semi-structured document.
- a title analysis unit for analyzing the title for analyzing the title.
- the document analysis system refers to the document description element included in the structured document or the semi-structured document for the layout and title of the structured document or the semi-structured document.
- An output unit for shaping and outputting in an expression format is provided.
- the document adaptation system of the present invention provides a structured document or semi-structured document, and a document in which layout information of the structured document or the semi-structured document is described.
- a table of contents document output unit for generating and outputting a table of contents document using the above, a structured document or semi-structured document, and a document describing the contents of a table of contents using a document describing layout information
- an item document output unit for generating and outputting.
- the document adaptation system of the present invention includes a URI (Universal Resource Identifier) of the structured document or the semi-structured document, and the structured document or the semi-structured document.
- An output component information storage unit for storing a set of combinations of document output component IDs as output component information, and the structured component corresponding to the URI described in the output component information and the output component information
- a composite document output unit for generating and outputting a composite document using the document or the semi-structure document and the document in which the information outside the layer corresponding to the structure document or the semi-structure document is described;
- the document analysis method and the document analysis program of the present invention refer to the arrangement of document description elements included in a structured document or semi-structured document, and group the document description elements juxtaposed in a certain direction.
- the document analysis method and the document analysis program of the present invention include a title analysis rule based on one or more values of the name, attribute, style, and content of the document description element of the structured document or the semi-structured document. And analyzing the title by comparing at least one of the name, attribute, style, and content of the document description element included in the structured document or the semi-structured document with the title analysis rule. And a step of grouping the components of the layout using the analyzed layout and the analyzed title to generate a new layout.
- the document analysis method and the document analysis program of the present invention refer to the arrangement of the document description elements below the grouped document description elements in the step of analyzing the layout. For last grouping! A step of analyzing the layout by repeating a process of grouping the document description elements juxtaposed in a direction orthogonal to the beaten direction a predetermined number of times.
- the document analysis method and the document analysis program of the present invention refer to the arrangement of the document description elements below the grouped document description elements in the step of analyzing the layout. For last grouping! A step of analyzing the layout of the next layer by repeating the process of grouping the document description elements juxtaposed in the orthogonal direction of the beaten direction a predetermined number of times.
- the document analysis method and the document analysis program of the present invention refer to the arrangement of only some designated document description elements among the document description elements in the step of analyzing the layout.
- the document analysis method and the document analysis program of the present invention provide a title analysis rule based on one or more values of the name, attribute, style, and content of a document description element of a structured document or semi-structured document.
- the title is analyzed by comparing the title analysis rule with one or more of the name, attribute, style, and content of the document description element included in the structured document or the semi-structured document in the storing step. Steps.
- the layout and title of a structured document or semi-structured document are included in the structured document or semi-structured document.
- the document adaptation method and the document adaptation program of the present invention use a structured document or semi-structured document and a document in which information on the layout of the structured document or semi-structured document is described. Generating and outputting a document; generating and outputting a document describing the contents of a table of contents using the structured document or the semi-structured document; and a document describing the layout information; It comprises.
- the document adaptation method and document adaptation program of the present invention provide a set of combinations of URIs of structured documents or semi-structured documents and IDs of output components of the structured documents or semi-structured documents. Steps stored in the output component information storage unit as output component information, the output component information, and the structured document or the semi-structured document corresponding to the URI described in the output component information Generating and outputting a synthesized document using a document in which layout information corresponding to the structured document or the semi-structured document is described.
- the first effect is that the layout intended by the document provider can be analyzed. This is because the layout is analyzed based on the arrangement of document description elements. In addition to the layout analyzed based on the layout of the document description element and the name of the document description element, the title analyzed using the attribute, style, and contents is used to select the component of the previously analyzed layout. This is because grouping creates a new LV with a set of related components with the title as a key, and a new layout with the components.
- the second effect is that the title can be comprehensively analyzed.
- the reason for this is the ability to analyze titles using attributes, styles, and contents in addition to the names of document description elements.
- a third effect is that layout information can be provided in a format that can be used by a third party to develop application software.
- the reason is that the analyzed layout and title are output in a format in which the layout components and titles are expressed using references to document description elements.
- the fourth effect is that the document can be adapted to the network, terminal, and user environments by utilizing the logical structure of the document intended by the document provider.
- the reason is that in addition to structured 'semi-structured' documents, layout information that reflects the logical structure of the document intended by the document provider This is because the document is adapted to the environment using the described document.
- the present invention provides a document browsing system that uses a logical structure of a document, such as a function for displaying an outline of a document, a function for reading aloud, and a function for selectively displaying document items. It can be applied to applications such as programs for realizing the system on a computer.
- a document conversion system that uses the logical structure of a document, such as a function that generates an outline of a document, a function that divides a document according to the outline, or a function that selectively synthesizes document items, or a document conversion system on a computer. It can also be applied to programs for realization and other purposes.
- FIG. 1 is a block diagram showing a configuration of an embodiment for carrying out a first invention of the present invention.
- FIG. 2 is a flowchart showing the operation of the embodiment for carrying out the first invention of the present invention.
- FIG. 3 is a diagram showing an example of an HTML document.
- FIG. 4 is a diagram for explaining a first embodiment of the present invention.
- FIG. 5 is a diagram for explaining a first embodiment of the present invention.
- FIG. 6 is a diagram for explaining the first embodiment of the present invention.
- FIG. 7 is a diagram for explaining the first embodiment of the present invention.
- FIG. 8 is a diagram for explaining the first embodiment of the present invention.
- FIG. 9 is a diagram for explaining the first embodiment of the present invention.
- FIG. 10 is a diagram showing an example of an output format according to the first embodiment of the present invention.
- FIG. 11 is a block diagram showing a configuration of an embodiment for carrying out a second invention of the present invention.
- FIG. 12 is a flowchart showing the operation of the embodiment for carrying out the second invention of the present invention.
- FIG. 13 is a diagram showing an example of a title analysis rule according to the second embodiment of the present invention.
- FIG. 14 is a diagram for explaining a second embodiment of the present invention.
- FIG. 15 is a diagram showing an example of an output format according to the second embodiment of the present invention.
- FIG. 16 is a block diagram showing a configuration of an embodiment for carrying out a third invention of the present invention.
- ⁇ 17 A flow chart showing the operation of the embodiment for carrying out the third invention of the present invention.
- ⁇ 18 It is a figure for explaining a third embodiment of the present invention.
- FIG. 19 is a diagram for explaining a third embodiment of the present invention.
- FIG. 20 is a diagram for explaining a third embodiment of the present invention.
- ⁇ 21 It is a diagram showing an example of the output format of the third embodiment of the present invention.
- FIG. 22 is a block diagram showing a configuration of an embodiment for carrying out a fourth invention of the present invention.
- FIG. 24A is a diagram showing an HTML document among examples of an HTML document and an XML document.
- FIG. 24B is a diagram showing an XML document among examples of an HTML document and an XML document.
- FIG. 26 is a diagram showing an example of an item document according to the fourth embodiment of the present invention.
- FIG. 27 is a block diagram showing a configuration of an embodiment for carrying out the fifth invention of the present invention.
- FIG. 29 is a diagram showing an example of information related to an output component of the fifth exemplary embodiment of the present invention.
- FIG. 30A is a diagram showing an HTML document among examples of an HTML document and an XML document.
- FIG. 30B is a diagram showing an XML document among examples of an HTML document and an XML document.
- FIG. 31 is a diagram showing an example of a composite document according to the fifth embodiment of the present invention.
- ⁇ 32 It is a block diagram showing the configuration of the sixth exemplary embodiment of the present invention.
- ⁇ 33 It is a block diagram showing the configuration of the seventh exemplary embodiment of the present invention.
- ⁇ 34 A block diagram showing the configuration of the eighth embodiment of the present invention.
- ⁇ 35 A block diagram showing the configuration of the ninth embodiment of the present invention.
- FIG. 37 A diagram for explaining a sixth embodiment of the present invention.
- FIG. 38 is a diagram for explaining a sixth embodiment of the present invention.
- FIG. 39 is a diagram for explaining a sixth embodiment of the present invention.
- FIG. 40 is a diagram for explaining a sixth embodiment of the present invention.
- FIG. 41 is a diagram for explaining a sixth embodiment of the present invention.
- FIG. 42 is a diagram for explaining a sixth embodiment of the present invention.
- FIG. 43 is a diagram for explaining a sixth embodiment of the present invention.
- FIG. 44 is a diagram for explaining a sixth embodiment of the present invention.
- FIG. 45 is a diagram for explaining a sixth embodiment of the present invention.
- the system of the first embodiment of the present invention includes a data processing device 1 that operates under program control, and a storage device 2 that stores information.
- the data processing apparatus 1 includes an input unit 11, a layout analysis tool 12, and an output unit 13.
- the storage device 2 includes a rendering result storage unit 21 and an analysis result storage unit 22.
- the input unit 11 acquires a structured “semi-structured document” from the outside, renders the document, and stores the rendering result in the rendering result storage unit 21.
- the layout analysis tool 1 2 has a basic layout analysis unit 14.
- the output unit 13 acquires information about the layout components and their hierarchical relationships from the analysis result storage unit 22, shapes the layout components into a format that can be expressed using references to document description elements, and outputs the information to the outside.
- an ID may be given to the layout component and output.
- the basic layout analysis unit 14 obtains the rendering result from the rendering result storage unit 21, refers to the arrangement of the document description elements, and doubles the document description elements juxtaposed in a certain direction. Analyze the layout. Specifically, the document description element that is specified, for example, the document description element that is the root and the layout of the document description element that belongs to the component of the layout that has already been analyzed, is placed in a certain direction. Group description elements. The grouped document description elements and the document description elements that are not grouped and have no child document description elements are stored in the analysis result storage unit 22 as layout elements. A document description element that has not been dulled is processed recursively until all document description elements are grouped or there are no child document description elements. repeat.
- the rendering result storage unit 21 displays the processing result of the input unit 11, and the analysis result storage unit 22 The processing result of the out analysis tool 12 is stored.
- a layout of a specific hierarchy that is, a set of components of the layout is analyzed.
- the document description elements arranged in the direction orthogonal to the direction used for the previous grouping are further referred to the arrangement of the document description elements belonging to the layout elements.
- the layout may be analyzed a predetermined number of times by grouping and replacing the components of the previously analyzed layout.
- the analyzed document description element may be replaced with the parent document description element. .
- the layout of any layer can be analyzed as much as possible.
- the arrangement of only some designated document description elements may be referred to as the document description element.
- the layout that is, the layout components and their hierarchical relationships are analyzed.
- the input unit 11 acquires the external force structured “semi-structured document”, renders the document, and stores the rendering result in the rendering result storage unit 21 (step S101).
- the basic layout analysis unit 14 obtains the rendering result from the rendering result storage unit 21, sets the processing target hierarchy n to 1 (step S102), and determines whether or not to continue processing for the processing target hierarchy (Step S103). As the judgment criteria, the upper limit value of the processing hierarchy and the size of the analyzed basic layout can be used. Basic layout analysis section 1 If it is determined that the process is not continued, the process proceeds to step S107.
- step S104 If it is determined in step S103 that the processing is to be continued in step S103, the basic layout analysis unit 14 acquires a document description element to be processed (step S104).
- the basic layout analysis unit 14 refers to the arrangement of the document description elements to be processed, analyzes the layout by grouping the document description elements juxtaposed in a certain direction, and determines the layout of the upper layer layout. It is stored in the analysis result storage unit 22 in association with the component (step S).
- the basic layout analysis unit 14 sets the processing target hierarchy n to n + 1 (step S106), and repeats the processing after step S103.
- the output unit 13 acquires information on the layout components and their hierarchical relations from the analysis result storage unit 22, and reformats the layout components into a format that can be expressed using references to document description elements. Output (step S107).
- the layout is analyzed by referring to the arrangement of the document description elements of the structured 'semi-structured document, and the arranged document description elements are grouped. Therefore, the arrangement of the document description elements The layout based on can be analyzed. Therefore, the layout intended by the document provider can be analyzed even in structured “semi-structured” documents with various description methods. In addition, since the layout is output in a format that is expressed using the reference of the document description element, a third party can develop an application using the layout information.
- the second embodiment of the present invention includes a data processing device 1 that operates under program control, and a storage device 2 that stores information.
- the data processing apparatus 1 includes an input unit 11, a layout analysis tool 12, and an output unit 13.
- the storage device 2 includes a rendering result storage unit 21, an analysis result storage unit 22, and a title analysis rule storage unit 23.
- the input unit 11 acquires a structured “semi-structured document” from the outside, renders the document, and stores the rendering result in the rendering result storage unit 21.
- the layout analysis tool 1 2 has a title analysis unit 15.
- the output unit 13 acquires a set of titles from the analysis result storage unit 22, formats the titles into a format that can be expressed using references to document description elements, and externally outputs them. Output.
- the rendering result storage unit 21 stores the processing result of the input unit 11, and the analysis result storage unit 22 stores the processing result of the layout analysis tool 12.
- the title analysis rule storage unit 23 stores in advance a title analysis rule, that is, a rule based on the value of the name, attribute, style, and content of the document description element, which is a criterion for title determination.
- the title analysis unit 15 obtains the rendering result from the rendering result storage unit 21 and the title analysis rule from the title analysis rule storage unit 23, and adds the name of the document description element to the title analysis of the attribute, style, and content. Analyze the title by checking against the rules. Specifically, the specified document description element, for example, the root document description element is set as the processing target element, and the name, attribute, style, and content of the document description element are checked against the title analysis rule for the processing target element. If it is determined as a title as a result of collation, the document description element is stored as a title in the analysis result storage unit 22, and if there is an unprocessed element to be processed, the process is continued.
- the specified document description element for example, the root document description element is set as the processing target element, and the name, attribute, style, and content of the document description element are checked against the title analysis rule for the processing target element. If it is determined as a title as a result of collation, the document description element is stored as a title in the analysis result storage unit 22,
- the child document description element of the document description element is newly set as a processing target element, and the processing is continued until there is no unprocessed processing target element.
- the title analysis rule “attribute is unique in the document”, “background color or background image is used”, “character color or character size that is used less frequently in the document is used” Then, you can use rules based on the specificity of the document description elements on the rendered image.
- a plurality of document description elements having the same name, attribute, and style may be used as document description elements.
- the input unit 11 acquires the external force structured “semi-structured document”, renders the document, and stores the rendering result in the rendering result storage unit 21 (step S201).
- the title analysis unit 15 acquires the rendering result from the rendering result storage unit 21 and the title analysis rule from the title analysis rule storage unit 23, and acquires the document description element to be processed (step S202).
- the title analysis unit 15 checks whether or not there is a document description element to be processed, and becomes a processing target. If it is determined that there is no document description element, the process proceeds to step S208 (step S203).
- the title analysis unit 15 determines that there is a document description element to be processed in step S203, the title analysis unit 15 checks the name, attribute, style, and content of the document description element with the title analysis rule (step S204). .
- step S204 determines that the title is not a title by collation in step S204, or if the document description element is stored as a title in step S206, the title analysis unit 15 acquires a document description element to be processed next. Then, the process from step S203 is performed on the document description element (step S205).
- step S204 the title analysis unit 15 stores the document description element as a title in the analysis result storage unit 22, and proceeds to step S205 (step 206).
- the output unit 13 acquires a set of titles from the analysis result storage unit 22, formats the titles into a format that can be expressed using references to document description elements, and outputs them to the outside (step S207).
- the title is used for the attribute, style, and content, so the title expressed using the attribute and style can also be analyzed. .
- the title is output in a format that is expressed using the reference of the document description element, it is possible to develop application applications that use information from third-party titles.
- the third embodiment of the present invention includes a data processing device 1 that operates under program control, and a storage device 2 that stores information.
- the data processing apparatus 1 includes an input unit 11, a layout analysis tool 12, and an output unit 13.
- the storage device 2 includes a rendering result storage unit 21, an analysis result storage unit 22, and a title analysis rule storage unit 23.
- the input unit 11 obtains a structured 'semi-structured document' from the outside, and renders the document.
- the rendering result is stored in the rendering result storage unit 21.
- the layout analysis tool 12 includes a basic layout analysis unit 14, a title analysis unit 15, And an out analysis unit 16.
- the output unit 13 acquires from the analysis result storage unit 22 a new layout component and its hierarchical relationship, and the correspondence between each component and title, and the new layout component and title are converted into a document description element. Format it into a format that can be expressed using the reference of, and output it to the outside. Here, you can give the new layout component an ID and output it.
- the basic layout analysis unit 14 obtains the rendering result from the rendering result storage unit 21, refers to the arrangement of the document description elements, and doubles the document description elements juxtaposed in a certain direction. Analyze the layout.
- the functions of the basic layout analysis unit 14 are the same as the functions shown in the basic layout analysis unit 14 of the first embodiment of the present invention.
- the title analysis unit 15 obtains the rendering result from the rendering result storage unit 21 and the title analysis rule from the title analysis rule storage unit 23, and adds the title of the document description element to the title analysis of the attribute, style, and content. Analyze the title by checking against the rules.
- the function of the title analysis unit 15 is the same as the function shown in the title analysis unit 15 of the second embodiment of the present invention.
- the layout analysis unit 16 acquires the layout components analyzed by the basic layout analysis unit 14 and their hierarchical relationships, the title analyzed by the title analysis unit 15, and the title from the analysis result storage unit 22.
- a new layout is generated by grouping a layout element without a title with a layout element with a title or another layout element without a title. Specifically, first, the layout elements of the first layer are acquired, and the titles included in the elements are associated with the elements.
- a component without a title is grouped with the component with the title, for example, the one closest to the top on the source. If there is no component with the closest title, for example, group the component with the component with the closest V and no title.
- the grouped layout components are stored in the analysis result storage unit 22 together with the titles belonging to the components as new layout components. Furthermore, by repeating the above process for the number of layout layers analyzed by the basic layout analysis unit 14, the components of the new layout and the relationship between the layers and the correspondence between each component and the title are analyzed.
- the rendering result storage unit 21 is the input unit 11, and the analysis result storage unit 22 is a layout analysis module. Store the processing results of each of the 12 rules.
- the title analysis rule storage unit 23 stores a title analysis rule in advance.
- the input unit 11 acquires the external force structured “semi-structured document”, renders the document, and stores the rendering result in the rendering result storage unit 21 (step S301).
- the operation of the basic layout analysis unit 14 is the same as the operation of the basic layout analysis unit (14 in FIG. 1) shown in the first embodiment of the present invention (steps S 302 to S 306). .
- step S311 to step S316 The operation is the same as 11) 15) (step S311 to step S316).
- the layout analysis unit 16 acquires from the analysis result storage unit 22 the layout components analyzed by the basic layout analysis unit 14 and their hierarchical relationships, the title analyzed by the title analysis unit 15, and the processing target Hierarchy n is set to 1 (step S321).
- the layout analysis unit 16 determines whether there is a layout component of the processing target hierarchy, and if it determines that there is no layout component of the processing target hierarchy, the layout analysis unit 16 proceeds to step S331 (step S322).
- layout analysis unit 16 determines in step S322 that there is a layout component of the processing target hierarchy, it acquires the layout component of the nth hierarchy (step S323), and determines the layout of the nth hierarchy. The component is associated with the title (step S324).
- the layout analysis unit 16 creates a new layout configuration by grouping a layout component without a title with a layout component with a title or another layout component without a title.
- the element is analyzed and stored in the analysis result storage unit 22 (step S325).
- the layout analysis unit 16 sets the processing target hierarchy n to n + 1 and repeats the processing from step S322 onward (step S326).
- the output unit 13 receives the new layout components from the analysis result storage unit 22 and their hierarchical relationships.
- step S331 the execution order of the operations of the basic layout analysis unit 14 (steps S302 to S306) and the operations of the title analysis unit 15 (steps S311 to S316) may be interchanged. Specifically, steps S311 to S316 are executed immediately after step S301. When step S312 force S is “No”, steps S302 to S306 are executed. In this case, when step S303 is “No”, the process proceeds to step S321.
- the layout is analyzed by referring to the layout of the document description elements and grouping the document description elements placed side by side, and using the attributes, styles, and contents in addition to the names of the document description elements.
- a title grouping a layout component without a title with a layout component with a title, or another layout component without a title
- a layout that captures the logical structure can be analyzed. Therefore, it is possible to analyze the layout that reflects the intention of the document provider.
- layouts and titles are output in a format that uses document description element references, third parties can develop application applications that use layout information.
- the fourth embodiment of the present invention includes a data processing device 5 that operates under program control, and a storage device 6 that stores information.
- the data processing device 5 includes an input unit 51, a document input unit 52, a table of contents document output unit 53, and an item document output unit 54.
- the storage device 6 includes a structured / semi-structured document storage unit 61 and a layout document storage unit 62.
- the input unit 51 obtains user input from an input device such as a keyboard or via a network, and obtains a URI (Universal Resource Identifier) of a structured “semi-structured document” desired by the user. .
- the input unit 51 acquires user input from an input device such as a keyboard or via a network, and controls output. Specifically, according to the acquired input, determine the power to output a table of contents document, whether to output a document that describes the contents of the table of contents, and output a document that describes the contents of the table of contents items. In this case, the contents item to be output is also determined.
- the document input unit 52 uses the URI of the structured 'semi-structured document' desired by the user obtained by the input unit 51 to acquire the document, and the structured 'semi-structured document storage' Store in part 61.
- the document input unit 52 acquires a document in which the layout information specified by the acquired structured / semi-structured document is described using the reference of the document description element, and stores it in the layout storage unit 62.
- the document input unit 52 may use another method whenever the header information of the communication protocol is used to acquire a document in which the layout information of the obtained structured 'semi-structured document' is described. Good.
- layout information may be analyzed and stored in the layout document storage unit 62 using the system shown in the third embodiment of the present invention.
- the table of contents document output unit 53 lays out the structured / semi-structured document desired by the user from the structured / semi-structured / document storage unit 61.
- Each document describing the layout information of the structured 'semi-structured document' is obtained from the document storage unit 62, and a table of contents document is generated.
- the document power in which layout information is described is also extracted from all end layout components, the titles specified for each are extracted, and the original structure is created using references to document description elements.
- 'Create a table of contents document by extracting the document description elements corresponding to the title from the semi-structured document, formatting them, and arranging them in order.
- a certain number of characters may be extracted and arranged for the content power below the document description element corresponding to the constituent element.
- decorations may be added to the table of contents, such as inserting divider lines between the components of the layout of a specific hierarchy, or inserting titles given separately to structured 'semi-structured documents'.
- upper-layer components may be used instead of the terminal components.
- the generated table of contents document is provided to the user from an output device such as a display or a speaker, or via a network.
- the item document output unit 54 determines that the input unit 51 outputs a document describing the contents of the table of contents, and if the item of the table of contents to be output is determined, the structured document semi-structured document storage unit 61 From the layout document storage unit 62, the document describing the layout information of the structured / semi-structured document desired by the user is obtained and specified. Generate a document describing the contents of the contents of the table of contents. Specifically, a component of a layout having a specified table of contents item as a title is extracted, and a document description element corresponding to the component from the original structured 'semi-structured document using a document description element reference.
- a document describing the contents of the contents of the table of contents is generated by extracting, formatting and arranging them in order.
- the contents of the extracted document description elements may be further extracted and arranged as the contents of the item.
- the document description elements of the structured 'semi-structured document may be replaced with another document description element and arranged.
- the user may use the structure / semi-structure / document desired by the user as the document describing the contents of the table of contents, and output it by aligning it with the area corresponding to the specified table of contents item.
- a document describing the contents of the generated table of contents is provided to the user from an output device such as a display or a speaker, or via a network.
- the structured'semi-structured document storage unit 61 and the layout document storage unit 62 store the processing results of the document input unit 52.
- the table of contents document output unit 53 generates the table of contents document
- the table of contents document is stored, and in the item document output unit 54, all the documents describing the contents of the table of contents are generated and stored in advance.
- the table of contents document corresponding to the user input or the document describing the contents of the table of contents may be selected and output by the table of contents document output unit 53 or the item document output unit 54.
- the input unit 51 obtains user input from an input device such as a keyboard or via a network, and obtains the URI of the structured “semi-structured document” desired by the user (step S401).
- the document input unit 52 acquires a document using the acquired URI, and stores it in the structured / semi-structured document storage unit 61. Further, the document input unit 52 acquires a document in which layout information specified in the acquired structured / semi-structured document is described, and stores the document in the layout document storage unit 62 (step S402).
- the input unit 51 determines whether or not to continue the process, and when determining that the process is not continued, the input unit 51 ends (step S403).
- step S404 If the input unit 51 determines to continue the process in step S403, the output content is a table of contents. Whether or not (step S404).
- the table of contents document output unit 53 determines that the table of contents is to be output in step S404, the structured / semi-structured document desired by the user from the structured 'semi-structured document storage unit 61 is stored in the layout document storage.
- the document in which the layout information of the structured'semi-structured document is described is obtained from the part 62, and a table of contents document is generated.
- the generated table of contents document is provided to the user from the output device with a display or a network via a network (step S405).
- the item document output unit 54 determines that the table of contents is not output in step S404, the item document output unit 54 further determines the items of the table of contents to be output, and the structured / semi-structured document storage unit 61 determines the desired structure.
- the semi-structured document is acquired from the layout document storage unit 62, and each document describing the layout information of the structured / semi-structured document is generated, and a document describing the contents of the specified table of contents is generated. To do.
- the document describing the contents of the generated table of contents is provided to the user from an output device such as a display or a speaker or via the network (step S406).
- the input unit 51 outputs a table of contents document in step S405, or outputs a document describing the contents of the table of contents in step S406, and then from an input device such as a keyboard or via a network. User input is acquired, and the processing after step S403 is repeated (step S407).
- a document that describes the contents of a table of contents and contents of a table of contents is generated and output using a structured 'semi-structured document and a document that describes the layout information of the document. Allows users to view documents using a table of contents that accurately captures the logical structure of the intended document, and makes it easier to grasp the overall image of the document on a small screen! . Therefore, it is possible to provide a document adapted to the terminal environment.
- the fifth embodiment of the present invention includes a data processing device 7 that operates under program control, and a storage device 8 that stores information.
- the data processing device 7 includes a document input unit 71 and a composite document output unit 72.
- Storage device 8 includes an output component storage unit 81, a structured / semi-structured document storage unit 82, and a layout document storage unit 83.
- the document input unit 71 obtains information on the output component from the output component storage unit 81, obtains a document corresponding to the URI described in the information, and forms a structured / semi-structured document storage unit. Store in 82. In addition, the document input unit 71 acquires a document in which layout information specified by each acquired structured / semi-structured document is described by using a reference to the document description element, and stores it in the layout document storage unit 83.
- the document input unit 71 may use another method when the header information of the communication protocol is used to acquire a document in which the layout information of the acquired structured / semi-structured document is described. Good. Further, with respect to the obtained structured “semi-structured document”, layout information may be analyzed and stored in the layout document storage unit 83 by using the system shown in the third embodiment of the present invention.
- the composite document output unit 72 receives information on the output component from the output component storage unit 81, and the structure / semi-structured document desired by the user from the structure / semi-structure / document storage unit 82. Then, each document in which layout information is described is acquired from the layout document storage unit 83, and a composite document is generated. Specifically, it obtains the combination of all URIs and component IDs from the information about the output component, extracts the document corresponding to each URI, and extracts the component corresponding to the component ID.
- the document description element corresponding to the component is extracted from the original structured 'semi-structured document using the reference of, and then the combined document is generated by formatting and arranging in order. Here, for each component, the contents of the document description element belonging to it may be further extracted and arranged.
- the generated composite document is provided to the user from a display device or a network via a network.
- text information representing the title of the component is further stored as information regarding the output component, and is extracted when the component corresponding to the ID of the component to be output is extracted by the composite document output unit 72.
- the information on the output component is searched by matching the text information indicating the title of the component and the title of the component stored in the information on the output component, and searching for the correct component using the text information as a clue if the information is different.
- By updating the ID of the component to be output stored in it is possible to generate an appropriate composite document even if the layout has changed.
- information about output components The display position information is further stored, and the composite document output unit 72 uses the display position information.
- the output component storage unit 81 stores information on the component to be output, that is, a set of combinations of the URI of the structured 'semi-structured document to be output and the ID of the component to be output in the document. .
- the processing result of the document input unit 71 is stored.
- the document input unit 71 acquires information on the output component from the output component storage unit 81 (step S 501).
- the document input unit 71 acquires a document corresponding to the URI described in the information related to the acquired output component, and stores it in the structured “semi-structured document storage unit 82”. In addition, the document input unit 71 acquires a document in which the layout information designated in each of the acquired structured / semi-structured documents is described, and stores it in the layout document storage unit 83 (step S502).
- the composite document output unit 72 receives the information on the output component from the output component storage unit 81, and the structure / semi-structured document desired by the user from the structure / semi-structure / document storage unit 82. Then, each document in which layout information of the document is described is acquired from the layout document storage unit 83, and a composite document is generated. The generated composite document is provided to the user from an output device such as a display or a speaker or via a network (step S503).
- Figure 32 Referring to the sixth embodiment of the present invention, the data processing device 1 and the storage device 2 are provided as in the first, second, and third embodiments of the present invention.
- the document analysis program 3 is read into the data processing device 1 to control the operation of the data processing device 1, and the storage device 2 stores the rendering result storage unit 21, the analysis result storage unit 22, and the title analysis rule storage unit 23. Is generated.
- the data processing device 1 executes the same processing as the processing by the data processing device 1 in the first, second, and third embodiments under the control of the document analysis program 3.
- the seventh embodiment of the present invention includes a data processing device 5 and a storage device 6 as in the fourth embodiment of the present invention.
- the document adaptation program 4 is read into the data processing device 5 to control the operation of the data processing device 5, and is structured in the storage device 6.
- the semi-structured document storage unit 61 and the layout document storage unit 6 2 Is generated.
- the data processing device 5 executes the same processing as the processing by the data processing device 5 in the fourth embodiment under the control of the document adaptation program 4.
- the eighth embodiment of the present invention includes a data processing device 7 and a storage device 8 as in the fifth embodiment of the present invention.
- the document adaptation program 9 is read into the data processing device 7 to control the operation of the data processing device 7, and the output device storage unit 81, the structured and semi-structured document storage unit 8 2 are stored in the storage device 8.
- the layout document storage unit 83 is generated.
- the data processing device 7 executes the same processing as the processing by the data processing device 7 in the fifth embodiment under the control of the document adaptation program 9.
- the ninth embodiment of the present invention includes a data processing device 1 that operates under program control, and a storage device 2 that stores information.
- the data processing device 1 includes an input unit 11, a layout analysis tool 12, and an output unit 13.
- the storage device 2 includes a rendering result storage unit 21, an analysis result storage unit 22, and a title analysis rule storage unit 23.
- the input unit 11 acquires a structured “semi-structured document” from the outside, renders the document, and stores the rendering result in the rendering result storage unit 21.
- the layout analysis tool 12 includes a basic layout analysis unit 14, a title analysis unit 15, and a layout analysis unit 16.
- the output unit 13 obtains the layout components and their hierarchical relationships from the analysis result storage unit 22, and the correspondence between each component and the title, and references the layout components and titles to the document description elements. Format it into a format that can be used for output.
- the basic layout analysis unit 14 obtains the rendering result from the rendering result storage unit 21, refers to the arrangement of the document description elements, and doubles the document description elements juxtaposed in a certain direction. Analyze the layout.
- the functions of the basic layout analysis unit 14 are the same as the functions shown in the basic layout analysis unit 14 of the first embodiment of the present invention.
- the title analysis unit 15 obtains the rendering result from the rendering result storage unit 21 and the title analysis rule from the title analysis rule storage unit 23, adds the title of the document description element, and analyzes the title of the attribute, style, and content. Analyze the title by checking against the rules.
- the function of the title analysis unit 15 is the same as the function shown in the title analysis unit 15 of the second embodiment of the present invention.
- the layout analysis unit 16 includes a block selection unit 17 and a section calculation unit 18.
- the block selection unit 17 receives from the analysis result storage unit 22 the first layout components analyzed by the basic layout analysis unit 14 and their hierarchical relationships, and the second layout analyzed by the layout analysis unit 16.
- the component of the first layout and its hierarchical relationship are obtained, one component of the second layout that can be divided is selected, and the component of the first layout that constitutes the component is analyzed.
- the title analyzed by the title analysis unit 15 is acquired, the component of the first layout to be analyzed is associated with the title, the name of the document description element constituting the associated title, Select major titles based on attributes and styles. Then, the component with the main title is the main component.
- the component is a boundary line force. This may be used as the main layout component, or the first layout component to be analyzed. Based on the distance between the components, the components that are more than the specified distance from the previous component may be the main components.
- the section calculation unit 18 groups a non-major component with a major component or another non-major component.
- the second layout component is generated and stored in the analysis result storage unit 22.
- non-major components are grouped with major components, for example, the ones closest to the top on the source. If there is no closest major component, for example, group with the closest minor component.
- the newly generated second layout component may be stored in place of the original second layout component, or the newly generated second layout component. May be stored as a child of a component of the original second layout. Further, it may be stored as a component of the second layout except for the component of the first layout determined as the boundary line.
- the input unit 11 acquires the external force structured “semi-structured document”, renders the document, and stores the rendering result in the rendering result storage unit 21 (step S901).
- the operation of the basic layout analysis unit 14 is the same as that of the basic layout analysis unit (14 in Fig. 1) shown in the first embodiment of the present invention (steps S902 to S906). .
- the operation of the title analysis unit 15 is the same as the operation of the title analysis unit (15 in FIG. 1) shown in the second embodiment of the present invention (step S 911 to step S 916).
- the block selection unit 17 receives from the analysis result storage unit 22 the components of the first layout analyzed by the basic layout analysis unit 14 and their hierarchical relationships, the title and layout analyzed by the title analysis unit 15. The component of the second layout analyzed by the analysis unit F16 and its hierarchical relationship are acquired (step S921).
- the block selection unit 17 determines whether there is a component of the second layout that can be divided, and if it determines that there is no component of the second layout that can be divided, the process proceeds to step S931 (step S931). S922).
- the number of titles included in the components of the second layout the number of components having the title among the components of the first layout constituting the components of the second layout, the second Area, width, height, etc. of layout components Can be used.
- step S923 If it is determined in step S923 that there is a component of the second layout that can be divided, the block selection unit 17 selects one component of the second layout that can be divided and configures it. The constituent elements of the first layout are analyzed (step S923).
- the block sorting unit 17 associates the component of the first layout to be analyzed with the title.
- Step S924 With respect to the associated titles, main titles are selected based on the names, attributes, and styles of the document description elements constituting the titles (Step S925).
- the block selection unit 17 uses the first layout component having the main title selected in step S925 as the main component.
- the main component based on the name, attribute, style, and contents of the document description element that constitutes the first layout component to be analyzed, it is determined whether or not the component is a boundary line force.
- the component determined as or the next component may be used as a component of the main layout, or based on the distance between the components of the first layout object to be analyzed, it is separated from the previous component by a predetermined distance or more.
- the main component can also be a component (step S926).
- the section calculation unit 18 groups a non-major component with a major component or another non-major component. As a result, the components of the second layout are generated and stored in the analysis result storage unit 22 (step S927).
- the output unit 13 acquires from the analysis result storage unit 22 the second layout components and their hierarchical relationships, and the correspondences between the respective components and titles, and the second layout components and titles. Is formatted into a format that can be expressed using a reference to the document description element and output to the outside (step S931).
- step S902 to S906 the execution order of the operation of the basic layout analysis unit 14 (steps S902 to S906) and the operation of the title analysis unit 15 (steps S911 to S916) may be interchanged.
- the elements of the first layout are grouped together to create a new second layout.
- Configuration requirements Since elements are generated, layouts that capture more logical structures can be analyzed. Therefore, it is possible to analyze a layout that reflects the intention of the document provider.
- third parties can develop application applications that use layout information.
- a personal computer is provided as a data processing device and a data storage device.
- the personal computer has a central processing unit that functions as an input unit, a layout analysis unit, an output unit, and a basic layout analysis unit, a rendering result storage unit, a memory device that functions as an analysis result storage unit, and a magnetic disk storage device. And speak.
- HTML document shown in Fig. 3 as a structured 'semi-structured document' will be described as an analysis of the layout up to the second layer.
- the central processing unit obtains an HTML document from the outside, renders the document, and stores the rendering result shown in FIG. 3 in the memory device.
- the central processing unit obtains the rendering result and first refers to the arrangement of the body element.
- the body element has no elements of the HTML document juxtaposed in the horizontal direction and cannot be grouped. Therefore, the central processing unit refers to the arrangement of the elements of the child HTML document.
- the hi element and two table elements which are the elements of the child's HTML document, are!, And since there is no element of the HTML document juxtaposed in the horizontal direction, group processing cannot be performed. Refers to the arrangement of elements in the child's HTML document.
- the hi element since there is no element in the child HTML document as shown in FIG. 5, the hi element is stored in the memory device as a component of the first layout.
- the tr element which is the element of the child's HTML document, cannot be grouped because there is no element of the HTML document juxtaposed in the horizontal direction.
- the arrangement of elements in the child's HTML document As shown in Fig. 7, the td element which is the element of the child HTML document has no elements of the HTML document juxtaposed in the horizontal direction and cannot be grouped! / Td required
- the element is stored in the memory device as a component of the second layout.
- the two tr elements that are the elements of the child's HTML document cannot be grouped because there are no elements of the HTML document juxtaposed in the horizontal direction.
- the arithmetic unit further refers to the arrangement of the elements of the child's HTML document.
- the two td elements that are the elements of the child's HTML document are juxtaposed in the horizontal direction. Is stored in the memory device as a component of the third layout. Even if the second tr element is V ⁇ , the two td elements, which are the elements of the child's HTML document, are juxtaposed in the horizontal direction as shown in Fig. 7. Is stored in the memory device as a component of the fourth layout.
- the layout of the first layer shown in FIG. 8 is analyzed.
- the elements of the HTML document arranged in the vertical direction are grouped by referring to the arrangement of the HTML document elements belonging to the layout elements of the first layer.
- the central processing unit further refers to the arrangement of the elements of the child HTML document. Since the first td element has no child HTML document element as shown in FIG. 9, the td element is stored in the memory device as a component of the layout. As for the second td element, there is no child HTML document element as shown in FIG. 9, so the td element is stored in the memory device as a component of the layout.
- the layout of the second layer is analyzed.
- the central processing unit obtains information on the analyzed layout components and their hierarchical relationship from the memory device, and expresses the layout components using reference to the elements of the HTML document, for example, a diagram. Format into the format shown in 10 and output to the outside.
- a personal computer is provided as a data processing device and a data storage device.
- the personal computer includes an input unit, a layout analysis unit, an output unit, and a title analysis unit.
- a central processing unit that functions as a memory, a rendering result storage unit, a memory device that functions as an analysis result storage unit, and a magnetic disk storage device.
- the magnetic disk storage device stores a title analysis rule as shown in FIG.
- HTML document shown in FIG. 3 will be described as a structured 'semi-structured document'.
- the central processing unit acquires an HTML document from the outside, renders the document, and stores the rendering result shown in FIG. 3 in a memory device.
- the central processing unit acquires the rendering result and the title analysis rule, and uses the body element as an element of the processing target HTML document. If the name, attribute, style, and content of the body element are matched with the title analysis rule, the rule does not match. Therefore, the child element of the body element, that is, the hi element and the two table elements are newly processed HTML documents And the hi element is the element of the next HTML document to be processed.
- the hi element is stored in the memory device as the title, and the first table element is the next processing target HTML document. Element. Repeat the above process until there are no more HTML document elements to process.
- the HTML document element to be processed is the first td element of the first tr element of the second table element
- the name, attribute, style, and content of the td element are checked against the title analysis rule. To do. If the td element has a background color specified, the height is 50px, and the content is 5 characters, the td element is stored in the memory device as a title because it matches the matching rule. With the above process, the title shown in Fig. 14 is analyzed.
- the central processing unit obtains the analyzed title information from the memory device, and formats the title into a format that is expressed using the reference of the elements of the HT ML document, for example, the format shown in FIG. Output to the outside.
- a third embodiment of the present invention will be described with reference to the drawings.
- the powerful example corresponds to the third embodiment of the present invention.
- a personal computer is provided as a data processing device and a data storage device.
- the personal computer includes a central processing unit that functions as an input unit, a layout analysis unit, an output unit, a basic layout analysis unit, a title analysis unit, and a layout analysis unit F, and rendering. It has a memory device that functions as a result storage unit, an analysis result storage unit, and a magnetic disk storage device. Also, a title analysis rule is stored in the magnetic disk storage device.
- HTML document shown in FIG. 3 will be described as a structured 'semi-structured document'.
- the central processing unit acquires an HTML document from the outside, renders the document, and stores the rendering result shown in FIG. 3 in a memory device.
- the central processing unit obtains the rendering result and analyzes the layout components and their hierarchical relationships. This function and operation are the same as those shown in the first embodiment.
- the central processing unit obtains the rendering result and the title analysis rule, and analyzes the title. This function and operation are the same as those shown in the second embodiment.
- the central processing unit obtains the analyzed layout components, their hierarchical relationships, and titles, and first analyzes the layout components of the first hierarchy.
- the layout component given by the first td element under the first tr element under the first table element under the bod y element has no title, so the layout component with the title Group with other layout components that do not have titles.
- the layout element given by the layout element with the title ie, the hi element under the body element
- the layout element with the title is the closest to the top of the source in view of the component element power.
- Grouped with other components and stored in the memory device as new layout components.
- the above processing is performed for the layout elements that do not have all titles, and the new layout elements in the first layer shown in FIG. 20 are analyzed.
- the new layout elements of the second hierarchy can be analyzed.
- the central processing unit obtains information on the analyzed components outside the layer and their hierarchical relationships, and information on the correspondence between each component and title, and acquires the new layout component and title. It is formatted into a format that is expressed using the reference of the element of the HTML document, for example, the format shown in Fig. 21, and output to the outside.
- This embodiment includes a personal computer as a data storage device.
- the personal computer functions as a central processing unit that functions as an input unit, a document input unit, a table of contents document output unit, and an item document output unit, and a structured / semi-structured document storage unit and a document storage unit outside the layer. It has a memory device and a magnetic disk device.
- the HTML document shown in FIG. 24A will be described as a structured “semi-structured document”, and the XML document shown in FIG. 24B will be described as a document in which the layout information of the HTML document is described.
- the central processing unit obtains the URL via the network, and the corresponding figure.
- the HTML document shown in 24A is acquired and stored in the memory device.
- the central processing unit also analyzes the HTML document, obtains the URL of the document describing the layout information of the document, http: // www. Nec. Co.jp/news. Yes Acquires the XML document shown in FIG. 24B and stores it in the memory device.
- the central processing unit obtains an HTML document and an XML document from the memory device, and all layout components from the XML document, urn: layout: 1, urn: layout: 2, urn: layout: 2: 1 , Urn: layout: 2: 2 is extracted, and the elements of the HTML document corresponding to the title for each layout component, / body [l] / hl [l], / body [1] / table [2] / tr [1] Ztd [1] / table [1] Ztr [1], / body [1] / table [2] / tr [l] / td [2] / table [1] Ztr [l] Extract.
- the HTML document element corresponding to the title is compared with the HTML document, the contents, “major-use”, “politics”, and “economy” are extracted, and the HTML document element such as the A element is added.
- the table of contents document with the rendering image shown in Fig. 25 is generated by arranging in order.
- the generated table of contents document is presented to the user via a network or a mobile phone web browser.
- the central processing unit acquires this information via the network, and the layout component having “politics” as the title from the XML document, urn: layout: 2: 1 is extracted, and the element of the HTML document belonging to the component, Zbody [l] Ztabl e [2] Ztr [l] Ztd [l], is further extracted. Also configuration The HTML document elements corresponding to the elements are compared with the HTML document, the contents are extracted, and the HTML document elements for navigation including links to the table of contents are added and arranged in order. Generate a document on the contents of “politics” with the rendered image shown in 26. The generated document is presented to the user via a network or a mobile phone web browser. If the user chooses a table of contents from the presented document
- the central processing unit again generates a table of contents document and presents it to the user. In addition, if “Previous” or “Next” is selected from the same document, the central processing unit generates a document related to the contents of “Main News” and “Economy” that are the next and previous items of “Politics” Present to the user.
- the personal computer has a central processing unit that functions as a document input unit and a composite document output unit, a structured 'semi-structured document storage unit, a memory device that functions as a layout document storage unit, and a magnetic disk device. ing. Further, the magnetic disk device stores information on output components as shown in FIG.
- the HTML document shown in FIGS. 24A and 30A is structured as a “semi-structured” document, and the document in which the layout information of each HTML document is described is shown in FIGS. 24B and 30B.
- the explanation is for the XML document shown.
- the central processing unit is described in the information about the output component, http: // www. N ec. Co. Jpz news, html and http:, / www. Nec. Co. JpZ stock, html 2
- One URL is acquired, and the corresponding HTML document shown in FIGS. 24A and 30A is acquired and stored in the memory device.
- the central processing unit analyzes each HTML document and URL of the document where the layout information of the document is described, http: //www.nec.co.jp/news, rdf and http: // www. nec. co.jp/stock, rdf is acquired, and the corresponding XML documents shown in FIGS. 24B and 30B are acquired and stored in the memory device.
- the central processing unit obtains the ID of the component to be output of the document corresponding to the two URLs, urn: layout: 2: 1, urn: layout: 1, from the information about the output component.
- the central processing unit obtains HTML and XML documents from the memory device and outputs them from the XML document.
- JpZ stock, html l] Ztable [l] is extracted, and the HTML document element corresponding to the component to be output here is checked against the HTML document, formatted, and arranged in order to generate a composite document with the rendered image shown in Figure 31.
- the generated composite document is presented to the user via, for example, a network or a Web browser of a mobile phone.
- a sixth embodiment of the present invention will be described with reference to the drawings.
- the powerful example corresponds to the ninth embodiment of the present invention.
- a personal computer is provided as a data processing device and a data storage device.
- the personal computer includes a central processing unit that functions as an input unit, a layout analysis unit, an output unit, a basic layout analysis unit, a title analysis unit, a layout analysis unit F, a block selection unit, and a section calculation unit, and a rendering result storage. And a memory device that functions as an analysis result storage unit, and a magnetic disk storage device. In addition, the title analysis rules are stored in the magnetic disk storage device.
- the central processing unit acquires an HTML document from the outside, renders the document, and stores the rendering result in a memory device.
- the central processing unit obtains the rendering result and analyzes the components of the first layout and their hierarchical relationships. This function and operation are the same as those shown in the first embodiment.
- the central processing unit obtains the rendering result and the title analysis rule, and analyzes the title. This function and operation are the same as those shown in the second embodiment.
- the central processing unit acquires the analyzed first layout and second layout, selects one component of the second layout that can be divided, and configures the first layout that constitutes this Elements are analyzed.
- Figure 38 shows the state during the analysis.
- the second layout component consisting of Zbody [l] Zdiv [2] to div [6] is the first layout component that composes Zbody [l] Zdiv [2], / body [l] / div [3], ⁇ , Zb ody [l] Zdiv [6] each has two or more components with titles Therefore, it is a component of the second layout that can be divided.
- the central processing unit obtains the analyzed title and selects the title included in the component of the first layout to be analyzed as shown in FIG. Furthermore, the position of the selected title is checked, and as shown in Fig. 40, the left end of the title consisting of Zbody [1] / div [2] located at the leftmost position within the specified distance in the horizontal direction starts from the left end of the title.
- the title consisting of Zbody [1] / div [2] and the title consisting of Zbody [l] Zdiv [4] are selected as the main titles.
- the components of the first layout containing the main titles are the main components.
- a method of selecting main titles a method of selecting titles with similar styles such as background color as main titles as shown in Fig. 42, or a certain number based on position as shown in Fig. 43.
- a method of selecting a certain percentage of titles as main titles or a method of selecting main titles by combining similarities of position and style may be used.
- the boundary line is determined by the width and height of the first layout component to be analyzed, the style of the boundary of the document description element that constitutes the component, and the document description element that constitutes the component. The width or height of the image that is the content may be used.
- the central processing unit groups non-major components based on major components, or other major components, and the distance on the source. Generate the layout components of. For example, the non-major component consisting of Zbody [l] Zdiv [3] is grouped with the closest major component, Zbody [l] Zdiv [2], which is directed to the top on the source.
- the two new second layout components analyzed here are stored in the memory device as children of the original second layout components.
- the analyzed second layout component may be stored in the memory device in place of the original second layout component, or the original second layout component or analyzed. Depending on the style and content of the components in the second layout, you may choose to remember the replacement or remember as a child.
- the components of the second layout and their hierarchical relationships can be completely analyzed.
- the central processing unit obtains information on the analyzed components outside the second layer and their hierarchical relationships, and information on the correspondence between each component and the title from the memory device, and the components of the second layout And the title are formatted using a reference to the element of the HTML document, for example, the format shown in the third embodiment, and output to the outside.
- IDs may be given to the layout components and output.
- the first document analysis system of the present invention obtains a rendering result of a structured 'semi-structured' document, refers to the arrangement of the document description elements, and groups the arranged document description elements.
- a basic layout analysis unit (14 in Fig. 1) that analyzes the layout, and an output unit (13 in Fig. 1) that outputs the analyzed layout in a format that expresses the components of the layout using references to document description elements.
- the second document analysis system of the present invention acquires a title analysis rule storage unit (23 in Fig. 11) that stores the title analysis rule, a rendering result of the structured 'semi-structured document', and a title analysis rule.
- the title analysis unit (15 in Fig. 11) analyzes the title by matching the attribute, style, and content with the title analysis rule, and the analyzed title is referred to the document description element. It has an output section (13 in Fig. 11) that outputs in a format that is used.
- Employing this structure comprehensively analyze the title using the attribute, style, and content in addition to the name of the document description element, and output the analyzed title in a format that can be used by a third party.
- the third document analysis system of the present invention acquires a title analysis rule storage unit (23 in FIG. 16) that stores title analysis rules, a rendering result of the structured “semi-structured document”, and a document description
- the basic layout analysis unit (14 in Fig. 16) analyzes the layout by grouping the document description elements that are juxtaposed with reference to the arrangement of the elements, and the rendering result and title analysis rules of the structured 'semi-structured document' In addition to the name of the document description element, the title analysis unit (15 in Fig.
- the layout analysis unit (16 in Fig. 16) generates a new layout by grouping it with the component of the layout, and represents the analyzed new layout using the reference of the document description element as the layout component and title.
- the first document adaptation system of the present invention obtains URIs (Uniform Resource Identifiers) of the user's desired structured 'semi-structured document', obtains user input, Acquires the document corresponding to the URI of the structured 'semi-structured document' desired by the user and the input unit that controls the output of the document describing the contents of the item (51 in Fig. 22), and corresponds to the acquired document
- a document input unit (52 in FIG. 22) that acquires a document that describes layout information to be acquired, and a document that describes the structured semi-structured document desired by the user and the layout information of the document are acquired.
- a table of contents document output section (53 in Fig.
- the second document adaptation system of the present invention acquires an output component storage unit (81 in FIG. 27) that stores information related to the output component, and acquires information related to the output component, and corresponds to the URI described in the information.
- Structured ⁇ Acquires a semi-structured document, and a document input part (71 in Fig. 27) for acquiring a document describing layout information corresponding to the acquired document, and information and structure on output components It has a synthetic document output unit (72 in Fig. 27) that acquires a semi-structured document and a document that describes the layout information of the document, generates a synthetic document, and outputs it.
- the information about the output component and the structured 'semi-structured document' and the document with the layout information that appropriately reflects the logical structure of the document are used to specify the specified layout.
- the information about the output component and the structured 'semi-structured document' and the document with the layout information that appropriately reflects the logical structure of the document are used to specify the specified layout.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Document Processing Apparatus (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
Claims
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2005800366943A CN101048773B (zh) | 2004-10-25 | 2005-10-25 | 文件分析系统以及文件分析方法 |
US11/577,984 US8051371B2 (en) | 2004-10-25 | 2005-10-25 | Document analysis system and document adaptation system |
JP2006543141A JP4124261B2 (ja) | 2004-10-25 | 2005-10-25 | 文書解析システム、文書解析方法、及びそのプログラム |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2004-310238 | 2004-10-25 | ||
JP2004310238 | 2004-10-25 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2006046523A1 true WO2006046523A1 (ja) | 2006-05-04 |
Family
ID=36227763
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2005/019531 WO2006046523A1 (ja) | 2004-10-25 | 2005-10-25 | 文書解析システム、及び文書適応システム |
Country Status (4)
Country | Link |
---|---|
US (1) | US8051371B2 (ja) |
JP (1) | JP4124261B2 (ja) |
CN (1) | CN101048773B (ja) |
WO (1) | WO2006046523A1 (ja) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2011523133A (ja) * | 2008-06-05 | 2011-08-04 | 北大方正集▲団▼有限公司 | レイアウトファイルの構造処理方法及び装置 |
Families Citing this family (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7712021B2 (en) * | 2005-03-25 | 2010-05-04 | Red Hat, Inc. | System, method and medium for component based web user interface frameworks |
US7464078B2 (en) * | 2005-10-25 | 2008-12-09 | International Business Machines Corporation | Method for automatically extracting by-line information |
TWI386817B (zh) * | 2006-05-24 | 2013-02-21 | Kofax Inc | 提供電腦軟體應用程式之使用者介面的系統及其方法 |
JP4768537B2 (ja) * | 2006-07-18 | 2011-09-07 | 株式会社リコー | コンテンツ閲覧システムおよびプログラム |
WO2008121986A1 (en) * | 2007-03-30 | 2008-10-09 | Google Inc. | Document processing for mobile devices |
CN101354705B (zh) * | 2007-07-23 | 2012-06-13 | 夏普株式会社 | 文档图像处理装置和文档图像处理方法 |
US8289333B2 (en) | 2008-03-04 | 2012-10-16 | Apple Inc. | Multi-context graphics processing |
US8477143B2 (en) | 2008-03-04 | 2013-07-02 | Apple Inc. | Buffers for display acceleration |
US9418171B2 (en) * | 2008-03-04 | 2016-08-16 | Apple Inc. | Acceleration of rendering of web-based content |
AU2009311452A1 (en) * | 2008-10-28 | 2012-07-19 | Vistaprint Schweiz Gmbh | Method and system for calculating weight of variable shape product manufactured from product blank |
CN102918523A (zh) * | 2010-05-26 | 2013-02-06 | 诺基亚公司 | 在应用中指定用户接口元素呈现的映射参数的方法和装置 |
US20120137233A1 (en) * | 2010-05-26 | 2012-05-31 | Nokia Corporation | Method and Apparatus for Enabling Generation of Multiple Independent User Interface Elements from a Web Page |
US9576068B2 (en) * | 2010-10-26 | 2017-02-21 | Good Technology Holdings Limited | Displaying selected portions of data sets on display devices |
US9317491B2 (en) * | 2010-11-22 | 2016-04-19 | Webydo Systems Ltd. | Methods and systems of generating and editing adaptable and interactive network documents |
US10803233B2 (en) * | 2012-05-31 | 2020-10-13 | Conduent Business Services Llc | Method and system of extracting structured data from a document |
JP2014128836A (ja) * | 2012-12-27 | 2014-07-10 | Brother Ind Ltd | 切断装置、保持部材、及び切断部材 |
CN103164520B (zh) * | 2013-03-08 | 2014-04-16 | 山东大学 | 一种面向层次化数据的交互可视方法及装置 |
US10089388B2 (en) | 2015-03-30 | 2018-10-02 | Airwatch Llc | Obtaining search results |
US10229209B2 (en) | 2015-03-30 | 2019-03-12 | Airwatch Llc | Providing search results based on enterprise data |
US10318582B2 (en) * | 2015-03-30 | 2019-06-11 | Vmware Inc. | Indexing electronic documents |
RU2638015C2 (ru) * | 2015-06-30 | 2017-12-08 | Общество С Ограниченной Ответственностью "Яндекс" | Способ идентификации целевого объекта на веб-странице |
CN108009137B (zh) * | 2017-12-22 | 2021-01-29 | 鼎富智能科技有限公司 | 一种基于配置文件的规范文书处理方法、装置及系统 |
US12056331B1 (en) | 2019-11-08 | 2024-08-06 | Instabase, Inc. | Systems and methods for providing a user interface that facilitates provenance tracking for information extracted from electronic source documents |
CN111178771B (zh) * | 2019-12-31 | 2022-03-29 | 中国石油天然气股份有限公司 | 体系构建方法和装置 |
US11315353B1 (en) * | 2021-06-10 | 2022-04-26 | Instabase, Inc. | Systems and methods for spatial-aware information extraction from electronic source documents |
US12067039B1 (en) | 2023-06-01 | 2024-08-20 | Instabase, Inc. | Systems and methods for providing user interfaces for configuration of a flow for extracting information from documents via a large language model |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH07282053A (ja) * | 1994-04-15 | 1995-10-27 | Matsushita Electric Ind Co Ltd | 文書編集装置 |
JP2000148788A (ja) * | 1998-11-05 | 2000-05-30 | Ricoh Co Ltd | 文書画像からのタイトル領域抽出装置およびタイトル領域抽出方法,並びに文書検索方法 |
JP2000172680A (ja) * | 1998-12-08 | 2000-06-23 | Ricoh Co Ltd | 文書登録システム、文書登録方法、その方法を実行させるための記録媒体、文書閲覧システム、文書閲覧方法、その方法を実行させるための記録媒体および文書取出しシステム |
JP2000357170A (ja) * | 1999-06-15 | 2000-12-26 | Fujitsu Ltd | 文書の参照理由を用いて情報検索を行う装置 |
JP2003085160A (ja) * | 2001-09-12 | 2003-03-20 | Seiko Epson Corp | ソースファイル生成装置 |
JP2004086855A (ja) * | 2002-06-28 | 2004-03-18 | Fuji Xerox Co Ltd | 文書処理装置及び文書処理方法、文書処理プログラム |
Family Cites Families (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH04124261A (ja) | 1990-09-17 | 1992-04-24 | Canon Inc | 化合物薄膜製造装置 |
JPH09251457A (ja) | 1996-03-18 | 1997-09-22 | Dainippon Screen Mfg Co Ltd | 文書変換装置 |
JPH10289252A (ja) | 1997-02-17 | 1998-10-27 | Dainippon Screen Mfg Co Ltd | 画像表示装置およびその処理を実行するためのプログラムを記録した記録媒体 |
JPH11203285A (ja) | 1998-01-14 | 1999-07-30 | Sanyo Electric Co Ltd | 文書構造解析装置、方法、及び記録媒体 |
JP3896702B2 (ja) | 1998-09-18 | 2007-03-22 | 富士ゼロックス株式会社 | 文書管理システム |
JP2001184344A (ja) | 1999-12-21 | 2001-07-06 | Internatl Business Mach Corp <Ibm> | 情報処理システム、プロキシサーバ、ウェブページ表示制御方法、記憶媒体、及びプログラム伝送装置 |
JP2003085159A (ja) | 2001-09-14 | 2003-03-20 | Fuji Xerox Co Ltd | 文書処理装置および画像出力装置ならびにそれらの方法 |
JP2003288334A (ja) | 2002-03-28 | 2003-10-10 | Toshiba Corp | 文書処理装置及び文書処理方法 |
JP3969176B2 (ja) | 2002-05-10 | 2007-09-05 | 日本電気株式会社 | ブラウザシステム及びその制御方法 |
US20030222921A1 (en) * | 2002-06-04 | 2003-12-04 | Bernard Rummel | Automatic layout generation using algorithms |
JP3941610B2 (ja) | 2002-07-08 | 2007-07-04 | 日本電気株式会社 | 情報抽出方法、情報抽出装置および情報抽出プログラム |
JP2004139275A (ja) | 2002-10-16 | 2004-05-13 | Nippon Telegr & Teleph Corp <Ntt> | Www文書表示方法及び閲覧者端末 |
US20040100509A1 (en) * | 2002-11-27 | 2004-05-27 | Microsoft Corporation | Web page partitioning, reformatting and navigation |
US7203901B2 (en) * | 2002-11-27 | 2007-04-10 | Microsoft Corporation | Small form factor web browsing |
JP4014160B2 (ja) * | 2003-05-30 | 2007-11-28 | インターナショナル・ビジネス・マシーンズ・コーポレーション | 情報処理装置、プログラム、及び記録媒体 |
GB0320278D0 (en) * | 2003-08-29 | 2003-10-01 | Hewlett Packard Development Co | Constrained document layout |
US7392473B2 (en) * | 2005-05-26 | 2008-06-24 | Xerox Corporation | Method and apparatus for determining logical document structure |
-
2005
- 2005-10-25 WO PCT/JP2005/019531 patent/WO2006046523A1/ja active Application Filing
- 2005-10-25 JP JP2006543141A patent/JP4124261B2/ja not_active Expired - Fee Related
- 2005-10-25 CN CN2005800366943A patent/CN101048773B/zh not_active Expired - Fee Related
- 2005-10-25 US US11/577,984 patent/US8051371B2/en not_active Expired - Fee Related
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH07282053A (ja) * | 1994-04-15 | 1995-10-27 | Matsushita Electric Ind Co Ltd | 文書編集装置 |
JP2000148788A (ja) * | 1998-11-05 | 2000-05-30 | Ricoh Co Ltd | 文書画像からのタイトル領域抽出装置およびタイトル領域抽出方法,並びに文書検索方法 |
JP2000172680A (ja) * | 1998-12-08 | 2000-06-23 | Ricoh Co Ltd | 文書登録システム、文書登録方法、その方法を実行させるための記録媒体、文書閲覧システム、文書閲覧方法、その方法を実行させるための記録媒体および文書取出しシステム |
JP2000357170A (ja) * | 1999-06-15 | 2000-12-26 | Fujitsu Ltd | 文書の参照理由を用いて情報検索を行う装置 |
JP2003085160A (ja) * | 2001-09-12 | 2003-03-20 | Seiko Epson Corp | ソースファイル生成装置 |
JP2004086855A (ja) * | 2002-06-28 | 2004-03-18 | Fuji Xerox Co Ltd | 文書処理装置及び文書処理方法、文書処理プログラム |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2011523133A (ja) * | 2008-06-05 | 2011-08-04 | 北大方正集▲団▼有限公司 | レイアウトファイルの構造処理方法及び装置 |
Also Published As
Publication number | Publication date |
---|---|
JP4124261B2 (ja) | 2008-07-23 |
CN101048773A (zh) | 2007-10-03 |
US20080148144A1 (en) | 2008-06-19 |
CN101048773B (zh) | 2012-01-11 |
JPWO2006046523A1 (ja) | 2008-05-22 |
US8051371B2 (en) | 2011-11-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2006046523A1 (ja) | 文書解析システム、及び文書適応システム | |
CA2372544C (en) | Information access method, information access system and program therefor | |
JP2009524883A (ja) | デジタルコンテンツのネットワークへの提示 | |
US20130262968A1 (en) | Apparatus and method for efficiently reviewing patent documents | |
JP4009971B2 (ja) | 電子化サービスマニュアル表示用プログラム、プログラムが記録された記録媒体、電子化サービスマニュアル表示制御方法、並びに電子化サービスマニュアル表示制御装置 | |
US20070150494A1 (en) | Method for transformation of an extensible markup language vocabulary to a generic document structure format | |
US9286272B2 (en) | Method for transformation of an extensible markup language vocabulary to a generic document structure format | |
US7934157B2 (en) | Utilization of tree view for printing data | |
JP4666996B2 (ja) | 電子ファイリングシステム、電子ファイリング方法 | |
KR100522186B1 (ko) | 동적으로 홈페이지를 제작하는 방법 및 이 방법을 웹에서구현하는 장치 | |
Edhlund et al. | NVivo for Mac essentials | |
JP2002215519A (ja) | ウェブページ生成方法およびシステム、ウェブページ生成プログラム、記録媒体 | |
EP1237094A1 (en) | A method for determining rubies | |
JP2007011973A (ja) | 情報検索装置及び情報検索プログラム | |
JP7438769B2 (ja) | 文章構造描画装置 | |
JPH09282218A (ja) | Html文書本型整形方法及びその装置 | |
JPH08106464A (ja) | 文書生成装置 | |
JP2004164134A (ja) | 電子文書作成装置、電子文書作成方法およびその方法をコンピュータに実行させるプログラム | |
JP2006155593A (ja) | 文書解析システム、及び文書適応システム | |
JP4221620B2 (ja) | 文書解析システム、文書解析方法、及びプログラム | |
US20030191770A1 (en) | System and method for formatting, proofreading and annotating information to be printed in directories | |
JP2009098829A (ja) | 漫画のコマ検索装置 | |
JP2005276159A (ja) | 回路図面表示データ生成装置およびそのプログラム並びに回路図面表示データ生成方法 | |
JP2021039579A (ja) | ドキュメント作成支援システム | |
JP4119413B2 (ja) | 知識情報収集システム、知識検索システム及び知識情報収集方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BW BY BZ CA CH CN CO CR CU CZ DK DM DZ EC EE EG ES FI GB GD GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV LY MD MG MK MN MW MX MZ NA NG NO NZ OM PG PH PL PT RO RU SC SD SG SK SL SM SY TJ TM TN TR TT TZ UG US UZ VC VN YU ZA ZM |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): BW GH GM KE LS MW MZ NA SD SZ TZ UG ZM ZW AM AZ BY KG MD RU TJ TM AT BE BG CH CY DE DK EE ES FI FR GB GR HU IE IS IT LU LV MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
DPE1 | Request for preliminary examination filed after expiration of 19th month from priority date (pct application filed from 20040101) | ||
WWE | Wipo information: entry into national phase |
Ref document number: 2006543141 Country of ref document: JP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 11577984 Country of ref document: US Ref document number: 200580036694.3 Country of ref document: CN |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 05805247 Country of ref document: EP Kind code of ref document: A1 |