CN103136304A - Article processing method and device - Google Patents
Article processing method and device Download PDFInfo
- Publication number
- CN103136304A CN103136304A CN2011104013863A CN201110401386A CN103136304A CN 103136304 A CN103136304 A CN 103136304A CN 2011104013863 A CN2011104013863 A CN 2011104013863A CN 201110401386 A CN201110401386 A CN 201110401386A CN 103136304 A CN103136304 A CN 103136304A
- Authority
- CN
- China
- Prior art keywords
- index
- xml document
- territory
- module
- xpath
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Document Processing Apparatus (AREA)
Abstract
The invention discloses an article processing method which comprises the following steps: establishing xtensible markup language (XML) documents to record the content of articles, wherein the XPATH of elements of the XML documents corresponds to chapter hierarchical relation of the content of the article; storing every XML document into an XML document domain to an article data sheet; and establishing an index of the XML document domain according to the XPATH of the elements of the XML documents. The invention provides an article processing device which comprises a structurized module, a database module and an index module. The structurized module is used for establishing the XML documents to record the content of the articles, wherein the XPATH of the elements inside the XML documents corresponds to the chapter hierarchical relation of the content of the article. The database module is used for storing each XML document into the XML document domain according to the XPATH of the elements of the XML documents. The index module is used for establishing the index of the XML document domain according to the XPATH of the elements of the XML documents. The article processing method and the device improves the efficiency of article retrieval.
Description
Technical field
The present invention relates to the publication of mutual communication network field, in particular to a kind of disposal route and device of entry.
Background technology
The data of entry class have the chapters and sections hierarchical structure, for integrality and the hierarchical relationship of safeguarding entry contents, the mode that can adopt XML whole entry contents as property store in a territory of database, consist of the XML document territory, and other attributes of entry record of complete together.
When entry is retrieved, according to the mode in territory, the set of properties of entry is made into search condition, and then entry is retrieved.In search condition comprises entry contents during the restriction of element, at first need to obtain the record that meets other conditions, obtain the complete XML fragment of entry contents, then by the mode of XPATH, element is retrieved, and then obtain qualified record by the mode of filtering.
It is frequent that inventor's discovery, this retrieval mode cause XML document to load, and consumes resources is more.
Summary of the invention
The present invention aims to provide a kind of disposal route and device of entry, to improve the entry effectiveness of retrieval.
In an embodiment of the present invention, provide a kind of disposal route of entry, having comprised: create XML document with record strip purpose content, wherein, the chapters and sections hierarchical relationship in the content of the corresponding entry of the XPATH of the element in XML document; Each XML document is stored in the XML document territory of entry data table; According to the XPATH of the element in XML document, to the XML document territory establishment index of database.
In an embodiment of the present invention, provide a kind for the treatment of apparatus of entry, having comprised: structurized module, be used for creating XML document with record strip purpose content, wherein, the chapters and sections hierarchical relationship in the content of the corresponding entry of the XPATH of the element in XML document; Database module is for each XML document being stored into the XML document territory of entry data table; Index module is used for the XPATH according to the element of XML document, to the XML document territory establishment index of database.
The disposal route of the entry of the above embodiment of the present invention and device so overcome the lower problem of entry recall precision of prior art, have improved the entry effectiveness of retrieval because the XML document territory has been created index.
Description of drawings
Accompanying drawing described herein is used to provide a further understanding of the present invention, consists of the application's a part, and illustrative examples of the present invention and explanation thereof are used for explaining the present invention, do not consist of improper restriction of the present invention.In the accompanying drawings:
Fig. 1 shows the disposal route according to the entry of the embodiment of the present invention;
Fig. 2 shows index relative schematic diagram according to the preferred embodiment of the invention;
Fig. 3 shows the process flow diagram of execution index retrieval according to the preferred embodiment of the invention;
Fig. 4 shows the screenshot capture at index management interface according to the preferred embodiment of the invention;
Fig. 5 shows the schematic diagram according to the treating apparatus of the entry of the embodiment of the present invention.
Embodiment
Below with reference to the accompanying drawings and in conjunction with the embodiments, describe the present invention in detail.
Fig. 1 shows the disposal route according to the entry of the embodiment of the present invention, comprising:
Step S10 creates XML document with record strip purpose content, wherein, and the chapters and sections hierarchical relationship in the content of the corresponding entry of the XPATH of the element in XML document;
Step S20 stores each XML document in the XML document territory of entry data table;
Step S30 is according to the XPATH of the element in XML document, to the XML document territory establishment index of database.
In the prior art, when utilizing XML technology retrieving head, obtain the complete XML fragment of entry contents, then retrieve by the mode of XPATH; And the disposal route of the entry of the present embodiment has created index to the XML document territory, so can utilize index to come retrieving head, need not to reload whole XML document, and this has reduced resource cost, has improved significantly recall precision, has shortened retrieval time.
In addition, prior art is undertaken by the mode of traversal addressing the retrieval of element, and retrieval rate is slow, and this method can utilize index to come retrieving head, need not again element to be traveled through addressing, and this has also shortened retrieval time.
Preferably, step S30 comprises: create corresponding index for the element in the XML document territory, and wherein, the title of index=XML document domain name claims+XPATH of domain name separating character+this element.This embodiment is simple.
Fig. 2 shows index relative schematic diagram according to the preferred embodiment of the invention.As can be seen from the figure, contacting of index territory and XML document is well-determined, therefore can convert retrieval to the index territory to the retrieval of element (its content is entry) of equal valuely, simultaneously, to the management of the paired index data table of the management transitions of element index data, make the retrieval of the element quickness and high efficiency that becomes.
For example, following tables of data is arranged:
In this tables of data, the XML that stores in the DOC_XMLDATA of territory has following structure:
According to this preferred embodiment, the title of the index of generation is as follows:
<node text=" DOC_XMLDATA_/paper/industry background "/〉
<node text=" DOC_XMLDATA_/paper/product orientation "/〉
<node text=" DOC_XMLDATA_/paper/key characteristic/functional characteristic "/〉
<node text=" DOC_XMLDATA_/paper/key characteristic/Performance Characteristics "/〉
<node text=" DOC_XMLDATA_/paper/key characteristic/technical characteristic "/〉
<node text=" DOC_XMLDATA_/paper/market outlook "/〉
<node text=" DOC_XMLDATA_/paper/risk assessment "/〉
Preferably, step S30 also comprises: each index venue is stored as the index data table, wherein, with the name storage of index in the index territory of index data table.
Preferably, also create title-domain in the index data table, be used for the simple name in record index territory, to present to the user.
As follows according to the index data table that above preferred embodiment creates:
CLOB refers to elongated the text field.
Preferably, this method also comprises:
The simple name of title-domain record is dedicates the user to;
Receive the user to the retrieval word string of selection and the input of simple name;
The corresponding index of selected simple name territory is retrieved as key word with the retrieval word string;
The content in XML document that the index that retrieves is pointed territory is submitted to the user.
The search condition that this preferred embodiment is inputted based on the user, the retrieval grammer of organizing search engine, and project and input key word that the user only need select to want to retrieve get final product.Need to inquire about as the user document that industry background or product orientation belong to the digital publishing aspect, the retrieval grammer of tissue is as follows:
((DOC_XMLDATA_/paper/industry background LIKE ' digital publishing ') OR (DOC_XMLDATA_/paper/product orientation LIKE ' digital publishing '))
The syntax conversion device converts retrieve statement to the grammer of element retrieval, and sends to retrieval service, and element retrieval grammer is as follows:
Retrieval service receives search condition, calls the syntax conversion service, converts retrieve statement to and carries out retrieval, obtains retrieval set.Search engine turns back to retrieval set on human-computer interaction interface.
Fig. 3 shows the process flow diagram of execution index retrieval according to the preferred embodiment of the invention, comprising:
The first step, search engine receive the retrieval request that the leading portion page transmits,
Second step, search engine call the syntax conversion device, the search condition of the page are converted to the grammer of element retrieval,
In the 3rd step, search engine is initiated retrieval request, and retrieve statement is passed to retrieval service,
In the 4th step, retrieval service is resolved the retrieval grammer, carries out retrieval, obtains retrieval set
In the 5th step, retrieval service is returned to the indexed results collection that obtains to search engine,
In the 6th step, search engine analysis result collection obtains result document according to the index rule and returns to the leading portion processing.
Fig. 4 shows the screenshot capture at index management interface according to the preferred embodiment of the invention.
This preferred embodiment provides more friendly interactive interface, utilizes title-domain to help the user to select suitable index territory, has realized utilizing index that entry is retrieved, and is for the user, more easy-to-use.
Fig. 5 shows the schematic diagram according to the treating apparatus of the entry of the embodiment of the present invention, comprising:
This device has reduced resource cost, has improved significantly recall precision, has shortened retrieval time.
Preferably, index module is used for creating corresponding index for the element in XML document territory, and wherein, the title of index=XML document domain name claims+XPATH of domain name separating character+this element.
Preferably, index module also is used for each index venue is stored as the index data table, wherein, with the name storage of index in the index territory of index data table.
Preferably, index module also is used for also creating title-domain at the index data table, is used for the simple name in record index territory, to present to the user.
Preferably, also comprise: interface module is used for the simple name that title-domain is put down in writing is and dedicates the user to; Receiver module is used for receiving the user to the retrieval word string of selection and the input of simple name; Retrieval module is used for the corresponding index of selected simple name territory being retrieved as key word with the retrieval word string; Submit module to, the content that is used for XML document that the index that retrieves is pointed territory is submitted to the user.
As can be seen from the above description, the present invention has realized following technique effect:
Direct retrieval elements: on the basis that does not change original XML storage organization, directly the element of XML is retrieved.
Reduce the repeated load of resource: directly reduce for element, reduce the repeated load to complete XML document, economize on resources, resource utilization is provided.
Improved recall precision: abandon original mode by traversal, addressing, adopt by the way retrieval of index with direct retrieval elements, improved recall precision.
obviously, those skilled in the art should be understood that, above-mentioned each module of the present invention or each step can realize with general calculation element, they can concentrate on single calculation element, perhaps be distributed on the network that a plurality of calculation elements form, alternatively, they can be realized with the executable program code of calculation element, thereby, they can be stored in memory storage and be carried out by calculation element, perhaps they are made into respectively each integrated circuit modules, perhaps a plurality of modules in them or step being made into the single integrated circuit module realizes.Like this, the present invention is not restricted to any specific hardware and software combination.
The above is only the preferred embodiments of the present invention, is not limited to the present invention, and for a person skilled in the art, the present invention can have various modifications and variations.Within the spirit and principles in the present invention all, any modification of doing, be equal to replacement, improvement etc., within all should being included in protection scope of the present invention.
Claims (10)
1. the disposal route of an entry, is characterized in that, comprising:
Create XML document with record strip purpose content, wherein, the chapters and sections hierarchical relationship in the content of the corresponding described entry of the XPATH of the element in described XML document;
Each described XML document is stored in the XML document territory of entry data table;
According to the XPATH of the element in described XML document, to the XML document territory establishment index of described database.
2. method according to claim 1, is characterized in that, according to the XPATH of the element in described XML document, the XML document territory of described database created index comprise:
Create corresponding index for the element in described XML document territory, wherein, the title of described index=described XML document domain name claims+XPATH of domain name separating character+this element.
3. method according to claim 2, is characterized in that, according to the XPATH of the element in described XML document, the XML document territory of described database created index also comprise:
Each described index venue is stored as the index data table, wherein, with the name storage of described index in the index territory of described index data table.
4. method according to claim 3, is characterized in that, also creates title-domain in described index data table, is used for putting down in writing the simple name in described index territory, to present to the user.
5. method according to claim 4, is characterized in that, also comprises:
The simple name of described title-domain record is dedicates the user to;
Receive the user to the retrieval word string of selection and the input of described simple name;
The corresponding index of selected simple name territory is retrieved as key word with described retrieval word string;
The content in XML document that the index that retrieves is pointed territory is submitted to the user.
6. the treating apparatus of an entry, is characterized in that, comprising:
Structurized module is used for creating XML document with record strip purpose content, wherein, and the chapters and sections hierarchical relationship in the content of the corresponding described entry of the XPATH of the element in described XML document;
Database module is for each described XML document being stored into the XML document territory of entry data table;
Index module is used for the XPATH according to the element of described XML document, to the XML document territory establishment index of described database.
7. device according to claim 6, is characterized in that, described index module is used for creating corresponding index for the element in described XML document territory, wherein, the title of described index=described XML document domain name claims+and the XPATH of domain name separating character+this element.
8. device according to claim 7, is characterized in that, described index module also is used for each described index venue is stored as the index data table, wherein, with the name storage of described index in the index territory of described index data table.
9. device according to claim 8, is characterized in that, described index module also is used for also creating title-domain at described index data table, is used for putting down in writing the simple name in described index territory, to present to the user.
10. device according to claim 9, is characterized in that, also comprises:
Interface module is used for the simple name that described title-domain is put down in writing is and dedicates the user to;
Receiver module is used for receiving the user to the retrieval word string of selection and the input of described simple name;
Retrieval module is used for the corresponding index of selected simple name territory being retrieved as key word with described retrieval word string;
Submit module to, the content that is used for XML document that the index that retrieves is pointed territory is submitted to the user.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201110401386.3A CN103136304B (en) | 2011-12-05 | 2011-12-05 | Article processing method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201110401386.3A CN103136304B (en) | 2011-12-05 | 2011-12-05 | Article processing method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103136304A true CN103136304A (en) | 2013-06-05 |
CN103136304B CN103136304B (en) | 2017-02-22 |
Family
ID=48496136
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201110401386.3A Expired - Fee Related CN103136304B (en) | 2011-12-05 | 2011-12-05 | Article processing method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103136304B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107193849A (en) * | 2016-03-15 | 2017-09-22 | 北大方正集团有限公司 | XML file full-text search index generation method and device |
CN109460394A (en) * | 2018-11-20 | 2019-03-12 | 北京广利核系统工程有限公司 | A kind of simplification method of multistage document entry tracing matrix |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050228768A1 (en) * | 2004-04-09 | 2005-10-13 | Ashish Thusoo | Mechanism for efficiently evaluating operator trees |
US20050228828A1 (en) * | 2004-04-09 | 2005-10-13 | Sivasankaran Chandrasekar | Efficient extraction of XML content stored in a LOB |
CN1965316A (en) * | 2004-04-09 | 2007-05-16 | 甲骨文国际公司 | Index for accessing XML data |
-
2011
- 2011-12-05 CN CN201110401386.3A patent/CN103136304B/en not_active Expired - Fee Related
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050228768A1 (en) * | 2004-04-09 | 2005-10-13 | Ashish Thusoo | Mechanism for efficiently evaluating operator trees |
US20050228828A1 (en) * | 2004-04-09 | 2005-10-13 | Sivasankaran Chandrasekar | Efficient extraction of XML content stored in a LOB |
CN1965316A (en) * | 2004-04-09 | 2007-05-16 | 甲骨文国际公司 | Index for accessing XML data |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107193849A (en) * | 2016-03-15 | 2017-09-22 | 北大方正集团有限公司 | XML file full-text search index generation method and device |
CN109460394A (en) * | 2018-11-20 | 2019-03-12 | 北京广利核系统工程有限公司 | A kind of simplification method of multistage document entry tracing matrix |
Also Published As
Publication number | Publication date |
---|---|
CN103136304B (en) | 2017-02-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102930059B (en) | Method for designing focused crawler | |
Mani et al. | Semantic data modeling using XML schemas | |
CN106126648B (en) | It is a kind of based on the distributed merchandise news crawler method redo log | |
US9753960B1 (en) | System, method, and computer program for dynamically generating a visual representation of a subset of a graph for display, based on search criteria | |
CN104516982A (en) | Method and system for extracting Web information based on Nutch | |
CN102622453A (en) | Body-based food security event semantic retrieval system | |
CN110222110A (en) | A kind of resource description framework data conversion storage integral method based on ETL tool | |
CN102810114A (en) | Personal computer resource management system based on body | |
CN102521232A (en) | Distributed acquisition and processing system and method of internet metadata | |
CN102193798A (en) | Method for automatically acquiring Open application programming interface (API) based on Internet | |
CN103020318A (en) | Method for maintenance of database tables in database | |
US9959305B2 (en) | Annotating structured data for search | |
CN101799890B (en) | Certificate data processing method and system | |
CN103136304A (en) | Article processing method and device | |
CN112417225A (en) | Joint query method and system for multi-source heterogeneous data | |
CN105740250B (en) | A kind of method and device for the property index creating XML node | |
Patil et al. | Semantic search using ontology and RDBMS for cricket | |
CN112905759A (en) | Intellectual property retrieval system and method | |
CN102819594B (en) | A kind of method and apparatus of organization website information | |
Zheng et al. | Design and implementation of news collecting and filtering system based on RSS | |
CN103729422A (en) | Information fragment associative output method and system | |
CN104965924B (en) | A kind of date storage method and device | |
CN104298685A (en) | Method and device for achieving heterogeneous system unified searching | |
Kaczmarek et al. | Information extraction from web pages for the needs of expert finding | |
Saraswathi et al. | Design of dynamically updated automatic ontology for mobile phone information retrieval system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20170222 Termination date: 20171205 |
|
CF01 | Termination of patent right due to non-payment of annual fee |