CN103136304B - Article processing method and device - Google Patents
Article processing method and device Download PDFInfo
- Publication number
- CN103136304B CN103136304B CN201110401386.3A CN201110401386A CN103136304B CN 103136304 B CN103136304 B CN 103136304B CN 201110401386 A CN201110401386 A CN 201110401386A CN 103136304 B CN103136304 B CN 103136304B
- Authority
- CN
- China
- Prior art keywords
- index
- domain
- xml document
- module
- xpath
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Document Processing Apparatus (AREA)
Abstract
The invention discloses an article processing method which comprises the following steps: establishing xtensible markup language (XML) documents to record the content of articles, wherein the XPATH of elements of the XML documents corresponds to chapter hierarchical relation of the content of the article; storing every XML document into an XML document domain to an article data sheet; and establishing an index of the XML document domain according to the XPATH of the elements of the XML documents. The invention provides an article processing device which comprises a structurized module, a database module and an index module. The structurized module is used for establishing the XML documents to record the content of the articles, wherein the XPATH of the elements inside the XML documents corresponds to the chapter hierarchical relation of the content of the article. The database module is used for storing each XML document into the XML document domain according to the XPATH of the elements of the XML documents. The index module is used for establishing the index of the XML document domain according to the XPATH of the elements of the XML documents. The article processing method and the device improves the efficiency of article retrieval.
Description
Technical field
The present invention relates to the publication of mutual communication network field, in particular to a kind for the treatment of method and apparatus of entry.
Background technology
The data of entry class has chapters and sections hierarchical structure, in order to safeguard integrity and the hierarchical relationship of entry contents, permissible
In a domain by the way of XML, whole entry contents being stored in data base as attribute, constitute XML document domain, and bar
Other attributes of purpose constitute a complete record together.
During to item retrievals, the mode according to domain is organized into search condition the attribute of entry, and then entry is examined
Rope.It is necessary first to obtain the record meeting other conditions during the restriction of element in search condition comprises to entry contents, obtain
The complete XML fragment of entry contents, then enters line retrieval to element by way of XPATH, and then is obtained by way of filtering
Take qualified record.
Inventor finds, this retrieval mode leads to XML document to load frequently, expends resource more.
Content of the invention
The present invention is intended to provide a kind for the treatment of method and apparatus of entry, to improve the efficiency of item retrievals.
In an embodiment of the present invention, there is provided a kind of processing method of entry, including:Create XML document with record strip
Purpose content, wherein, chapters and sections hierarchical relationship in the content of XPATH corresponding entry of the element in XML document;By each XML
Document stores in the XML document domain of entry data table;According to the XPATH of the element in XML document, the XML literary composition to data base
Shelves domain creates index.
In an embodiment of the present invention, there is provided a kind of processing meanss of entry, including:Structurized module, for creating
XML document with record strip purpose content, wherein, chapters and sections level in the content of XPATH corresponding entry of the element in XML document
Relation;DBM, for storing each XML document in the XML document domain of entry data table;Index module, is used for
According to the XPATH of the element in XML document, index is created to the XML document domain of data base.
The treating method and apparatus of the entry of the above embodiment of the present invention because index is created to XML document domain,
Overcome the less efficient problem of the item retrievals of prior art, improve the efficiency of item retrievals.
Brief description
Accompanying drawing described herein is used for providing a further understanding of the present invention, constitutes the part of the application, this
Bright schematic description and description is used for explaining the present invention, does not constitute inappropriate limitation of the present invention.In the accompanying drawings:
Fig. 1 shows the processing method of entry according to embodiments of the present invention;
Fig. 2 shows index relative schematic diagram according to the preferred embodiment of the invention;
The flow chart that Fig. 3 shows execution indexed search according to the preferred embodiment of the invention;
Fig. 4 shows the screenshot capture at index management interface according to the preferred embodiment of the invention;
Fig. 5 shows the schematic diagram of the processing meanss of entry according to embodiments of the present invention.
Specific embodiment
Below with reference to the accompanying drawings and in conjunction with the embodiments, to describe the present invention in detail.
Fig. 1 shows the processing method of entry according to embodiments of the present invention, including:
Step S10, creates XML document with record strip purpose content, wherein, the XPATH of the element in XML document corresponds to bar
Chapters and sections hierarchical relationship in purpose content;
Step S20, each XML document is stored in the XML document domain of entry data table;
Step S30, according to the XPATH of the element in XML document, creates index to the XML document domain of data base.
When in the prior art, using XML technology retrieval entry, obtain the complete XML fragment of entry contents, then pass through
The mode of XPATH enters line retrieval;And the processing method of the entry of the present embodiment XML document domain is created index it is possible to
Utilize the index to retrieve entry, whole XML document need not be reloaded, which reduce resource cost, considerably improve retrieval effect
Rate, shortens retrieval time.
In addition, prior art is carried out by way of traversal addresses to the retrieval of element, retrieval rate is slow, and this method can
To utilize the index to retrieve entry, again addressing need not be traveled through to element, this also shortens retrieval time.
Preferably, step S30 includes:Corresponding index, wherein, the name of index are created for the element in XML document domain
Title=XML document domain name claims+XPATH of domain name separating character+this element.This embodiment is simple.
Fig. 2 shows index relative schematic diagram according to the preferred embodiment of the invention.It can be seen that index domain with
The contact of XML document is well-determined, and the therefore retrieval to element (its content is entry) can equivalently be converted into rope
Draw the retrieval in domain, meanwhile, the management to element index is converted into the management to index data table data so that the retrieval of element becomes
Obtain quickness and high efficiency.
For example, there is following tables of data:
In this tables of data, in the DOC_XMLDATA of domain, the XML of storage has following structure:
According to this preferred embodiment, the title of the index of generation is as follows:
<Node text=" DOC_XMLDATA_/paper/industry background "/>
<Node text=" DOC_XMLDATA_/paper/product orientation "/>
<Node text=" DOC_XMLDATA_/paper/key characteristic/functional characteristic "/>
<Node text=" DOC_XMLDATA_/paper/key characteristic/Performance Characteristics "/>
<Node text=" DOC_XMLDATA_/paper/key characteristic/technical characteristic "/>
<Node text=" DOC_XMLDATA_/paper/market prospect "/>
<Node text=" DOC_XMLDATA_/paper/risk assessment "/>
Preferably, step S30 also includes:Each index venue is stored as index data table, wherein, by the name of index
Claim storage in the index domain of index data table.
Preferably, also create title-domain in index data table, for recording the simple name in index domain, to present to use
Family.
As follows according to the index data table that above preferred embodiment creates:
CLOB refers to elongated the text field.
Preferably, this method also includes:
The simple name that title-domain is recorded is presented to user;
The retrieval word string to the simple selection named and input for the receive user;
Enter line retrieval to retrieve word string as the keyword index domain corresponding to selected simple name;
The content in XML document domain pointed for the index retrieving is submitted to user.
The search condition based on user input for this preferred embodiment, the retrieval grammer of organizing search engine, and user only needs
Select project and the input keyword wanting to retrieve.As user needs inquiry industry background or product orientation to belong to numeral and go out
The document of version aspect, then the retrieval grammer organized is as follows:
((DOC_XMLDATA_/paper/industry background LIKE ' digital publishing ') OR (DOC_XMLDATA_/paper/product
Positioning LIKE ' digital publishing '))
Syntax conversion device is converted into the grammer of element retrieval retrieval sentence, and is sent to retrieval service, and element is retrieved
Grammer is as follows:
Retrieval service receives search condition, calls syntax conversion service, is converted into retrieving sentence and executing retrieval, obtains
Retrieval set.Search engine returns to retrieval set on human-computer interaction interface.
The flow chart that Fig. 3 shows execution indexed search according to the preferred embodiment of the invention, including:
The first step, search engine receives the retrieval request of leading portion page transmission,
Second step, search engine calls syntax conversion device, the search condition of the page is converted into the grammer of element retrieval,
3rd step, search engine initiates retrieval request, and retrieval sentence is passed to retrieval service,
4th step, retrieval service parsing retrieval grammer, execute retrieval, obtain retrieval set
5th step, retrieval service returns the indexed results collection obtaining to search engine,
6th step, search engine analysis result collection, result document is obtained according to index rule and returns to leading portion process.
Fig. 4 shows the screenshot capture at index management interface according to the preferred embodiment of the invention.
This preferred embodiment provides the friendly interactive interface of comparison, helps user to select suitable rope using title-domain
Drawing domain it is achieved that line retrieval is entered to entry using index, for a user, comparing flexibly easy-to-use.
Fig. 5 shows the schematic diagram of the processing meanss of entry according to embodiments of the present invention, including:
Structurized module 10, for creating XML document with record strip purpose content, wherein, element in XML document
Chapters and sections hierarchical relationship in the content of XPATH corresponding entry;
DBM 20, for storing each XML document in the XML document domain of entry data table;
Index module 30, for the XPATH according to the element in XML document, creates rope to the XML document domain of data base
Draw.
This device decreases resource cost, considerably improves recall precision, shortens retrieval time.
Preferably, index module is used for creating corresponding index, wherein, the name of index for the element in XML document domain
Title=XML document domain name claims+XPATH of domain name separating character+this element.
Preferably, index module is additionally operable to for each index venue to be stored as index data table, wherein, by the name of index
Claim storage in the index domain of index data table.
Preferably, index module is additionally operable to also create title-domain in index data table, for recording the simple of index domain
Name, to present to user.
Preferably, also include:Interface module, the simple name for recording title-domain is presented to user;Receiver module,
For the retrieval word string to the simple selection named and input for the receive user;Retrieval module, for retrieve word string as pass
Line retrieval is entered in the key word index domain corresponding to selected simple name;Submit module to, for by the index retrieving indication
To the content in XML document domain submit to user.
As can be seen from the above description, present invention achieves following technique effect:
Can direct retrieval elements:On the basis of not changing original XML storage organization, directly the element of XML is carried out
Retrieval.
That reduces resource repeats loading:Directly reduced for element, reduce to complete XML-document repeat load,
Economize on resources, resource utilization is provided.
Improve recall precision:Abandon original by way of traversal, addressing, adopt first with direct retrieval by index
The method retrieval of element, improves recall precision.
Obviously, those skilled in the art should be understood that each module of the above-mentioned present invention or each step can be with general
Computing device realizing, they can concentrate on single computing device, or be distributed in multiple computing devices and formed
Network on, alternatively, they can be realized with the executable program code of computing device, it is thus possible to they are stored
To be executed by computing device in the storage device, or they be fabricated to each integrated circuit modules respectively, or by they
In multiple modules or step be fabricated to single integrated circuit module to realize.So, the present invention be not restricted to any specific
Hardware and software combines.
The foregoing is only the preferred embodiments of the present invention, be not limited to the present invention, for the skill of this area
For art personnel, the present invention can have various modifications and variations.All within the spirit and principles in the present invention, made any repair
Change, equivalent, improvement etc., should be included within the scope of the present invention.
Claims (6)
1. a kind of processing method of entry is it is characterised in that include:
Create XML document with record strip purpose content, wherein, the corresponding described entry of XPATH of the element in described XML document
Chapters and sections hierarchical relationship in content;
Each described XML document is stored in the XML document domain of entry data table;
According to the XPATH of the element in described XML document, index is created to the XML document domain of described tables of data, wherein, for
Element in described XML document domain creates corresponding index, and the title=described XML document domain name of described index claims+and domain name divides
Every the XPATH of symbol+this element, each described index venue is stored as index data table, and the title of described index is deposited
Store up in the index domain of described index data table.
2. method according to claim 1, it is characterised in that also creating title-domain in described index data table, is used for
Record the simple name in described index domain, to present to user.
3. method according to claim 2 is it is characterised in that also include:
The simple name that described title-domain is recorded is presented to user;
The selection to described simple name for the receive user and the retrieval word string of input;
Line retrieval is entered as the keyword index domain corresponding to selected simple name using described retrieval word string;
The content in XML document domain pointed for the index retrieving is submitted to user.
4. a kind of processing meanss of entry are it is characterised in that include:
Structurized module, for creating XML document with record strip purpose content, wherein, element in described XML document
Chapters and sections hierarchical relationship in the content of the corresponding described entry of XPATH;
Memory module, for storing each described XML document in the XML document domain of entry data table;
Index module, for the XPATH according to the element in described XML document, creates rope to the XML document domain of described tables of data
Draw, wherein, corresponding index, the title=described XML document domain of described index are created for the element in described XML document domain
The XPATH of title+domain name separating character+this element, each described index venue is stored as index data table, and by described rope
The title drawn stores in the index domain of described index data table.
5. device according to claim 4 is it is characterised in that described index module is additionally operable in described index data table
Also create title-domain, for recording the simple name in described index domain, to present to user.
6. device according to claim 5 is it is characterised in that also include:
Interface module, the simple name for recording described title-domain is presented to user;
Receiver module, for the retrieval word string of the selection to described simple name for the receive user and input;
Retrieval module, for being carried out using described retrieval word string as the keyword index domain corresponding to selected simple name
Retrieval;
Submit module to, for the content in XML document domain pointed for the index retrieving is submitted to user.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201110401386.3A CN103136304B (en) | 2011-12-05 | 2011-12-05 | Article processing method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201110401386.3A CN103136304B (en) | 2011-12-05 | 2011-12-05 | Article processing method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103136304A CN103136304A (en) | 2013-06-05 |
CN103136304B true CN103136304B (en) | 2017-02-22 |
Family
ID=48496136
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201110401386.3A Expired - Fee Related CN103136304B (en) | 2011-12-05 | 2011-12-05 | Article processing method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103136304B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107193849A (en) * | 2016-03-15 | 2017-09-22 | 北大方正集团有限公司 | XML file full-text search index generation method and device |
CN109460394B (en) * | 2018-11-20 | 2020-06-16 | 北京广利核系统工程有限公司 | Simplification method of multi-level document entry tracking matrix |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1965316A (en) * | 2004-04-09 | 2007-05-16 | 甲骨文国际公司 | Index for accessing XML data |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7366735B2 (en) * | 2004-04-09 | 2008-04-29 | Oracle International Corporation | Efficient extraction of XML content stored in a LOB |
US7603347B2 (en) * | 2004-04-09 | 2009-10-13 | Oracle International Corporation | Mechanism for efficiently evaluating operator trees |
-
2011
- 2011-12-05 CN CN201110401386.3A patent/CN103136304B/en not_active Expired - Fee Related
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1965316A (en) * | 2004-04-09 | 2007-05-16 | 甲骨文国际公司 | Index for accessing XML data |
Also Published As
Publication number | Publication date |
---|---|
CN103136304A (en) | 2013-06-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP2901318B1 (en) | Evaluating xml full text search | |
KR101183312B1 (en) | Dispersing search engine results by using page category information | |
US8751505B2 (en) | Indexing and searching entity-relationship data | |
US7548912B2 (en) | Simplified search interface for querying a relational database | |
US9152697B2 (en) | Real-time search of vertically partitioned, inverted indexes | |
US8983931B2 (en) | Index-based evaluation of path-based queries | |
CN106126648B (en) | It is a kind of based on the distributed merchandise news crawler method redo log | |
US10810181B2 (en) | Refining structured data indexes | |
CN110222110A (en) | A kind of resource description framework data conversion storage integral method based on ETL tool | |
US8156144B2 (en) | Metadata search interface | |
KR101224800B1 (en) | Crawling database for infomation | |
CN102193798A (en) | Method for automatically acquiring Open application programming interface (API) based on Internet | |
Kamali et al. | Structural similarity search for mathematics retrieval | |
Xiao et al. | A Multi-Ontology Approach for Personal Information Management. | |
US8108421B2 (en) | Query throttling during query translation | |
Mass et al. | IQ: The Case for Iterative Querying for Knowledge. | |
CN103136304B (en) | Article processing method and device | |
Patil et al. | Semantic search using ontology and RDBMS for cricket | |
CN105740250B (en) | A kind of method and device for the property index creating XML node | |
Ghebghoub et al. | Learning object indexing tool based on a LOM ontology | |
CN1588371A (en) | Forming method for package device | |
Qu et al. | Searching SCORM metadata in a RDF-based E-learning P2P network using Xquery and Query by Example | |
Peng et al. | A folksonomy-ontology-based digital gazetteer service | |
Ding et al. | Design and implementation of Educational Resources Database System based on SQL SERVER 2005 and ASP. NET 2.0 XML | |
Bahreini et al. | SDISSASA: A multiagent-Based web mining via semantic access to Web resources in Enterprise Architecture |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20170222 Termination date: 20171205 |