CN103136304B - Article processing method and device - Google Patents

Article processing method and device Download PDF

Info

Publication number
CN103136304B
CN103136304B CN201110401386.3A CN201110401386A CN103136304B CN 103136304 B CN103136304 B CN 103136304B CN 201110401386 A CN201110401386 A CN 201110401386A CN 103136304 B CN103136304 B CN 103136304B
Authority
CN
China
Prior art keywords
index
domain
xml document
module
xpath
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201110401386.3A
Other languages
Chinese (zh)
Other versions
CN103136304A (en
Inventor
刘浩
翟因为
陈长刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University Founder Group Co Ltd
Beijing Founder Electronics Co Ltd
Original Assignee
Peking University Founder Group Co Ltd
Beijing Founder Electronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University Founder Group Co Ltd, Beijing Founder Electronics Co Ltd filed Critical Peking University Founder Group Co Ltd
Priority to CN201110401386.3A priority Critical patent/CN103136304B/en
Publication of CN103136304A publication Critical patent/CN103136304A/en
Application granted granted Critical
Publication of CN103136304B publication Critical patent/CN103136304B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Document Processing Apparatus (AREA)

Abstract

The invention discloses an article processing method which comprises the following steps: establishing xtensible markup language (XML) documents to record the content of articles, wherein the XPATH of elements of the XML documents corresponds to chapter hierarchical relation of the content of the article; storing every XML document into an XML document domain to an article data sheet; and establishing an index of the XML document domain according to the XPATH of the elements of the XML documents. The invention provides an article processing device which comprises a structurized module, a database module and an index module. The structurized module is used for establishing the XML documents to record the content of the articles, wherein the XPATH of the elements inside the XML documents corresponds to the chapter hierarchical relation of the content of the article. The database module is used for storing each XML document into the XML document domain according to the XPATH of the elements of the XML documents. The index module is used for establishing the index of the XML document domain according to the XPATH of the elements of the XML documents. The article processing method and the device improves the efficiency of article retrieval.

Description

The treating method and apparatus of entry
Technical field
The present invention relates to the publication of mutual communication network field, in particular to a kind for the treatment of method and apparatus of entry.
Background technology
The data of entry class has chapters and sections hierarchical structure, in order to safeguard integrity and the hierarchical relationship of entry contents, permissible In a domain by the way of XML, whole entry contents being stored in data base as attribute, constitute XML document domain, and bar Other attributes of purpose constitute a complete record together.
During to item retrievals, the mode according to domain is organized into search condition the attribute of entry, and then entry is examined Rope.It is necessary first to obtain the record meeting other conditions during the restriction of element in search condition comprises to entry contents, obtain The complete XML fragment of entry contents, then enters line retrieval to element by way of XPATH, and then is obtained by way of filtering Take qualified record.
Inventor finds, this retrieval mode leads to XML document to load frequently, expends resource more.
Content of the invention
The present invention is intended to provide a kind for the treatment of method and apparatus of entry, to improve the efficiency of item retrievals.
In an embodiment of the present invention, there is provided a kind of processing method of entry, including:Create XML document with record strip Purpose content, wherein, chapters and sections hierarchical relationship in the content of XPATH corresponding entry of the element in XML document;By each XML Document stores in the XML document domain of entry data table;According to the XPATH of the element in XML document, the XML literary composition to data base Shelves domain creates index.
In an embodiment of the present invention, there is provided a kind of processing meanss of entry, including:Structurized module, for creating XML document with record strip purpose content, wherein, chapters and sections level in the content of XPATH corresponding entry of the element in XML document Relation;DBM, for storing each XML document in the XML document domain of entry data table;Index module, is used for According to the XPATH of the element in XML document, index is created to the XML document domain of data base.
The treating method and apparatus of the entry of the above embodiment of the present invention because index is created to XML document domain, Overcome the less efficient problem of the item retrievals of prior art, improve the efficiency of item retrievals.
Brief description
Accompanying drawing described herein is used for providing a further understanding of the present invention, constitutes the part of the application, this Bright schematic description and description is used for explaining the present invention, does not constitute inappropriate limitation of the present invention.In the accompanying drawings:
Fig. 1 shows the processing method of entry according to embodiments of the present invention;
Fig. 2 shows index relative schematic diagram according to the preferred embodiment of the invention;
The flow chart that Fig. 3 shows execution indexed search according to the preferred embodiment of the invention;
Fig. 4 shows the screenshot capture at index management interface according to the preferred embodiment of the invention;
Fig. 5 shows the schematic diagram of the processing meanss of entry according to embodiments of the present invention.
Specific embodiment
Below with reference to the accompanying drawings and in conjunction with the embodiments, to describe the present invention in detail.
Fig. 1 shows the processing method of entry according to embodiments of the present invention, including:
Step S10, creates XML document with record strip purpose content, wherein, the XPATH of the element in XML document corresponds to bar Chapters and sections hierarchical relationship in purpose content;
Step S20, each XML document is stored in the XML document domain of entry data table;
Step S30, according to the XPATH of the element in XML document, creates index to the XML document domain of data base.
When in the prior art, using XML technology retrieval entry, obtain the complete XML fragment of entry contents, then pass through The mode of XPATH enters line retrieval;And the processing method of the entry of the present embodiment XML document domain is created index it is possible to Utilize the index to retrieve entry, whole XML document need not be reloaded, which reduce resource cost, considerably improve retrieval effect Rate, shortens retrieval time.
In addition, prior art is carried out by way of traversal addresses to the retrieval of element, retrieval rate is slow, and this method can To utilize the index to retrieve entry, again addressing need not be traveled through to element, this also shortens retrieval time.
Preferably, step S30 includes:Corresponding index, wherein, the name of index are created for the element in XML document domain Title=XML document domain name claims+XPATH of domain name separating character+this element.This embodiment is simple.
Fig. 2 shows index relative schematic diagram according to the preferred embodiment of the invention.It can be seen that index domain with The contact of XML document is well-determined, and the therefore retrieval to element (its content is entry) can equivalently be converted into rope Draw the retrieval in domain, meanwhile, the management to element index is converted into the management to index data table data so that the retrieval of element becomes Obtain quickness and high efficiency.
For example, there is following tables of data:
In this tables of data, in the DOC_XMLDATA of domain, the XML of storage has following structure:
According to this preferred embodiment, the title of the index of generation is as follows:
<Node text=" DOC_XMLDATA_/paper/industry background "/>
<Node text=" DOC_XMLDATA_/paper/product orientation "/>
<Node text=" DOC_XMLDATA_/paper/key characteristic/functional characteristic "/>
<Node text=" DOC_XMLDATA_/paper/key characteristic/Performance Characteristics "/>
<Node text=" DOC_XMLDATA_/paper/key characteristic/technical characteristic "/>
<Node text=" DOC_XMLDATA_/paper/market prospect "/>
<Node text=" DOC_XMLDATA_/paper/risk assessment "/>
Preferably, step S30 also includes:Each index venue is stored as index data table, wherein, by the name of index Claim storage in the index domain of index data table.
Preferably, also create title-domain in index data table, for recording the simple name in index domain, to present to use Family.
As follows according to the index data table that above preferred embodiment creates:
CLOB refers to elongated the text field.
Preferably, this method also includes:
The simple name that title-domain is recorded is presented to user;
The retrieval word string to the simple selection named and input for the receive user;
Enter line retrieval to retrieve word string as the keyword index domain corresponding to selected simple name;
The content in XML document domain pointed for the index retrieving is submitted to user.
The search condition based on user input for this preferred embodiment, the retrieval grammer of organizing search engine, and user only needs Select project and the input keyword wanting to retrieve.As user needs inquiry industry background or product orientation to belong to numeral and go out The document of version aspect, then the retrieval grammer organized is as follows:
((DOC_XMLDATA_/paper/industry background LIKE ' digital publishing ') OR (DOC_XMLDATA_/paper/product Positioning LIKE ' digital publishing '))
Syntax conversion device is converted into the grammer of element retrieval retrieval sentence, and is sent to retrieval service, and element is retrieved Grammer is as follows:
Retrieval service receives search condition, calls syntax conversion service, is converted into retrieving sentence and executing retrieval, obtains Retrieval set.Search engine returns to retrieval set on human-computer interaction interface.
The flow chart that Fig. 3 shows execution indexed search according to the preferred embodiment of the invention, including:
The first step, search engine receives the retrieval request of leading portion page transmission,
Second step, search engine calls syntax conversion device, the search condition of the page is converted into the grammer of element retrieval,
3rd step, search engine initiates retrieval request, and retrieval sentence is passed to retrieval service,
4th step, retrieval service parsing retrieval grammer, execute retrieval, obtain retrieval set
5th step, retrieval service returns the indexed results collection obtaining to search engine,
6th step, search engine analysis result collection, result document is obtained according to index rule and returns to leading portion process.
Fig. 4 shows the screenshot capture at index management interface according to the preferred embodiment of the invention.
This preferred embodiment provides the friendly interactive interface of comparison, helps user to select suitable rope using title-domain Drawing domain it is achieved that line retrieval is entered to entry using index, for a user, comparing flexibly easy-to-use.
Fig. 5 shows the schematic diagram of the processing meanss of entry according to embodiments of the present invention, including:
Structurized module 10, for creating XML document with record strip purpose content, wherein, element in XML document Chapters and sections hierarchical relationship in the content of XPATH corresponding entry;
DBM 20, for storing each XML document in the XML document domain of entry data table;
Index module 30, for the XPATH according to the element in XML document, creates rope to the XML document domain of data base Draw.
This device decreases resource cost, considerably improves recall precision, shortens retrieval time.
Preferably, index module is used for creating corresponding index, wherein, the name of index for the element in XML document domain Title=XML document domain name claims+XPATH of domain name separating character+this element.
Preferably, index module is additionally operable to for each index venue to be stored as index data table, wherein, by the name of index Claim storage in the index domain of index data table.
Preferably, index module is additionally operable to also create title-domain in index data table, for recording the simple of index domain Name, to present to user.
Preferably, also include:Interface module, the simple name for recording title-domain is presented to user;Receiver module, For the retrieval word string to the simple selection named and input for the receive user;Retrieval module, for retrieve word string as pass Line retrieval is entered in the key word index domain corresponding to selected simple name;Submit module to, for by the index retrieving indication To the content in XML document domain submit to user.
As can be seen from the above description, present invention achieves following technique effect:
Can direct retrieval elements:On the basis of not changing original XML storage organization, directly the element of XML is carried out Retrieval.
That reduces resource repeats loading:Directly reduced for element, reduce to complete XML-document repeat load, Economize on resources, resource utilization is provided.
Improve recall precision:Abandon original by way of traversal, addressing, adopt first with direct retrieval by index The method retrieval of element, improves recall precision.
Obviously, those skilled in the art should be understood that each module of the above-mentioned present invention or each step can be with general Computing device realizing, they can concentrate on single computing device, or be distributed in multiple computing devices and formed Network on, alternatively, they can be realized with the executable program code of computing device, it is thus possible to they are stored To be executed by computing device in the storage device, or they be fabricated to each integrated circuit modules respectively, or by they In multiple modules or step be fabricated to single integrated circuit module to realize.So, the present invention be not restricted to any specific Hardware and software combines.
The foregoing is only the preferred embodiments of the present invention, be not limited to the present invention, for the skill of this area For art personnel, the present invention can have various modifications and variations.All within the spirit and principles in the present invention, made any repair Change, equivalent, improvement etc., should be included within the scope of the present invention.

Claims (6)

1. a kind of processing method of entry is it is characterised in that include:
Create XML document with record strip purpose content, wherein, the corresponding described entry of XPATH of the element in described XML document Chapters and sections hierarchical relationship in content;
Each described XML document is stored in the XML document domain of entry data table;
According to the XPATH of the element in described XML document, index is created to the XML document domain of described tables of data, wherein, for Element in described XML document domain creates corresponding index, and the title=described XML document domain name of described index claims+and domain name divides Every the XPATH of symbol+this element, each described index venue is stored as index data table, and the title of described index is deposited Store up in the index domain of described index data table.
2. method according to claim 1, it is characterised in that also creating title-domain in described index data table, is used for Record the simple name in described index domain, to present to user.
3. method according to claim 2 is it is characterised in that also include:
The simple name that described title-domain is recorded is presented to user;
The selection to described simple name for the receive user and the retrieval word string of input;
Line retrieval is entered as the keyword index domain corresponding to selected simple name using described retrieval word string;
The content in XML document domain pointed for the index retrieving is submitted to user.
4. a kind of processing meanss of entry are it is characterised in that include:
Structurized module, for creating XML document with record strip purpose content, wherein, element in described XML document Chapters and sections hierarchical relationship in the content of the corresponding described entry of XPATH;
Memory module, for storing each described XML document in the XML document domain of entry data table;
Index module, for the XPATH according to the element in described XML document, creates rope to the XML document domain of described tables of data Draw, wherein, corresponding index, the title=described XML document domain of described index are created for the element in described XML document domain The XPATH of title+domain name separating character+this element, each described index venue is stored as index data table, and by described rope The title drawn stores in the index domain of described index data table.
5. device according to claim 4 is it is characterised in that described index module is additionally operable in described index data table Also create title-domain, for recording the simple name in described index domain, to present to user.
6. device according to claim 5 is it is characterised in that also include:
Interface module, the simple name for recording described title-domain is presented to user;
Receiver module, for the retrieval word string of the selection to described simple name for the receive user and input;
Retrieval module, for being carried out using described retrieval word string as the keyword index domain corresponding to selected simple name Retrieval;
Submit module to, for the content in XML document domain pointed for the index retrieving is submitted to user.
CN201110401386.3A 2011-12-05 2011-12-05 Article processing method and device Expired - Fee Related CN103136304B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201110401386.3A CN103136304B (en) 2011-12-05 2011-12-05 Article processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201110401386.3A CN103136304B (en) 2011-12-05 2011-12-05 Article processing method and device

Publications (2)

Publication Number Publication Date
CN103136304A CN103136304A (en) 2013-06-05
CN103136304B true CN103136304B (en) 2017-02-22

Family

ID=48496136

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201110401386.3A Expired - Fee Related CN103136304B (en) 2011-12-05 2011-12-05 Article processing method and device

Country Status (1)

Country Link
CN (1) CN103136304B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107193849A (en) * 2016-03-15 2017-09-22 北大方正集团有限公司 XML file full-text search index generation method and device
CN109460394B (en) * 2018-11-20 2020-06-16 北京广利核系统工程有限公司 Simplification method of multi-level document entry tracking matrix

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1965316A (en) * 2004-04-09 2007-05-16 甲骨文国际公司 Index for accessing XML data

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7366735B2 (en) * 2004-04-09 2008-04-29 Oracle International Corporation Efficient extraction of XML content stored in a LOB
US7603347B2 (en) * 2004-04-09 2009-10-13 Oracle International Corporation Mechanism for efficiently evaluating operator trees

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1965316A (en) * 2004-04-09 2007-05-16 甲骨文国际公司 Index for accessing XML data

Also Published As

Publication number Publication date
CN103136304A (en) 2013-06-05

Similar Documents

Publication Publication Date Title
KR101183312B1 (en) Dispersing search engine results by using page category information
EP2901318B1 (en) Evaluating xml full text search
US8751505B2 (en) Indexing and searching entity-relationship data
Andrews et al. A classification of semantic annotation systems
US7548912B2 (en) Simplified search interface for querying a relational database
CN106126648B (en) It is a kind of based on the distributed merchandise news crawler method redo log
US8983931B2 (en) Index-based evaluation of path-based queries
US8775356B1 (en) Query enhancement of semantic wiki for improved searching of unstructured data
US10810181B2 (en) Refining structured data indexes
US8156144B2 (en) Metadata search interface
CN110222110A (en) A kind of resource description framework data conversion storage integral method based on ETL tool
KR101224800B1 (en) Crawling database for infomation
Xiao et al. A Multi-Ontology Approach for Personal Information Management.
Kamali et al. Structural similarity search for mathematics retrieval
US8108421B2 (en) Query throttling during query translation
Mass et al. IQ: The Case for Iterative Querying for Knowledge.
CN103136304B (en) Article processing method and device
Patil et al. Semantic search using ontology and RDBMS for cricket
Francisco‐Revilla et al. Encoded archival description: Data quality and analysis
Ghebghoub et al. Learning object indexing tool based on a LOM ontology
Zhong et al. 3SEPIAS: A semi-structured search engine for personal information in dataspace system
Qu et al. Searching SCORM metadata in a RDF-based E-learning P2P network using Xquery and Query by Example
CN1588371A (en) Forming method for package device
TWI423053B (en) Domain Interpretation Data Retrieval Method and Its System
Bahreini et al. SDISSASA: A multiagent-Based web mining via semantic access to Web resources in Enterprise Architecture

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20170222

Termination date: 20171205