CN107818140A - A kind of digital publication semantic tagger optimization method - Google Patents

A kind of digital publication semantic tagger optimization method Download PDF

Info

Publication number
CN107818140A
CN107818140A CN201710926504.XA CN201710926504A CN107818140A CN 107818140 A CN107818140 A CN 107818140A CN 201710926504 A CN201710926504 A CN 201710926504A CN 107818140 A CN107818140 A CN 107818140A
Authority
CN
China
Prior art keywords
document
mark
weights
data
individual
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710926504.XA
Other languages
Chinese (zh)
Inventor
艾顺刚
张蕾
何卫东
姚进德
孙晓翠
王林林
刘焱
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Retech Digital Industry Park Co Ltd
Original Assignee
Jiangsu Retech Digital Industry Park Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Retech Digital Industry Park Co Ltd filed Critical Jiangsu Retech Digital Industry Park Co Ltd
Priority to CN201710926504.XA priority Critical patent/CN107818140A/en
Publication of CN107818140A publication Critical patent/CN107818140A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/34Browsing; Visualisation therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Machine Translation (AREA)

Abstract

The present invention provides a kind of digital publication semantic tagger optimization method.Comprise the following steps:Document content pre-processes;Build ontology model;Structure individual simultaneously fills data property value;Adjust document marking and mark weights;Storage mark and mark weights;Input word and carry out knowledge query, matched data simultaneously sorts by weights.The inventive method can improve the accuracy of document marking, and user can faster search effective document when carrying out knowledge query using ontology knowledge base, and can improve the accuracy of other associated electronic document markings.

Description

A kind of digital publication semantic tagger optimization method
Technical field
The present invention relates to digital publication technical field, more particularly to a kind of digital publication semantic tagger optimization side Method.
Background technology
Knowledge processing is the inexorable trend of Information Technology Development, as to knowledge application requirement more and more higher, traditional knows The needs of new can not be met by knowing Database Systems, so body is referred in knowledge engineering, by body relative theory skill Art is applied in the exploitation of knowledge base.
Ontology knowledge system is later stage the 1970s, expert system, knowledge system and knowledge-intensive information system The constructing technology of system develops and forms knowledge engineering, and the system established is referred to as knowledge system(knowledge-based systems).Knowledge system is the most important industrialization of artificial intelligence subject and commercialization product.Knowledge system is used to aid in people Carry out problem solving, such as detect credit card fraud, accelerate Ship Design, assisted medical diagnosis, make scientific software more intelligent Change, financial service, the evaluation of product quality and advertising are provided to F/O, support that the service of electric network is extensive It is multiple.
With the continuous development of digital publishing, the explosive growth of modern the Internet digital content resource, also go out at this stage The technology that some contents to electronic publication refine mark is showed, but these extractions to content mark are according to basic word Storehouse and content context extract.The notation methods of this extraction are not bound with the domain background of publication, have a lot The related key message in field is filtered, and reduces the accuracy for being labeled in specific area;So that mark can not completely represent The core and main contents of document.
When the information to the field is retrieved according to mark, can make to have in terms of information recall ratio and precision ratio very big Shortcoming, does not make full use of content markup information, and the relation and structure between information also do not show sufficiently so that use Family needs to take a significant amount of time on information sifting.
The content of the invention
The technical problems to be solved by the invention are the above-mentioned technical deficiencies of face, there is provided one kind can improve document The accuracy of mark, user can faster search effective document, and energy when carrying out knowledge query using ontology knowledge base Improve a kind of digital publication semantic tagger optimization method of the accuracy of other associated electronic document markings.
The technical solution adopted for the present invention to solve the technical problems is:
A kind of digital publication semantic tagger optimization method, it is characterised in that comprise the following steps:
Document content pre-processes:Document is parsed in computer systems, keyword is extracted using keyword extraction instrument, And the weights for calculating keyword based on word position, provide data basis for subsequent builds individual.
Build ontology model:Body is the clear and definite expression of the formalization to the class in some field, the characteristic of each class The various aspects of class and its characteristic of constraint and attribute are described, therefore body includes class, object properties and data attribute.It is interior Holding mark optimization method is realized based on body, builds body by ontology edit tool in computer systems, using certainly Downward methodological principle is pushed up, completes class, object properties, the structure of data attribute in the tool.
Structure individual simultaneously fills data property value:Individual is according to the example that existing class is established in body, structure individual It is the process that user is modeled according to document content to document;When filling individual data items attribute information, in each data category Property a corresponding text box, for inputting and showing the data attribute information;The value of data attribute is obtained from document marking , data attribute value will be used as according to full-text search acquisition key message by marking the property value that can not be obtained.
Adjust document marking and mark weights:Obtain the original markup information of document and the property value of above-mentioned individual filling With property value corresponding to document, the mark in document is adjusted;According to the rank of class where individual and data attribute Priority sets weights and as the new mark of document to property value, if property value is the original mark of document, original weights With existing weight number combining, then new and old mark is sorted according to weights, selects the high mark as document of weights.
Storage mark and mark weights:Original mark corresponding to document is deleted, by the mark and weights after above-mentioned middle adjustment Store in tables of data corresponding to mark;When other documents carry out content mark, the data in table are marked as factor of influence It is added in the weight computing formula of mark.
Input word and carry out knowledge query, matched data simultaneously sorts by weights:User is inquired about by knowledge query, when Individual is matched according to data attribute information, can be according to lookup property value power in a document when showing all information of the individual Value is ranked up, and display result can arrange Query Result according to the descending of weights.
The beneficial effects of the present invention are:
The weights marked in digital publication are calibrated by individual attribute information in body, improve the accurate of document marking Property, user can faster search effective document when carrying out knowledge query using ontology knowledge base;
The factor of influence of weight computing formula in other documents extraction mark will be used as by the mark after optimization, improve other The accuracy of electronic document marking.
The beneficial effects of the invention are as follows:
Markup information can be provided to digital publication to check, realize band mark preview and the reading method of digital publication, Can be helped reader the subject information fast and effectively checked in document.
Conceptual network can be established between electronic document simultaneously, the foundation of ontology library provides effective data supporting.
Brief description of the drawings
Fig. 1 is the flow chart of the embodiment of the present invention.
Embodiment
With reference to embodiment, the present invention is further illustrated:
A kind of digital publication semantic tagger optimization method as shown in Figure 1, it is characterised in that comprise the following steps:
Document content pre-processes:Document is parsed in computer systems, keyword is extracted using keyword extraction instrument, And the weights of keyword are calculated based on word position, provide data basis for subsequent builds individual.
Build ontology model:Body is the clear and definite expression of the formalization to the class in some field, the characteristic of each class The various aspects of class and its characteristic of constraint and attribute are described, therefore body includes class, object properties and data attribute.It is interior Holding mark optimization method is realized based on body, on condition that needing to build body by ontology edit tool, we are using certainly Downward methodological principle is pushed up, completes class, object properties, the structure of data attribute in the tool.
Structure individual simultaneously fills data property value:Individual is according to the example that existing class is established in body, structure individual It is the process that user is modeled according to document content to document;When filling individual data items attribute information, in each data category Property a corresponding text box, for inputting and showing the data attribute information;The value of data attribute is obtained from document marking , data attribute value will be used as according to full-text search acquisition key message by marking the property value that can not be obtained.
Adjust document marking and mark weights:Obtain the original markup information of document and the property value of above-mentioned individual filling With property value corresponding to document, the mark in document is adjusted;By the excellent of the rank of class where individual and data attribute First level is added in weight computing formula as weight, obtains the weights of property value and as the new mark of document, will be new Old mark is according to the high mark of weights sequencing selection weights and as the new mark of document.If property value is the original mark of document Note, then original weights and existing weight number combining, then sort new and old mark according to weights, select weights high as document Mark.
Storage mark and mark weights:Original mark corresponding to document is deleted, by the mark and weights after above-mentioned middle adjustment Store in tables of data corresponding to mark;When other documents carry out content mark, the data in table are marked as factor of influence It is added in the weight computing formula of mark.
Input word and carry out knowledge query, matched data simultaneously sorts by weights:User is inquired about by knowledge query, when Individual is matched according to data attribute information, can be according to lookup property value power in a document when showing all information of the individual Value is ranked up, and display result can arrange Query Result according to the descending of weights.
Protection scope of the present invention is not limited to the above embodiments, it is clear that those skilled in the art can be to this hair It is bright to carry out various changes and deformation without departing from scope and spirit of the present invention.If these changes and deformation belong to power of the present invention In the range of profit requirement and its equivalent technologies, then including the intent of the present invention is also changed and deformed comprising these.

Claims (3)

1. a kind of digital publication semantic tagger optimization method, it is characterised in that comprise the following steps:
Document content pre-processes:Document is parsed in computer systems, keyword is extracted using keyword extraction instrument, And the weights of keyword are calculated based on word position, provide data basis for subsequent builds individual;
Build ontology model:Body is built by ontology edit tool in computer systems, it is former using top-down method Then, class, object properties, the structure of data attribute are completed in the tool, form the sheet for including class, object properties and data attribute Body;
Structure individual simultaneously fills data property value:Individual is according to the example that existing class is established in body, and it is to use to build individual The process that family is modeled according to document content to document, and individual data items attribute is filled, data are obtained from document marking The value of attribute;
Adjust document marking and mark weights:Obtain the original markup information of document and the property value and category of above-mentioned individual filling Property document corresponding to value, the mark in document is adjusted, by the rank of class where individual and the priority of data attribute It is added to as weight in weight computing formula, obtains the weights of property value and as the new mark of document;
Storage mark and mark weights:Original mark corresponding to document is deleted, by the mark and weight storage after above-mentioned middle adjustment Into tables of data corresponding to mark;When other documents carry out content mark, the data marked in table add as factor of influence Into the weight computing formula of mark;
Input word and carry out knowledge query, matched data simultaneously sorts by weights:User is inquired about by knowledge query, works as basis Data attribute information matching individual, it can enter when showing all information of the individual according to the weights of property value in a document are searched Row sequence, display result can arrange Query Result according to the descending of weights.
A kind of 2. digital publication semantic tagger optimization method as claimed in claim 1, it is characterised in that:Described structure Body is simultaneously filled in data property value, and a text box is corresponded in each data attribute, for inputting and showing that the data attribute is believed Breath;The value of data attribute obtains from document marking, will be according to full-text search by marking the property value that can not be obtained Key message is obtained as data attribute value.
A kind of 3. digital publication semantic tagger optimization method as claimed in claim 1, it is characterised in that:Described adjustment text In shelves mark and mark weights, if property value is the original mark of document, original weights and existing weight number combining then will be new Old mark sorts according to weights, selects the high mark as document of weights.
CN201710926504.XA 2017-10-07 2017-10-07 A kind of digital publication semantic tagger optimization method Pending CN107818140A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710926504.XA CN107818140A (en) 2017-10-07 2017-10-07 A kind of digital publication semantic tagger optimization method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710926504.XA CN107818140A (en) 2017-10-07 2017-10-07 A kind of digital publication semantic tagger optimization method

Publications (1)

Publication Number Publication Date
CN107818140A true CN107818140A (en) 2018-03-20

Family

ID=61607773

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710926504.XA Pending CN107818140A (en) 2017-10-07 2017-10-07 A kind of digital publication semantic tagger optimization method

Country Status (1)

Country Link
CN (1) CN107818140A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109087223A (en) * 2018-08-03 2018-12-25 广州大学 A kind of educational resource model building method based on ontology

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109087223A (en) * 2018-08-03 2018-12-25 广州大学 A kind of educational resource model building method based on ontology

Similar Documents

Publication Publication Date Title
CN104866572B (en) A kind of network short text clustering method
CN108446367A (en) A kind of the packaging industry data search method and equipment of knowledge based collection of illustrative plates
CN104899273B (en) A kind of Web Personalization method based on topic and relative entropy
US20160179931A1 (en) System And Method For Supplementing Search Queries
US8832102B2 (en) Methods and apparatuses for clustering electronic documents based on structural features and static content features
CN103593425B (en) Preference-based intelligent retrieval method and system
CN103927358A (en) Text search method and system
CN103455487B (en) The extracting method and device of a kind of search term
CN105893641A (en) Job recommending method
CN103440329A (en) Authoritative author and high-quality paper recommending system and recommending method
CN109711925A (en) Cross-domain recommending data processing method, cross-domain recommender system with multiple auxiliary domains
CN102289523A (en) Method for intelligently extracting text labels
CN104484431A (en) Multi-source individualized news webpage recommending method based on field body
CN107665217A (en) A kind of vocabulary processing method and system for searching service
CN109711887A (en) Generation method, device, electronic equipment and the computer media of store recommendation list
CN107180075A (en) The label automatic generation method of text classification integrated level clustering
CN105975596A (en) Query expansion method and system of search engine
CN105843796A (en) Microblog emotional tendency analysis method and device
CN106547864A (en) A kind of Personalized search based on query expansion
CN104899229A (en) Swarm intelligence based behavior clustering system
CN105205163B (en) A kind of multi-level two sorting technique of the incremental learning of science and technology news
CN107679208A (en) A kind of searching method of picture, terminal device and storage medium
CN101241504A (en) Remote sense image data intelligent search method based on content
CN103425740A (en) IOT (Internet Of Things) faced material information retrieval method based on semantic clustering
CN103927177A (en) Characteristic-interface digraph establishment method based on LDA model and PageRank algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20180320

WD01 Invention patent application deemed withdrawn after publication