CN107818140A - A kind of digital publication semantic tagger optimization method - Google Patents
A kind of digital publication semantic tagger optimization method Download PDFInfo
- Publication number
- CN107818140A CN107818140A CN201710926504.XA CN201710926504A CN107818140A CN 107818140 A CN107818140 A CN 107818140A CN 201710926504 A CN201710926504 A CN 201710926504A CN 107818140 A CN107818140 A CN 107818140A
- Authority
- CN
- China
- Prior art keywords
- document
- mark
- weights
- data
- individual
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/34—Browsing; Visualisation therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Animal Behavior & Ethology (AREA)
- Machine Translation (AREA)
Abstract
The present invention provides a kind of digital publication semantic tagger optimization method.Comprise the following steps:Document content pre-processes;Build ontology model;Structure individual simultaneously fills data property value;Adjust document marking and mark weights;Storage mark and mark weights;Input word and carry out knowledge query, matched data simultaneously sorts by weights.The inventive method can improve the accuracy of document marking, and user can faster search effective document when carrying out knowledge query using ontology knowledge base, and can improve the accuracy of other associated electronic document markings.
Description
Technical field
The present invention relates to digital publication technical field, more particularly to a kind of digital publication semantic tagger optimization side
Method.
Background technology
Knowledge processing is the inexorable trend of Information Technology Development, as to knowledge application requirement more and more higher, traditional knows
The needs of new can not be met by knowing Database Systems, so body is referred in knowledge engineering, by body relative theory skill
Art is applied in the exploitation of knowledge base.
Ontology knowledge system is later stage the 1970s, expert system, knowledge system and knowledge-intensive information system
The constructing technology of system develops and forms knowledge engineering, and the system established is referred to as knowledge system(knowledge-based
systems).Knowledge system is the most important industrialization of artificial intelligence subject and commercialization product.Knowledge system is used to aid in people
Carry out problem solving, such as detect credit card fraud, accelerate Ship Design, assisted medical diagnosis, make scientific software more intelligent
Change, financial service, the evaluation of product quality and advertising are provided to F/O, support that the service of electric network is extensive
It is multiple.
With the continuous development of digital publishing, the explosive growth of modern the Internet digital content resource, also go out at this stage
The technology that some contents to electronic publication refine mark is showed, but these extractions to content mark are according to basic word
Storehouse and content context extract.The notation methods of this extraction are not bound with the domain background of publication, have a lot
The related key message in field is filtered, and reduces the accuracy for being labeled in specific area;So that mark can not completely represent
The core and main contents of document.
When the information to the field is retrieved according to mark, can make to have in terms of information recall ratio and precision ratio very big
Shortcoming, does not make full use of content markup information, and the relation and structure between information also do not show sufficiently so that use
Family needs to take a significant amount of time on information sifting.
The content of the invention
The technical problems to be solved by the invention are the above-mentioned technical deficiencies of face, there is provided one kind can improve document
The accuracy of mark, user can faster search effective document, and energy when carrying out knowledge query using ontology knowledge base
Improve a kind of digital publication semantic tagger optimization method of the accuracy of other associated electronic document markings.
The technical solution adopted for the present invention to solve the technical problems is:
A kind of digital publication semantic tagger optimization method, it is characterised in that comprise the following steps:
Document content pre-processes:Document is parsed in computer systems, keyword is extracted using keyword extraction instrument,
And the weights for calculating keyword based on word position, provide data basis for subsequent builds individual.
Build ontology model:Body is the clear and definite expression of the formalization to the class in some field, the characteristic of each class
The various aspects of class and its characteristic of constraint and attribute are described, therefore body includes class, object properties and data attribute.It is interior
Holding mark optimization method is realized based on body, builds body by ontology edit tool in computer systems, using certainly
Downward methodological principle is pushed up, completes class, object properties, the structure of data attribute in the tool.
Structure individual simultaneously fills data property value:Individual is according to the example that existing class is established in body, structure individual
It is the process that user is modeled according to document content to document;When filling individual data items attribute information, in each data category
Property a corresponding text box, for inputting and showing the data attribute information;The value of data attribute is obtained from document marking
, data attribute value will be used as according to full-text search acquisition key message by marking the property value that can not be obtained.
Adjust document marking and mark weights:Obtain the original markup information of document and the property value of above-mentioned individual filling
With property value corresponding to document, the mark in document is adjusted;According to the rank of class where individual and data attribute
Priority sets weights and as the new mark of document to property value, if property value is the original mark of document, original weights
With existing weight number combining, then new and old mark is sorted according to weights, selects the high mark as document of weights.
Storage mark and mark weights:Original mark corresponding to document is deleted, by the mark and weights after above-mentioned middle adjustment
Store in tables of data corresponding to mark;When other documents carry out content mark, the data in table are marked as factor of influence
It is added in the weight computing formula of mark.
Input word and carry out knowledge query, matched data simultaneously sorts by weights:User is inquired about by knowledge query, when
Individual is matched according to data attribute information, can be according to lookup property value power in a document when showing all information of the individual
Value is ranked up, and display result can arrange Query Result according to the descending of weights.
The beneficial effects of the present invention are:
The weights marked in digital publication are calibrated by individual attribute information in body, improve the accurate of document marking
Property, user can faster search effective document when carrying out knowledge query using ontology knowledge base;
The factor of influence of weight computing formula in other documents extraction mark will be used as by the mark after optimization, improve other
The accuracy of electronic document marking.
The beneficial effects of the invention are as follows:
Markup information can be provided to digital publication to check, realize band mark preview and the reading method of digital publication,
Can be helped reader the subject information fast and effectively checked in document.
Conceptual network can be established between electronic document simultaneously, the foundation of ontology library provides effective data supporting.
Brief description of the drawings
Fig. 1 is the flow chart of the embodiment of the present invention.
Embodiment
With reference to embodiment, the present invention is further illustrated:
A kind of digital publication semantic tagger optimization method as shown in Figure 1, it is characterised in that comprise the following steps:
Document content pre-processes:Document is parsed in computer systems, keyword is extracted using keyword extraction instrument,
And the weights of keyword are calculated based on word position, provide data basis for subsequent builds individual.
Build ontology model:Body is the clear and definite expression of the formalization to the class in some field, the characteristic of each class
The various aspects of class and its characteristic of constraint and attribute are described, therefore body includes class, object properties and data attribute.It is interior
Holding mark optimization method is realized based on body, on condition that needing to build body by ontology edit tool, we are using certainly
Downward methodological principle is pushed up, completes class, object properties, the structure of data attribute in the tool.
Structure individual simultaneously fills data property value:Individual is according to the example that existing class is established in body, structure individual
It is the process that user is modeled according to document content to document;When filling individual data items attribute information, in each data category
Property a corresponding text box, for inputting and showing the data attribute information;The value of data attribute is obtained from document marking
, data attribute value will be used as according to full-text search acquisition key message by marking the property value that can not be obtained.
Adjust document marking and mark weights:Obtain the original markup information of document and the property value of above-mentioned individual filling
With property value corresponding to document, the mark in document is adjusted;By the excellent of the rank of class where individual and data attribute
First level is added in weight computing formula as weight, obtains the weights of property value and as the new mark of document, will be new
Old mark is according to the high mark of weights sequencing selection weights and as the new mark of document.If property value is the original mark of document
Note, then original weights and existing weight number combining, then sort new and old mark according to weights, select weights high as document
Mark.
Storage mark and mark weights:Original mark corresponding to document is deleted, by the mark and weights after above-mentioned middle adjustment
Store in tables of data corresponding to mark;When other documents carry out content mark, the data in table are marked as factor of influence
It is added in the weight computing formula of mark.
Input word and carry out knowledge query, matched data simultaneously sorts by weights:User is inquired about by knowledge query, when
Individual is matched according to data attribute information, can be according to lookup property value power in a document when showing all information of the individual
Value is ranked up, and display result can arrange Query Result according to the descending of weights.
Protection scope of the present invention is not limited to the above embodiments, it is clear that those skilled in the art can be to this hair
It is bright to carry out various changes and deformation without departing from scope and spirit of the present invention.If these changes and deformation belong to power of the present invention
In the range of profit requirement and its equivalent technologies, then including the intent of the present invention is also changed and deformed comprising these.
Claims (3)
1. a kind of digital publication semantic tagger optimization method, it is characterised in that comprise the following steps:
Document content pre-processes:Document is parsed in computer systems, keyword is extracted using keyword extraction instrument,
And the weights of keyword are calculated based on word position, provide data basis for subsequent builds individual;
Build ontology model:Body is built by ontology edit tool in computer systems, it is former using top-down method
Then, class, object properties, the structure of data attribute are completed in the tool, form the sheet for including class, object properties and data attribute
Body;
Structure individual simultaneously fills data property value:Individual is according to the example that existing class is established in body, and it is to use to build individual
The process that family is modeled according to document content to document, and individual data items attribute is filled, data are obtained from document marking
The value of attribute;
Adjust document marking and mark weights:Obtain the original markup information of document and the property value and category of above-mentioned individual filling
Property document corresponding to value, the mark in document is adjusted, by the rank of class where individual and the priority of data attribute
It is added to as weight in weight computing formula, obtains the weights of property value and as the new mark of document;
Storage mark and mark weights:Original mark corresponding to document is deleted, by the mark and weight storage after above-mentioned middle adjustment
Into tables of data corresponding to mark;When other documents carry out content mark, the data marked in table add as factor of influence
Into the weight computing formula of mark;
Input word and carry out knowledge query, matched data simultaneously sorts by weights:User is inquired about by knowledge query, works as basis
Data attribute information matching individual, it can enter when showing all information of the individual according to the weights of property value in a document are searched
Row sequence, display result can arrange Query Result according to the descending of weights.
A kind of 2. digital publication semantic tagger optimization method as claimed in claim 1, it is characterised in that:Described structure
Body is simultaneously filled in data property value, and a text box is corresponded in each data attribute, for inputting and showing that the data attribute is believed
Breath;The value of data attribute obtains from document marking, will be according to full-text search by marking the property value that can not be obtained
Key message is obtained as data attribute value.
A kind of 3. digital publication semantic tagger optimization method as claimed in claim 1, it is characterised in that:Described adjustment text
In shelves mark and mark weights, if property value is the original mark of document, original weights and existing weight number combining then will be new
Old mark sorts according to weights, selects the high mark as document of weights.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710926504.XA CN107818140A (en) | 2017-10-07 | 2017-10-07 | A kind of digital publication semantic tagger optimization method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710926504.XA CN107818140A (en) | 2017-10-07 | 2017-10-07 | A kind of digital publication semantic tagger optimization method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107818140A true CN107818140A (en) | 2018-03-20 |
Family
ID=61607773
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710926504.XA Pending CN107818140A (en) | 2017-10-07 | 2017-10-07 | A kind of digital publication semantic tagger optimization method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107818140A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109087223A (en) * | 2018-08-03 | 2018-12-25 | 广州大学 | A kind of educational resource model building method based on ontology |
-
2017
- 2017-10-07 CN CN201710926504.XA patent/CN107818140A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109087223A (en) * | 2018-08-03 | 2018-12-25 | 广州大学 | A kind of educational resource model building method based on ontology |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104866572B (en) | A kind of network short text clustering method | |
CN108446367A (en) | A kind of the packaging industry data search method and equipment of knowledge based collection of illustrative plates | |
CN104899273B (en) | A kind of Web Personalization method based on topic and relative entropy | |
US20160179931A1 (en) | System And Method For Supplementing Search Queries | |
US8832102B2 (en) | Methods and apparatuses for clustering electronic documents based on structural features and static content features | |
CN103593425B (en) | Preference-based intelligent retrieval method and system | |
CN103927358A (en) | Text search method and system | |
CN103455487B (en) | The extracting method and device of a kind of search term | |
CN105893641A (en) | Job recommending method | |
CN103440329A (en) | Authoritative author and high-quality paper recommending system and recommending method | |
CN109711925A (en) | Cross-domain recommending data processing method, cross-domain recommender system with multiple auxiliary domains | |
CN102289523A (en) | Method for intelligently extracting text labels | |
CN104484431A (en) | Multi-source individualized news webpage recommending method based on field body | |
CN107665217A (en) | A kind of vocabulary processing method and system for searching service | |
CN109711887A (en) | Generation method, device, electronic equipment and the computer media of store recommendation list | |
CN107180075A (en) | The label automatic generation method of text classification integrated level clustering | |
CN105975596A (en) | Query expansion method and system of search engine | |
CN105843796A (en) | Microblog emotional tendency analysis method and device | |
CN106547864A (en) | A kind of Personalized search based on query expansion | |
CN104899229A (en) | Swarm intelligence based behavior clustering system | |
CN105205163B (en) | A kind of multi-level two sorting technique of the incremental learning of science and technology news | |
CN107679208A (en) | A kind of searching method of picture, terminal device and storage medium | |
CN101241504A (en) | Remote sense image data intelligent search method based on content | |
CN103425740A (en) | IOT (Internet Of Things) faced material information retrieval method based on semantic clustering | |
CN103927177A (en) | Characteristic-interface digraph establishment method based on LDA model and PageRank algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20180320 |
|
WD01 | Invention patent application deemed withdrawn after publication |