CN104750836B - Digital publication semantic tagger optimization method based on ontology - Google Patents

Digital publication semantic tagger optimization method based on ontology Download PDF

Info

Publication number
CN104750836B
CN104750836B CN201510156576.1A CN201510156576A CN104750836B CN 104750836 B CN104750836 B CN 104750836B CN 201510156576 A CN201510156576 A CN 201510156576A CN 104750836 B CN104750836 B CN 104750836B
Authority
CN
China
Prior art keywords
document
weight
mark
data
ontology
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510156576.1A
Other languages
Chinese (zh)
Other versions
CN104750836A (en
Inventor
刘永坚
白立华
杨朝阳
曾瑞
李文忠
杨慧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Epoch Publish Medium Inc Co
Wuhan University of Technology WUT
Original Assignee
Epoch Publish Medium Inc Co
Wuhan University of Technology WUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Epoch Publish Medium Inc Co, Wuhan University of Technology WUT filed Critical Epoch Publish Medium Inc Co
Priority to CN201510156576.1A priority Critical patent/CN104750836B/en
Publication of CN104750836A publication Critical patent/CN104750836A/en
Application granted granted Critical
Publication of CN104750836B publication Critical patent/CN104750836B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention provides a kind of digital publication semantic tagger optimization method based on ontology.Include the following steps: that document content pre-processes;Construct ontology model;Building individual simultaneously fills data attribute value;Adjust document marking and mark weight;Storage mark and mark weight;It inputs word and carries out knowledge query, matched data simultaneously sorts by weight.The method of the present invention can be improved the accuracy of document marking, and user can faster search effective document when carrying out knowledge query using ontology knowledge base, and can improve the accuracy of other associated electronic document markings.

Description

Digital publication semantic tagger optimization method based on ontology
Technical field
The present invention relates to digital publication technical field more particularly to a kind of digital publication semanteme marks based on ontology Infuse optimization method.
Background technique
Knowledge processing is the inexorable trend of Information Technology Development, and with higher and higher to knowledge application requirement, traditional knows Know Database Systems and be no longer satisfied new demand, so ontology is referred in knowledge engineering, by ontology relative theory skill Art applies in the exploitation of knowledge base.
Ontology knowledge system is later period the 1970s, expert system, knowledge system and knowledge-intensive information system The constructing technology of system develops and forms knowledge engineering, and the system established is referred to as knowledge system (knowledge-based Systems).Knowledge system is the most important industrialization of artificial intelligence subject and commercialization product.Knowledge system is for assisting people Carry out problem solving, such as detect credit card fraud, accelerate Ship Design, assisted medical diagnosis, keep scientific software more intelligent Change, financial service, the evaluation of product quality and advertising are provided to F/O, support the service of electric network extensive It is multiple.
With the continuous development of digital publishing, the explosive growth of modern the Internet digital content resource also goes out at this stage The content for having showed some pairs of electronic publications refines the technology of mark, but these are according to basic word to the extraction of content mark Library and content context extract.The notation methods of this extraction are not bound with the domain background of publication, have very much The relevant key message in field is filtered, and reduces the accuracy for being labeled in specific area;So that mark cannot completely represent The core and main contents of document.
When the information to the field is retrieved according to mark, can make to have in terms of information recall ratio and precision ratio very big Shortcoming, does not make full use of content markup information, and the relationship and structure between information also do not show adequately, so that with Family needs to take a significant amount of time on information sifting.
Summary of the invention
The technical problem to be solved by the present invention is to the above-mentioned technical deficiencies of face, and providing one kind can be improved document The accuracy of mark, user can faster search effective document, and energy when carrying out knowledge query using ontology knowledge base Improve the digital publication semantic tagger optimization method based on ontology of the accuracy of other associated electronic document markings.
The technical solution adopted by the present invention to solve the technical problems is:
Digital publication semantic tagger optimization method based on ontology, which comprises the steps of:
Document content pretreatment: in computer systems parsing document, is extracted and is closed using keyword extraction tool Key word, and the weight for calculating keyword based on word position, provide data basis for subsequent builds individual.
Construct ontology model: ontology is the specific expression of the formalization to the class in some field, the characteristic of each class The various aspects of class and its characteristic of constraint and attribute are described, therefore ontology includes class, object properties and data attribute.It is interior Holding mark optimization method is realized based on ontology, constructs ontology by ontology construction tool in computer systems, using certainly Downward methodological principle is pushed up, completes the building of class, object properties, data attribute in the tool.
Building individual simultaneously fills data attribute value: individual is the example established according to class existing in ontology, building individual It is the process that user models document according to document content;When filling individual data items attribute information, in each data category Property a corresponding text box, for inputting and showing the data attribute information;The value of data attribute is obtained from document marking , by marking the attribute value that can not be obtained key message will be obtained as data attribute value according to full-text search.
It adjusts document marking and marks weight: obtaining the attribute value of the original markup information of document and above-mentioned individual filling Document corresponding with attribute value, is adjusted the mark in document;According to the rank of class where individual and data attribute Priority is to attribute value setting weight and as the new mark of document, if attribute value is the original mark of document, original weight With existing weight number combining, then new and old mark is sorted according to weight, the mark for selecting weight high as document.
Storage mark and mark weight: deleting the corresponding original mark of document, will mark and weight adjusted among the above It stores in the corresponding tables of data of mark;When other documents carry out content mark, the data in table are marked as impact factor It is added in the weight computing formula of mark.
It inputting word and carries out knowledge query, matched data simultaneously sorts by weight: user is inquired by knowledge query, when Individual is matched according to data attribute information, it can be according to the power of lookup attribute value in a document when showing all information of the individual Value is ranked up, and display result can arrange query result according to the descending of weight.
The beneficial effects of the present invention are:
The weight marked in digital publication is calibrated by individual attribute information in ontology, improves the standard of document marking True property, user can faster search effective document when carrying out knowledge query using ontology knowledge base;
The impact factor of weight computing formula in mark will be extracted as other documents by the mark after optimization, improved The accuracy of other electronic document markings.
The beneficial effects of the present invention are:
Markup information can be provided to digital publication to check, realize band mark preview and the reading side of digital publication Formula, can help reader the subject information quickly and effectively checked in document.
Conceptual network can be established between electronic document simultaneously, the foundation of ontology library provides effective data supporting.
Detailed description of the invention
Fig. 1 is the flow chart of the embodiment of the present invention.
Specific embodiment
With reference to embodiment, the present invention is further illustrated:
Digital publication semantic tagger optimization method based on ontology as shown in Figure 1, which is characterized in that including walking as follows It is rapid:
Document content pretreatment: in computer systems parsing document, is extracted and is closed using keyword extraction tool Key word, and the weight of keyword is calculated based on word position, data basis is provided for subsequent builds individual.
Construct ontology model: ontology is the specific expression of the formalization to the class in some field, the characteristic of each class The various aspects of class and its characteristic of constraint and attribute are described, therefore ontology includes class, object properties and data attribute.It is interior Holding mark optimization method is realized based on ontology, on condition that needing to construct ontology by ontology construction tool, we are using certainly Downward methodological principle is pushed up, completes the building of class, object properties, data attribute in the tool.
Building individual simultaneously fills data attribute value: individual is the example established according to class existing in ontology, building individual It is the process that user models document according to document content;When filling individual data items attribute information, in each data category Property a corresponding text box, for inputting and showing the data attribute information;The value of data attribute is obtained from document marking , by marking the attribute value that can not be obtained key message will be obtained as data attribute value according to full-text search.
It adjusts document marking and marks weight: obtaining the attribute value of the original markup information of document and above-mentioned individual filling Document corresponding with attribute value, is adjusted the mark in document;By the excellent of the rank of class where individual and data attribute First grade is added in weight computing formula as weight, obtains the weight of attribute value and the new mark as document, will be new Old mark is according to the high mark of weight sequencing selection weight and as the new mark of document.If attribute value is the original mark of document Note, then original weight and existing weight number combining, then new and old mark is sorted, select weight high as document according to weight Mark.
Storage mark and mark weight: deleting the corresponding original mark of document, will mark and weight adjusted among the above It stores in the corresponding tables of data of mark;When other documents carry out content mark, the data in table are marked as impact factor It is added in the weight computing formula of mark.
It inputting word and carries out knowledge query, matched data simultaneously sorts by weight: user is inquired by knowledge query, when Individual is matched according to data attribute information, it can be according to the power of lookup attribute value in a document when showing all information of the individual Value is ranked up, and display result can arrange query result according to the descending of weight.
Protection scope of the present invention is not limited to the above embodiments, it is clear that those skilled in the art can be to this hair It is bright to carry out various changes and deformation without departing from scope and spirit of the present invention.If these changes and deformation belong to power of the present invention In the range of benefit requirement and its equivalent technologies, then including the intent of the present invention also includes these changes and deforms.

Claims (1)

1. the digital publication semantic tagger optimization method based on ontology, which comprises the steps of:
Document content pretreatment: in computer systems parsing document, extracts keyword using keyword extraction tool, And the weight of keyword is calculated based on word position, data basis is provided for subsequent builds individual;
It constructs ontology model: ontology being constructed by ontology construction tool in computer systems, it is former using top-down method Then, the building of class, object properties, data attribute is completed in the tool, and composition includes the sheet of class, object properties and data attribute Body;
Building individual simultaneously fills data attribute value: individual is the example established according to class existing in ontology, and building individual is to use The process that family models document according to document content, and individual data items attribute is filled, data are obtained from document marking The value of attribute;
It adjusts document marking and marks weight: obtaining the attribute value and category of the original markup information of document and above-mentioned individual filling Property the corresponding document of value, the mark in document is adjusted, by the rank of class where individual and the priority of data attribute It is added in weight computing formula as weight, obtains the weight of attribute value and the new mark as document;
Storage mark and mark weight: deleting the corresponding original mark of document, will mark and weight storage adjusted among the above Into the corresponding tables of data of mark;When other documents carry out content mark, the data marked in table are added as impact factor Into the weight computing formula of mark;
It inputs word and carries out knowledge query, matched data simultaneously sorts by weight: user is inquired by knowledge query, works as basis Data attribute information matching individual, when showing all information of the individual can according to lookup attribute value weight in a document into Row sequence, display result can arrange query result according to the descending of weight;
The described building individual is simultaneously filled in data attribute value, in the corresponding text box of each data attribute, for inputting and Show the data attribute information;The value of data attribute is obtained from document marking, by marking the attribute value that can not be obtained Key message will be obtained according to full-text search as data attribute value;
In the adjustment document marking and mark weight, if attribute value is the original mark of document, original weight and existing Then weight number combining sorts new and old mark according to weight, the mark for selecting weight high as document.
CN201510156576.1A 2015-04-03 2015-04-03 Digital publication semantic tagger optimization method based on ontology Active CN104750836B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510156576.1A CN104750836B (en) 2015-04-03 2015-04-03 Digital publication semantic tagger optimization method based on ontology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510156576.1A CN104750836B (en) 2015-04-03 2015-04-03 Digital publication semantic tagger optimization method based on ontology

Publications (2)

Publication Number Publication Date
CN104750836A CN104750836A (en) 2015-07-01
CN104750836B true CN104750836B (en) 2019-04-26

Family

ID=53590520

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510156576.1A Active CN104750836B (en) 2015-04-03 2015-04-03 Digital publication semantic tagger optimization method based on ontology

Country Status (1)

Country Link
CN (1) CN104750836B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107704636A (en) * 2017-11-09 2018-02-16 安徽教育网络出版有限公司 A kind of dynamic digital publishing system

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103593335A (en) * 2013-09-05 2014-02-19 姜赢 Chinese semantic proofreading method based on ontology consistency verification and reasoning

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060277176A1 (en) * 2005-06-01 2006-12-07 Mydrew Inc. System, method and apparatus of constructing user knowledge base for the purpose of creating an electronic marketplace over a public network
KR100966651B1 (en) * 2008-01-16 2010-06-29 재단법인서울대학교산학협력재단 Ontology-based Semantic Annotation System and Method thereof

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103593335A (en) * 2013-09-05 2014-02-19 姜赢 Chinese semantic proofreading method based on ontology consistency verification and reasoning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
关联数据和知识表示的自动语义标注技术;谢铭;《中国博士学位论文全文数据库 信息科技辑》;20130615(第06期);I138-65

Also Published As

Publication number Publication date
CN104750836A (en) 2015-07-01

Similar Documents

Publication Publication Date Title
CN104899304B (en) Name entity recognition method and device
CN104361102B (en) A kind of expert recommendation method and system based on group matches
CN104537116B (en) A kind of books searching method based on label
CN106649272B (en) A kind of name entity recognition method based on mixed model
CN107526799A (en) A kind of knowledge mapping construction method based on deep learning
CN107705066A (en) Information input method and electronic equipment during a kind of commodity storage
CN104809176A (en) Entity relationship extracting method of Zang language
CN106708966A (en) Similarity calculation-based junk comment detection method
CN103455487B (en) The extracting method and device of a kind of search term
CN105718579A (en) Information push method based on internet-surfing log mining and user activity recognition
CN104899273A (en) Personalized webpage recommendation method based on topic and relative entropy
CN104484431A (en) Multi-source individualized news webpage recommending method based on field body
CN106547864B (en) A kind of Personalized search based on query expansion
CN106484829B (en) A kind of foundation and microblogging diversity search method of microblogging order models
CN105389329B (en) A kind of open source software recommended method based on community review
CN103823893A (en) User comment-based product search method and system
CN109710851A (en) Employment recommended method and system based on multi-source data analysis under the Internet model
CN105426529A (en) Image retrieval method and system based on user search intention positioning
TW202001620A (en) Automatic website data collection method using a complex semantic computing model to form a seed vocabulary data set
CN105975596A (en) Query expansion method and system of search engine
CN109711887A (en) Generation method and device of mall recommendation list, electronic equipment and computer medium
CN108090223B (en) Openers portrait method based on internet information
CN105843796A (en) Microblog emotional tendency analysis method and device
CN104899231A (en) Sentiment analysis engine based on fine-granularity attributive classification
CN104866554A (en) Personalized searching method and system on basis of social annotation

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CB03 Change of inventor or designer information
CB03 Change of inventor or designer information

Inventor after: Liu Yongjian

Inventor after: Bai Lihua

Inventor after: Wei Min

Inventor after: Wu Lei

Inventor after: Yang Chaoyang

Inventor after: Zeng Rui

Inventor after: Li Wenzhong

Inventor after: Yang Hui

Inventor before: Liu Yongjian

Inventor before: Bai Lihua

Inventor before: Yang Chaoyang

Inventor before: Zeng Rui

Inventor before: Li Wenzhong

Inventor before: Yang Hui