CN107818140A

CN107818140A - A kind of digital publication semantic tagger optimization method

Info

Publication number: CN107818140A
Application number: CN201710926504.XA
Authority: CN
Inventors: 艾顺刚; 张蕾; 何卫东; 姚进德; 孙晓翠; 王林林; 刘焱
Original assignee: Jiangsu Retech Digital Industry Park Co Ltd
Current assignee: Jiangsu Retech Digital Industry Park Co Ltd
Priority date: 2017-10-07
Filing date: 2017-10-07
Publication date: 2018-03-20

Abstract

The present invention provides a kind of digital publication semantic tagger optimization method.Comprise the following steps：Document content pre-processes；Build ontology model；Structure individual simultaneously fills data property value；Adjust document marking and mark weights；Storage mark and mark weights；Input word and carry out knowledge query, matched data simultaneously sorts by weights.The inventive method can improve the accuracy of document marking, and user can faster search effective document when carrying out knowledge query using ontology knowledge base, and can improve the accuracy of other associated electronic document markings.

Description

A kind of digital publication semantic tagger optimization method

Technical field

The present invention relates to digital publication technical field, more particularly to a kind of digital publication semantic tagger optimization side Method.

Background technology

Knowledge processing is the inexorable trend of Information Technology Development, as to knowledge application requirement more and more higher, traditional knows The needs of new can not be met by knowing Database Systems, so body is referred in knowledge engineering, by body relative theory skill Art is applied in the exploitation of knowledge base.

Ontology knowledge system is later stage the 1970s, expert system, knowledge system and knowledge-intensive information system The constructing technology of system develops and forms knowledge engineering, and the system established is referred to as knowledge system（knowledge-based systems）.Knowledge system is the most important industrialization of artificial intelligence subject and commercialization product.Knowledge system is used to aid in people Carry out problem solving, such as detect credit card fraud, accelerate Ship Design, assisted medical diagnosis, make scientific software more intelligent Change, financial service, the evaluation of product quality and advertising are provided to F/O, support that the service of electric network is extensive It is multiple.

With the continuous development of digital publishing, the explosive growth of modern the Internet digital content resource, also go out at this stage The technology that some contents to electronic publication refine mark is showed, but these extractions to content mark are according to basic word Storehouse and content context extract.The notation methods of this extraction are not bound with the domain background of publication, have a lot The related key message in field is filtered, and reduces the accuracy for being labeled in specific area；So that mark can not completely represent The core and main contents of document.

When the information to the field is retrieved according to mark, can make to have in terms of information recall ratio and precision ratio very big Shortcoming, does not make full use of content markup information, and the relation and structure between information also do not show sufficiently so that use Family needs to take a significant amount of time on information sifting.

The content of the invention

The technical problems to be solved by the invention are the above-mentioned technical deficiencies of face, there is provided one kind can improve document The accuracy of mark, user can faster search effective document, and energy when carrying out knowledge query using ontology knowledge base Improve a kind of digital publication semantic tagger optimization method of the accuracy of other associated electronic document markings.

The technical solution adopted for the present invention to solve the technical problems is：

A kind of digital publication semantic tagger optimization method, it is characterised in that comprise the following steps：

Document content pre-processes：Document is parsed in computer systems, keyword is extracted using keyword extraction instrument, And the weights for calculating keyword based on word position, provide data basis for subsequent builds individual.

Build ontology model：Body is the clear and definite expression of the formalization to the class in some field, the characteristic of each class The various aspects of class and its characteristic of constraint and attribute are described, therefore body includes class, object properties and data attribute.It is interior Holding mark optimization method is realized based on body, builds body by ontology edit tool in computer systems, using certainly Downward methodological principle is pushed up, completes class, object properties, the structure of data attribute in the tool.

Structure individual simultaneously fills data property value：Individual is according to the example that existing class is established in body, structure individual It is the process that user is modeled according to document content to document；When filling individual data items attribute information, in each data category Property a corresponding text box, for inputting and showing the data attribute information；The value of data attribute is obtained from document marking , data attribute value will be used as according to full-text search acquisition key message by marking the property value that can not be obtained.

Adjust document marking and mark weights：Obtain the original markup information of document and the property value of above-mentioned individual filling With property value corresponding to document, the mark in document is adjusted；According to the rank of class where individual and data attribute Priority sets weights and as the new mark of document to property value, if property value is the original mark of document, original weights With existing weight number combining, then new and old mark is sorted according to weights, selects the high mark as document of weights.

Storage mark and mark weights：Original mark corresponding to document is deleted, by the mark and weights after above-mentioned middle adjustment Store in tables of data corresponding to mark；When other documents carry out content mark, the data in table are marked as factor of influence It is added in the weight computing formula of mark.

Input word and carry out knowledge query, matched data simultaneously sorts by weights：User is inquired about by knowledge query, when Individual is matched according to data attribute information, can be according to lookup property value power in a document when showing all information of the individual Value is ranked up, and display result can arrange Query Result according to the descending of weights.

The beneficial effects of the present invention are：

The weights marked in digital publication are calibrated by individual attribute information in body, improve the accurate of document marking Property, user can faster search effective document when carrying out knowledge query using ontology knowledge base；

The factor of influence of weight computing formula in other documents extraction mark will be used as by the mark after optimization, improve other The accuracy of electronic document marking.

The beneficial effects of the invention are as follows：

Markup information can be provided to digital publication to check, realize band mark preview and the reading method of digital publication, Can be helped reader the subject information fast and effectively checked in document.

Conceptual network can be established between electronic document simultaneously, the foundation of ontology library provides effective data supporting.

Brief description of the drawings

Fig. 1 is the flow chart of the embodiment of the present invention.

Embodiment

With reference to embodiment, the present invention is further illustrated：

A kind of digital publication semantic tagger optimization method as shown in Figure 1, it is characterised in that comprise the following steps：

Document content pre-processes：Document is parsed in computer systems, keyword is extracted using keyword extraction instrument, And the weights of keyword are calculated based on word position, provide data basis for subsequent builds individual.

Build ontology model：Body is the clear and definite expression of the formalization to the class in some field, the characteristic of each class The various aspects of class and its characteristic of constraint and attribute are described, therefore body includes class, object properties and data attribute.It is interior Holding mark optimization method is realized based on body, on condition that needing to build body by ontology edit tool, we are using certainly Downward methodological principle is pushed up, completes class, object properties, the structure of data attribute in the tool.

Adjust document marking and mark weights：Obtain the original markup information of document and the property value of above-mentioned individual filling With property value corresponding to document, the mark in document is adjusted；By the excellent of the rank of class where individual and data attribute First level is added in weight computing formula as weight, obtains the weights of property value and as the new mark of document, will be new Old mark is according to the high mark of weights sequencing selection weights and as the new mark of document.If property value is the original mark of document Note, then original weights and existing weight number combining, then sort new and old mark according to weights, select weights high as document Mark.

Protection scope of the present invention is not limited to the above embodiments, it is clear that those skilled in the art can be to this hair It is bright to carry out various changes and deformation without departing from scope and spirit of the present invention.If these changes and deformation belong to power of the present invention In the range of profit requirement and its equivalent technologies, then including the intent of the present invention is also changed and deformed comprising these.

Claims

1. a kind of digital publication semantic tagger optimization method, it is characterised in that comprise the following steps：

Document content pre-processes：Document is parsed in computer systems, keyword is extracted using keyword extraction instrument, And the weights of keyword are calculated based on word position, provide data basis for subsequent builds individual；

Build ontology model：Body is built by ontology edit tool in computer systems, it is former using top-down method Then, class, object properties, the structure of data attribute are completed in the tool, form the sheet for including class, object properties and data attribute Body；

Structure individual simultaneously fills data property value：Individual is according to the example that existing class is established in body, and it is to use to build individual The process that family is modeled according to document content to document, and individual data items attribute is filled, data are obtained from document marking The value of attribute；

Adjust document marking and mark weights：Obtain the original markup information of document and the property value and category of above-mentioned individual filling Property document corresponding to value, the mark in document is adjusted, by the rank of class where individual and the priority of data attribute It is added to as weight in weight computing formula, obtains the weights of property value and as the new mark of document；

Storage mark and mark weights：Original mark corresponding to document is deleted, by the mark and weight storage after above-mentioned middle adjustment Into tables of data corresponding to mark；When other documents carry out content mark, the data marked in table add as factor of influence Into the weight computing formula of mark；

Input word and carry out knowledge query, matched data simultaneously sorts by weights：User is inquired about by knowledge query, works as basis Data attribute information matching individual, it can enter when showing all information of the individual according to the weights of property value in a document are searched Row sequence, display result can arrange Query Result according to the descending of weights.

A kind of 2. digital publication semantic tagger optimization method as claimed in claim 1, it is characterised in that：Described structure Body is simultaneously filled in data property value, and a text box is corresponded in each data attribute, for inputting and showing that the data attribute is believed Breath；The value of data attribute obtains from document marking, will be according to full-text search by marking the property value that can not be obtained Key message is obtained as data attribute value.

A kind of 3. digital publication semantic tagger optimization method as claimed in claim 1, it is characterised in that：Described adjustment text In shelves mark and mark weights, if property value is the original mark of document, original weights and existing weight number combining then will be new Old mark sorts according to weights, selects the high mark as document of weights.