CN102332013A - OWL (ontology web language)-based Internet language ontology learning system - Google Patents

OWL (ontology web language)-based Internet language ontology learning system Download PDF

Info

Publication number
CN102332013A
CN102332013A CN201110270784A CN201110270784A CN102332013A CN 102332013 A CN102332013 A CN 102332013A CN 201110270784 A CN201110270784 A CN 201110270784A CN 201110270784 A CN201110270784 A CN 201110270784A CN 102332013 A CN102332013 A CN 102332013A
Authority
CN
China
Prior art keywords
owl
ontology
internet
storehouse
inference engine
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201110270784A
Other languages
Chinese (zh)
Inventor
王楠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN201110270784A priority Critical patent/CN102332013A/en
Publication of CN102332013A publication Critical patent/CN102332013A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Transfer Between Computers (AREA)

Abstract

The invention discloses an OWL (ontology web language)-based Internet language ontology learning system which is characterized by comprising a web page acquisition sub-system, an OWL ontology conversion sub-system, an OWL reasoning engine sub-system and an original document management sub-system, wherein the web page acquisition sub-system is responsible for acquiring Internet web pages and converting the Internet web pages to format text; the OWL ontology conversion sub-system is responsible for performing OWL ontology conversion on the format text and establishing an OWL ontology instance; the OWL reasoning engine sub-system is responsible for creating and maintaining an OWL reasoning engine, performing reasoning on the OWL ontology instance according to the OWL reasoning engine and establishing OWL knowledge description corresponding to the Internet web pages; and the original document management sub-system is responsible for saving and maintaining the Internet web pages acquired by the web page acquisition sub-system. The OWL-based Internet language ontology learning system can perform OWL analysis, extraction and conversion on massive information which is mainly based on HTML (hypertext markup language) or XML (extensible markup language) text of the existing Internet. The invention provides a brand-new ideal with great practical value for OWL Internet applications, thereby having very high practical value and broad application prospects.

Description

Internet language body learning system based on OWL
Technical field
The invention belongs to field of computer technology, be specifically related to a kind of learning-oriented internet Ontology Language converting system.
Background technology
Today, the internet has goed deep into the every nook and cranny of human society, and can predict it and will in the Development of Human Civilization process, play the part of more and more important role.Let computing machine " understand " internet, make the internet more intellectuality be human dream.At present, computing machine roughly can be divided into following several types to the processing mode of internet information:
1) internet information content and form are not done analysis, that is: receive internet information, store, inquire about, send or the like as information flow (byte).In this case, internet information mainly is some data, comprises numeral, character string, Media Stream etc.; The flow performing that computing machine designs according to the software engineer, each step all a pre-designed known state concentrate to judge current state, and determine next step implementation strategy.All implementation strategies all are that the software engineer designs, and input to computing machine in advance.
2) internet information content is done keyword to analyzing, this computer-chronograph to " understanding " of internet text is exactly: the text possibly comprise the keyword of some user's appointments.Computing machine is not really understood article content, and just the comparison on the code rank is handled and give the people all understanding tasks.Only on this level, just produce huge search engine industry, and brought up industry giants such as Google, Baidu.
3) internet information content and form are done simple semantic identification.At present, mainly adopt the RDF agreement that simple definition and parsing done in the semanteme between speech and the speech.For computing machine, all speech are character string still, and computing machine is not also known its semanteme, but know that some character string and other character string have binding (that is: structure) relation, can do simple reasoning according to these binding relationships thus.And for the understanding work of content, remain and accomplish by computed people.
4) attempt to construct the internet again with the structure of knowledge, that is: semantic internet, (Ontology of Web Language is called for short: OWL) set up semantic network mainly to adopt the internet Ontology Language of W3C.If everyone presses OWL and creates the internet, internet itself just becomes the computing machine structure of knowledge of " understanding " to a certain extent.Software engineers will be a series of inference rules of Computer Design and engine, on the OWL semantic network, let computing machine oneself " understanding " internet information content, and make right judgement and operation.
The development in future direction has been represented in the design of OWL.This is human to giving the computer internet structure of knowledge, and for creating the very important trial that computing machine can be read " understanding " internet information content.But,, that is: make up the complete OWL structure of knowledge if adopt traditional programmed method to realize this imagination; And then set up perfect inference rule; Then, powerful inference engine of structure on this inference rule basis, each step that all intermediatenesses that let computing machine construct in advance according to the software engineer are done out in the reasoning process is judged, thereby reads " understanding " internet information content.At first, this needs high wisdom and technique and skill, and clever again slip-stick artist also can't imagine the complicacy of whole internet knowledge; So, let all common website builders construct the semantic internet of OWL in this way and hardly maybe; Even if indivedual website slip-stick artists can accomplish, also be difficult to reach the level of large-scale application, let alone expedite the emergence of an industry.Secondly, the data on the internet are huge at present wants to convert thereof into the OWL form to estimating, and this is undoubtedly impossible mission basically.Therefore, it is not the internet that whole internet is all become the OWL language description that OWL moves towards practical key, but is that master's magnanimity information carries out OWL and analyzes, extracts and change to existing internet with html text or XML text.
 
Summary of the invention
The invention provides a kind of solution of the above problems, a kind of practical internet language body learning system based on OWL is provided.
The present invention adopt following mentality of designing let computing machine read to a certain extent " understanding " internet information content:
First; Identical with the OWL design concept is: the two will be the structure of knowledge on basis with OWL to one in computing machine all; But; The present invention does not plan to construct whole internet with OWL, but from the general internet text, extracts OWL information, ontology information implicit in the text is translated to the OWL form that is:.Computing machine utilizes and self compares and reasoning based on the structure of knowledge and the implicit ontology information of internet OWL of OWL, thereby realizes reading the purpose of " understanding " internet.
The second, different with traditional programmed method is that the present invention does not plan once to computing machine the input structure of knowledge, inference rule and inference engine; But the mode of employing machine learning comprises the interaction repeatedly with the slip-stick artist, progressively self-perfection; Go into numerously by letter, reach the level of large-scale application.
The 3rd; Not only can safeguard, upgrade, improve the internet ontology knowledge model of computing machine through study based on the internet langue leaning system of OWL; And can safeguard, upgrade, improve the inference rule and the inference engine that use this ontology knowledge storehouse, OWL internet, so that apply to neatly in all kinds of internet intelligent application.
Based on above thinking, the present invention provides a kind of internet language body learning system based on OWL, and it is characterized in that: it comprises:
The webpage acquisition subsystem is responsible for gathering internet web page and converting thereof into format text;
OWL body conversion subsystem is responsible for said internet web page is carried out the conversion of OWL body, sets up the OWL instances of ontology;
OWL inference engine subsystem is responsible for creating and safeguarding the OWL inference engine, and according to said OWL inference engine said OWL instances of ontology is carried out reasoning, sets up the corresponding OWL knowledge description of said internet web page;
The original document ADMINISTRATION SUBSYSTEM is responsible for preserving and safeguarding the said internet web page that said webpage acquisition subsystem is gathered.
Preferably; Said webpage acquisition subsystem comprises webpage acquisition module and text pre-processing module; Said webpage acquisition module is gathered the format text information in the said internet web page, and said text pre-processing module is extracted text, participle, row's fork, gone heavily to handle and the processing of grammer mark the data in the said format text information.Wherein, participle is exactly with inserting the space between speech in the Chinese sentence and the speech, separating speech and speech to English that kind; Row's qi is exactly that a word or a speech have the more than one meaning, is " ambiguity ", eliminates this ambiguity " row's qi "; Go heavily promptly: possibly gather identical article from different websites, identify them, only keep a; The grammer note be exactly on the basis of participle the speech of the part of speech of each speech and grammatical attribute mark on.
Preferably, said webpage acquisition module is gathered html format or the format text information of XML form in the said internet web page.
Preferably, said OWL body conversion subsystem comprises OWL modular converter, OWL transformation rule storehouse, OWL transformation rule maintenance module and OWL instances of ontology storehouse; Wherein, the OWL transformation rule is created and safeguarded to OWL transformation rule maintenance module through man-machine interaction; Create good OWL transformation rule and deposit OWL transformation rule storehouse in; The OWL modular converter carries out the OWL body according to the OWL transformation rule in the OWL transformation rule storehouse to said format text and is converted to and sets up the OWL instances of ontology, and deposits this OWL instances of ontology in OWL instances of ontology storehouse.
Preferably, it also comprises OWL modeling subsystem, and said OWL modeling subsystem comprises OWL MBM and OWL model bank; Said OWL MBM is created through man-machine interaction and is safeguarded the OWL ontology model in various fields, and will create the OWL ontology model of getting well and deposit the OWL model bank in.
Preferably; Said OWL modular converter is at first at the said OWL model bank retrieval OWL ontology model close with said internet web page field of living in, according to said OWL ontology model said format text carried out the OWL body then and is converted to and sets up the OWL instances of ontology.
Preferably, said OWL inference engine subsystem comprises OWL inference engine module, OWL inference engine maintenance module, OWL inference engine storehouse, OWL inference rule maintenance module and OWL inference rule storehouse; The OWL inference engine is created and safeguarded to said OWL inference engine maintenance module through man-machine interaction, and it is created good OWL inference engine and deposits OWL inference engine storehouse in; OWL inference rule is created and safeguarded to said OWL inference rule maintenance module through man-machine interaction, and it is created good OWL inference rule and deposits OWL inference rule storehouse in; Said OWL inference engine in the said OWL inference engine of the said OWL inference engine module invokes storehouse; And said OWL instances of ontology is carried out reasoning according to the OWL inference rule in the said OWL inference rule storehouse, obtain the corresponding OWL knowledge description of said internet web page according to the reasoning proposition.
Preferably, said original document ADMINISTRATION SUBSYSTEM comprises original document administration module and original document storehouse; Said original document administration module deposits said internet web page in the original document storehouse, and sets up the index between said internet web page and the said OWL knowledge description.
The internet that whole internet is not all become the OWL language description based on the internet language body learning system of OWL of the present invention; But be that master's magnanimity information carries out OWL and analyzes, extracts and change with HTML or XML text to existing internet; Convert thereof into the understandable OWL information content of computing machine, and constantly study, perfect in this process.The present invention provides a kind of brand-new thinking that has practical value for the OWL internet, applications, so it has very high practical value and wide application prospect.
Description of drawings
Fig. 1 is the principle framework figure of the internet language body learning system based on OWL of the present invention.
Embodiment
Following specific embodiments of the invention is described in further detail.
As shown in Figure 1, the workflow of the internet language body learning system based on OWL of the present invention is following:
1, the webpage acquisition module collects internet web page in the system; Extract the format text information of html format wherein or XML form; Extract text, participle, filter, go heavily, arrange processing such as qi, grammer mark through the text pre-processing module then, so that extract the OWL ontology information.The webpage acquisition module is exported to the OWL modular converter to pretreated format text, deposits urtext in internet original document storehouse through the original document administration module simultaneously, in order to calling.
2, the OWL modular converter receives pretreated format text, and it is resolved to the OWL instances of ontology of OWL ontology describing, and deposits it in OWL instances of ontology storehouse.The OWL modular converter need rely on OWL model bank and OWL transformation rule storehouse to accomplish the conversion of format text to the OWL instances of ontology.When going wrong in the OWL conversion, the OWL modular converter will be putd question to the human-computer interaction interface that OWL transformation rule maintenance module and OWL MBM provide, with operator's interaction.The operator accomplishes the process of OWL conversion through revising OWL model and OWL transformation rule to help the OWL modular converter.
3, the OWL transformation rule is configurable and safeguards.The operator creates, revises the OWL transformation rule through OWL transformation rule human-computer interaction interface, constantly improves the OWL transfer capability of this system.The OWL transformation rule deposits OWL transformation rule storehouse in, in order to calling.
4, the OWL ontology model is the core foundation of OWL conversion and OWL reasoning.Native system has designed establishment, the maintenance that special OWL modeling subsystem is realized the OWL ontology model.The problem and the demand that in OWL conversion and OWL reasoning, produce can be carried out artificial treatment through the human-computer interaction interface that the OWL modeling provides.OWL modeling subsystem will be in the repeatedly practising of OWL conversion and reasoning constantly study and correction OWL ontology model.The OWL ontology model deposits the OWL model bank in order to calling.
5, having OWL model bank and OWL instances of ontology storehouse is that computer system obtained can be by the content of " understanding ", and the behavior of " understanding " will be leaned on the OWL ontology inference.Only can carry out reasoning to the OWL body, computing machine is just calculated really and is read " to have understood " internet.The reasoning of this system is accomplished by the OWL inference engine, will use OWL ontology model, inference rule and internet OWL instances of ontology in the process; Could set up the corresponding OWL knowledge description of said internet web page at last, as a result of output.
6, OWL inference engine and inference rule can be created and maintenance according to the needs of practical application, thereby make this system can be used for internet intelligent process field widely neatly.Inference engine is responsible for by the inference engine maintenance module, and its result deposits the inference engine storehouse in.Inference rule inference rule maintenance module again is responsible for, and its result deposits the inference rule storehouse in.
7, the original document administration module deposits said internet web page in the original document storehouse, and sets up the index between internet web page and the OWL knowledge description.
Because inference engine storehouse, OWL model bank and OWL instances of ontology storehouse etc. are all constantly learnt and increased, thereby can improve constantly oneself conversion efficiency and conversion effect in the process of using.
Above embodiment is merely the present invention's a kind of embodiment wherein, and it describes comparatively concrete and detailed, but can not therefore be interpreted as the restriction to claim of the present invention.Should be pointed out that for the person of ordinary skill of the art under the prerequisite that does not break away from the present invention's design, can also make some distortion and improvement, these all belong to protection scope of the present invention.Therefore, the protection domain of patent of the present invention should be as the criterion with accompanying claims.

Claims (8)

1. based on the internet language body learning system of OWL, it is characterized in that: it comprises:
The webpage acquisition subsystem is responsible for gathering internet web page and converting thereof into format text;
OWL body conversion subsystem is responsible for said format text is carried out the conversion of OWL body, sets up the OWL instances of ontology;
OWL inference engine subsystem is responsible for creating and safeguarding the OWL inference engine, and according to said OWL inference engine said OWL instances of ontology is carried out reasoning, sets up the corresponding OWL knowledge description of said internet web page;
The original document ADMINISTRATION SUBSYSTEM is responsible for preserving and safeguarding the said internet web page that said webpage acquisition subsystem is gathered.
2. the internet language body learning system based on OWL according to claim 1; It is characterized in that: said webpage acquisition subsystem comprises webpage acquisition module and text pre-processing module; Said webpage acquisition module is gathered the format text information in the said internet web page, and said text pre-processing module is extracted text, participle, row's fork, gone heavy and the processing of grammer mark the data in the said format text information.
3. the internet language body learning system based on OWL according to claim 2, it is characterized in that: said webpage acquisition module is gathered the text message in the said internet web page.
4. the internet language body learning system based on OWL according to claim 1 is characterized in that: said OWL body conversion subsystem comprises OWL modular converter, OWL transformation rule storehouse, OWL transformation rule maintenance module and OWL instances of ontology storehouse; Wherein, the OWL transformation rule is created and safeguarded to said OWL transformation rule maintenance module through man-machine interaction; Create good OWL transformation rule and deposit OWL transformation rule storehouse in; Said OWL modular converter carries out the OWL body according to the OWL transformation rule in the OWL transformation rule storehouse to said format text and is converted to and sets up the OWL instances of ontology, and deposits this OWL instances of ontology in OWL instances of ontology storehouse.
5. the internet language body learning system based on OWL according to claim 4, it is characterized in that: it also comprises OWL modeling subsystem, said OWL modeling subsystem comprises OWL MBM and OWL model bank; Said OWL MBM is created through man-machine interaction and is safeguarded the OWL ontology model in various fields, and will create the OWL ontology model of getting well and deposit the OWL model bank in.
6. the internet language body learning system based on OWL according to claim 5; It is characterized in that: said OWL modular converter is at first at the said OWL model bank retrieval OWL ontology model close with said internet web page field of living in, according to said OWL ontology model said format text carried out the OWL body then and is converted to and sets up the OWL instances of ontology.
7. the internet language body learning system based on OWL according to claim 1 is characterized in that: said OWL inference engine subsystem comprises OWL inference engine module, OWL inference engine maintenance module, OWL inference engine storehouse, OWL inference rule maintenance module and OWL inference rule storehouse; The OWL inference engine is created and safeguarded to said OWL inference engine maintenance module through man-machine interaction, and it is created good OWL inference engine and deposits OWL inference engine storehouse in; OWL inference rule is created and safeguarded to said OWL inference rule maintenance module through man-machine interaction, and it is created good OWL inference rule and deposits OWL inference rule storehouse in; Said OWL inference engine in the said OWL inference engine of the said OWL inference engine module invokes storehouse; And said OWL instances of ontology is carried out reasoning according to the OWL inference rule in the said OWL inference rule storehouse, obtain the corresponding OWL knowledge description of said internet web page.
8. the internet language body learning system based on OWL according to claim 7, it is characterized in that: said original document ADMINISTRATION SUBSYSTEM comprises original document administration module and original document storehouse; Said original document administration module deposits said internet web page in the original document storehouse, and sets up the index between said internet web page and the said OWL knowledge description.
CN201110270784A 2011-09-14 2011-09-14 OWL (ontology web language)-based Internet language ontology learning system Pending CN102332013A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201110270784A CN102332013A (en) 2011-09-14 2011-09-14 OWL (ontology web language)-based Internet language ontology learning system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201110270784A CN102332013A (en) 2011-09-14 2011-09-14 OWL (ontology web language)-based Internet language ontology learning system

Publications (1)

Publication Number Publication Date
CN102332013A true CN102332013A (en) 2012-01-25

Family

ID=45483789

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201110270784A Pending CN102332013A (en) 2011-09-14 2011-09-14 OWL (ontology web language)-based Internet language ontology learning system

Country Status (1)

Country Link
CN (1) CN102332013A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102932417A (en) * 2012-09-28 2013-02-13 浪潮(北京)电子信息产业有限公司 Data storage method and device
CN106156143A (en) * 2015-04-13 2016-11-23 富士通株式会社 Page processor and web page processing method
CN108021703A (en) * 2017-12-26 2018-05-11 广西师范大学 A kind of talk formula intelligent tutoring system

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101251841A (en) * 2007-05-17 2008-08-27 华东师范大学 Method for establishing and searching feature matrix of Web document based on semantics

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101251841A (en) * 2007-05-17 2008-08-27 华东师范大学 Method for establishing and searching feature matrix of Web document based on semantics

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
丁剑飞等: "基于本体的分布式实例推理技术研究", 《计算机仿真》, vol. 25, no. 2, 28 February 2008 (2008-02-28) *
甘健候: "基于本体的语义Web知识发现及应用的研究", <中国优秀博硕士学位论文全文数据库(硕士)信息科技辑>, 15 March 2005 (2005-03-15), pages 139 - 186 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102932417A (en) * 2012-09-28 2013-02-13 浪潮(北京)电子信息产业有限公司 Data storage method and device
CN106156143A (en) * 2015-04-13 2016-11-23 富士通株式会社 Page processor and web page processing method
CN108021703A (en) * 2017-12-26 2018-05-11 广西师范大学 A kind of talk formula intelligent tutoring system

Similar Documents

Publication Publication Date Title
CN109271626A (en) Text semantic analysis method
Yong-Gui et al. Research on semantic Web mining
CN104268200A (en) Unsupervised named entity semantic disambiguation method based on deep learning
CN101710343A (en) Body automatic build system and method based on text mining
CN102591612B (en) General webpage text extraction method based on punctuation continuity and system thereof
CN101127042A (en) Sensibility classification method based on language model
CN103106227A (en) System and method of looking up new word based on webpage text
Parekh et al. Mining domain specific texts and glossaries to evaluate and enrich domain ontologies
CN103324700A (en) Noumenon concept attribute learning method based on Web information
CN109947921A (en) A kind of intelligent Answer System based on natural language processing
CN113312922B (en) Improved chapter-level triple information extraction method
CN104933162A (en) Method for converting CSV (Comma-Separated Values) data labeled by metadata to RDF (Resource Description Framework) data
CN102332013A (en) OWL (ontology web language)-based Internet language ontology learning system
Al-Arfaj et al. Towards ontology construction from Arabic texts-a proposed framework
CN102298639A (en) Ontology of web language (OWL)-based internet OWL transformator
CN102436467B (en) Self-learning type OWL (Ontology of Web Language) inference engine
Zhiwei et al. Research for information extraction based on wrapper model algorithm
CN102147731A (en) Automatic functional requirement extraction system based on extended functional requirement description framework
CN102521239B (en) Question-answering information matching system and method based on OWL (web ontology language) for Internet
CN102521241B (en) Semiautomatic learning type OWL (web ontology language) modeling system
Harrag et al. Quran intelligent ontology construction approach using association rules mining
Liu et al. An automatic mark-up approach for structured document retrieval in engineering design
CN102346772A (en) Directional acquisition system based on OWL (ontology web language) semantic analysis
Altenbek et al. Identification of basic phrases for kazakh language using maximum entropy model
Zhao et al. Construction of power industry corpus based on data mining and machine learning intelligent algorithm

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20120125