CN104090958A - Semantic information retrieval system and method based on domain ontology - Google Patents

Semantic information retrieval system and method based on domain ontology Download PDF

Info

Publication number
CN104090958A
CN104090958A CN201410329258.6A CN201410329258A CN104090958A CN 104090958 A CN104090958 A CN 104090958A CN 201410329258 A CN201410329258 A CN 201410329258A CN 104090958 A CN104090958 A CN 104090958A
Authority
CN
China
Prior art keywords
user
semantic
information
ontology library
inquiry request
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410329258.6A
Other languages
Chinese (zh)
Inventor
姬朝阳
姚林
陈雪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xuchang University
Original Assignee
Xuchang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xuchang University filed Critical Xuchang University
Priority to CN201410329258.6A priority Critical patent/CN104090958A/en
Publication of CN104090958A publication Critical patent/CN104090958A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides a semantic information retrieval system and method based on domain ontology, and relates to the field of intelligent information retrieval. The semantic information retrieval system and method based on the domain ontology can achieve individual recommendation of user retrieval requirements, optimize the retrieval result and improve the precision ratio. The method comprises the steps that the system carries out analysis processing on an initial inquiry request input by a user, and a final inquiry request is obtained; a network information file meeting the conditions is matched out from a resource base according to the final total inquiry request; analysis and semantic processing are carried out on the network information file with the help of a domain ontology base, and useless and unrelated information is filtered out; individual processing is carried out according to a user interest model, the information meeting the user individual requirements is found out, correlation sequencing is carried out on the retrieval results by combining semantic correlation degree and interest correlation degree, and the retrieval results subjected to sequencing processing are fed back to the user.

Description

A kind of semantic information retrieval system and method based on domain body
Technical field
The present invention relates to intelligent information retrieval field, relate in particular to a kind of semantic information retrieval system and method based on domain body.
Background technology
Along with the fast development of Internet and mobile communication technology, Web has become global information source, how from immense information resources, to find quickly and accurately own required information, becomes puzzlement user's a difficult problem.The retrieval mode that traditional information retrieval offers user is the retrieval mode mating with the keyword of user's input, but in most situation, this simple keyword coupling is difficult to understand the real retrieval object of user, therefore causes the degree of accuracy of current this information retrieval mode not high.
How fast, effectively obtain information needed, be the research topic with realistic meaning.Personalized and complicated along with the growth of Internet resources, customer information requirement, the information retrieval technique based on keyword has demonstrated the problem such as " information overload " and " information is isotropic " gradually.The result for retrieval that adopts existing Semantic Search to obtain, may include and the inconsistent a large amount of irrelevant informations of user interest degree more, increased user and screen from these irrelevant result for retrieval the workload of own information of interest.
Summary of the invention
Embodiments of the invention provide a kind of semantic information retrieval system and method based on domain body, can realize the personalized recommendation of user search demand, optimize result for retrieval and improve precision ratio.
For achieving the above object, embodiments of the invention adopt following technical scheme:
A semantic information retrieval system based on domain body, comprising:
User interface proxy module, for receiving the initial information inquiry request of user's input;
Individual demand analysis module, carry out lexical analysis for the described initial information inquiry request that described user interface proxy module is received, obtain the content of keyword level, then predict the interested content of described user based on field ontology library and in conjunction with user interest ontology library, form final inquiry request, and submit described final inquiry request to information search proxy module; Wherein, comprising in described field ontology library: the semantic relation network in every field between definition and each concept of the most basic concept, concept; Described user interest ontology library comprises: the interested information of user and excavate out according to user interest point its have the information of potential interest;
Described information search proxy module, for searching for the network information document relevant to described final inquiry request from resources bank, described resources bank comprises: local information resource, share in different areas resource;
Document analysis and semantic filtering module, for the network information document searching for described information search proxy module, based on described field ontology library, judge field under described network information document according to the keyword extracting, then convert described network information document to term in described field ontology library, obtain the network information document after conversion, simultaneously according to the semantic relevancy between the network information document after described final inquiry request and described conversion, filter out in described network information document and the incoherent information of user's inquiry request, obtain the result document after semantic filtering,
Personalisation process module, for the user knowledge of storing in conjunction with described user interest ontology library, assesses described result document, provides the degree of association of described result document and user interest;
Order module, for the described semantic relevancy obtaining according to described document analysis and semantic filtering module, and the described degree of association that obtains of described personalisation process module, described result document is optimized according to the increment type semantic sequence pattern mining algorithm of frequent sequence tree, finally the Query Result that meets user's request and interest is returned to user by user interface proxy module;
Described user interface proxy module, also for being shown to described user by described Query Result;
Described user interest ontology library, for the feedback information to described Query Result according to user, upgrades the content of described user's correspondence in user interest ontology library.
A semantic information retrieval method based on domain body, comprises the following steps:
101, receive the initial information inquiry request of user's input;
102, the described initial information inquiry request receiving is carried out to lexical analysis, obtain the content of keyword level, then predict the interested content of described user based on field ontology library and in conjunction with user interest ontology library, form final inquiry request; Wherein, wherein, comprising in described field ontology library: the semantic relation network in every field between definition and each concept of the most basic concept, concept; Described user interest ontology library comprises: the interested information of user and excavate out according to user interest point its have the information of potential interest;
103, the search network information document relevant to described final inquiry request from resources bank, described resources bank comprises: local information resource, share in different areas resource;
104, for described network information document, based on described field ontology library, judge field under described network information document according to the keyword extracting, then convert described network information document to term in described field ontology library, obtain the network information document after conversion, simultaneously according to the semantic relevancy between the network information document after described final inquiry request and described conversion, filter out in described network information document and the incoherent information of user's inquiry request, obtain the result document after semantic filtering;
105, in conjunction with the user knowledge of storing in described user interest ontology library, described result document is assessed, provided the degree of association of described result document and user interest;
106, according to described semantic relevancy and the described degree of association, described result document is optimized according to the increment type semantic sequence pattern mining algorithm of frequent sequence tree, finally the Query Result that meets user's request and interest is returned to user;
107, the feedback information to described Query Result according to user, upgrades the content of described user's correspondence in user interest ontology library.
Optionally, the increment type semantic sequence pattern mining algorithm of frequent sequence tree, comprising:
(1) semantic sequence database changes, and minimum support is not less than in the situation of frequent sequence tree support threshold value:
First find out the set I-db of all items in newly-increased semantic sequence database db; The sequence that comprises the item in I-db in original semantic sequence database is formed to new semantic sequence database, remove simultaneously and in sequence, be not included in I-db middle term; Then to described new semantic sequence database and described newly-increased semantic sequence database construction data for projection storehouse, find all sequences pattern that meets support threshold value, frequent sequence tree is upgraded;
(2) semantic sequence database changes, and minimum support is less than in the situation of frequent sequence tree support threshold value:
To original semantic sequence database construction data for projection storehouse, find support to be not less than the given support of user, and be less than all sequences pattern of frequent sequence tree support threshold value; All sequence patterns and support information thereof are stored in frequent sequence tree and a frequent sequence tree support threshold value and is made as the given support of user; Then, process by the situation described in (1).
The system and method that technique scheme provides, by the initial query request of user's input is carried out to analyzing and processing, obtains final inquiry request.Then from resources bank, match qualified network information document according to last total inquiry request.Under the help of field ontology library, network information document is analyzed and semantic processes, filtered out useless incoherent information.Carry out personalisation process according to user interest ontology library again, find out the information that meets users ' individualized requirement, finally, in conjunction with semantic relevancy and the interest degree of correlation, result for retrieval is carried out to relevance ranking, and result for retrieval after treatment sequence is returned to user.Like this by the personalized recommendation that can realize user search demand that uses to user interest ontology library, and be optimized result for retrieval according to the increment type semantic sequence pattern mining algorithm of frequent sequence tree in the time of sequence, can improve precision ratio.
Brief description of the drawings
A kind of system chart that Fig. 1 provides for the embodiment of the present invention;
A kind of semantic information retrieval method flow diagram based on domain body that Fig. 2 provides for the embodiment of the present invention.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is clearly and completely described, obviously, described embodiment is only the present invention's part embodiment, instead of whole embodiment.Based on the embodiment in the present invention, those of ordinary skill in the art, not making the every other embodiment obtaining under creative work prerequisite, belong to the scope of protection of the invention.
The embodiment of the present invention provides a kind of semantic information retrieval system based on domain body, as shown in Figure 1, this system comprises: user interface proxy module 1, individual demand analysis module 2, field ontology library 3, user interest ontology library 4, information search proxy module 5, document analysis and semantic filtering module 6, personalisation process module 7, order module 8.
User interface proxy module 1, for receiving the initial information inquiry request of user's input.
For new user, after login native system, before receiving the initial information inquiry request of user's input, the user interface proxy module 1 in native system need to first provide registering functional, be used for the foundation of new user information profile, constructed new user's user interest ontology library; And old user is in the time that needs carry out retrieval and inquisition, after can directly logining, input initial information inquiry request by user interface proxy module 1.
Individual demand analysis module 2, carry out lexical analysis for the described initial information inquiry request that described user interface proxy module 1 is received, obtain the content of keyword level, then predict the interested content of described user based on field ontology library 3 and in conjunction with user interest ontology library 4, form final inquiry request, and submit described final inquiry request to information search proxy module 5.
Optionally, described individual demand analysis module 2, specifically for the content of described keyword level and described user interest ontology library 4 are shone upon, the whether existing similar inquiry request of examination; If have the content being mapped in described user interest ontology library 4 as final inquiry request, otherwise inquire about described field ontology library 3, therefrom find out each corresponding field of the content that occurs described keyword level, the related notion in described corresponding field is enumerated to user by described user interface proxy module 1, to make described user according to the query intention of oneself, determine field and the implication of desired seek, and then obtain final inquiry request; Now, described user interest ontology library 4 just need to record this new demand information of described user.
Wherein, comprising in described field ontology library: the semantic relation network in every field between definition and each concept of the most basic concept, concept; Described user interest ontology library comprises: the interested information of user and excavate out according to user interest point its have the information of potential interest.Described information search proxy module 5, for searching for the network information document relevant to described final inquiry request from resources bank, described resources bank comprises: local information resource, share in different areas resource and other Internet resources.
Document analysis and semantic filtering module 6, for the network information document searching for described information search proxy module 5, based on described field ontology library 3, judge field under described network information document according to the keyword extracting, then convert the term in described field ontology library 3 to, simultaneously according to the semantic relevancy between described final inquiry request and described network information document, filter out in described network information document and the incoherent information of user's inquiry request, obtain the result document after semantic filtering.
Personalisation process module 7, for the user knowledge in conjunction with described user interest ontology library 4 storages, assesses described result document, provides the degree of association of described result document and user interest.
Order module 8, for the described semantic relevancy obtaining according to described document analysis and semantic filtering module 2, and the described degree of association that obtains of described personalisation process module 7, described result document is optimized according to the increment type semantic sequence pattern mining algorithm of frequent sequence tree, finally the Query Result that meets user's request and interest is returned to user by user interface proxy module 1.
Described user interface proxy module 1, also for being shown to described user by described Query Result.
Described user interest ontology library 4, for the feedback information to described Query Result according to user, upgrades the content of described user's correspondence in user interest ontology library.
The embodiment of the present invention also provides a kind of semantic information retrieval method based on domain body, as shown in Figure 2, said method comprising the steps of:
101, receive the initial information inquiry request of user's input.
102, the described initial information inquiry request receiving is carried out to lexical analysis, obtain the content of keyword level, then predict the interested content of described user based on field ontology library and in conjunction with user interest ontology library, form final inquiry request.
Wherein, comprising in described field ontology library: the semantic relation network in every field between definition and each concept of the most basic concept, concept; Described user interest ontology library comprises: the interested information of user and excavate out according to user interest point its have the information of potential interest.The content of described keyword level and described user interest ontology library can be shone upon to the whether existing similar inquiry request of examination; If have using the content being mapped in described user interest ontology library as final inquiry request, otherwise inquire about described field ontology library, therefrom find out each corresponding field of the content that occurs described keyword level, the related notion in described corresponding field is enumerated to user, to make described user according to the query intention of oneself, determine field and the implication of desired seek, and then obtain final inquiry request; Described user interest ontology library records this new demand information of described user.
103, from the related resource search network information document relevant to described final inquiry request.
Described related resource comprises: local information resource, share in different areas resource and other Internet resources.
104, for described network information document, based on described field ontology library, judge field under described network information document according to the keyword extracting, then convert described network information document to term in described field ontology library, obtain the network information document after conversion, simultaneously according to the semantic relevancy between the network information document after described final inquiry request and described conversion, filter out in described network information document and the incoherent information of user's inquiry request, obtain the result document after semantic filtering.
105, in conjunction with the user knowledge of storing in described user interest ontology library, described result document is assessed, provided the degree of association of described result document and user interest.
106, according to described semantic relevancy and the described degree of association, described result document is optimized according to the increment type semantic sequence pattern mining algorithm of frequent sequence tree, finally the Query Result that meets user's request and interest is returned to user.
107, the feedback information to described Query Result according to user, upgrades the content of described user's correspondence in user interest ontology library.
Below understand specifically the building process of field ontology library:
A body can be made up of concept, taxonomical hierarchy, relation, function, axiom and 6 kinds of elements of example.Concept in body is sensu lato concept, and it,, except being concept in general sense, can be also task, function, behavior, strategy, reasoning process etc.These concepts in body form a taxonomical hierarchy conventionally.Relation in body represents the association between concept, and the institute that this association has showed between the concept except taxonomical hierarchy relation is related.Function in body is a kind of special relation.Axiom in a lot of fields in, between representative function or between association, also exist association or constraint.Example refers to the fundamental element that belongs to key concept class, i.e. the concrete entity of certain concept class indication, all examples of specific area.
Body is carried out to formalized description and can be expressed as one hexa-atomic group O:=(C, H c, R, rel, A, I); Wherein, C is the set of all concepts in field, C={c1, and c2 ..., cm}; H cbe a taxonomical hierarchy, it is an oriented transitive relation, H c∈ C × C, represents the upper the next hierarchical relationship between concept or between concept and example, as H c(c i, c j) expression c ic jupper hierarchical relationship; R is the set of semantic relation between concept, represents the class association between concept, R={r1, r2 ..., rn}, as Composed-of represents between concept part and overall relation, Is-a represents inheritance between concept, and Instance-of represents the relation of inclusion of concept and affiliated example etc.; Rel:R → C × C is a function, also can be expressed as rel (R)=(C1, C2) or R (C1, C2); A represents the set of axiom in field, for the associated or constraint existing between expression field function or association, and as first order logic etc., A={a1, a2 ..., ap}; I is the example collection under concept, and example refers to the concrete entity of certain concept indication, I={i1, and i2 ..., iq}.
According to the formalized description of body, the concrete steps that build field ontology library are as follows:
(1) determine territory;
(2) determine the concept in field, construct each concept class;
(3) determine the attribute of concept, construct each Attribute class;
(4) set up the semantic relation between attribute;
(5) set up the semantic relation between concept;
(6) set up the example of concept;
(7) build body by concept, attribute, relation and the example set up.
While determining the concept in field, can be by using Jena API, call the example that Model-Factory.create Ontology Model method builds ontology model class Ont Model, then the instantiation of each concept class of the create Class method construct Ont Class by calling example obtains concepts all in field.
While determining the attribute of concept, the instantiation that calls each Attribute class of create Object Property method construct Object-Property of the example of the ontology model class OntMo-del having created obtains the attribute of concepts all in field.
When field ontology library builds, the hierarchical relationship between concept can arrange by the add Sub Class method of concept class Ont Class or add Super Class method.In addition, also have two kinds of semantic relations of often using: 1. special-shaped synonym, the i.e. same implication of different lexical representations; 2. homograph, the i.e. different implication of same lexical representation.Wherein, the relation between homograph can be described with semantic label different From, all Different etc. in OWL, arranges by the add-Disjoint With method of concept class Ont Class.Relation between abnormity synonym can be described with equivalent-Class, equivalent Property and same As etc., arranges by the add Equivalent Class method of concept class Ont-Class.
Hierarchical relationship between attribute can arrange by the add-Super Property method of Attribute class Object Property.In addition, also need the field of definition and the codomain that often set a property, arrange by add Domain method and the add-Range method of Attribute class Object Property.
The foundation of the example of each conception ontology class can be by calling the example of each example class of create Individual method construct Indivi-dual of ontology model class Ont Model example.
Finally, the foundation of body can build the example of body class Ontology by the create Ontology method of calling ontology model class Ont Model example, and can each adeditive attribute be set by the add Property method of calling body class Ontology.
On the basis of domain body, for each different heterogeneous data source, need to build the applied ontology of answering in contrast.Applied ontology can be regarded the mapping of domain body in data source as, is to convert the isomeric data in data source to described by OWL instances of ontology to obtain.When specific implementation, can first create an OWL document, then often obtain a metadata information, just using it as a sub-Knots inserting in OWL document, and the attribute information of inserting metadata.Then,, by Jena API, applied ontology is deposited into (such as My SQL) in database.In addition, the body that OWL can also be described changes into RDF tlv triple, and deposits in RDF tlv triple storehouse.Complicated class being defined by OWL or attribute can be expressed as one or more corresponding RDF tlv triple.
More than introduced the building process of field ontology library, the structure of user interest ontology library can be with reference to the above.
Step 106 specifically comprises: the result document obtaining in step 105 is converted into semantic sequence, form semantic sequence database, then the increment type semantic sequence pattern mining algorithm that adopts frequent sequence tree, is optimized, and obtains the Query Result that meets user's request and interest.Here, the increment type semantic sequence pattern mining algorithm of described frequent sequence tree, comprising:
(1) semantic sequence database changes, and minimum support is not less than in the situation of frequent sequence tree support threshold value:
First find out the set I-db of all items in newly-increased semantic sequence database db; The sequence that comprises the item in I-db in original semantic sequence database is formed to new semantic sequence database, remove simultaneously and in sequence, be not included in I-db middle term; Then to described new semantic sequence database and described newly-increased semantic sequence database construction data for projection storehouse, find all sequences pattern that meets support threshold value, frequent sequence tree is upgraded.
(2) semantic sequence database changes, and minimum support is less than in the situation of frequent sequence tree support threshold value:
To original semantic sequence database construction data for projection storehouse, find support to be not less than the given support of user, and be less than all sequences pattern of frequent sequence tree support threshold value; All sequence patterns and support information thereof are stored in frequent sequence tree and a frequent sequence tree support threshold value and is made as the given support of user; Then, process by the situation described in (1).
The increment type semantic sequence pattern mining algorithm (referred to as ISSFST) of above-mentioned frequent sequence tree, described result document is converted into semantic sequence, Mining Frequent Patterns, can realize personalized recommendation, the optimization result for retrieval of user search demand and improve precision ratio.
ISSFST uses the storage organization of frequent sequence tree construction as increment type Sequential Pattern Mining Algorithm.Frequent sequence tree is a kind of sequence storage organization, and in frequent sequence tree, storage meets all sequences pattern and the support thereof of frequent sequence tree support threshold value.
1, frequent sequence tree construction
Frequent sequence tree is a prefix trees, meets all sequences pattern and the support information thereof of frequent sequence tree support threshold value in frequent sequence tree in stored data base.The construction process of frequent sequence tree is similar to the process that uses PrefixSpan algorithm to excavate sequence pattern in database.All frequent as child's node using what excavate in data for projection storehouse each time, be inserted into during taking last of data for projection storehouse prefix, the frequent sequence for father's node is set.
The definition of frequent sequence tree: the root node of frequent sequence tree comprises an attribute, for storing frequent sequence tree support threshold value.Except root node, in frequent sequence tree, each node comprises two attributes, meets respectively frequent sequence tree support threshold series pattern and support thereof in stored data base.All represented a sequence pattern database to the path of any one leaf node from child's node of root node, its support equals the support of leaf node.In frequent sequence tree, the support of any node is all not less than the support of its child node.
2, the semantic sequence pattern mining algorithm based on frequent sequence tree
ISSFST is a kind of increment type Sequential Pattern Mining Algorithm based on projection, and its main thought is all sequences pattern that meets support by frequent sequence being upgraded to operation, finding.In the time that database changes, ISSFST algorithm divides following two kinds of situations to upgrade frequent sequence tree:
(1) database changes, and minimum support is not less than frequent sequence tree support threshold value;
(2) database changes, and minimum support is less than frequent sequence tree support threshold value.
The first situation: the set I-db that first finds out all items in newly-increased database db.The sequence that comprises the item in I-db in database is formed to new database, remove simultaneously and in sequence, be not included in I-db middle term.Then the database to new formation and newly-increased database construction data for projection storehouse, find all sequences pattern that meets support threshold value, and frequent sequence tree is upgraded.
The second situation: to structure data for projection storehouse, legacy data storehouse, find support to be not less than the given support of user, and be less than all sequences pattern of frequent sequence tree support threshold value.All sequence patterns and support information thereof are stored in frequent sequence tree and a frequent sequence tree support threshold value and is made as the given support of user.Then, process by the first situation.
In ISSFST algorithm, use a kind of Pruning strategy: in the time that database changes, because the support of the sequence pattern generating with the irrelevant sequence of newly-increased database middle term in legacy data storehouse does not change, therefore do not need these sequence structure data for projection storehouses.
Algorithm 1:ISSFST (DB, db, min-sup, FST)
Input: legacy data storehouse DB, incremental data storehouse db, minimum support min-sup, frequent sequence tree FST.
Output: the frequent sequence tree FST after renewal, the database D B after renewal, the frequent sequence sets FS ' of database after upgrading.
Method:
(1) If FST is empty
(2) Con-FST (DB, min-sup, FST); The frequent sequence tree of/* structure DB */
(3) the newly-increased database db of Else If is not empty
(4)If?min-sup>=FST-sup
(5)Tree-updated(DB,db,FST);
(6)Else?If?min-sup<FST-sup
(7) to DB structure data for projection storehouse, find support to be not less than min-sup and be less than all sequences pattern and the support thereof of frequent sequence tree support threshold value, these sequences are stored in frequent sequence tree;
(8)FST-sup=min-sup;
(9)Tree-updated(DB,db,FST);
(10) travel through frequent sequence tree, find FS ', DB=DB+db;
(11)Return;
Algorithm 2:Tree-updated (DB, db, FST)
Input: legacy data storehouse DB, incremental data storehouse db, frequent sequence tree FST.
Output: the frequent sequence tree FST after renewal.
Method:
(1) find set Item-db all in db;
(2)For?each?s?in?DB
(3) in If s, comprise the item in Item-db
(4) delete in s and be not included in the item in Item-db, form s ';
(5)DB-new=DB-new+s’;
(6)db’=db+DB-new;
(7), to db ' structure data for projection storehouse, finding minimum support is all sequences pattern and the support thereof of frequent sequence tree support threshold value, and frequent sequence tree is upgraded;
(8)Return;
Owing to having stored all sequences pattern and the support information thereof that meet frequent sequence tree support threshold value in sequence library in frequent sequence tree, frequent sequence tree construction quoting in ISSFST algorithm, makes ISSFST algorithm can make full use of previous Result.In the situation that minimum support is not less than frequent sequence tree support threshold value, in the time that database changes, ISSFST algorithm does not need structure data for projection storehouse, legacy data storehouse, only need to, to the sequence structure data for projection storehouse relevant to item in incremental data storehouse, greatly reduce the scale in data for projection storehouse.
The optimization of result for retrieval is the effective means that improves information retrieval effect.Practice shows, in personalized semantic information retrieval system, result document is adopted to the Optimization of Information Retrieval method based on frequent semantic sequence, user be can effectively alleviate because screen the burden that information of interest produces from a large amount of irrelevant result for retrieval, and personalization and the precision ratio of information retrieval improved.
In order to verify the validity of correlation technique related in proposed system model and information retrieval, project team has realized a semantic information retrieval experimental system based on computer body.The running environment of this system is: CPU is Inter Core2, the internal memory of 1G, and operating system is Windows XP.System has adopted the Integrated Development Environment of Eclipse, has used MySql database.
Experimental result shows, while utilizing body to carry out semantic information retrieval, can obtain good Query Result.By comparing analysis with traditional semantic information retrieval method, show feasibility, reliability and the validity of the increment type semantic sequence Frequent Pattern Mining method based on frequent sequence tree.
The above; be only the specific embodiment of the present invention, but protection scope of the present invention is not limited to this, any be familiar with those skilled in the art the present invention disclose technical scope in; can expect easily changing or replacing, within all should being encompassed in protection scope of the present invention.Therefore, protection scope of the present invention should described be as the criterion with the protection domain of claim.

Claims (5)

1. the semantic information retrieval system based on domain body, is characterized in that, comprising:
User interface proxy module, for receiving the initial information inquiry request of user's input;
Individual demand analysis module, carry out lexical analysis for the described initial information inquiry request that described user interface proxy module is received, obtain the content of keyword level, then predict the interested content of described user based on field ontology library and in conjunction with user interest ontology library, form final inquiry request, and submit described final inquiry request to information search proxy module; Wherein, comprising in described field ontology library: the semantic relation network in every field between definition and each concept of the most basic concept, concept; Described user interest ontology library comprises: the interested information of user and excavate out according to user interest point its have the information of potential interest;
Described information search proxy module, for searching for the network information document relevant to described final inquiry request from resources bank, described resources bank comprises: local information resource, share in different areas resource;
Document analysis and semantic filtering module, for the network information document searching for described information search proxy module, based on described field ontology library, judge field under described network information document according to the keyword extracting, then convert described network information document to term in described field ontology library, obtain the network information document after conversion, simultaneously according to the semantic relevancy between the network information document after described final inquiry request and described conversion, filter out in described network information document and the incoherent information of user's inquiry request, obtain the result document after semantic filtering,
Personalisation process module, for the user knowledge of storing in conjunction with described user interest ontology library, assesses described result document, provides the degree of association of described result document and user interest;
Order module, for the described semantic relevancy obtaining according to described document analysis and semantic filtering module, and the described degree of association that obtains of described personalisation process module, described result document is optimized according to the increment type semantic sequence pattern mining algorithm of frequent sequence tree, finally the Query Result that meets user's request and interest is returned to user by user interface proxy module;
Described user interface proxy module, also for being shown to described user by described Query Result;
Described user interest ontology library, for the feedback information to described Query Result according to user, upgrades the content of described user's correspondence in user interest ontology library.
2. system according to claim 1, is characterized in that,
Described individual demand analysis module, specifically for the content of described keyword level and described user interest ontology library are shone upon, the whether existing similar inquiry request of examination; If have using the content being mapped in described user interest ontology library as final inquiry request, otherwise inquire about described field ontology library, therefrom find out each corresponding field of the content that occurs described keyword level, the related notion in described corresponding field is enumerated to user by described user interface proxy module, to make described user according to the query intention of oneself, determine field and the implication of desired seek, and then obtain final inquiry request;
Described user interest ontology library, also for recording this new demand information of described user.
3. the semantic information retrieval method based on domain body, is characterized in that, comprises the following steps:
101, receive the initial information inquiry request of user's input;
102, the described initial information inquiry request receiving is carried out to lexical analysis, obtain the content of keyword level, then predict the interested content of described user based on field ontology library and in conjunction with user interest ontology library, form final inquiry request; Wherein, comprising in described field ontology library: the semantic relation network in every field between definition and each concept of the most basic concept, concept; Described user interest ontology library comprises: the interested information of user and excavate out according to user interest point its have the information of potential interest;
103, the search network information document relevant to described final inquiry request from resources bank, described resources bank comprises: local information resource, share in different areas resource;
104, for described network information document, based on described field ontology library, judge field under described network information document according to the keyword extracting, then convert described network information document to term in described field ontology library, obtain the network information document after conversion, simultaneously according to the semantic relevancy between the network information document after described final inquiry request and described conversion, filter out in described network information document and the incoherent information of user's inquiry request, obtain the result document after semantic filtering;
105, in conjunction with the user knowledge of storing in described user interest ontology library, described result document is assessed, provided the degree of association of described result document and user interest;
106, according to described semantic relevancy and the described degree of association, described result document is optimized according to the increment type semantic sequence pattern mining algorithm of frequent sequence tree, finally the Query Result that meets user's request and interest is returned to user;
107, the feedback information to described Query Result according to user, upgrades the content of described user's correspondence in user interest ontology library.
4. method according to claim 3, is characterized in that, step 102 specifically comprises:
The content of described keyword level and described user interest ontology library are shone upon to the whether existing similar inquiry request of examination; If have using the content being mapped in described user interest ontology library as final inquiry request, otherwise inquire about described field ontology library, therefrom find out each corresponding field of the content that occurs described keyword level, the related notion in described corresponding field is enumerated to user, to make described user according to the query intention of oneself, determine field and the implication of desired seek, and then obtain final inquiry request;
Described user interest ontology library records this new demand information of described user.
5. the method for stating according to claim 3, is characterized in that, the increment type semantic sequence pattern mining algorithm of frequent sequence tree, comprising:
(1) semantic sequence database changes, and minimum support is not less than in the situation of frequent sequence tree support threshold value:
First find out the set I-db of all items in newly-increased semantic sequence database db; The sequence that comprises the item in I-db in original semantic sequence database is formed to new semantic sequence database, remove simultaneously and in sequence, be not included in I-db middle term; Then to described new semantic sequence database and described newly-increased semantic sequence database construction data for projection storehouse, find all sequences pattern that meets support threshold value, frequent sequence tree is upgraded;
(2) semantic sequence database changes, and minimum support is less than in the situation of frequent sequence tree support threshold value:
To original semantic sequence database construction data for projection storehouse, find support to be not less than the given support of user, and be less than all sequences pattern of frequent sequence tree support threshold value; All sequence patterns and support information thereof are stored in frequent sequence tree and a frequent sequence tree support threshold value and is made as the given support of user; Then, process by the situation described in (1).
CN201410329258.6A 2014-07-04 2014-07-04 Semantic information retrieval system and method based on domain ontology Pending CN104090958A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410329258.6A CN104090958A (en) 2014-07-04 2014-07-04 Semantic information retrieval system and method based on domain ontology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410329258.6A CN104090958A (en) 2014-07-04 2014-07-04 Semantic information retrieval system and method based on domain ontology

Publications (1)

Publication Number Publication Date
CN104090958A true CN104090958A (en) 2014-10-08

Family

ID=51638674

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410329258.6A Pending CN104090958A (en) 2014-07-04 2014-07-04 Semantic information retrieval system and method based on domain ontology

Country Status (1)

Country Link
CN (1) CN104090958A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105653661A (en) * 2015-12-29 2016-06-08 云南电网有限责任公司电力科学研究院 Search result re-ranking method and device
CN105956053A (en) * 2016-04-27 2016-09-21 海信集团有限公司 Network information-based search method and apparatus
WO2016177277A1 (en) * 2015-05-04 2016-11-10 阿里巴巴集团控股有限公司 Information recommendation method and apparatus
CN106570187A (en) * 2016-11-14 2017-04-19 南京邮电大学 Ontological-concept-similarity-based software component retrieving method
WO2017088245A1 (en) * 2015-11-27 2017-06-01 小米科技有限责任公司 Method and apparatus for recommending reference document
CN107193873A (en) * 2017-04-17 2017-09-22 吉林工程技术师范学院 A kind of network search method based on semantic network technology
CN107832312A (en) * 2017-01-03 2018-03-23 北京工业大学 A kind of text based on deep semantic discrimination recommends method
CN108829666A (en) * 2018-05-24 2018-11-16 中山大学 A kind of reading understanding topic method for solving solved based on semantic parsing and SMT
CN110334325A (en) * 2019-07-16 2019-10-15 同方知网数字出版技术股份有限公司 A kind of full text similarity analysis method compared towards publishing house's strange land resource joint
TWI723782B (en) * 2018-10-12 2021-04-01 張劭農 Method for generating personalized interactive content and system thereof

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140081900A1 (en) * 2008-08-19 2014-03-20 Northrop Grumman Systems Corporation System and method for information sharing across security boundaries
CN103886099A (en) * 2014-04-09 2014-06-25 中国人民大学 Semantic retrieval system and method of vague concepts

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140081900A1 (en) * 2008-08-19 2014-03-20 Northrop Grumman Systems Corporation System and method for information sharing across security boundaries
CN103886099A (en) * 2014-04-09 2014-06-25 中国人民大学 Semantic retrieval system and method of vague concepts

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
刘佳新: "一种基于频繁序列树的增量式序列模式挖掘算法", 《计算机与现代化》 *
姬朝阳等: "基于本体的个性化信息检索模型设计", 《微计算机信息》 *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016177277A1 (en) * 2015-05-04 2016-11-10 阿里巴巴集团控股有限公司 Information recommendation method and apparatus
CN106202087A (en) * 2015-05-04 2016-12-07 阿里巴巴集团控股有限公司 A kind of information recommendation method and device
WO2017088245A1 (en) * 2015-11-27 2017-06-01 小米科技有限责任公司 Method and apparatus for recommending reference document
CN105653661A (en) * 2015-12-29 2016-06-08 云南电网有限责任公司电力科学研究院 Search result re-ranking method and device
CN105956053B (en) * 2016-04-27 2019-07-16 海信集团有限公司 A kind of searching method and device based on the network information
CN105956053A (en) * 2016-04-27 2016-09-21 海信集团有限公司 Network information-based search method and apparatus
CN106570187A (en) * 2016-11-14 2017-04-19 南京邮电大学 Ontological-concept-similarity-based software component retrieving method
CN106570187B (en) * 2016-11-14 2020-04-21 南京邮电大学 Software component retrieval method based on ontology concept similarity
CN107832312A (en) * 2017-01-03 2018-03-23 北京工业大学 A kind of text based on deep semantic discrimination recommends method
CN107832312B (en) * 2017-01-03 2023-10-10 北京工业大学 Text recommendation method based on deep semantic analysis
CN107193873A (en) * 2017-04-17 2017-09-22 吉林工程技术师范学院 A kind of network search method based on semantic network technology
CN108829666A (en) * 2018-05-24 2018-11-16 中山大学 A kind of reading understanding topic method for solving solved based on semantic parsing and SMT
TWI723782B (en) * 2018-10-12 2021-04-01 張劭農 Method for generating personalized interactive content and system thereof
CN110334325A (en) * 2019-07-16 2019-10-15 同方知网数字出版技术股份有限公司 A kind of full text similarity analysis method compared towards publishing house's strange land resource joint

Similar Documents

Publication Publication Date Title
CN104090958A (en) Semantic information retrieval system and method based on domain ontology
CN107402988B (en) Distributed NewSQL database system and semi-structured data query method
Sevilla Ruiz et al. Inferring versioned schemas from NoSQL databases and its applications
EP2973041B1 (en) Apparatus, systems, and methods for batch and realtime data processing
CN104160394B (en) Scalable analysis platform for semi-structured data
Franklin et al. From databases to dataspaces: a new abstraction for information management
US8024701B2 (en) Visual creation of object/relational constructs
US20140279903A1 (en) Version control system using commit manifest database tables
CN102760058B (en) Massive software project sharing method oriented to large-scale collaborative development
US10929439B2 (en) Taxonomic tree generation
CN103488759A (en) Method and device for searching application programs according to key words
WO2023087673A1 (en) Hierarchical data retrieval method and apparatus, and device
EP3732587B1 (en) Systems and methods for context-independent database search paths
Baas Nosql spatial–neo4j versus postgis
Vajk et al. Automatic NoSQL schema development: A case study
Hoang et al. Retracted: Semantic information integration with linked data mashups approaches
Elbattah et al. Large-scale ontology storage and query using graph database-oriented approach: The case of Freebase
Fernández-García et al. A microservice-based architecture for enhancing the user experience in cross-device distributed mashup UIs with multiple forms of interaction
Barrasa et al. Building Knowledge Graphs
Le-Phuoc et al. Unifying stream data and linked open data
Angelis et al. Generating and exploiting semantically enriched, integrated, linked and open museum data
Lee et al. Ontology management for large-scale e-commerce applications
Zamula et al. MneMojno—Design and deployment of a Semantic web service and a mobile application
Muñoz-Sánchez et al. Managing Physical Schemas in MongoDB Stores
Anelli et al. Querying deep web data sources as linked data

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20141008