CN101582073A - Intelligent retrieval system and method based on domain ontology - Google Patents

Intelligent retrieval system and method based on domain ontology Download PDF

Info

Publication number
CN101582073A
CN101582073A CNA200810306721XA CN200810306721A CN101582073A CN 101582073 A CN101582073 A CN 101582073A CN A200810306721X A CNA200810306721X A CN A200810306721XA CN 200810306721 A CN200810306721 A CN 200810306721A CN 101582073 A CN101582073 A CN 101582073A
Authority
CN
China
Prior art keywords
semantic
query
domain
index
processing unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CNA200810306721XA
Other languages
Chinese (zh)
Inventor
吴来
刘鹏
李春梅
黄道雄
范书德
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zhongjikehai Technology & Development Co Ltd
Original Assignee
Beijing Zhongjikehai Technology & Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zhongjikehai Technology & Development Co Ltd filed Critical Beijing Zhongjikehai Technology & Development Co Ltd
Priority to CNA200810306721XA priority Critical patent/CN101582073A/en
Publication of CN101582073A publication Critical patent/CN101582073A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the field of Chinese information retrieval (IR), in particular to an intelligent retrieval method based on domain ontology (DO) and an intelligent retrieval system including the method. The system comprises an ontology inference module used for analyzing natural query language input by users, an index processing module used for creating an index database, a query processing module used for performing special query, and a result optimizing ordering module used for processing query results. In addition, the system also comprises a domain library, a data repository, and the index database, which are built based on a certain domain. The intelligent retrieval system and the intelligent retrieval method based on the domain ontology make full use of concepts in the domain library and interrelation thereof, can correctly understand the demand of users, optimize retrieval results, return professional domain information for the users more accurately and comprehensively, and significantly improve the property of the information retrieval in the field of professional technology.

Description

A kind of intelligent retrieval system and method based on domain body
Technical field
The present invention relates to Chinese information retrieval (IR) field, particularly a kind of intelligent search method, and the intelligent retrieval system that comprises this method based on domain body (Domain ontology).
Background technology
The appearance of information retrieval technique is the milestone on the network development history, and it brings great convenience for the network user, and Google, Baidu are typical case's representatives in this field.As long as the user is input term or retrieve statement, and information retrieval system will be according to certain ordering rule, return all webpages that comprise this term or retrieve statement apace for the user.Therefore, for information retrieval system, correct understanding user's Search Requirement, optimization sort result mode etc. are most important.
Yet various information, particularly professional domain knowledge can't accurately be understood and handle to existing universal search engine, often retrieval less than in addition return a large amount of irrelevant professional domain information, system's recall ratio and precision ratio are not high.Main cause is:
On the one hand, take the keyword matching mode to understand the user search statement.Notion and semanteme that information retrieval system is not paid close attention to the professional domain vocabulary of user's input just directly mate keyword behind the participle and the index terms in the index database according to literal form.
On the other hand, according to the retrieval degree of correlation to result sort processing, i.e. how much sorting according to word identical between term and the index terms or speech.
In order to improve recall precision, some information retrieval systems have proposed improvement technology such as " relevant searches ", yet these technology still do not break away from the essence of literal coupling.In artificial intelligence fields such as (AI), the solution that is introduced as relevant issues of body (ontology) has brought opportunity.
(1) body is formal, the clear and definite normalized illustration (studer 1998 for ontology is a formal, explicit specification of a shared conceptualization) of shared ideas model.
The target of body is the knowledge of catching association area, determines the vocabulary of the common approval in this field, and clearly defines the mutual relationship between these vocabulary and vocabulary, the common understanding to this domain knowledge is provided, and is stored in computing machine with normalized form.
(2) stipulated domain.
Domain body is a description object with a specific field, and the concept definition of this specific area and the relation between the notion, main theory, ultimate principle are provided, and the activity that takes place in the field etc.
(3) representation of knowledge, share and reuse.
Sharing architectonic expression is the semanteme of " machine can be handled ", and it is grammer with URI as naming mechanism, with XML based on RDF, with different application integration together, the data on the Web is carried out abstract representation.Body is by the expression mode of this general framework, and the border of permission leap different application, enterprise and group is carried out sharing of data and reused.
(4) the semantic basis of information interchange.
Knowledge hierarchy by common approval in the field that body provided comprises terminology, set of relations and rule set, can provide a kind of common recognition for different subjects, and carrying out information interchange for the people under different background and the field, machine, software systems etc. provides possibility.
Just because of above characteristics and advantage, so body provides possibility for semantic understanding, intelligent retrieval etc.The nineties in 20th century, ontology has obtained the extensive concern and the research in a plurality of fields such as knowledge engineering, artificial intelligence, and has obtained certain achievement.
Yet, make up at present comparatively detailed, include the architectonic general body of all spectra, and set up information retrieval system and unrealistic based on this general body.Therefore, be necessary, make up domain body, realize intelligent retrieval this professional domain knowledge from a certain field.At present, still do not exist in the relevant intelligent retrieval technology based on the sentence pattern method for mode matching of user's input of domain body and the result optimizing sort method that semantic distance is measured, and still there is not the intelligent retrieval system that comprises this method, cause intelligent retrieval system to face series of technical, as expection, on retrieval performance, be not significantly improved and improve than traditional searching system.
Summary of the invention
Fundamental purpose of the present invention is to provide a kind of system based on domain body realization intelligent retrieval, is intended to the correct understanding user's request, and professional domain information service efficiently is provided, and improves the deficiency of existing information searching system.
Another object of the present invention also is to provide methods such as a kind of sentence pattern method for mode matching that is used for above-mentioned intelligent retrieval system based on domain body and semantic distance measurement, utilize this method can the correct understanding user the natural query statement of input, Query Result is carried out the calculating of semantic relevancy, for the user returns maximally related professional domain information.
To achieve the above object of the invention, the present invention is achieved through the following technical solutions:
The embodiment of the invention discloses a kind of intelligent retrieval system based on domain body, it is characterized in that, this system comprises: the ontology inference module that is used for analysis user input nature query statement, be used to create the index process module of index database, be used to carry out the query processing module of ad hoc inquiry, and being used for the result optimizing order module that Query Result is handled, described system also comprises: based on the constructed field ontology library in a certain field, data repository and index data base;
Wherein, the ontology inference module comprises participle pretreatment unit and sentence pattern pattern matching unit;
The participle pretreatment unit is used to receive the natural query statement that the user imports, and query statement is carried out pre-service such as participle, part-of-speech tagging, domain body character labeling, removes weak semantic vocabulary, obtains strong semantic lexical set;
The sentence pattern pattern matching unit is used for the sentence pattern pattern of strong semantic lexical set and predefined is carried out matching treatment, obtains new retrieval type;
The index process module comprises body semantic indexing processing unit and full-text index processing unit;
Body semantic indexing processing unit is used to obtain the data resource document, resolves, handles and extract the document body content information, based on the synthetic document semantic vector of field ontology library, sets up the semantic indexing storehouse based on body;
The full-text index processing unit is used to obtain the data resource document, extracts document information, sets up the full-text index storehouse;
Query processing module comprises semantic query processing unit, expanding query processing unit and full-text search processing unit;
The semantic query processing unit is used for handling based on the related intelligence inquire that carries out professional domain information between domain body notion and notion;
The expanding query processing unit is used for carrying out expanding query based on association between domain body notion and notion and handles;
The full-text search processing unit is used for according to traditional retrieval mode, promptly carries out the processing of full-text search according to the keyword matching principle;
Data repository comprises: the resource data in the field of resource in the local field database or extracting from network;
Index data base comprises: by the body semantic indexing storehouse and the full-text index storehouse of index process module foundation.
The embodiment of the invention also discloses a kind of intelligent search method based on domain body, it is characterized in that, this method comprises the steps:
A. the natural query statement to user's input carries out participle, part-of-speech tagging, and carries out character labeling based on domain body;
B. analyze, judge the word finder among the above-mentioned steps A, carry out the judgement of body role nonempty entry, and inquire about accordingly according to certain rule;
C. Query Result is carried out semantic distance and measure, carry out result's optimization, and, return to the user result for retrieval ordering output according to the semantic distance value.
Wherein the judgement of the described body role of above-mentioned steps B nonempty entry further comprises:
If B1. do not comprise Ontological concept in the natural query statement of user's input, then carry out full-text search;
If B2. comprise Ontological concept in the natural query statement of user's input, then carry out the matching judgment of sentence pattern pattern.
Wherein the described sentence pattern pattern match judgement of above-mentioned steps B2 further comprises:
If semantic query is then carried out in B21. sentence pattern pattern match success;
If field ontology library is then visited in B22. sentence pattern pattern match success, carry out suitable semantic extension and handle, carry out expanding query.
Therefore, intelligent retrieval system and method that the embodiment of the invention provides based on domain body, have following advantage: intelligent retrieval system and the method based on domain body of the present invention made full use of notion and the mutual relationship thereof in the field ontology library, can the correct understanding user's request, optimize result for retrieval, for the user returns professional domain information more complete, more accurately, can significantly improve the performance of professional skill field internal information retrieval.
Description of drawings
According to the description of following drawings and Examples, can prove absolutely feature of the present invention and advantage.In the accompanying drawings:
Fig. 1 is the structured flowchart of a kind of intelligent retrieval system based on domain body of the embodiment of the invention;
Fig. 2 is the process flow diagram that the semantic indexing database is created in the semantic body index process unit in the embodiment of the invention;
Fig. 3 is the intelligent retrieval system of the embodiment of the invention shown in Figure 1 is carried out professional domain knowledge query process for the user a process flow diagram;
Fig. 4 is the retrieval mode figure that the embodiment of the invention adopts; And
Fig. 5 is semantic apart from synoptic diagram between the domain body notion of the embodiment of the invention.
Embodiment
For making the purpose, technical solutions and advantages of the present invention clearer, below with reference to accompanying drawing and embodiment, the present invention is described in further detail.Be to be understood that; following for embodiment only be used as and explain the present invention, be not limited to the present invention; be that protection scope of the present invention is not limited to following embodiment; on the contrary; according to design of the present invention; those of ordinary skills can suitably change, and these changes can fall within the invention scope that claims limit.
Basic thought of the present invention is: one embodiment of the present of invention provide multiple retrieval mode based on field ontology library, as shown in Figure 4, comprising: full-text search 402, expansion retrieval 403 and semantic retrieval 404.If do not comprise Ontological concept in the vocabulary of user's input, then carry out full-text search; Otherwise carrying out the sentence pattern pattern match in conjunction with the natural query statement that domain body is imported the user handles.If the match is successful, then visit semantic body index database and carry out semantic retrieval; If coupling is unsuccessful, then carry out suitable semantic extension inquiry based on field ontology library.At last, Query Result is carried out semantic distance measure, and with result optimizing ordering and output, for the user returns professional domain information.
The intelligent retrieval system based on domain body provided by the invention shown in Fig. 1 comprises: ontology inference module 102, index process module 109, query processing module 115 and result optimizing order module 119, and field ontology library 105, data repository 106 and index data base 112.
Index process module 109 among Fig. 1 is at local data resource 107 in the data repository 106 or internet resource 108, and in conjunction with field ontology library 105, generates index data bases 112 by body semantic indexing processing unit 110 and full-text index processing unit 111; Ontology inference module 102 receives the natural language querying statement of user's 101 inputs, in conjunction with field ontology library 105, uses participle pretreatment unit 103 and sentence pattern pattern matching unit 104 to generate corresponding retrieval type.Then, retrieval type is imported into index data base 112.Index data base 112 receives retrieval type, visit corresponding body semantic indexing storehouse 113 or full-text index storehouse 114 according to the rule of correspondence, then, carry out corresponding query processing by the semantic query processing unit 116 in the query processing module 115, expanding query processing unit 117 and full-text search processing unit 118, at last, optimize result for retrieval by result optimizing order module 119, Query Result is returned to user 101.
The field ontology library 105 of the embodiment of the invention among Fig. 1 adopts the data of instrument field to analyze and make up, developed a kind of instrument that makes up field ontology library at structural data automatically according to the present invention, this instrument can carry out the structure of domain body knowledge base automatically, has greatly improved the efficient that field ontology library makes up.
Full-text index processing unit 111 among Fig. 1 adopts general disposal route and technology, the title of want deal with data resource, summary, full text etc. are set up index, so that improve the recall ratio of system during retrieval,, be not described in detail in this because correlation technique is very ripe.
Fig. 2 shows the treatment scheme of body semantic indexing processing unit 110 among Fig. 1, and details are as follows for concrete steps:
1) document obtains 201, be used to obtain the system data resource in the professional domain, document can comprise multiple forms such as html, asp, pdf, doc, txt, excel, ppt, ps, picture herein, and obtaining by web crawlers of Web page info grasped.
For example, embodiments of the invention adopt heritrix reptile framework, the seed of setting according to the user goes for asks a page, and effective URL added to wait processing in the formation, extract first link that waits in the formation then it is carried out page parsing, and extract effective text information according to the self-defining withdrawal device of user-defined-extractor, store this locality into the mirrored storage structure.Simultaneously effective URL in the page is added formation once more and wait processing, so constantly analyze, to the last one links till no any effective link, finishes the extracting of a subtask, so constantly move in circles, until having grasped required predetermined internet resource.
Data in the local professional domain database can directly be extracted from the local data resource 107 of Fig. 1, and embodiment adopts the data in the instrument and meter database in the national basis condition emphasis platform project " the advanced manufacturing and the shared net of robotization science data "; For the domain body file, can visit the field ontology library of prior establishment, directly obtain.
2) Context resolution 202, and the document that obtains by step 1) is resolved, and by document content is carried out format analysis, obtain the particular content in all kinds of documents that is:.Idiographic flow is: at first the mode with stream reads file in the internal memory, then the storage format of all types of files is analyzed, and extracts the effective information of file at last from internal memory according to storage format separately.
3) participle, part-of-speech tagging 203, this step is to step 2) in parse documents carry out word segmentation processing, and the mark of part of speech.Specifically be vocabulary in the document to be cut apart, and mark out the part of speech of each vocabulary, particularly done specific processing at the participle of professional domain vocabulary by the participle instrument of system.Wherein part of speech marks such as noun, verb, number, adjective, preposition, auxiliary word, conjunction, punctuate are respectively symbols such as n, v, m, a, p, u, c, wp.
For example, at following document content: " bimetallic system cell is to utilize the principle work that degrees of expansion is different when temperature change of two kinds of different metals.The main element of industrial bimetallic system cell is the multilayered metal film that two or more metal film stacks of usefulness force together and form." carry out the mark of participle and part of speech, last result is: " bimetallic system cell/n/ is/two kinds/m of v utilization/v difference/a metal/n when/p temperature/n change/v/n degrees of expansion/n difference/a/u principle/n work/v/u./ wp industry/n usefulness/p bimetallic system cell/n is main/b /u element/n is/one/m of v with/two kinds/m of p or/many kinds/m of c sheet metal/n laminates/v /p together/nl composition/v/u is many/a layer/q sheet metal/n./wp”。
4) the body character labeling 204, the role that vocabulary is served as in body analyzes and marks, and is labeled as C, object properties (ObjectProperty) as body genus (Class) and is labeled as OP, data attribute (DatatypeProperty) and is labeled as the mark that DP, instances of ontology (Individuals) are labeled as I etc.In addition, also can carry out more detailed mark as required, as instrument example (yb_Individuals) be labeled as yb_I, standard instance (bz_ Individuals) is labeled as bz_I etc.
For example, with above-mentioned steps 3) the result further carry out body role's judgement, be labeled as at last: " bimetallic system cell/n/yb_C is/two kinds/m/null of v/null utilization/v/OP difference/a/null metal/n/C when/p/null temperature/n/DP change/v/null/n/null degrees of expansion/n/DP difference/a/null/u/null principle/n/DP work/v/null /u/null./ wp/null industry/n/null usefulness/p/null bimetallic system cell/n/yb_C is main/b/null /u/null element/n/C is/one/m/null of v/null with/two kinds/m/null of p/null or/many kinds/m/null of c/null sheet metal/n/C laminates/v/null /p/null together/nl/null composition/v/OP/u/null is many/a/null layer/q/null sheet metal/n/C./wp/null”。
5) extract core vocabulary 205, this step is the annotation results at step 4), be that empty vocabulary is removed with wherein body role, and reservation body role is the process of non-NULL vocabulary.Generally speaking, if certain vocabulary is not admitted among the field ontology library in this field in the document, then this vocabulary is for the specialized information retrieving of field, basic is interfere information or irrelevant information, therefore, for improving the professional domain effectiveness of retrieval, needn't create index information for this speech.
It is as follows that step 4) is extracted core vocabulary: " bimetallic system cell/n/yb_C utilization/v/OP metal/n/C temperature/n/DP degrees of expansion/n/DP principle/n/DP bimetallic system cell/n/yb_C element/n/C sheet metal/n/C composition/v/OP sheet metal/n/C ".
6) synthetic semantic vector 206, with the notion that all occur in domain body in the document, be that the core word that step 5) is extracted merges into semantic vector, the middle identical concept that allows occurs repeatedly, and different positions has different influences to final document calculation of similarity degree result.
The result that core word in the step 5) is merged into behind the semantic vector is: " (bimetallic system cell utilizes, metal, and temperature, degrees of expansion, principle, bimetallic system cell, element, sheet metal is formed sheet metal) ".
7) set up semantic indexing 207,, the semantic vector that extracts is set up index based on the domain body knowledge base.
The mode that semantic indexing of the present invention is created not only can be saved the space, be promoted recall precision, reserving document semanteme to greatest extent.
Fig. 3 shows the flow process of carrying out the professional domain knowledge query based on domain body, wherein the user to import retrieve statement 301, participle, part-of-speech tagging 302 and body character labeling 303 similar with processing procedure in the front body semantic indexing processing unit 110, so, repeat no more herein.Import retrieve statement 301 by the user? after the flow processing of body character labeling 303, obtain indicating part of speech and role's participle lexical set.
For example, the user imports the nature query statement: " can measure the instrument and the manufacturer of human temperature " through the result after the processing of processes such as participle, part of speech and body character labeling is: can, v, null}, { measurement, v, ObjectProperty}, { people, n, X}{ body temperature, n, X},, u, X}, { instrument, n, yb_Class}, and, c, null}, { production firm, n, ObjectProperty}.
Below be to judge the 304 detailed process flow processs that begin from body role nonempty entry:
1) the strong semantic word finder behind 304 pairs of marks of body role nonempty entry is analyzed, and judges whether contain Ontological concept in its lexical set.
If a) the body role is sky, then utilize the lexical set visit of participle to extract core vocabulary 305, utilize core vocabulary visit full-text index storehouse 306 to carry out the full-text search matching treatment then.
For example, " children's nutrient health problems ", the lexical set of participle is: " children// nutrition/health/problem/", extraction core vocabulary is: " children/nutrition/health/", and utilize this core word to compile visit full-text index storehouse and carry out the full-text search processing.
B) if contain one or more Ontological concept in the query statement, sentence pattern pattern match 308 is visited in the processing of then extracting strong semantic vocabulary 307 then.
For example, behind " which kind of thermometer has " participle: " thermometer/n /u kind/n has/v which/r ", it is further carried out the body character labeling and extracts strong semantic vocabulary, obtain " thermometer/n/C " at last.Wherein, it should be noted that, the sentence pattern pattern is a kind of self-defining sentence pattern pattern of setting up in advance according to mutual relationship between the notion in the domain body knowledge base and each notion and inference rule etc., being based upon to a certain extent of this sentence pattern pattern also must be formulated and definition according to user requirements analysis and under domain expert's guidance.It is abundant more that the sentence pattern pattern is set up, and the effect of intelligence inquire is good more.
B1) the match is successful if contain the strong semantic word finder of Ontological concept and sentence pattern pattern M, then carries out this step, forms the intelligent retrieval formula at last;
Following is an embodiment that the match is successful:
For example, the user imports " can measure the instrument and the manufacturer of human temperature ", through participle and the word finder that extraction core vocabulary obtains at last is: " measurement/people/body temperature/instrument/manufacturer ".This retrieve statement and sentence pattern pattern M 1Be complementary.Sentence pattern pattern M 1Be defined as: " body attribute P 1+ X+ body genus C+ body attribute P 2", and exist following relation: C to have attribute P 1, P 2, wherein " X " is any composition, the concrete corresponding relation of strong semantic word finder and sentence pattern pattern match is: " measurement/(body attribute P 1) instrument/(Ontological concept C) manufacturer/(the body attribute P of the body temperature of people/(X)/(X) 2) "
In conjunction with the above embodiments, meet pattern M 1Processing rule be: instrument (body class C) is measured down (attribute P 1) value comprise " human temperature " all instrument (the body class C) example (X) and (the attribute P of manufacturer of this instrument (body class C) example 2) respective value return according to certain format, briefly will satisfy instrument example and manufacturer's form output according to the rules thereof of measuring human temperature exactly.
After the success of sentence pattern pattern match, according to the processing rule under the set pattern, the visit field ontology library through ontology inference, forms the intelligent semantic retrieval type that the compliance with system indexed format requires.
Retrieval type should be: [R 1∪ (F 1..., F m)] ∪ [R 2∪ (F 1..., F n)] ∪ ..., ∪ [R i∪ (F 1, F 2..., F k)].Wherein, m 〉=1, n 〉=1, k 〉=1, R represents the instrument that satisfies condition, F represents one or more manufacturers of instrument R correspondence.For example, work as i=1, the retrieval type during k=3 should be: R 1∪ (F 1, F 2, F 3), that is, and R 1F 1∪ R 1F 2∪ R 1F 3
B2) if contain the strong semantic word finder and the failure of sentence pattern pattern match of Ontological concept, then carry out this step, form the expansion retrieval type at last.
For example, " which the kind of thermometer has " contains Ontological concept " thermometer " in the vocabulary behind participle, but not definition in the sentence pattern pattern; In like manner, when user's input " spectrometer ", the vocabulary behind participle " spectrometer " belongs to Ontological concept, but also not definition in the sentence pattern pattern.
After the pattern match failure, visit field ontology library 309 carries out semantic extension, forms the expanding query retrieval type.Concrete processing procedure is: with the strong semantic vocabulary x in the query statement, and the related notion X in y and the field ontology library 309, Y shines upon, and according to the relationship between superior and subordinate between Ontological concept, synonymy, and other relation is carried out suitable query expansion processing.(X, X 1..., X a) ∪ (Y, Y 1..., Y b), a wherein, b is a positive integer, for example, X 1Be the synonym of X, Y 1, Y 2Be the subordinate concept of notion Y, that is, a=1, during b=2, Cha Xun retrieval type is so: (X, X 1) ∪ (Y, Y 1, Y 2), promptly.XY∪XY 1∪XY 2∪X 1Y 1∪X 1Y 2
B3) by above-mentioned steps b1) and b2) afterwards, form query and search formula 311, be specially and form corresponding semantic query retrieval type and expanding query retrieval type.Utilize query and search formula 311 visit semantic indexing storehouses 312, carry out corresponding semantic query or expanding query and handle.
2) sort result
A) semantic distance is measured
A1) the semantic distance Measurement Algorithm in sentence pattern pattern match when success: embodiment is with reference to the b1 in the step 1)) described, relevant " semantic distance " of each RF in the retrieval type calculated D RfBe phrase justice distance, the wherein D between R in the body and F two notions RfBe positive integer, its value is when connecting R and F through minimum Ontological concept node, the bar number of notion connecting line.As shown in Figure 5, have many semantic relation lines A, B can be coupled together, the shortest can couple together the two through two connecting lines, this body node, i.e. D Rf=2.d RfFor the dimension in the semantic vector of every record in the index database poor, as document semantic vector K=(a 1, a 2, a 3, a 4, a 5, a 6, a 7), a wherein 3=R, a 6=F, then d Rf=3.When R or F occurred in the document semantic vector, then the semantic distance infinity counted 10 during actual computation 3, when all not occurring, this d RfDo not do any calculating.
Semantic distance Measurement Algorithm when a2) the sentence pattern pattern match is failed: contain Ontological concept in the retrieval type of user's input, still, when its strong semantic word finder and the failure of body sentence pattern pattern match, semantic distance is measured the following mode that adopts.Embodiment is with reference to the b2 in the step 1)) described, strong semantic word finder may comprise one or more Ontological concept vocabulary, and when Ontological concept quantity was 1, the query and search formula should be: X ∪ X 1∪ ... ∪ X m, wherein, X 1... X mExpansion concept for X.Do not relate to the semantic distance problem this moment, in this case, sets D Rf=d Rf=1.When body key concept quantity when being a plurality of, the form of the query and search formula of returning is as previously described: (X, X 1..., X a) ∪ (Y, Y 1..., Y b) ∪ ..., ∪ (Z, Z 1..., Z b), at this moment, D Rf, d RfValue be the mean value of distance between the notion of combination in any retrieval type.
B) according to the semantic distance calculating of sorting
The formula that ordering is calculated is: Z=q 1* ∑ f 1(q iA i, B)+q 2* f 2(g 1(D Rf), g 2(d Rf)).
Wherein A is the vectorial matrix of forming of a plurality of retrievals that a retrieval type forms, A iBe retrieval vector among the A, ∑ is all f when i is different value 1And, B is the document semantic vector, f 1(q iA i, B) expression A i, B two vectors related function, q iBe query expansion coefficient, q i∈ (0,1], if be former notion, then q i=1, if be synonym or subordinate concept etc., then set query expansion coefficient q according to similarities different in the query expansion strategy i, as:
f 1(A i, B)=q i* (a 1+ a 2+ ...+a j) * (b 1+ b 2+ ...+b k), a wherein j, b kBe respectively A i, the notion when B two vectorial dimensions are i, and if only if a jWith b kDuring for identical concept, (A B) increases q to f certainly i
f 2(g 1, g 2) be g 1, g 2Similar function, as, f 2(g 1, g 2)=∑ q i/ (| g 1(D Rf)-g 2(d Rf) |+1).Q wherein iFor with distance D RfThe query expansion coefficient of corresponding semantic vector, g 1(D Rf) be the body semantic distance normalization function of different vectors in the same retrieval type, as g 1(D Rf)=1/D Rfg 2(d Rf) and g 1(D Rf) implication is identical, ∑ is to different q i, D Rf, d RfFollowing formula summation.q 1, q 2Be respectively two function f 1, f 2Weights.
Can pass through q 1, q 2The setting and the f of size 1, f 2, g 1, g 2Realize the adjustment of sort method Deng the modification of function.Can be kernel with this sort algorithm in addition,, can reach better effect in conjunction with other sort method commonly used.
Annotate: the full-text search sort result: according to the weights of in advance different matching areas such as title, summary, full text being set, and keyword hits information calculations similarity and orderings such as number.Concrete sort algorithm be not described in detail.
3) ranking results after the above-mentioned processing is returned to the user.
Although above-mentionedly described the present invention in detail, be to be understood that embodiments of the invention only are exemplarily to illustrate principle of the present invention, under the situation that does not break away from design of the present invention and scope, embodiments of the invention also have various variations, substitute and revise.These changes all should should not be counted as the disengaging with the spirit and scope of the present invention within the scope of the present invention.

Claims (9)

1. intelligent retrieval system based on domain body, comprise the ontology inference module that is used for analysis user input nature query statement, be used to create the index process module of index database, be used to carry out the query processing module of ad hoc inquiry, and the result optimizing order module that is used for the Query Result processing, it is characterized in that described system also comprises data repository, field ontology library and index data base;
2. the intelligent retrieval system based on domain body according to claim 1 is characterized in that, described ontology inference module comprises participle pretreatment unit and sentence pattern pattern matching unit;
Described participle pretreatment unit is used to receive the natural query statement that the user imports, and query statement is carried out pre-service such as participle, part-of-speech tagging, domain body character labeling, removes weak semantic vocabulary, obtains strong semantic lexical set;
Described sentence pattern pattern matching unit is used for the sentence pattern pattern of strong semantic lexical set and predefined is carried out matching treatment, obtains new retrieval type;
3. the intelligent retrieval system based on domain body according to claim 1 is characterized in that, described index process module comprises body semantic indexing processing unit and full-text index processing unit;
Described body semantic indexing processing unit is used to obtain the data resource document, resolves, handles and extract the document body content information, based on the synthetic semantic vector of field ontology library, sets up semantic body index database;
Described full-text index processing unit is used to obtain the data resource document, extracts document information, sets up the full-text index storehouse;
4. the intelligent retrieval system based on domain body according to claim 1 is characterized in that, described query processing module comprises semantic query processing unit, expanding query processing unit and full-text search processing unit;
Described semantic query processing unit is used for handling based on the related intelligence inquire that carries out professional domain information between domain body notion and notion;
Described expanding query processing unit is used for carrying out expanding query based on association between domain body notion and notion and handles;
The full-text search processing unit is used for according to traditional retrieval mode, promptly carries out the processing of full-text search according to the keyword matching principle.
5. the intelligent retrieval system based on domain body according to claim 1 is characterized in that, described data repository comprises the resource data in resource that local field database is interior or the field of grasping from network;
6. the intelligent retrieval system based on domain body according to claim 1 is characterized in that, described index data base comprises body semantic indexing storehouse and the full-text index storehouse of being set up by the index process module.
7. one kind according to claim 1 based on the intelligent search method of domain body, it is characterized in that described method may further comprise the steps:
A. the natural query statement of user input is carried out participle, part-of-speech tagging and carries out character labeling based on domain body;
B. analyze, judge the word finder among the above-mentioned steps A, carry out the judgement of body role nonempty entry, and inquire about accordingly according to certain rule;
C. Query Result is carried out semantic distance and measure, carry out result's optimization, and, return to the user sort result output according to the semantic distance value.
8. method according to claim 7 is characterized in that, the judgement of the body role nonempty entry described in the step B further comprises:
If B1. do not comprise Ontological concept in the natural query statement of user's input, then carry out full-text search;
If B2. comprise Ontological concept in the natural query statement of user's input, then carry out the matching judgment of ontology schema.
9. method according to claim 8 is characterized in that step B2 further comprises:
If B21. the match is successful for ontology schema, then form the semantic query retrieval type;
If B22. the match is successful for ontology schema, then visit field ontology library, carry out semantic extension and handle, and form the expanding query retrieval type.
CNA200810306721XA 2008-12-31 2008-12-31 Intelligent retrieval system and method based on domain ontology Pending CN101582073A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNA200810306721XA CN101582073A (en) 2008-12-31 2008-12-31 Intelligent retrieval system and method based on domain ontology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNA200810306721XA CN101582073A (en) 2008-12-31 2008-12-31 Intelligent retrieval system and method based on domain ontology

Publications (1)

Publication Number Publication Date
CN101582073A true CN101582073A (en) 2009-11-18

Family

ID=41364220

Family Applications (1)

Application Number Title Priority Date Filing Date
CNA200810306721XA Pending CN101582073A (en) 2008-12-31 2008-12-31 Intelligent retrieval system and method based on domain ontology

Country Status (1)

Country Link
CN (1) CN101582073A (en)

Cited By (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102073725A (en) * 2011-01-11 2011-05-25 百度在线网络技术(北京)有限公司 Method for searching structured data and search engine system for implementing same
CN102073726A (en) * 2011-01-11 2011-05-25 百度在线网络技术(北京)有限公司 Search engine system and structured data import method for search engine system
CN102073692A (en) * 2010-12-16 2011-05-25 北京农业信息技术研究中心 Agricultural field ontology library based semantic retrieval system and method
CN102156801A (en) * 2011-03-04 2011-08-17 浙江大学 Ontology-reasoning-based Chinese medicinal five-element diagnosis and treatment system
CN102479235A (en) * 2010-11-30 2012-05-30 成都致远诺亚舟教育科技有限公司 Associated search method and system of chemical knowledge
CN102508911A (en) * 2011-11-14 2012-06-20 江苏联著实业有限公司 Website knowledge structure analyzing system based on study type OWL (web ontology language) modeling
CN102521240A (en) * 2011-11-14 2012-06-27 江苏联著实业有限公司 Internet supply and demand information matching system and matching method thereof on basis of OWL (Web Ontology Language)
CN102521244A (en) * 2011-11-14 2012-06-27 江苏联著实业有限公司 User data analysis system based on learning-type OWL (Ontology of Web Language) modeling
CN102521241A (en) * 2011-11-14 2012-06-27 江苏联著实业有限公司 Semiautomatic learning type OWL (web ontology language) modeling system
CN102521239A (en) * 2011-11-14 2012-06-27 江苏联著实业有限公司 Question-answering information matching system and method based on OWL (web ontology language) for Internet
CN102880645A (en) * 2012-08-24 2013-01-16 上海云叟网络科技有限公司 Semantic intelligent search method
CN102915381A (en) * 2012-11-20 2013-02-06 公安部第三研究所 Multi-dimensional semantic based visualized network retrieval rendering system and rendering control method
CN103077107A (en) * 2012-12-31 2013-05-01 Tcl集团股份有限公司 Method and system for maintaining data
CN103324688A (en) * 2013-06-04 2013-09-25 北京大学 Retrieval method and device for ontology knowledge base
CN103885985A (en) * 2012-12-24 2014-06-25 北京大学 Real-time microblog search method and device
CN103984714A (en) * 2014-05-07 2014-08-13 湖北工业大学 Ontology semantics-based supply and demand matching method for cloud manufacturing service
CN104182454A (en) * 2014-07-04 2014-12-03 重庆科技学院 Multi-source heterogeneous data semantic integration model constructed based on domain ontology and method
CN104376110A (en) * 2014-11-27 2015-02-25 武汉理工数字传播工程有限公司 Chinese knowledge inference method based on ontology inference
CN104504163A (en) * 2015-01-21 2015-04-08 北京智富者机器人科技有限公司 Robot vision knowledge retrieval system
CN104765779A (en) * 2015-03-20 2015-07-08 浙江大学 Patent document inquiry extension method based on YAGO2s
CN105786932A (en) * 2014-12-26 2016-07-20 北大医疗信息技术有限公司 Query method and query apparatus for clinical business in medical system
CN106326422A (en) * 2016-08-24 2017-01-11 北京大学 Method and system for retrieving food security data information based on knowledge ontology
CN106649357A (en) * 2015-10-30 2017-05-10 北京国双科技有限公司 Data processing method and apparatus used for crawler program
CN106844625A (en) * 2017-01-17 2017-06-13 清华大学 The compliance checking method and device of bank's O&M rules and regulations change
CN107038225A (en) * 2017-03-31 2017-08-11 江苏飞搏软件股份有限公司 The search method of information intelligent retrieval system
CN108228701A (en) * 2017-10-23 2018-06-29 武汉大学 A kind of system for realizing Chinese near-nature forest language inquiry interface
CN108280225A (en) * 2018-02-12 2018-07-13 北京吉高软件有限公司 A kind of semantic retrieving method and searching system
CN109325068A (en) * 2018-08-10 2019-02-12 北京搜狐新媒体信息技术有限公司 A kind of method for interchanging data and device
CN109446313A (en) * 2018-10-31 2019-03-08 重庆爱思网安信息技术有限公司 A kind of ordering system and method based on natural language analysis
CN110377706A (en) * 2019-07-25 2019-10-25 腾讯科技(深圳)有限公司 Search statement method for digging and equipment based on deep learning
CN110609875A (en) * 2019-08-26 2019-12-24 华北电力大学(保定) ESI (electronic information System) cross-period data intelligent retrieval method
CN112883165A (en) * 2021-03-16 2021-06-01 山东亿云信息技术有限公司 Intelligent full-text retrieval method and system based on semantic understanding
CN113742471A (en) * 2021-09-15 2021-12-03 重庆大学 Vector retrieval type dialogue method of general question-answering system

Cited By (50)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102479235A (en) * 2010-11-30 2012-05-30 成都致远诺亚舟教育科技有限公司 Associated search method and system of chemical knowledge
CN102479235B (en) * 2010-11-30 2014-04-16 成都致远诺亚舟教育科技有限公司 Associated search method and system of chemical knowledge
CN102073692B (en) * 2010-12-16 2016-04-27 北京农业信息技术研究中心 Based on the semantic retrieval system and method for agriculture field ontology library
CN102073692A (en) * 2010-12-16 2011-05-25 北京农业信息技术研究中心 Agricultural field ontology library based semantic retrieval system and method
CN102073725A (en) * 2011-01-11 2011-05-25 百度在线网络技术(北京)有限公司 Method for searching structured data and search engine system for implementing same
CN102073726A (en) * 2011-01-11 2011-05-25 百度在线网络技术(北京)有限公司 Search engine system and structured data import method for search engine system
CN102073726B (en) * 2011-01-11 2014-08-06 百度在线网络技术(北京)有限公司 Structured data import method and device for search engine system
CN102156801A (en) * 2011-03-04 2011-08-17 浙江大学 Ontology-reasoning-based Chinese medicinal five-element diagnosis and treatment system
CN102521240A (en) * 2011-11-14 2012-06-27 江苏联著实业有限公司 Internet supply and demand information matching system and matching method thereof on basis of OWL (Web Ontology Language)
CN102521239A (en) * 2011-11-14 2012-06-27 江苏联著实业有限公司 Question-answering information matching system and method based on OWL (web ontology language) for Internet
CN102521241A (en) * 2011-11-14 2012-06-27 江苏联著实业有限公司 Semiautomatic learning type OWL (web ontology language) modeling system
CN102521244A (en) * 2011-11-14 2012-06-27 江苏联著实业有限公司 User data analysis system based on learning-type OWL (Ontology of Web Language) modeling
CN102521240B (en) * 2011-11-14 2013-06-19 江苏联著实业有限公司 Internet supply and demand information matching system and matching method thereof on basis of OWL (Web Ontology Language)
CN102508911A (en) * 2011-11-14 2012-06-20 江苏联著实业有限公司 Website knowledge structure analyzing system based on study type OWL (web ontology language) modeling
CN102521241B (en) * 2011-11-14 2014-05-14 江苏联著实业有限公司 Semiautomatic learning type OWL (web ontology language) modeling system
CN102880645A (en) * 2012-08-24 2013-01-16 上海云叟网络科技有限公司 Semantic intelligent search method
CN102880645B (en) * 2012-08-24 2015-12-16 上海云叟网络科技有限公司 The intelligent search method of semantization
CN102915381A (en) * 2012-11-20 2013-02-06 公安部第三研究所 Multi-dimensional semantic based visualized network retrieval rendering system and rendering control method
CN103885985B (en) * 2012-12-24 2018-05-18 北京大学 Microblogging real-time search method and device
CN103885985A (en) * 2012-12-24 2014-06-25 北京大学 Real-time microblog search method and device
CN103077107A (en) * 2012-12-31 2013-05-01 Tcl集团股份有限公司 Method and system for maintaining data
CN103324688A (en) * 2013-06-04 2013-09-25 北京大学 Retrieval method and device for ontology knowledge base
CN103984714B (en) * 2014-05-07 2017-02-01 湖北工业大学 Ontology semantics-based supply and demand matching method for cloud manufacturing service
CN103984714A (en) * 2014-05-07 2014-08-13 湖北工业大学 Ontology semantics-based supply and demand matching method for cloud manufacturing service
CN104182454A (en) * 2014-07-04 2014-12-03 重庆科技学院 Multi-source heterogeneous data semantic integration model constructed based on domain ontology and method
CN104182454B (en) * 2014-07-04 2018-03-27 重庆科技学院 The integrated model of multi-source heterogeneous data semantic based on domain body structure and method
CN104376110A (en) * 2014-11-27 2015-02-25 武汉理工数字传播工程有限公司 Chinese knowledge inference method based on ontology inference
CN105786932A (en) * 2014-12-26 2016-07-20 北大医疗信息技术有限公司 Query method and query apparatus for clinical business in medical system
CN105786932B (en) * 2014-12-26 2020-03-27 北大医疗信息技术有限公司 Query method and query device for clinical business in medical system
CN104504163A (en) * 2015-01-21 2015-04-08 北京智富者机器人科技有限公司 Robot vision knowledge retrieval system
CN104765779A (en) * 2015-03-20 2015-07-08 浙江大学 Patent document inquiry extension method based on YAGO2s
CN106649357A (en) * 2015-10-30 2017-05-10 北京国双科技有限公司 Data processing method and apparatus used for crawler program
CN106326422A (en) * 2016-08-24 2017-01-11 北京大学 Method and system for retrieving food security data information based on knowledge ontology
CN106326422B (en) * 2016-08-24 2019-09-17 北京大学 A kind of method and system of the food safety data information retrieval of knowledge based ontology
CN106844625B (en) * 2017-01-17 2020-07-28 清华大学 Method and device for checking compliance of bank operation and maintenance regulation and change
CN106844625A (en) * 2017-01-17 2017-06-13 清华大学 The compliance checking method and device of bank's O&M rules and regulations change
CN107038225A (en) * 2017-03-31 2017-08-11 江苏飞搏软件股份有限公司 The search method of information intelligent retrieval system
CN108228701A (en) * 2017-10-23 2018-06-29 武汉大学 A kind of system for realizing Chinese near-nature forest language inquiry interface
CN108280225A (en) * 2018-02-12 2018-07-13 北京吉高软件有限公司 A kind of semantic retrieving method and searching system
CN108280225B (en) * 2018-02-12 2021-05-28 北京吉高软件有限公司 Semantic retrieval method and semantic retrieval system
CN109325068A (en) * 2018-08-10 2019-02-12 北京搜狐新媒体信息技术有限公司 A kind of method for interchanging data and device
CN109325068B (en) * 2018-08-10 2021-03-23 北京搜狐新媒体信息技术有限公司 Data exchange method and device
CN109446313A (en) * 2018-10-31 2019-03-08 重庆爱思网安信息技术有限公司 A kind of ordering system and method based on natural language analysis
CN109446313B (en) * 2018-10-31 2020-10-02 重庆爱思网安信息技术有限公司 Sequencing system and method based on natural language analysis
CN110377706A (en) * 2019-07-25 2019-10-25 腾讯科技(深圳)有限公司 Search statement method for digging and equipment based on deep learning
CN110377706B (en) * 2019-07-25 2022-10-14 腾讯科技(深圳)有限公司 Search sentence mining method and device based on deep learning
CN110609875A (en) * 2019-08-26 2019-12-24 华北电力大学(保定) ESI (electronic information System) cross-period data intelligent retrieval method
CN112883165A (en) * 2021-03-16 2021-06-01 山东亿云信息技术有限公司 Intelligent full-text retrieval method and system based on semantic understanding
CN113742471A (en) * 2021-09-15 2021-12-03 重庆大学 Vector retrieval type dialogue method of general question-answering system
CN113742471B (en) * 2021-09-15 2023-09-12 重庆大学 Vector retrieval type dialogue method of Pu-Fa question-answering system

Similar Documents

Publication Publication Date Title
CN101582073A (en) Intelligent retrieval system and method based on domain ontology
Tang et al. Using Bayesian decision for ontology mapping
Chirita et al. P-tag: large scale automatic generation of personalized annotation tags for the web
CN111291161A (en) Legal case knowledge graph query method, device, equipment and storage medium
CN100416570C (en) FAQ based Chinese natural language ask and answer method
US20150254230A1 (en) Method and system for monitoring social media and analyzing text to automate classification of user posts using a facet based relevance assessment model
CN102609512A (en) System and method for heterogeneous information mining and visual analysis
CN103838833A (en) Full-text retrieval system based on semantic analysis of relevant words
CN112507109A (en) Retrieval method and device based on semantic analysis and keyword recognition
CN114090861A (en) Education field search engine construction method based on knowledge graph
Huang et al. Design and implementation of oil and gas information on intelligent search engine based on knowledge graph
CN113610626A (en) Bank credit risk identification knowledge graph construction method and device, computer equipment and computer readable storage medium
WO2012091541A1 (en) A semantic web constructor system and a method thereof
Yang et al. Semantic web information retrieval based on the Wordnet
Wen et al. KAT: Keywords-to-SPARQL translation over RDF graphs
Nunes et al. Can entities be friends?
Suryanarayana et al. Stepping towards a semantic web search engine for accurate outcomes in favor of user queries: Using RDF and ontology technologies
Weikum et al. Temporal knowledge for timely intelligence
Layfield et al. Experiments with document retrieval from small text collections using latent semantic analysis or term similarity with query coordination and automatic relevance feedback
JP2009528581A (en) Knowledge correlation search engine
Du et al. The research of the semantic search engine based on the ontology
Van de Maele et al. An ontology-based crawler for the semantic web
Ganta et al. Search engine optimization through spanning forest generation algorithm
Hassan et al. An Overview of Schema Extraction and Matching Techniques
Er et al. Set of tuples expansion by example with reliability

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Open date: 20091118