CN106528875A - Probability mode matching-based keyword query transformation and distribution system and method - Google Patents

Probability mode matching-based keyword query transformation and distribution system and method Download PDF

Info

Publication number
CN106528875A
CN106528875A CN201611128587.XA CN201611128587A CN106528875A CN 106528875 A CN106528875 A CN 106528875A CN 201611128587 A CN201611128587 A CN 201611128587A CN 106528875 A CN106528875 A CN 106528875A
Authority
CN
China
Prior art keywords
matching
interface
query
inquiry
integrated
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201611128587.XA
Other languages
Chinese (zh)
Inventor
姜芳艽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Normal University
Original Assignee
Jiangsu Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Normal University filed Critical Jiangsu Normal University
Priority to CN201611128587.XA priority Critical patent/CN106528875A/en
Publication of CN106528875A publication Critical patent/CN106528875A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database

Abstract

The invention provides a probability mode matching-based keyword query transformation and distribution system and method, and belongs to a keyword query transformation and distribution system and method. The keyword query transformation and distribution system comprises a keyword query interface, an integrated query interface, a Web database query interface, a keyword query transformation module and a query distribution module; a user submits a keyword query at the keyword query interface, the keyword query transformation module transforms the keyword query of the user into the query of the integrated query interface, the query distribution module further distributes the query to various Web database query interfaces for querying various Web databases. The probability mode matching-based keyword query transformation and distribution system and method have the advantages that the user is provided with the interface for submitting the keyword query, and the method is simple and fast; the matching efficiency, the matching rationality and the matching accuracy are improved; and the whole system is automatically completed, the user only needs to submit the keyword query and the system transforms the query into the integrated interface, thereby distributing the query to the Web database query interfaces.

Description

A kind of keyword query conversion matched based on conceptual schema and dissemination system and method
Technical field
The present invention relates to a kind of keyword query conversion and dissemination system and method, particularly a kind of to be based on conceptual schema Keyword query conversion and dissemination system and the method matched somebody with somebody.
Background technology
User can be divided into two steps to keyword query conversion:First the keyword query of user is transformed into structurized Integrated query interface, is further distributed on each web data library inquiry interface.The two steps all rely on correct pattern Matching.The first step is to carry out pattern match using example information, and second step is that Land use models information and example information enter row mode Matching.
The research of automatic mode matching process had considerable progress in 10 years in the past.Some prototype systems are automatically or half There is new breakthrough in terms of automatic mode matching correct.Such as:The systems such as SEMINT, Cupid, DIKE, COMA.But in these systems In, the method for pattern match is all, in the matching result for finding determination, to ignore the uncertainty of pattern match.It is noticeable It is that pattern match among these has uncertainty.
On the one hand, the keyword query that user submits to often only has property value and does not include attribute-name, due to attribute-name Disappearance causes the semanteme of attribute to become to be difficult to judge, when mapping that on the integrated query interface of corresponding structuring, may Generation is various certain rational matching result.
On the other hand, matching between integrated interface pattern and the pattern of each web data library inquiry interface also tends to be not Accurately.Firstly, since each web data storehouse is designed in different when and wheres by different tissues or individual, autonomy is very By force, the complexity and multiformity of content and form are caused, this proposes bigger challenge to the accuracy of pattern match;And As web data storehouse is constantly in dynamic change, it has been observed that the average every three months of web data library inquiry interface will occur Some changes, which results in existing integrated interface and often fail to pattern match Jing of each web data library inquiry interface;Furthermore, The inaccurate of automatic mode matching extraction also increased uncertain factor to follow-up pattern match.Therefore, integrated interface pattern There is many uncertainties in matching between each web data library inquiry interface modes.
The content of the invention
The invention aims to provide it is a kind of based on conceptual schema match keyword query conversion with dissemination system and Method, solves, how on the premise of uncertain presence, to carry out user key words inquiry and effectively and accurately change and distribute Problem.
The object of the present invention is achieved like this:The present invention includes key for the conversion and distribution of user key words inquiry Word inquiry conversion and dissemination system and keyword query conversion and distribution method.
Described keyword query conversion is included with dissemination system:Keyword query interface, integrated query interface, Web numbers According to library inquiry interface, keyword query modular converter and inquiry distribution module;User submits keyword in keyword query interface Inquiry, the keyword query of user is converted to the inquiry of integrated query interface by keyword query modular converter, then by inquiring about Inquiry is further distributed to each web data library inquiry interface by distribution module, and each web data storehouse is inquired about.
Described keyword query interface, submits inquiry request to for user, that is, submits key word of the inquiry to;
Described integrated query interface, is to extract the structuralized query for integrating to connect by each web data library inquiry interface Mouthful;
Described web data library inquiry interface, is the database query interface in the same field for obtaining of swashing from Web;
Described keyword query modular converter, for the inquiry that user on keyword query interface submits to is changed to knot On the integrated query interface of structureization;
Described inquiry distribution module, for the inquiry on integrated query interface is distributed to web data library inquiry interface On.
Described keyword query modular converter is further included:Change data type analysis submodule, based on probability Translative mode matched sub-block and keyword transform subblock;There is identical data class by data type analysis submodule block analysis first The keyword of type and the concept of integrated interface are right as potentially matching;Then by the pattern match submodule based on probability Block calculate may matching to matching probability;It is last the keyword query of user to be changed to integrated by keyword transform subblock Query interface.
Described change data type analysis submodule, for analyzing keyword, the structuring integrated interface of user's inquiry Each concept belonging to data type;The described translative mode matched sub-block based on probability, for calculating each possible matching Matching probability, different data types adopt different computational methods, calculate the probability of matching;Described keyword conversion Submodule, each matching probability is sorted in descending order, selects and obtain optimization model matching result, will be used on keyword query interface The inquiry that family is submitted to is changed to the integrated query interface of structuring.
Described inquiry distribution module is further included:Distribution data type analysis submodule, the distribution mould based on probability Formula matched sub-block and inquiry distribution submodule;First have by data type analysis submodule block analysis that same data type is integrated to be connect The concept and each attribute of each web data library inquiry interface of mouth is right as potentially matching;Then by the pattern based on probability Matched sub-block calculate may matching to matching probability;It is last the inquiry of integrated interface to be distributed by pass inquiry distribution submodule To each attribute of each web data library inquiry interface.
Described distribution data type analysis submodule, each concept and each Web numbers for analytical structure integrated interface According to the data type belonging to each attribute of library inquiry interface;The described distribution mode matched sub-block based on probability, for calculating Each matching probability that may be matched, different data types adopt different computational methods, calculate the probability of matching;Described Inquiry distribution submodule, each matching probability is sorted in descending order, optimization model matching result is selected and obtain, integrated inquiry is connect Inquiry on mouth is distributed on web data library inquiry interface.
Described keyword query conversion is comprised the following steps with distribution method:
Step A:The keyword query of user is converted to into the inquiry of the integrated query interface of structuring;Inquired about using user Candidate's value information of keyword message and each concept of integrated query interface, finds keyword query pattern and integrated query interface mould Pattern match relation between formula, between the related notion set up in user's key word of the inquiry and integrated structuralized query interface Corresponding relation, so as to the keyword query of user to be converted to the inquiry of the integrated query interface of structuring;
Step B:The structuralized query of integrated interface is distributed to into the inquiry of each web data library inquiry interface;Using integrated The pattern information of interface, the pattern information of candidate's value information and web data library inquiry interface, candidate's value information, find integrated looking into The matching relationship between interface modes and each web data library inquiry interface modes is ask, each concept of integrated query interface and Web is set up The pattern corresponding relation of each attribute of database query interface, so as to the structuralized query of integrated interface is distributed to each web data Library inquiry interface;The pattern of described integrated interface is concept and corresponding tag name set.
Described step A and step B include the extension step difference of each step of extension step, step A and step B It is identical, comprise the following steps that:
Step A1 or step B1:The optimization matched based on data type.Whether data type is identical to be carried out in fact Between the concept of keyword and integrated query interface or integrated query interface concept and web data library inquiry interface attribute Between the prerequisite that whether matches, i.e. keyword often with its data type identical concept matching, concept is also only and data Type identical attributes match.It is exactly that the occurrence of same type is placed on into one group based on the matching optimization of data type, carries out With probability calculation, matching probability calculating is otherwise no longer carried out;Described data type includes text-type, numeric type and time type.
Step A2 or step B2:When matching for the occurrence of different data types, using different probability meters Calculation method, obtains the Optimum Matching result matched based on conceptual schema.
Step A3 or step B3:The keyword query of user is matched into integrated query interface, then by integrated query interface Each web data library inquiry interface is distributed to further.
Described step A2 and step B2 include the extension step of each step of extension step, step A2 and step B2 Difference is identical, comprises the following steps that:
Step A21 or B21 steps:For the matching of character type data type, using the calculating based on similarity of character string Method obtains matching probability, and ripe similarity of character string computational methods are many at present, such as Levenshtein distance, Affine gap distance, Jaro distance, Q-gram distance, Similarity Measure result are general as matching Rate;
Step A22 or B22 steps:For the matching of digital data type, based on the coverage condition of numerical example, (1) Without covering;(2) loose portions are covered;(3) it is loose to cover;(4) single constraint is covered;(5) Complex Constraints are covered, and enter row mode The beta pruning matched somebody with somebody, obtains possible matching right;
Step A23 or B23 steps:It is right for the matching of possible digital data type, matching probability calculating is carried out, will Each matching probability is sorted in descending order, and select maximum probability a pair are right as first matching, is deleted comprising this matching centering The matching probability of any one, selects maximum a pair right as second matching ..., successively from remaining matching probability Analogize, until finding all of occurrence.
Described step A23 and step B23 include the extension step phase of extension step, step A23 and step B23 Together, comprise the following steps that:
If data-oriented is two values data m and n, matching probability is:
If data-oriented is the set S of two groups of discrete digital datas1And S2, S1={ n1,n2,n3,…},S2={ m1, m2,m3..., both duplicate data number reflect similarity degree, then matching probability is:
If data-oriented is the digital collection R of two groups of range types1And R2, R1={ s1,s2,s3,…},R2={ t1,t2, t3..., both overlapping degrees reflect similarity degree, then matching probability is:
Wherein, 0≤P≤1, and the value for calculating is bigger, the matching degree of two is bigger.
Beneficial effect, as a result of such scheme, by the invention it is possible to the keyword query submitted user to automatically Efficiently it is transformed on structuring integrated interface, and then is distributed on each web data library inquiry interface, with the high spy of accuracy Point, is the basis for realizing obtaining automatically web data storehouse information.
User can submit simple keyword to as querying condition, and existing system typically provides only complicated integrated inquiry and connects Mouthful, user need to submit corresponding querying condition to according to the label of interface, and process is comparatively laborious.Had based on conceptual schema matching process Help to find more rational matching right, existing method is usually to immediately arrive at the conclusion for whether matching, and the present invention is to calculate The matching probability of possible pattern match.Whole system is automatically performed, and the processing efficient of matching, the result of matching are more reasonable.
Solve how it is uncertain exist on the premise of, carry out user key words inquiry effectively and accurately conversion and The problem of distribution, has reached the purpose of the present invention.
Advantage:1. it is supplied to user to submit the interface of keyword query to, submits structure in integrated query interface with existing Change inquiry to compare, more simple and fast.2. the matching optimization based on data type is limited to the matching of pattern with identical number According to the matching of type between, the efficiency of matching is improve than prior art.3. different data types adopts different probability Computational methods, and the matching process of digital data is proposed, prior art does not propose preferably to match to digital data Method, our method improve the reasonability and accuracy of matching.4. in digital data matching primitives, it is proposed that beta pruning Method, further increases the accuracy of matching.5. whole system is automatically performed, and user need to only submit keyword query, system to By inquiry conversion to integrated interface, and then it is distributed to each web data library inquiry interface.Existing system provide only from integrated interface to The conversion of each web data library inquiry interface, be not directed to that keyword query to integrated query interface changes the step for.
Description of the drawings:
Fig. 1 is a kind of keyword query conversion matched based on conceptual schema of the present invention and the block diagram of dissemination system.
Fig. 2 is the flow chart of the keyword query modular converter of the present invention.
Fig. 3 is the flow chart of the inquiry distribution module of the present invention.
Fig. 4 is the matching optimization schematic diagram based on data type of the present invention.
Fig. 5 is candidate value set V in embodiments of the inventioniAnd VjCovering relation schematic diagram.
Specific embodiment
The present invention includes keyword query conversion and dissemination system and pass for the conversion and distribution of user key words inquiry Key word inquiry conversion and distribution method.
Described keyword query conversion is included with dissemination system:Keyword query interface, integrated query interface, Web numbers According to library inquiry interface, keyword query modular converter and inquiry distribution module;User submits keyword in keyword query interface Inquiry, the keyword query of user is converted to the inquiry of integrated query interface by keyword query modular converter, then by inquiring about Inquiry is further distributed to each web data library inquiry interface by distribution module, and each web data storehouse is inquired about.
Described keyword query interface, submits inquiry request to for user, that is, submits key word of the inquiry to;
Described integrated query interface, is to extract the structuralized query for integrating to connect by each web data library inquiry interface Mouthful;
Described web data library inquiry interface, is the database query interface in the same field for obtaining of swashing from Web;
Described keyword query modular converter, for the inquiry that user on keyword query interface submits to is changed to knot On the integrated query interface of structureization;
Described inquiry distribution module, for the inquiry on integrated query interface is distributed to web data library inquiry interface On.
Described keyword query modular converter is further included:Change data type analysis submodule, based on probability Translative mode matched sub-block and keyword transform subblock;There is identical data class by data type analysis submodule block analysis first The keyword of type and the concept of integrated interface are right as potentially matching;Then by the pattern match submodule based on probability Block calculate may matching to matching probability;It is last the keyword query of user to be changed to integrated by keyword transform subblock Query interface.
Described change data type analysis submodule, for analyzing keyword, the structuring integrated interface of user's inquiry Each concept belonging to data type;The described translative mode matched sub-block based on probability, for calculating each possible matching Matching probability, different data types adopt different computational methods, calculate the probability of matching;Described keyword conversion Submodule, each matching probability is sorted in descending order, selects and obtain optimization model matching result, will be used on keyword query interface The inquiry that family is submitted to is changed to the integrated query interface of structuring.
Described inquiry distribution module is further included:Distribution data type analysis submodule, the distribution mould based on probability Formula matched sub-block and inquiry distribution submodule;First have by data type analysis submodule block analysis that same data type is integrated to be connect The concept and each attribute of each web data library inquiry interface of mouth is right as potentially matching;Then by the pattern based on probability Matched sub-block calculate may matching to matching probability;It is last the inquiry of integrated interface to be distributed by pass inquiry distribution submodule To each attribute of each web data library inquiry interface.
Described distribution data type analysis submodule, each concept and each Web numbers for analytical structure integrated interface According to the data type belonging to each attribute of library inquiry interface;The described distribution mode matched sub-block based on probability, for calculating Each matching probability that may be matched, different data types adopt different computational methods, calculate the probability of matching;Described Inquiry distribution submodule, each matching probability is sorted in descending order, optimization model matching result is selected and obtain, integrated inquiry is connect Inquiry on mouth is distributed on web data library inquiry interface.
Described keyword query conversion is comprised the following steps with distribution method:
Step A:The keyword query of user is converted to into the inquiry of the integrated query interface of structuring;Inquired about using user Candidate's value information of keyword message and each concept of integrated query interface, finds keyword query pattern and integrated query interface mould Pattern match relation between formula, between the related notion set up in user's key word of the inquiry and integrated structuralized query interface Corresponding relation, so as to the keyword query of user to be converted to the inquiry of the integrated query interface of structuring;
Step B:The structuralized query of integrated interface is distributed to into the inquiry of each web data library inquiry interface;Using integrated The pattern information of interface, the pattern information of candidate's value information and web data library inquiry interface, candidate's value information, find integrated looking into The matching relationship between interface modes and each web data library inquiry interface modes is ask, each concept of integrated query interface and Web is set up The pattern corresponding relation of each attribute of database query interface, so as to the structuralized query of integrated interface is distributed to each web data Library inquiry interface;The pattern of described integrated interface is concept and corresponding tag name set.
Described step A and step B include the extension step difference of each step of extension step, step A and step B It is identical, comprise the following steps that:
Step A1 or step B1:The optimization matched based on data type.Whether data type is identical to be carried out in fact Between the concept of keyword and integrated query interface or integrated query interface concept and web data library inquiry interface attribute Between the prerequisite that whether matches, i.e. keyword often with its data type identical concept matching, concept is also only and data Type identical attributes match.It is exactly that the occurrence of same type is placed on into one group based on the matching optimization of data type, carries out With probability calculation, matching probability calculating is otherwise no longer carried out;Described data type includes text-type, numeric type and time type.
Step A2 or step B2:When matching for the occurrence of different data types, using different probability meters Calculation method, obtains the Optimum Matching result matched based on conceptual schema.
Step A3 or step B3:The keyword query of user is matched into integrated query interface, then by integrated query interface Each web data library inquiry interface is distributed to further.
Described step A2 and step B2 include the extension step of each step of extension step, step A2 and step B2 Difference is identical, comprises the following steps that:
Step A21 or B21 steps:For the matching of character type data type, using the calculating based on similarity of character string Method obtains matching probability, and ripe similarity of character string computational methods are many at present, such as Levenshtein distance, Affine gap distance, Jaro distance, Q-gram distance, Similarity Measure result are general as matching Rate;
Step A22 or B22 steps:For the matching of digital data type, based on the coverage condition of numerical example, (1) Without covering;(2) loose portions are covered;(3) it is loose to cover;(4) single constraint is covered;(5) Complex Constraints are covered, and enter row mode The beta pruning matched somebody with somebody, obtains possible matching right;
Step A23 or B23 steps:It is right for the matching of possible digital data type, matching probability calculating is carried out, will Each matching probability is sorted in descending order, and select maximum probability a pair are right as first matching, is deleted comprising this matching centering The matching probability of any one, selects maximum a pair right as second matching ..., successively from remaining matching probability Analogize, until finding all of occurrence.
Described step A23 and step B23 include the extension step phase of extension step, step A23 and step B23 Together, comprise the following steps that:
If data-oriented is two values data m and n, matching probability is:
If data-oriented is the set S of two groups of discrete digital datas1And S2, S1={ n1,n2,n3,…},S2={ m1, m2,m3..., both duplicate data number reflect similarity degree, then matching probability is:
If data-oriented is the digital collection R of two groups of range types1And R2, R1={ s1,s2,s3,…},R2={ t1,t2, t3..., both overlapping degrees reflect similarity degree, then matching probability is:
Wherein, 0≤P≤1, and the value for calculating is bigger, the matching degree of two is bigger.
With reference to accompanying drawing, the present invention is described in further detail.
Embodiment 1:In order to the present invention and its advantage is more fully understood, in conjunction with drawings and the specific embodiments to the present invention It is further described in detail.
The system of the present invention is illustrated with reference to Fig. 1 first.The present invention provides a kind of pass matched based on conceptual schema Key word inquiry conversion and dissemination system, including:Keyword query interface, integrated query interface, web data library inquiry interface, pass Key word inquiry modular converter, inquiry distribution module.
User submits inquiry request in keyword query interface, that is, submit key word of the inquiry to;
By keyword query modular converter, the example of the disappearance pattern information (attribute-name) inquired about using user is (crucial Word) information and each concept of integrated query interface example (candidate value) information, find keyword query pattern and connect with integrated inquiry Pattern match relation between mouth mold formula, the related notion set up in user's key word of the inquiry and integrated structuralized query interface Between corresponding relation, the keyword query of user is converted to into the inquiry of the integrated query interface of structuring.Fig. 2 is referred to subsequently, The keyword query modular converter is illustrated in greater detail.
By inquiring about distribution module, using pattern (the concept and corresponding tag name set) information of integrated interface, example The pattern (attribute-name) of (candidate value) information and web data library inquiry interface, example (candidate value) information, find integrated inquiry and connect Matching relationship between mouth mold formula and each web data library inquiry interface modes, sets up each concept of integrated query interface and web data The structuralized query of integrated interface is distributed to each web data library inquiry and is connect by the pattern corresponding relation of each attribute of library inquiry interface The inquiry of mouth.Fig. 3 is subsequently referred to, the inquiry distribution module is illustrated in greater detail.
Below with reference to Fig. 2, Fig. 3, Fig. 4 and Fig. 5, a kind of keyword query point matched based on conceptual schema to the present invention The method sent out is described in detail.
Such as Fig. 2 and Fig. 3, keyword query modular converter and inquiry distribution module, including:Data type analysis submodule, Based on the pattern match submodule of probability, and inquiry conversion (or distribution) submodule.
Data type analysis submodule, each concept and each web data library inquiry for analytical structure integrated interface connect Data type belonging to each attribute of mouth.It is the prerequisite that possible match with same data type.
Based on the pattern match submodule of probability, for calculating each matching probability that may be matched.Different data types Using different computational methods, the probability of matching is calculated.
Inquiry conversion (distribution) submodule, each matching probability is sorted in descending order, by the inquiry on integrated query interface point It is sent on web data library inquiry interface.
As shown in figure 4, data type analysis submodule is occurrence to be grouped by data type:Discounting for number According to type, by the keyword { Key of inquiry1,Key2,…,KeykConcept { C with integrated query interface1,C2,…,CjOne by one Match somebody with somebody, then possible matching result quantity isWherein j is the concept number of complex query interface, and k is the pass that user submits to Key word number, because the concept of generally complicated integrated query interface is typically at 50 or so, amount of calculation is than larger.
If only carrying out matching one by one between the keyword of same data type and concept, possible matching result quantity For:
Wherein, k1,k2And k3It is the number comprising text-type, numeric type and time data, j in keyword respectively1,j2With j3The number of text-type, numeric type and time data is included in the concept for being integrated query interface respectively.Due to
So being first grouped to occurrence by data type.
As shown in figure 5, the pattern match submodule based on probability carries out beta pruning to match condition first, then calculating may The matching probability of matching.
For 2- ties up the pruning method that search space ties up the pattern match of concept space with 2-:Assume two on integrated interface Individual concept CiAnd Cj(i<>J), CiWith CjCandidate's value set be respectively ViAnd Vj(being numeric type), user's inquiry is comprising number Font keyword Key1And Key2, and Key1And Key2Must be with CiWith CjIn one matching.Another A is matched with B, is denoted as { A, B }.Then ViWith VjThere are following five kinds of situations, as shown in Figure 5:
(1) without covering:
Vi∩Vj=Φ, i.e. concept CiWith concept CjCandidate value between there is no the relation of any covering, such as Fig. 5 (a). Keyword Key1<>Key2, and Key1,Key2∈(ViOr Vj), then according to the size of the value of keyword accurately can judge with generally Read CiAnd CjMatching relationship be one of following situation:
If 1) Key1∈Vi,Key2∈Vj, then matching result be:{Key1,Ci, { Key2,Cj};
If 2) Key1∈Vj,Key2∈Vi, then matching result be:{Key1,Cj, { Key2,Ci};
(2) loose portions are covered:
Vi∩Vj<>Φ, i.e. concept CiWith concept CjCandidate value between there is the relation for covering mutually, ViAnd VjBetween nothing Restriction relation is difficult to capture its restriction relation, but level of coverage is smaller, such as Fig. 5 (b).Typically, for key Word Key1And Key2
If 1)AndMatching process is with (1);
If 2) Key1∈(Vi∩Vj) and Key2∈(Vj-Vi), according to the alternative of matching, obtain matching result { Key1, Ci, { key2,Cj};Vice versa;
If 3) Key1∈(Vi∩Vj) and Key2∈(Vi∩Vj), then can not judge matching relationship, need further to calculate two The similarity of person.
(3) it is loose to cover:
Vi∩Vj<>Φ, i.e. concept CiWith concept CjCandidate value between there is the relation for covering mutually, ViAnd VjBetween nothing Restriction relation is difficult to capture its restriction relation, and level of coverage than larger, such as Fig. 5 (c).Can not then judge that matching is closed System, needs further to calculate both similarities.
(4) single constraint is covered:
Vi∩Vj<>Φ, i.e. concept CiWith concept CjCandidate value between there is the relation for covering mutually, and ViAnd VjBetween Have simple restriction relation, for example, for concept CiWith concept CjA pair of values<Vi,Vj>, ViV is less than alwaysj, such as Fig. 5 (d).If Key1<Key2, then corresponding matching is { Key1,Ci, { Key2,Cj}.Usually, if concept CiWith concept CjAppoint Anticipate a pair of values<Vi,Vj>:
If 1) Vi<VjAnd Key1,Key2∈Vi,VjIf, Key1<Key2, then matching result be:{Key1,Ci, { Key2,Cj};
If 2) Vi>VjAnd Key1,Key2∈Vi,VjIf, Key1<Key2, then matching result be:{Key1,Cj, { Key2,Ci};
Other situations, the rest may be inferred.
(5) Complex Constraints are covered:
Vi∩Vj<>Φ, i.e. concept CiWith concept CjCandidate value between there is the relation for covering mutually, and ViAnd VjBetween Have complexity restriction relation, for example, for concept CiWith concept CjA pair of values<Vi,Vj>, i.e. ViWith VjThe constraint of satisfaction Relation is not single, but ViWith VjSingle constraint is met in segmentation and covers (the 4th kind of situation), i.e., meet single on each section One constraint is covered, such as Fig. 5 (e).We first can be segmented, and judge keyword and concept further according to the segmentation of above-mentioned 4th kind of method Matching relationship.Usually, concept CiWith concept CjAny pair value<Vi,Vj>:
If 1)<Vi,Vj>∈Segment1,Vi<VjAnd Key1<Key2, then matching result be:{Key1,Ci, { Key2,Cj};
If 2)<Vi,Vj>∈Segment2,Vi>VjAnd Key1<Key2, then matching result be:{Key1,Cj, { Key2,Ci};
Other situations, the rest may be inferred.
Pruning method extension n- dimension search spaces tie up concept space with m-
Forgoing describe the pruning method that 2- dimension search spaces tie up the pattern match of concept space with 2-.It is n- below to this Dimension search space ties up the extension of concept space with m-.
The dimension difference of search space directly affects the result of matching.If inquiry only includes a keyword Key1, other Condition is constant, Vi<VjAnd Key1∈Vi,Vj, then matching result is not but unique, matches { Key1,CiAnd { Key1,CjProbability It is respectively 50%.On the other hand, the dimension of concept space is higher, and candidate concepts subspace is more, it is thus possible to matching result get over It is many.For example, there are 2- dimension inquiry Q={ Key1,Key2And 3- dimension concept { C1,C2,C3, candidate concepts subspace is { C1,C2, {C2,C3, { C1,C3, concept { C is tieed up with 2-1,C2Compare, candidate concepts subspace increases, and possible matching result is also correspondingly Increase.
Therefore the dimension and concept space dimension of search space will all have influence on last matching result.For n- dimensions are inquired about Concept space, and n≤m are tieed up with m- in space, then possible matching number is
Beta pruning is carried out to pattern match reduces the calculating that may be matched in a large number, inquiry conversion is greatly improved and distributes Efficiency.
The meaning of this method is, it is considered to uncertain pattern match, calculates the probability of matching, retains system as far as possible Rational matching result simultaneously sorts by matching degree, obtains pattern match result the most rational, correctly and efficiently by user Keyword query is transformed into the integrated query interface of structuring, and is correctly distributed on each web data library inquiry interface.

Claims (10)

1. a kind of keyword query conversion matched based on conceptual schema and dissemination system, is characterized in that:Keyword query is changed Include with dissemination system:Keyword query interface, integrated query interface, web data library inquiry interface, keyword query modulus of conversion Block and inquiry distribution module;User submits keyword query in keyword query interface, will be used by keyword query modular converter The keyword query at family is converted to the inquiry of integrated query interface, then is further distributed to respectively inquiry by inquiry distribution module Web data library inquiry interface, inquires about to each web data storehouse.
2. a kind of keyword query conversion and dissemination system based on conceptual schema matching according to claim 1, which is special Levying is:Described keyword query interface, submits inquiry request to for user, that is, submits key word of the inquiry to;
Described integrated query interface, is that the structuralized query interface for integrating is extracted by each web data library inquiry interface;
Described web data library inquiry interface, is the database query interface in the same field for obtaining of swashing from Web;
Described keyword query modular converter, for the inquiry that user on keyword query interface submits to is changed to structuring On integrated query interface;
Described inquiry distribution module, for the inquiry on integrated query interface is distributed to web data library inquiry interface.
3. a kind of keyword query conversion and dissemination system based on conceptual schema matching according to claim 1, which is special Levying is:Described keyword query modular converter is further included:Change data type analysis submodule, the conversion based on probability Pattern match submodule and keyword transform subblock;There is same data type by data type analysis submodule block analysis first The concept of keyword and integrated interface is right as potentially matching;Then by the pattern match submodule meter based on probability Calculate may matching to matching probability;Finally the keyword query of user is changed to integrated inquiry by keyword transform subblock Interface.
4. a kind of keyword query conversion and dissemination system based on conceptual schema matching according to claim 3, which is special Levying is:Described change data type analysis submodule, for analyze user inquiry keyword, structuring integrated interface it is each Data type belonging to concept;The described translative mode matched sub-block based on probability, for calculate it is each may matching With probability, different data types adopt different computational methods, calculate the probability of matching;Described keyword conversion submodule Block, each matching probability is sorted in descending order, selects and obtain optimization model matching result, and user on keyword query interface is carried The inquiry of friendship is changed to the integrated query interface of structuring.
5. a kind of keyword query conversion and dissemination system based on conceptual schema matching according to claim 1, which is special Levying is:Described inquiry distribution module is further included:Distribution data type analysis submodule, the distribution mode based on probability Sub-module and inquiry distribution submodule;There is same data type integrated interface by data type analysis submodule block analysis first Concept and each attribute of each web data library inquiry interface, it is right as potentially matching;Then by the pattern match based on probability Submodule calculate may matching to matching probability;It is last the inquiry of integrated interface to be distributed to respectively by pass inquiry distribution submodule The each attribute of web data library inquiry interface.
6. a kind of keyword query conversion and dissemination system based on conceptual schema matching according to claim 5, which is special Levying is:Described distribution data type analysis submodule, each concept and each web data storehouse for analytical structure integrated interface Data type belonging to each attribute of query interface;The described distribution mode matched sub-block based on probability, respectively may be used for calculating The matching probability that can be matched, different data types adopt different computational methods, calculate the probability of matching;Described inquiry Distribution submodule, each matching probability is sorted in descending order, optimization model matching result is selected and obtain, by integrated query interface Inquiry be distributed on web data library inquiry interface.
7. a kind of keyword query conversion matched based on conceptual schema described in claim 1 and the distributor of dissemination system Method, is characterized in that:Described keyword query conversion is comprised the following steps with distribution method:
Step A:The keyword query of user is converted to into the inquiry of the integrated query interface of structuring;The key inquired about using user Candidate's value information of word information and each concept of integrated query interface, find keyword query pattern and integrated query interface pattern it Between pattern match relation, it is right between the related notion set up in user's key word of the inquiry and integrated structuralized query interface Should be related to, so as to the keyword query of user to be converted to the inquiry of the integrated query interface of structuring;
Step B:The structuralized query of integrated interface is distributed to into the inquiry of each web data library inquiry interface;Using integrated interface Pattern information, the pattern information of candidate's value information and web data library inquiry interface, candidate's value information, find integrated inquiry and connect Matching relationship between mouth mold formula and each web data library inquiry interface modes, sets up each concept of integrated query interface and web data The pattern corresponding relation of each attribute of library inquiry interface, looks into so as to the structuralized query of integrated interface is distributed to each web data storehouse Ask interface;The pattern of described integrated interface is concept and corresponding tag name set.
8. according to claim 7 a kind of based on the conceptual schema keyword query conversion for matching and the method distributed, its It is characterized in that:Described step A and step B include the extension step difference of each step of extension step, step A and step B It is identical, comprise the following steps that:
Step A1 or step B1:The optimization matched based on data type.It is to carry out key in fact that whether data type is identical Between the concept of word and integrated query interface or integrated query interface concept and web data library inquiry interface attribute between The prerequisite for whether matching, i.e. keyword often with its data type identical concept matching, concept is also only and data type Identical attributes match.It is exactly that the occurrence of same type is placed on into one group based on the matching optimization of data type, carries out matching generally Rate is calculated, and otherwise no longer carries out matching probability calculating;Described data type includes text-type, numeric type and time type;
Step A2 or step B2:When matching for the occurrence of different data types, using different probability calculation sides Method, obtains the Optimum Matching result matched based on conceptual schema;
Step A3 or step B3:The keyword query of user is matched into integrated query interface, then enters one by integrated query interface Step is distributed to each web data library inquiry interface.
9. according to claim 8 a kind of based on the conceptual schema keyword query conversion for matching and the method distributed, its It is characterized in that:Described step A2 and step B2 include the extension step of each step of extension step, step A2 and step B2 Difference is identical, comprises the following steps that:
Step A21 or B21 steps:For the matching of character type data type, using the computational methods based on similarity of character string Matching probability is obtained, ripe similarity of character string computational methods are many at present, such as Levenshtein distance, Affine gap distance, Jaro distance, Q-gram distance, Similarity Measure result are general as matching Rate;
Step A22 or B22 steps:For the matching of digital data type, based on the coverage condition of numerical example, (1) nothing is covered Lid;(2) loose portions are covered;(3) it is loose to cover;(4) single constraint is covered;(5) Complex Constraints are covered, and carry out pattern match Beta pruning, obtains possible matching right;
Step A23 or B23 steps:It is right for the matching of possible digital data type, matching probability calculating is carried out, by each Sorted with probability in descending order, select maximum probability a pair are right as first matching, delete any comprising this matching centering The matching probability of one, selects maximum a pair right as second matching ... from remaining matching probability, the like, Until finding all of occurrence.
10. according to claim 9 a kind of based on the conceptual schema keyword query conversion for matching and the method distributed, its It is characterized in that:The extension step that described step A23 and step B23 include extension step, step A23 and step B23 is identical, Comprise the following steps that:
If data-oriented is two values data m and n, matching probability is:
P ( m , n ) = 1 - | m - n | m a x ( m , n )
If data-oriented is the set S of two groups of discrete digital datas1And S2, S1={ n1,n2,n3,…},S2={ m1,m2, m3..., both duplicate data number reflect similarity degree, then matching probability is:
P ( S 1 , S 2 ) = | S 1 &cap; S 2 | | S 1 &cup; S 2 |
If data-oriented is the digital collection R of two groups of range types1And R2, R1={ s1,s2,s3,…},R2={ t1,t2,t3... }, two The overlapping degree of person reflects similarity degree, then matching probability is:
P ( R 1 , R 2 ) = ( s 1 &cup; s 2 &cup; ... ) &cap; ( t 1 &cup; t 2 &cup; ... ) ( s 1 &cup; s 2 &cup; ... ) &cup; ( t 1 &cup; t 2 &cup; ... )
Wherein, 0≤P≤1, and the value for calculating is bigger, the matching degree of two is bigger.
CN201611128587.XA 2016-12-09 2016-12-09 Probability mode matching-based keyword query transformation and distribution system and method Pending CN106528875A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611128587.XA CN106528875A (en) 2016-12-09 2016-12-09 Probability mode matching-based keyword query transformation and distribution system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611128587.XA CN106528875A (en) 2016-12-09 2016-12-09 Probability mode matching-based keyword query transformation and distribution system and method

Publications (1)

Publication Number Publication Date
CN106528875A true CN106528875A (en) 2017-03-22

Family

ID=58342921

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611128587.XA Pending CN106528875A (en) 2016-12-09 2016-12-09 Probability mode matching-based keyword query transformation and distribution system and method

Country Status (1)

Country Link
CN (1) CN106528875A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112966178A (en) * 2021-03-05 2021-06-15 北京百度网讯科技有限公司 Consultation result distribution method, device, equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101216853A (en) * 2008-01-11 2008-07-09 孟小峰 Intelligent web enquiry interface system and its method
CN103425697A (en) * 2012-05-24 2013-12-04 中兴通讯股份有限公司 Search method and search system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101216853A (en) * 2008-01-11 2008-07-09 孟小峰 Intelligent web enquiry interface system and its method
CN103425697A (en) * 2012-05-24 2013-12-04 中兴通讯股份有限公司 Search method and search system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112966178A (en) * 2021-03-05 2021-06-15 北京百度网讯科技有限公司 Consultation result distribution method, device, equipment and storage medium
CN112966178B (en) * 2021-03-05 2024-01-23 北京百度网讯科技有限公司 Consultation result distribution method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN103473283B (en) Method for matching textual cases
CN104035917B (en) A kind of knowledge mapping management method and system based on semantic space mapping
CN102831121B (en) Method and system for extracting webpage information
CN101430695B (en) System and method for computing difference affinities of word
CN109446341A (en) The construction method and device of knowledge mapping
CN109359172A (en) A kind of entity alignment optimization method divided based on figure
CN105468677A (en) Log clustering method based on graph structure
CN110162695A (en) A kind of method and apparatus of information push
CN112463980A (en) Intelligent plan recommendation method based on knowledge graph
CN111260413A (en) Vector graph-based rapid calculation method for construction cost of power main network engineering
CN109359200A (en) Place name address date intelligently parsing system
CN103995859B (en) A kind of hot spot region incident detection system based on geographical labels applied to LBSN networks
CN104778540A (en) BOM (bill of material) management method and management system for building material equipment manufacturing
CN102902826A (en) Quick image retrieval method based on reference image indexes
CN101957852A (en) Method and system for producing correlation information of table data
CN111127068B (en) Automatic pricing method and device for engineering quantity list
CN103970775A (en) Object spatial position relationship-based medical image retrieval method
CN106951460A (en) A kind of MBD model retrieval methods based on figure matching
CN104036051A (en) Database mode abstract generation method based on label propagation
CN109145161A (en) Chinese Place Names querying method, device and equipment
CN109408578A (en) One kind being directed to isomerous environment monitoring data fusion method
CN103761286B (en) A kind of Service Source search method based on user interest
CN107491508A (en) A kind of data base querying time forecasting methods based on Recognition with Recurrent Neural Network
CN107908739A (en) Dynamic syntax analytic method and its resolution system
CN107480270A (en) A kind of real time individual based on user feedback data stream recommends method and system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20170322