CN106528875A - Probability mode matching-based keyword query transformation and distribution system and method - Google Patents
Probability mode matching-based keyword query transformation and distribution system and method Download PDFInfo
- Publication number
- CN106528875A CN106528875A CN201611128587.XA CN201611128587A CN106528875A CN 106528875 A CN106528875 A CN 106528875A CN 201611128587 A CN201611128587 A CN 201611128587A CN 106528875 A CN106528875 A CN 106528875A
- Authority
- CN
- China
- Prior art keywords
- matching
- interface
- query
- inquiry
- integrated
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/258—Data format conversion from or to a database
Abstract
The invention provides a probability mode matching-based keyword query transformation and distribution system and method, and belongs to a keyword query transformation and distribution system and method. The keyword query transformation and distribution system comprises a keyword query interface, an integrated query interface, a Web database query interface, a keyword query transformation module and a query distribution module; a user submits a keyword query at the keyword query interface, the keyword query transformation module transforms the keyword query of the user into the query of the integrated query interface, the query distribution module further distributes the query to various Web database query interfaces for querying various Web databases. The probability mode matching-based keyword query transformation and distribution system and method have the advantages that the user is provided with the interface for submitting the keyword query, and the method is simple and fast; the matching efficiency, the matching rationality and the matching accuracy are improved; and the whole system is automatically completed, the user only needs to submit the keyword query and the system transforms the query into the integrated interface, thereby distributing the query to the Web database query interfaces.
Description
Technical field
The present invention relates to a kind of keyword query conversion and dissemination system and method, particularly a kind of to be based on conceptual schema
Keyword query conversion and dissemination system and the method matched somebody with somebody.
Background technology
User can be divided into two steps to keyword query conversion:First the keyword query of user is transformed into structurized
Integrated query interface, is further distributed on each web data library inquiry interface.The two steps all rely on correct pattern
Matching.The first step is to carry out pattern match using example information, and second step is that Land use models information and example information enter row mode
Matching.
The research of automatic mode matching process had considerable progress in 10 years in the past.Some prototype systems are automatically or half
There is new breakthrough in terms of automatic mode matching correct.Such as:The systems such as SEMINT, Cupid, DIKE, COMA.But in these systems
In, the method for pattern match is all, in the matching result for finding determination, to ignore the uncertainty of pattern match.It is noticeable
It is that pattern match among these has uncertainty.
On the one hand, the keyword query that user submits to often only has property value and does not include attribute-name, due to attribute-name
Disappearance causes the semanteme of attribute to become to be difficult to judge, when mapping that on the integrated query interface of corresponding structuring, may
Generation is various certain rational matching result.
On the other hand, matching between integrated interface pattern and the pattern of each web data library inquiry interface also tends to be not
Accurately.Firstly, since each web data storehouse is designed in different when and wheres by different tissues or individual, autonomy is very
By force, the complexity and multiformity of content and form are caused, this proposes bigger challenge to the accuracy of pattern match;And
As web data storehouse is constantly in dynamic change, it has been observed that the average every three months of web data library inquiry interface will occur
Some changes, which results in existing integrated interface and often fail to pattern match Jing of each web data library inquiry interface;Furthermore,
The inaccurate of automatic mode matching extraction also increased uncertain factor to follow-up pattern match.Therefore, integrated interface pattern
There is many uncertainties in matching between each web data library inquiry interface modes.
The content of the invention
The invention aims to provide it is a kind of based on conceptual schema match keyword query conversion with dissemination system and
Method, solves, how on the premise of uncertain presence, to carry out user key words inquiry and effectively and accurately change and distribute
Problem.
The object of the present invention is achieved like this:The present invention includes key for the conversion and distribution of user key words inquiry
Word inquiry conversion and dissemination system and keyword query conversion and distribution method.
Described keyword query conversion is included with dissemination system:Keyword query interface, integrated query interface, Web numbers
According to library inquiry interface, keyword query modular converter and inquiry distribution module;User submits keyword in keyword query interface
Inquiry, the keyword query of user is converted to the inquiry of integrated query interface by keyword query modular converter, then by inquiring about
Inquiry is further distributed to each web data library inquiry interface by distribution module, and each web data storehouse is inquired about.
Described keyword query interface, submits inquiry request to for user, that is, submits key word of the inquiry to;
Described integrated query interface, is to extract the structuralized query for integrating to connect by each web data library inquiry interface
Mouthful;
Described web data library inquiry interface, is the database query interface in the same field for obtaining of swashing from Web;
Described keyword query modular converter, for the inquiry that user on keyword query interface submits to is changed to knot
On the integrated query interface of structureization;
Described inquiry distribution module, for the inquiry on integrated query interface is distributed to web data library inquiry interface
On.
Described keyword query modular converter is further included:Change data type analysis submodule, based on probability
Translative mode matched sub-block and keyword transform subblock;There is identical data class by data type analysis submodule block analysis first
The keyword of type and the concept of integrated interface are right as potentially matching;Then by the pattern match submodule based on probability
Block calculate may matching to matching probability;It is last the keyword query of user to be changed to integrated by keyword transform subblock
Query interface.
Described change data type analysis submodule, for analyzing keyword, the structuring integrated interface of user's inquiry
Each concept belonging to data type;The described translative mode matched sub-block based on probability, for calculating each possible matching
Matching probability, different data types adopt different computational methods, calculate the probability of matching;Described keyword conversion
Submodule, each matching probability is sorted in descending order, selects and obtain optimization model matching result, will be used on keyword query interface
The inquiry that family is submitted to is changed to the integrated query interface of structuring.
Described inquiry distribution module is further included:Distribution data type analysis submodule, the distribution mould based on probability
Formula matched sub-block and inquiry distribution submodule;First have by data type analysis submodule block analysis that same data type is integrated to be connect
The concept and each attribute of each web data library inquiry interface of mouth is right as potentially matching;Then by the pattern based on probability
Matched sub-block calculate may matching to matching probability;It is last the inquiry of integrated interface to be distributed by pass inquiry distribution submodule
To each attribute of each web data library inquiry interface.
Described distribution data type analysis submodule, each concept and each Web numbers for analytical structure integrated interface
According to the data type belonging to each attribute of library inquiry interface;The described distribution mode matched sub-block based on probability, for calculating
Each matching probability that may be matched, different data types adopt different computational methods, calculate the probability of matching;Described
Inquiry distribution submodule, each matching probability is sorted in descending order, optimization model matching result is selected and obtain, integrated inquiry is connect
Inquiry on mouth is distributed on web data library inquiry interface.
Described keyword query conversion is comprised the following steps with distribution method:
Step A:The keyword query of user is converted to into the inquiry of the integrated query interface of structuring;Inquired about using user
Candidate's value information of keyword message and each concept of integrated query interface, finds keyword query pattern and integrated query interface mould
Pattern match relation between formula, between the related notion set up in user's key word of the inquiry and integrated structuralized query interface
Corresponding relation, so as to the keyword query of user to be converted to the inquiry of the integrated query interface of structuring;
Step B:The structuralized query of integrated interface is distributed to into the inquiry of each web data library inquiry interface;Using integrated
The pattern information of interface, the pattern information of candidate's value information and web data library inquiry interface, candidate's value information, find integrated looking into
The matching relationship between interface modes and each web data library inquiry interface modes is ask, each concept of integrated query interface and Web is set up
The pattern corresponding relation of each attribute of database query interface, so as to the structuralized query of integrated interface is distributed to each web data
Library inquiry interface;The pattern of described integrated interface is concept and corresponding tag name set.
Described step A and step B include the extension step difference of each step of extension step, step A and step B
It is identical, comprise the following steps that:
Step A1 or step B1:The optimization matched based on data type.Whether data type is identical to be carried out in fact
Between the concept of keyword and integrated query interface or integrated query interface concept and web data library inquiry interface attribute
Between the prerequisite that whether matches, i.e. keyword often with its data type identical concept matching, concept is also only and data
Type identical attributes match.It is exactly that the occurrence of same type is placed on into one group based on the matching optimization of data type, carries out
With probability calculation, matching probability calculating is otherwise no longer carried out;Described data type includes text-type, numeric type and time type.
Step A2 or step B2:When matching for the occurrence of different data types, using different probability meters
Calculation method, obtains the Optimum Matching result matched based on conceptual schema.
Step A3 or step B3:The keyword query of user is matched into integrated query interface, then by integrated query interface
Each web data library inquiry interface is distributed to further.
Described step A2 and step B2 include the extension step of each step of extension step, step A2 and step B2
Difference is identical, comprises the following steps that:
Step A21 or B21 steps:For the matching of character type data type, using the calculating based on similarity of character string
Method obtains matching probability, and ripe similarity of character string computational methods are many at present, such as Levenshtein distance,
Affine gap distance, Jaro distance, Q-gram distance, Similarity Measure result are general as matching
Rate;
Step A22 or B22 steps:For the matching of digital data type, based on the coverage condition of numerical example, (1)
Without covering;(2) loose portions are covered;(3) it is loose to cover;(4) single constraint is covered;(5) Complex Constraints are covered, and enter row mode
The beta pruning matched somebody with somebody, obtains possible matching right;
Step A23 or B23 steps:It is right for the matching of possible digital data type, matching probability calculating is carried out, will
Each matching probability is sorted in descending order, and select maximum probability a pair are right as first matching, is deleted comprising this matching centering
The matching probability of any one, selects maximum a pair right as second matching ..., successively from remaining matching probability
Analogize, until finding all of occurrence.
Described step A23 and step B23 include the extension step phase of extension step, step A23 and step B23
Together, comprise the following steps that:
If data-oriented is two values data m and n, matching probability is:
If data-oriented is the set S of two groups of discrete digital datas1And S2, S1={ n1,n2,n3,…},S2={ m1,
m2,m3..., both duplicate data number reflect similarity degree, then matching probability is:
If data-oriented is the digital collection R of two groups of range types1And R2, R1={ s1,s2,s3,…},R2={ t1,t2,
t3..., both overlapping degrees reflect similarity degree, then matching probability is:
Wherein, 0≤P≤1, and the value for calculating is bigger, the matching degree of two is bigger.
Beneficial effect, as a result of such scheme, by the invention it is possible to the keyword query submitted user to automatically
Efficiently it is transformed on structuring integrated interface, and then is distributed on each web data library inquiry interface, with the high spy of accuracy
Point, is the basis for realizing obtaining automatically web data storehouse information.
User can submit simple keyword to as querying condition, and existing system typically provides only complicated integrated inquiry and connects
Mouthful, user need to submit corresponding querying condition to according to the label of interface, and process is comparatively laborious.Had based on conceptual schema matching process
Help to find more rational matching right, existing method is usually to immediately arrive at the conclusion for whether matching, and the present invention is to calculate
The matching probability of possible pattern match.Whole system is automatically performed, and the processing efficient of matching, the result of matching are more reasonable.
Solve how it is uncertain exist on the premise of, carry out user key words inquiry effectively and accurately conversion and
The problem of distribution, has reached the purpose of the present invention.
Advantage:1. it is supplied to user to submit the interface of keyword query to, submits structure in integrated query interface with existing
Change inquiry to compare, more simple and fast.2. the matching optimization based on data type is limited to the matching of pattern with identical number
According to the matching of type between, the efficiency of matching is improve than prior art.3. different data types adopts different probability
Computational methods, and the matching process of digital data is proposed, prior art does not propose preferably to match to digital data
Method, our method improve the reasonability and accuracy of matching.4. in digital data matching primitives, it is proposed that beta pruning
Method, further increases the accuracy of matching.5. whole system is automatically performed, and user need to only submit keyword query, system to
By inquiry conversion to integrated interface, and then it is distributed to each web data library inquiry interface.Existing system provide only from integrated interface to
The conversion of each web data library inquiry interface, be not directed to that keyword query to integrated query interface changes the step for.
Description of the drawings:
Fig. 1 is a kind of keyword query conversion matched based on conceptual schema of the present invention and the block diagram of dissemination system.
Fig. 2 is the flow chart of the keyword query modular converter of the present invention.
Fig. 3 is the flow chart of the inquiry distribution module of the present invention.
Fig. 4 is the matching optimization schematic diagram based on data type of the present invention.
Fig. 5 is candidate value set V in embodiments of the inventioniAnd VjCovering relation schematic diagram.
Specific embodiment
The present invention includes keyword query conversion and dissemination system and pass for the conversion and distribution of user key words inquiry
Key word inquiry conversion and distribution method.
Described keyword query conversion is included with dissemination system:Keyword query interface, integrated query interface, Web numbers
According to library inquiry interface, keyword query modular converter and inquiry distribution module;User submits keyword in keyword query interface
Inquiry, the keyword query of user is converted to the inquiry of integrated query interface by keyword query modular converter, then by inquiring about
Inquiry is further distributed to each web data library inquiry interface by distribution module, and each web data storehouse is inquired about.
Described keyword query interface, submits inquiry request to for user, that is, submits key word of the inquiry to;
Described integrated query interface, is to extract the structuralized query for integrating to connect by each web data library inquiry interface
Mouthful;
Described web data library inquiry interface, is the database query interface in the same field for obtaining of swashing from Web;
Described keyword query modular converter, for the inquiry that user on keyword query interface submits to is changed to knot
On the integrated query interface of structureization;
Described inquiry distribution module, for the inquiry on integrated query interface is distributed to web data library inquiry interface
On.
Described keyword query modular converter is further included:Change data type analysis submodule, based on probability
Translative mode matched sub-block and keyword transform subblock;There is identical data class by data type analysis submodule block analysis first
The keyword of type and the concept of integrated interface are right as potentially matching;Then by the pattern match submodule based on probability
Block calculate may matching to matching probability;It is last the keyword query of user to be changed to integrated by keyword transform subblock
Query interface.
Described change data type analysis submodule, for analyzing keyword, the structuring integrated interface of user's inquiry
Each concept belonging to data type;The described translative mode matched sub-block based on probability, for calculating each possible matching
Matching probability, different data types adopt different computational methods, calculate the probability of matching;Described keyword conversion
Submodule, each matching probability is sorted in descending order, selects and obtain optimization model matching result, will be used on keyword query interface
The inquiry that family is submitted to is changed to the integrated query interface of structuring.
Described inquiry distribution module is further included:Distribution data type analysis submodule, the distribution mould based on probability
Formula matched sub-block and inquiry distribution submodule;First have by data type analysis submodule block analysis that same data type is integrated to be connect
The concept and each attribute of each web data library inquiry interface of mouth is right as potentially matching;Then by the pattern based on probability
Matched sub-block calculate may matching to matching probability;It is last the inquiry of integrated interface to be distributed by pass inquiry distribution submodule
To each attribute of each web data library inquiry interface.
Described distribution data type analysis submodule, each concept and each Web numbers for analytical structure integrated interface
According to the data type belonging to each attribute of library inquiry interface;The described distribution mode matched sub-block based on probability, for calculating
Each matching probability that may be matched, different data types adopt different computational methods, calculate the probability of matching;Described
Inquiry distribution submodule, each matching probability is sorted in descending order, optimization model matching result is selected and obtain, integrated inquiry is connect
Inquiry on mouth is distributed on web data library inquiry interface.
Described keyword query conversion is comprised the following steps with distribution method:
Step A:The keyword query of user is converted to into the inquiry of the integrated query interface of structuring;Inquired about using user
Candidate's value information of keyword message and each concept of integrated query interface, finds keyword query pattern and integrated query interface mould
Pattern match relation between formula, between the related notion set up in user's key word of the inquiry and integrated structuralized query interface
Corresponding relation, so as to the keyword query of user to be converted to the inquiry of the integrated query interface of structuring;
Step B:The structuralized query of integrated interface is distributed to into the inquiry of each web data library inquiry interface;Using integrated
The pattern information of interface, the pattern information of candidate's value information and web data library inquiry interface, candidate's value information, find integrated looking into
The matching relationship between interface modes and each web data library inquiry interface modes is ask, each concept of integrated query interface and Web is set up
The pattern corresponding relation of each attribute of database query interface, so as to the structuralized query of integrated interface is distributed to each web data
Library inquiry interface;The pattern of described integrated interface is concept and corresponding tag name set.
Described step A and step B include the extension step difference of each step of extension step, step A and step B
It is identical, comprise the following steps that:
Step A1 or step B1:The optimization matched based on data type.Whether data type is identical to be carried out in fact
Between the concept of keyword and integrated query interface or integrated query interface concept and web data library inquiry interface attribute
Between the prerequisite that whether matches, i.e. keyword often with its data type identical concept matching, concept is also only and data
Type identical attributes match.It is exactly that the occurrence of same type is placed on into one group based on the matching optimization of data type, carries out
With probability calculation, matching probability calculating is otherwise no longer carried out;Described data type includes text-type, numeric type and time type.
Step A2 or step B2:When matching for the occurrence of different data types, using different probability meters
Calculation method, obtains the Optimum Matching result matched based on conceptual schema.
Step A3 or step B3:The keyword query of user is matched into integrated query interface, then by integrated query interface
Each web data library inquiry interface is distributed to further.
Described step A2 and step B2 include the extension step of each step of extension step, step A2 and step B2
Difference is identical, comprises the following steps that:
Step A21 or B21 steps:For the matching of character type data type, using the calculating based on similarity of character string
Method obtains matching probability, and ripe similarity of character string computational methods are many at present, such as Levenshtein distance,
Affine gap distance, Jaro distance, Q-gram distance, Similarity Measure result are general as matching
Rate;
Step A22 or B22 steps:For the matching of digital data type, based on the coverage condition of numerical example, (1)
Without covering;(2) loose portions are covered;(3) it is loose to cover;(4) single constraint is covered;(5) Complex Constraints are covered, and enter row mode
The beta pruning matched somebody with somebody, obtains possible matching right;
Step A23 or B23 steps:It is right for the matching of possible digital data type, matching probability calculating is carried out, will
Each matching probability is sorted in descending order, and select maximum probability a pair are right as first matching, is deleted comprising this matching centering
The matching probability of any one, selects maximum a pair right as second matching ..., successively from remaining matching probability
Analogize, until finding all of occurrence.
Described step A23 and step B23 include the extension step phase of extension step, step A23 and step B23
Together, comprise the following steps that:
If data-oriented is two values data m and n, matching probability is:
If data-oriented is the set S of two groups of discrete digital datas1And S2, S1={ n1,n2,n3,…},S2={ m1,
m2,m3..., both duplicate data number reflect similarity degree, then matching probability is:
If data-oriented is the digital collection R of two groups of range types1And R2, R1={ s1,s2,s3,…},R2={ t1,t2,
t3..., both overlapping degrees reflect similarity degree, then matching probability is:
Wherein, 0≤P≤1, and the value for calculating is bigger, the matching degree of two is bigger.
With reference to accompanying drawing, the present invention is described in further detail.
Embodiment 1:In order to the present invention and its advantage is more fully understood, in conjunction with drawings and the specific embodiments to the present invention
It is further described in detail.
The system of the present invention is illustrated with reference to Fig. 1 first.The present invention provides a kind of pass matched based on conceptual schema
Key word inquiry conversion and dissemination system, including:Keyword query interface, integrated query interface, web data library inquiry interface, pass
Key word inquiry modular converter, inquiry distribution module.
User submits inquiry request in keyword query interface, that is, submit key word of the inquiry to;
By keyword query modular converter, the example of the disappearance pattern information (attribute-name) inquired about using user is (crucial
Word) information and each concept of integrated query interface example (candidate value) information, find keyword query pattern and connect with integrated inquiry
Pattern match relation between mouth mold formula, the related notion set up in user's key word of the inquiry and integrated structuralized query interface
Between corresponding relation, the keyword query of user is converted to into the inquiry of the integrated query interface of structuring.Fig. 2 is referred to subsequently,
The keyword query modular converter is illustrated in greater detail.
By inquiring about distribution module, using pattern (the concept and corresponding tag name set) information of integrated interface, example
The pattern (attribute-name) of (candidate value) information and web data library inquiry interface, example (candidate value) information, find integrated inquiry and connect
Matching relationship between mouth mold formula and each web data library inquiry interface modes, sets up each concept of integrated query interface and web data
The structuralized query of integrated interface is distributed to each web data library inquiry and is connect by the pattern corresponding relation of each attribute of library inquiry interface
The inquiry of mouth.Fig. 3 is subsequently referred to, the inquiry distribution module is illustrated in greater detail.
Below with reference to Fig. 2, Fig. 3, Fig. 4 and Fig. 5, a kind of keyword query point matched based on conceptual schema to the present invention
The method sent out is described in detail.
Such as Fig. 2 and Fig. 3, keyword query modular converter and inquiry distribution module, including:Data type analysis submodule,
Based on the pattern match submodule of probability, and inquiry conversion (or distribution) submodule.
Data type analysis submodule, each concept and each web data library inquiry for analytical structure integrated interface connect
Data type belonging to each attribute of mouth.It is the prerequisite that possible match with same data type.
Based on the pattern match submodule of probability, for calculating each matching probability that may be matched.Different data types
Using different computational methods, the probability of matching is calculated.
Inquiry conversion (distribution) submodule, each matching probability is sorted in descending order, by the inquiry on integrated query interface point
It is sent on web data library inquiry interface.
As shown in figure 4, data type analysis submodule is occurrence to be grouped by data type:Discounting for number
According to type, by the keyword { Key of inquiry1,Key2,…,KeykConcept { C with integrated query interface1,C2,…,CjOne by one
Match somebody with somebody, then possible matching result quantity isWherein j is the concept number of complex query interface, and k is the pass that user submits to
Key word number, because the concept of generally complicated integrated query interface is typically at 50 or so, amount of calculation is than larger.
If only carrying out matching one by one between the keyword of same data type and concept, possible matching result quantity
For:
Wherein, k1,k2And k3It is the number comprising text-type, numeric type and time data, j in keyword respectively1,j2With
j3The number of text-type, numeric type and time data is included in the concept for being integrated query interface respectively.Due to
So being first grouped to occurrence by data type.
As shown in figure 5, the pattern match submodule based on probability carries out beta pruning to match condition first, then calculating may
The matching probability of matching.
For 2- ties up the pruning method that search space ties up the pattern match of concept space with 2-:Assume two on integrated interface
Individual concept CiAnd Cj(i<>J), CiWith CjCandidate's value set be respectively ViAnd Vj(being numeric type), user's inquiry is comprising number
Font keyword Key1And Key2, and Key1And Key2Must be with CiWith CjIn one matching.Another A is matched with B, is denoted as { A, B }.Then
ViWith VjThere are following five kinds of situations, as shown in Figure 5:
(1) without covering:
Vi∩Vj=Φ, i.e. concept CiWith concept CjCandidate value between there is no the relation of any covering, such as Fig. 5 (a).
Keyword Key1<>Key2, and Key1,Key2∈(ViOr Vj), then according to the size of the value of keyword accurately can judge with generally
Read CiAnd CjMatching relationship be one of following situation:
If 1) Key1∈Vi,Key2∈Vj, then matching result be:{Key1,Ci, { Key2,Cj};
If 2) Key1∈Vj,Key2∈Vi, then matching result be:{Key1,Cj, { Key2,Ci};
(2) loose portions are covered:
Vi∩Vj<>Φ, i.e. concept CiWith concept CjCandidate value between there is the relation for covering mutually, ViAnd VjBetween nothing
Restriction relation is difficult to capture its restriction relation, but level of coverage is smaller, such as Fig. 5 (b).Typically, for key
Word Key1And Key2:
If 1)AndMatching process is with (1);
If 2) Key1∈(Vi∩Vj) and Key2∈(Vj-Vi), according to the alternative of matching, obtain matching result { Key1,
Ci, { key2,Cj};Vice versa;
If 3) Key1∈(Vi∩Vj) and Key2∈(Vi∩Vj), then can not judge matching relationship, need further to calculate two
The similarity of person.
(3) it is loose to cover:
Vi∩Vj<>Φ, i.e. concept CiWith concept CjCandidate value between there is the relation for covering mutually, ViAnd VjBetween nothing
Restriction relation is difficult to capture its restriction relation, and level of coverage than larger, such as Fig. 5 (c).Can not then judge that matching is closed
System, needs further to calculate both similarities.
(4) single constraint is covered:
Vi∩Vj<>Φ, i.e. concept CiWith concept CjCandidate value between there is the relation for covering mutually, and ViAnd VjBetween
Have simple restriction relation, for example, for concept CiWith concept CjA pair of values<Vi,Vj>, ViV is less than alwaysj, such as Fig. 5
(d).If Key1<Key2, then corresponding matching is { Key1,Ci, { Key2,Cj}.Usually, if concept CiWith concept CjAppoint
Anticipate a pair of values<Vi,Vj>:
If 1) Vi<VjAnd Key1,Key2∈Vi,VjIf, Key1<Key2, then matching result be:{Key1,Ci, { Key2,Cj};
If 2) Vi>VjAnd Key1,Key2∈Vi,VjIf, Key1<Key2, then matching result be:{Key1,Cj, { Key2,Ci};
Other situations, the rest may be inferred.
(5) Complex Constraints are covered:
Vi∩Vj<>Φ, i.e. concept CiWith concept CjCandidate value between there is the relation for covering mutually, and ViAnd VjBetween
Have complexity restriction relation, for example, for concept CiWith concept CjA pair of values<Vi,Vj>, i.e. ViWith VjThe constraint of satisfaction
Relation is not single, but ViWith VjSingle constraint is met in segmentation and covers (the 4th kind of situation), i.e., meet single on each section
One constraint is covered, such as Fig. 5 (e).We first can be segmented, and judge keyword and concept further according to the segmentation of above-mentioned 4th kind of method
Matching relationship.Usually, concept CiWith concept CjAny pair value<Vi,Vj>:
If 1)<Vi,Vj>∈Segment1,Vi<VjAnd Key1<Key2, then matching result be:{Key1,Ci, { Key2,Cj};
If 2)<Vi,Vj>∈Segment2,Vi>VjAnd Key1<Key2, then matching result be:{Key1,Cj, { Key2,Ci};
Other situations, the rest may be inferred.
Pruning method extension n- dimension search spaces tie up concept space with m-
Forgoing describe the pruning method that 2- dimension search spaces tie up the pattern match of concept space with 2-.It is n- below to this
Dimension search space ties up the extension of concept space with m-.
The dimension difference of search space directly affects the result of matching.If inquiry only includes a keyword Key1, other
Condition is constant, Vi<VjAnd Key1∈Vi,Vj, then matching result is not but unique, matches { Key1,CiAnd { Key1,CjProbability
It is respectively 50%.On the other hand, the dimension of concept space is higher, and candidate concepts subspace is more, it is thus possible to matching result get over
It is many.For example, there are 2- dimension inquiry Q={ Key1,Key2And 3- dimension concept { C1,C2,C3, candidate concepts subspace is { C1,C2,
{C2,C3, { C1,C3, concept { C is tieed up with 2-1,C2Compare, candidate concepts subspace increases, and possible matching result is also correspondingly
Increase.
Therefore the dimension and concept space dimension of search space will all have influence on last matching result.For n- dimensions are inquired about
Concept space, and n≤m are tieed up with m- in space, then possible matching number is
Beta pruning is carried out to pattern match reduces the calculating that may be matched in a large number, inquiry conversion is greatly improved and distributes
Efficiency.
The meaning of this method is, it is considered to uncertain pattern match, calculates the probability of matching, retains system as far as possible
Rational matching result simultaneously sorts by matching degree, obtains pattern match result the most rational, correctly and efficiently by user
Keyword query is transformed into the integrated query interface of structuring, and is correctly distributed on each web data library inquiry interface.
Claims (10)
1. a kind of keyword query conversion matched based on conceptual schema and dissemination system, is characterized in that:Keyword query is changed
Include with dissemination system:Keyword query interface, integrated query interface, web data library inquiry interface, keyword query modulus of conversion
Block and inquiry distribution module;User submits keyword query in keyword query interface, will be used by keyword query modular converter
The keyword query at family is converted to the inquiry of integrated query interface, then is further distributed to respectively inquiry by inquiry distribution module
Web data library inquiry interface, inquires about to each web data storehouse.
2. a kind of keyword query conversion and dissemination system based on conceptual schema matching according to claim 1, which is special
Levying is:Described keyword query interface, submits inquiry request to for user, that is, submits key word of the inquiry to;
Described integrated query interface, is that the structuralized query interface for integrating is extracted by each web data library inquiry interface;
Described web data library inquiry interface, is the database query interface in the same field for obtaining of swashing from Web;
Described keyword query modular converter, for the inquiry that user on keyword query interface submits to is changed to structuring
On integrated query interface;
Described inquiry distribution module, for the inquiry on integrated query interface is distributed to web data library inquiry interface.
3. a kind of keyword query conversion and dissemination system based on conceptual schema matching according to claim 1, which is special
Levying is:Described keyword query modular converter is further included:Change data type analysis submodule, the conversion based on probability
Pattern match submodule and keyword transform subblock;There is same data type by data type analysis submodule block analysis first
The concept of keyword and integrated interface is right as potentially matching;Then by the pattern match submodule meter based on probability
Calculate may matching to matching probability;Finally the keyword query of user is changed to integrated inquiry by keyword transform subblock
Interface.
4. a kind of keyword query conversion and dissemination system based on conceptual schema matching according to claim 3, which is special
Levying is:Described change data type analysis submodule, for analyze user inquiry keyword, structuring integrated interface it is each
Data type belonging to concept;The described translative mode matched sub-block based on probability, for calculate it is each may matching
With probability, different data types adopt different computational methods, calculate the probability of matching;Described keyword conversion submodule
Block, each matching probability is sorted in descending order, selects and obtain optimization model matching result, and user on keyword query interface is carried
The inquiry of friendship is changed to the integrated query interface of structuring.
5. a kind of keyword query conversion and dissemination system based on conceptual schema matching according to claim 1, which is special
Levying is:Described inquiry distribution module is further included:Distribution data type analysis submodule, the distribution mode based on probability
Sub-module and inquiry distribution submodule;There is same data type integrated interface by data type analysis submodule block analysis first
Concept and each attribute of each web data library inquiry interface, it is right as potentially matching;Then by the pattern match based on probability
Submodule calculate may matching to matching probability;It is last the inquiry of integrated interface to be distributed to respectively by pass inquiry distribution submodule
The each attribute of web data library inquiry interface.
6. a kind of keyword query conversion and dissemination system based on conceptual schema matching according to claim 5, which is special
Levying is:Described distribution data type analysis submodule, each concept and each web data storehouse for analytical structure integrated interface
Data type belonging to each attribute of query interface;The described distribution mode matched sub-block based on probability, respectively may be used for calculating
The matching probability that can be matched, different data types adopt different computational methods, calculate the probability of matching;Described inquiry
Distribution submodule, each matching probability is sorted in descending order, optimization model matching result is selected and obtain, by integrated query interface
Inquiry be distributed on web data library inquiry interface.
7. a kind of keyword query conversion matched based on conceptual schema described in claim 1 and the distributor of dissemination system
Method, is characterized in that:Described keyword query conversion is comprised the following steps with distribution method:
Step A:The keyword query of user is converted to into the inquiry of the integrated query interface of structuring;The key inquired about using user
Candidate's value information of word information and each concept of integrated query interface, find keyword query pattern and integrated query interface pattern it
Between pattern match relation, it is right between the related notion set up in user's key word of the inquiry and integrated structuralized query interface
Should be related to, so as to the keyword query of user to be converted to the inquiry of the integrated query interface of structuring;
Step B:The structuralized query of integrated interface is distributed to into the inquiry of each web data library inquiry interface;Using integrated interface
Pattern information, the pattern information of candidate's value information and web data library inquiry interface, candidate's value information, find integrated inquiry and connect
Matching relationship between mouth mold formula and each web data library inquiry interface modes, sets up each concept of integrated query interface and web data
The pattern corresponding relation of each attribute of library inquiry interface, looks into so as to the structuralized query of integrated interface is distributed to each web data storehouse
Ask interface;The pattern of described integrated interface is concept and corresponding tag name set.
8. according to claim 7 a kind of based on the conceptual schema keyword query conversion for matching and the method distributed, its
It is characterized in that:Described step A and step B include the extension step difference of each step of extension step, step A and step B
It is identical, comprise the following steps that:
Step A1 or step B1:The optimization matched based on data type.It is to carry out key in fact that whether data type is identical
Between the concept of word and integrated query interface or integrated query interface concept and web data library inquiry interface attribute between
The prerequisite for whether matching, i.e. keyword often with its data type identical concept matching, concept is also only and data type
Identical attributes match.It is exactly that the occurrence of same type is placed on into one group based on the matching optimization of data type, carries out matching generally
Rate is calculated, and otherwise no longer carries out matching probability calculating;Described data type includes text-type, numeric type and time type;
Step A2 or step B2:When matching for the occurrence of different data types, using different probability calculation sides
Method, obtains the Optimum Matching result matched based on conceptual schema;
Step A3 or step B3:The keyword query of user is matched into integrated query interface, then enters one by integrated query interface
Step is distributed to each web data library inquiry interface.
9. according to claim 8 a kind of based on the conceptual schema keyword query conversion for matching and the method distributed, its
It is characterized in that:Described step A2 and step B2 include the extension step of each step of extension step, step A2 and step B2
Difference is identical, comprises the following steps that:
Step A21 or B21 steps:For the matching of character type data type, using the computational methods based on similarity of character string
Matching probability is obtained, ripe similarity of character string computational methods are many at present, such as Levenshtein distance,
Affine gap distance, Jaro distance, Q-gram distance, Similarity Measure result are general as matching
Rate;
Step A22 or B22 steps:For the matching of digital data type, based on the coverage condition of numerical example, (1) nothing is covered
Lid;(2) loose portions are covered;(3) it is loose to cover;(4) single constraint is covered;(5) Complex Constraints are covered, and carry out pattern match
Beta pruning, obtains possible matching right;
Step A23 or B23 steps:It is right for the matching of possible digital data type, matching probability calculating is carried out, by each
Sorted with probability in descending order, select maximum probability a pair are right as first matching, delete any comprising this matching centering
The matching probability of one, selects maximum a pair right as second matching ... from remaining matching probability, the like,
Until finding all of occurrence.
10. according to claim 9 a kind of based on the conceptual schema keyword query conversion for matching and the method distributed, its
It is characterized in that:The extension step that described step A23 and step B23 include extension step, step A23 and step B23 is identical,
Comprise the following steps that:
If data-oriented is two values data m and n, matching probability is:
If data-oriented is the set S of two groups of discrete digital datas1And S2, S1={ n1,n2,n3,…},S2={ m1,m2,
m3..., both duplicate data number reflect similarity degree, then matching probability is:
If data-oriented is the digital collection R of two groups of range types1And R2, R1={ s1,s2,s3,…},R2={ t1,t2,t3... }, two
The overlapping degree of person reflects similarity degree, then matching probability is:
Wherein, 0≤P≤1, and the value for calculating is bigger, the matching degree of two is bigger.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611128587.XA CN106528875A (en) | 2016-12-09 | 2016-12-09 | Probability mode matching-based keyword query transformation and distribution system and method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611128587.XA CN106528875A (en) | 2016-12-09 | 2016-12-09 | Probability mode matching-based keyword query transformation and distribution system and method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106528875A true CN106528875A (en) | 2017-03-22 |
Family
ID=58342921
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611128587.XA Pending CN106528875A (en) | 2016-12-09 | 2016-12-09 | Probability mode matching-based keyword query transformation and distribution system and method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106528875A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112966178A (en) * | 2021-03-05 | 2021-06-15 | 北京百度网讯科技有限公司 | Consultation result distribution method, device, equipment and storage medium |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101216853A (en) * | 2008-01-11 | 2008-07-09 | 孟小峰 | Intelligent web enquiry interface system and its method |
CN103425697A (en) * | 2012-05-24 | 2013-12-04 | 中兴通讯股份有限公司 | Search method and search system |
-
2016
- 2016-12-09 CN CN201611128587.XA patent/CN106528875A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101216853A (en) * | 2008-01-11 | 2008-07-09 | 孟小峰 | Intelligent web enquiry interface system and its method |
CN103425697A (en) * | 2012-05-24 | 2013-12-04 | 中兴通讯股份有限公司 | Search method and search system |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112966178A (en) * | 2021-03-05 | 2021-06-15 | 北京百度网讯科技有限公司 | Consultation result distribution method, device, equipment and storage medium |
CN112966178B (en) * | 2021-03-05 | 2024-01-23 | 北京百度网讯科技有限公司 | Consultation result distribution method, device, equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103473283B (en) | Method for matching textual cases | |
CN104035917B (en) | A kind of knowledge mapping management method and system based on semantic space mapping | |
CN102831121B (en) | Method and system for extracting webpage information | |
CN101430695B (en) | System and method for computing difference affinities of word | |
CN109446341A (en) | The construction method and device of knowledge mapping | |
CN109359172A (en) | A kind of entity alignment optimization method divided based on figure | |
CN105468677A (en) | Log clustering method based on graph structure | |
CN110162695A (en) | A kind of method and apparatus of information push | |
CN112463980A (en) | Intelligent plan recommendation method based on knowledge graph | |
CN111260413A (en) | Vector graph-based rapid calculation method for construction cost of power main network engineering | |
CN109359200A (en) | Place name address date intelligently parsing system | |
CN103995859B (en) | A kind of hot spot region incident detection system based on geographical labels applied to LBSN networks | |
CN104778540A (en) | BOM (bill of material) management method and management system for building material equipment manufacturing | |
CN102902826A (en) | Quick image retrieval method based on reference image indexes | |
CN101957852A (en) | Method and system for producing correlation information of table data | |
CN111127068B (en) | Automatic pricing method and device for engineering quantity list | |
CN103970775A (en) | Object spatial position relationship-based medical image retrieval method | |
CN106951460A (en) | A kind of MBD model retrieval methods based on figure matching | |
CN104036051A (en) | Database mode abstract generation method based on label propagation | |
CN109145161A (en) | Chinese Place Names querying method, device and equipment | |
CN109408578A (en) | One kind being directed to isomerous environment monitoring data fusion method | |
CN103761286B (en) | A kind of Service Source search method based on user interest | |
CN107491508A (en) | A kind of data base querying time forecasting methods based on Recognition with Recurrent Neural Network | |
CN107908739A (en) | Dynamic syntax analytic method and its resolution system | |
CN107480270A (en) | A kind of real time individual based on user feedback data stream recommends method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170322 |