CN102073718A - System and method for explaining, erasing and modifying search result in probabilistic database - Google Patents
System and method for explaining, erasing and modifying search result in probabilistic database Download PDFInfo
- Publication number
- CN102073718A CN102073718A CN 201110003411 CN201110003411A CN102073718A CN 102073718 A CN102073718 A CN 102073718A CN 201110003411 CN201110003411 CN 201110003411 CN 201110003411 A CN201110003411 A CN 201110003411A CN 102073718 A CN102073718 A CN 102073718A
- Authority
- CN
- China
- Prior art keywords
- user
- event
- probable value
- query result
- explanation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Abstract
The invention provides a system and a method for explaining, erasing and modifying a search result in a probabilistic database, and the method is as follows: enabling a user to submit structured query language for query; processing the query and acquiring new data; matching with tuple data in the databse and acquiring the related tuple data so as to constitute a search result tuple; computing and processing according to probability value of the acquired related tuple data, and acquiring the probability value of the search result tuple; simultaneously performing priority sequencing and saving the acquired related tuple data; returning the probability value of the search result to the user, and simultaneously incidentally explaining the probability value of the search result; enabling the user to decide whether to accept or not according to the returned result and the explanation; enabling the user to modify the explanation of the search result if the returned result and the explanation are not accepted; and re-calculating the search result and returning the search result to the user according to the modification of the user to the explanation of the search result. The system and the method have the advantages that the user can acquire more accurate search result, and reduce time which is required to consume for re-calculating the search result.
Description
[technical field]
The present invention relates to the data management technique field, particularly the probability database Query Result is made an explanation and wipes the system and method that changes about a kind of.
[background technology]
The challenge that mobile Internet the is become popular technique raw data that to be management collected from mobile general environment also becomes it into the complicacy of the information that is fit to decision support.A main source of complicacy is uncertain, can be the uncertainty of body, also can be the uncertainty on the understanding.Uncertainty on the body mainly is because the entity existence of which kind of type is not described, uncertainty on the understanding is mainly derived from the measurement noise, deviation or inadequate sample, the data of losing owing to sensor or network failure, fuzzy situation, and different explanations.The present invention mainly utilizes the uncertainty on the probability theory solution understanding.
At present, expression and inquiry uncertain data have had number of research projects under mobile computing environment, and wherein most representative is probability database, comprises the probability data model, the aspects such as approximate treatment of probabilistic query and probabilistic query.Many universities and scientific research institution have all developed the prototype system of probability database in succession, the Trio of Stanford University for example, the MYSTIQ of University of Washington, MayBMS of Cornell University or the like.
In database field, in order to represent uncertain data, the most a kind of data model that probability model is adopted.The granularity of uncertain data comprises relation table, tuple and three kinds of ranks of attribute, wherein the most attractive based on the uncertain data model of tuple, this mainly is because it has done good balance at (independence between the probability event) and counting yield aspect aspect the expression.Research of the present invention basis is based on the probability data model of tuple.The set of a probability results tuple is returned in the inquiry of submitting to probability database, and wherein each probability results tuple is that elementary probability tuple from leave database in derives.The probable value of result of calculation tuple can be summed up as the probable value problem of calculating an any one occurrence expression formula, and its time complexity can be PTIME (polynomial time), also can be #P-hard (#P difficulty), and this is relevant with the event representation formula.In order to solve the situation of #P-hard, some approximate calculative strategies are developed, and comprise the searching algorithm based on figure, based on the optimized Algorithm of inquiry plan search, based on the random algorithm of Meng Te-Caro simulation with based on the algorithm of probability interval.In addition, in probability database, the inquiry of many particular types has also obtained support, and for example top-K (preceding K) inquiry, Skyline (local horizon) analyze and assemble and inquire about or the like.
Except a large amount of research work, thereby also obtained the attention of academia by cleaning quality that the uncertain data of depositing in the database improves database query result at the search efficiency that improves probability database.Existing research work comprises the mistake by the data of depositing in the integrity constraint correction database, for the data that clash in the database are given corresponding probable value, in inconsistent probability database, support consistent data base querying, assemble to recover disappearance or insecure sensing data by space-time, eliminate the uncertainty of data by acquisition sensor data source again.
The result who returns for system provides the work of explanation to originate from artificial intelligence field.The explanation of a sentence P is defined as a set E who derives the sentence of sentence P.If all sentences among the set E are set up, sentence P also sets up so.In infosystems such as expert system, commending system, explain the purpose that is often used as user's understanding, education, acceptance, persuasion, system debug and wipes.In database field, inquiry plan and routing are only considered in the explanation of Query Result usually, and fundamental purpose is to adjust application program with the better utilization index.What recently, had at being interpreted as research work that Query Result is returned, but the emphasis of research mainly is at traditional database rather than probability database.In addition,, the research work at the explanation aspect of rank (ordering) inquiry is arranged also, the attribute of the rank that can explain a tuple is sorted in the traditional database field.The notion of digital proof and relationship mainly is that trace data by which master data is derived, and has also obtained the attention of academia recently.Digital proof and relationship can provide a lot of important more can, for example explain, verify and recomputate.The Panda project of Stamford be devoted to based on data and based on seamless the combining of the proof of process, a kind of language of general inquiry, analysis and use digital proof is provided.
In information retrieval field, because the ambiguity of user inquiring and the redundancy of the document that system returns have a lot of research work at the user feedback technical elements.In user's feedback procedure, the user is required according to certain principle and individual preference the result who returns to be given a mark usually, and system upgrades the principle of coupling then, returns the Query Result that more meets customer requirements.The researchist in general fit calculation field also proposes in the application program of scene perception to use the technology of user feedback, this mainly be because with subscriber-related scene information can not be by technology the means perception or infer and.
[summary of the invention]
Purpose of the present invention has proposed a kind ofly in probability database Query Result to be made an explanation and wipe the system and method that changes, thereby provides Query Result more accurately for the user.
Provided by the inventionly a kind of Query Result in the probability database is made an explanation and wipes the method that changes, store data in the mode of table in the described probability database, each tuple in the table is counted as the description to a probability event, each tuple comprises event ID and probable value field, the probable value field is represented the probability that the described probability event of this tuple takes place, and comprises following step:
(1) user submits the Structured Query Language (SQL) inquiry to;
(2) inquiry that the user is submitted to is handled, the data after obtaining to handle;
(3) data after handling and the tuple data in the database are mated, obtain associated tuple data and relevant tuple data is accessed formation Query Result tuple from probability database;
(4) carry out computing according to the probable value of the associated tuple data that is obtained, obtain the probable value of Query Result tuple; Simultaneously the associated tuple data of obtain is carried out priority ordering and preservation;
(5) return the Query Result probable value to the user, subsidiary simultaneously explanation to the Query Result probable value,
(6) user is according to Query Result that returns and explanation thereof, and whether decision accepts Query Result; If do not accept Query Result, user input is called the instruction of relevant explanation and is revised explanation to Query Result;
(7) Query Result is recomputated in the modification of Query Result being explained according to the user, returns to the user;
Wherein, with Query Result topmost before the event ID of k tuple and the explanation that probable value is called result of calculation thereof.
The Query Result that each Structured Query Language (SQL) data is produced also is a table, and each tuple in the table is to be derived by other tuple that leaves in the database, is calculated result's probable value by the probable value of these basic tuples.
For the explanation of presenting to the user inquiring result in the step (5), obtain by the following method: the probable value of weighing each tuple by partial derivative is to the influence of the probable value of tuple as a result, the probable value of tuple is big more with respect to the rate of change of the probable value of a uncertain tuple as a result, and this uncertain tuple is just big more to the influence of tuple as a result.
Obtain user's explanation by following partial derivative measurement method:
For any incident e, represent its probable value with P (e); For each row in the Query Result, with ee=F (e1, e2 ..., en) represent its event representation formula, ee is by elementary event e1 in expression, e2 ..., en by Boolean operator with (∧), or (∨), non-(~) combines; The probable value of ee can be by elementary event e1, e2 ..., the probable value of en derives, be expressed as P (ee)=f (P (e1), P (e2) ..,, P (en)); Its probable value be P (ee)=f (P (e1), P (e2) ..., P (en)), each elementary event ei weighs with respect to the rate of change of P (ei) with P (ee) the influence of the probable value of ee, promptly partial derivative is expressed as
The value of this partial derivative is big more, represents that the probability of this basic tuple incident is big more to the probable value influence of Query Result.
For the explanation of presenting to the user inquiring result in the step (5), obtain partial derivative by recursive algorithm, thereby obtain shown explanation, specific algorithm is as follows:
For event representation formula ee=F (e1, e2 ..., en), e1, e2 ..., en does not repeat, known P (e1), P (e2) ..., what P (en), P (ee) can recurrence is calculated as follows:
(1) (1≤i≤n), P (ei) is known for event representation formula ei;
(2) for event representation formula~ee, wherein ee is an event representation formula, P (~ee)=1-P (ee);
(3) for event representation formula ee1 ∧ ee2, wherein ee1 and ee2 are two event representation formulas, P (ee1 ∧ ee2)=P (ee1) * P (ee2);
(4) for event representation formula ee1 ∨ ee2, wherein ee1 and ee2 are two event representation formulas, P (ee1 ∨ ee2)=P (ee1)+P (ee2)-P (ee1) * P (ee2).
P (ee) is with respect to P (e1), P (e2) ..., what the partial derivative of P (en) can recurrence is calculated as follows:
(2) for event representation formula~ee, wherein ee is an event representation formula,
(3) for event representation formula ee1 ∧ ee2, wherein ee1 and ee2 are two event representation formulas,
(4) for event representation formula ee1 ∨ ee2, wherein ee1 and ee2 are two event representation formulas,
For the explanation of presenting to the user inquiring result in the step (5), be in recursive algorithm, to calculate by algorithm application with linear session complexity, concrete grammar is as follows:
1) obtain each event representation formula ee=F (e1, e2 ..., probable value P en) (ee (N)) and partial derivative
2) each event representation formula ee=F (e1, e2 ..., en) adopt a binary tree to represent that leafy node is corresponding to event ID, inner node is corresponding to the operational symbol of event representation formula; For each the node N in the binary tree,,, represent with N to be the subtree of root node with subTree (N), with ee (N) expression and the corresponding event representation formula of subTree (N) with the right node that RChild (N) represents it with the left node that LChild (N) represents it; Each node N increases an additional territory and deposits probable value P (ee (N)); Suppose en1, en2 ..., enk, corresponding with all leafy nodes among the subTree (N) for appearing at all event IDs among the ee (N), node is deposited the partial derivative tabulation by an additional tabulation
Be designated as
3) calculate event representation formula ee=F (e1, e2 ..., calculate it in the process of probable value P en) (ee) and embody the partial derivative of explaining;
4) the method pair binary tree corresponding with ee carries out follow-up traversal, and when traversing each node N, its left node LChild (N) and right node RChild (N) are processed, therefore, and for the probable value P (ee (LChild (N))) and the partial derivative of left node
Reach probable value P (ee (RChild (N))) and partial derivative for right node
Obtain,
After the postorder traversal binary tree corresponding, can obtain with ee
Next from
Middle preceding k value selecting maximum just can obtain the explanation to result of calculation.
In the process that binary tree is traveled through,, tabulate at partial derivative for each node N
In only keep preceding k element of absolute value maximum, and all the other elements are shifted to an earlier date jettisoning, be kept at last and in the root node be just
Preceding k element of middle absolute value maximum.
For the modification of Query Result being explained according to the user in the step (7), recomputate Query Result, be to calculate:, only recomputate the node relevant with being corrected tuple by in binary tree, preserving results of intermediate calculations according to following method.
Provided by the inventionly a kind of Query Result in the probability database is made an explanation and wipes the system that changes, comprise user interface, server, wherein server comprises central processing unit, storage unit, calls matching unit and computing unit; Wherein, user interface is used for for user input query, shows that to the user Query Result etc. sets up the interactive exchange of user and system; Data query after the query statement that described central processing unit sends by user interface the user is handled and will be handled sends to and calls matching unit; The event data of calling in the tabular of data query after matching unit will be handled and the probability database in the storage unit is mated, and obtains all associated event datas; Data in the described probability database are deposited in the mode of table, and each row in the table is counted as the description to a probability event, comprises an event ID and a probable value field; Call all associated event datas that matching unit will obtain and send to computing unit; Computing unit calculates the probable value in the event data of calling matching unit and sending, and obtains Query Result, and Query Result is fed back to user interface is shown to the user; Simultaneously computing unit is that probable value is handled with the event data probable value that each is associated according to the result who calculates, and the probable value of the explanation that provides according to system sorts the sequence table of acquisition correlativity to the influence degree of Query Result; Computing unit stores the correlativity sequence table that obtains, and middle result of calculation is stored; User interface shows Query Result to the user; When user's input checked that former correlativitys that obtain this result are explained, central processing unit accessed corresponding with it former correlativitys explanation and feeds back to user interface according to user's instruction control computing unit; The user makes amendment to the probable value of some explanation and feeds back to central processing unit by user interface, central processing unit feeds back to computing unit with the data of the user's modification of its acquisition, computing unit knows that by comparison the probable value of those event datas is modified, and recomputates Query Result and feed back to user interface.
Computing unit calculates Query Result and explanation is carried out according to above-mentioned described mode.
The present invention is by when presenting to the user inquiring result, also present to user inquiring result's explanation, be that Query Result by which tuple is derived, and then revise by the mistake that exists in the tuple of user to the derived query result, thereby system can recomputate according to user's correction, returns a Query Result more accurately.To the probabilistic correction of data, query engine need recomputate Query Result based on the user.The present invention has developed otherwise effective technique, by preserve results of intermediate calculations in binary tree, only recomputates the node relevant with being corrected tuple, compares with the time that calculating inquiry is for the first time spent, and has significantly reduced the time that inquiry spends of recomputating.Owing to each node in the binary tree has all been preserved result of calculation in the past, after the probable value of some node is modified, has only from the leafy node that is modified the node correspondence and need recomputate in addition to the node on the root node path.For the number of the node that the time complexity of algorithm is revised with respect to the user for linearity, for the height of the tree of presentation of events expression formula, also be linear.What the present invention proposed makes an explanation and wipes the system and method that changes database query result, has the following advantages: 1, make probability database to utilize its personal knowledge to provide Query Result more accurately for it at particular user; 2, the result to inquiry makes an explanation, and for the inquiry with safe inquiry plan, is linear with not providing explanation to compare required extra consumed time complexity; 3, by preserving inquiry plan, significantly reduced and recomputated the required consumed time cost of Query Result.
[description of drawings]
Fig. 1 is the inventive method schematic flow sheet;
Fig. 2 is the functional block diagram of system of the present invention;
Fig. 3 has deposited three probability database examples of showing Movie, FemaleLike and MaleLike;
The flow process interface synoptic diagram of submitting SQL query demonstration Query Result to the end from the user to that Fig. 4 .1-Fig. 4 .7 passes the imperial examinations at the provincial level for the present invention.
[embodiment]
Reach technological means and the effect that predetermined purpose is taked for further setting forth the present invention, below in conjunction with accompanying drawing and preferred embodiment, a kind of the probability database Query Result is made an explanation and wipes the system and method that changes what foundation the present invention proposed, its embodiment, feature and effect thereof, illustrate as after.
As shown in Figure 1, it is the process flow diagram of method of the present invention, what the present invention proposed makes an explanation and wipes the method that changes the probability database Query Result, store data in the mode of table in the described probability database, each tuple in the table is counted as the description to a probability event, each tuple comprises event ID and probable value field, and the probable value field is represented the probability that the described probability event of this tuple takes place, and comprises following step:
(1) user submits SQL (a kind of Structured Query Language (SQL)) inquiry to;
(2) inquiry that the user is submitted to is handled, the data after obtaining to handle;
(3) data after handling and the tuple data in the database are mated, obtain associated tuple data and relevant tuple data is accessed formation Query Result tuple from probability database;
(4) carry out computing according to the probable value of the associated tuple data that is obtained, obtain the probable value of Query Result tuple; Simultaneously the associated tuple data of obtain is carried out priority ordering and preservation;
(5) return the Query Result probable value to the user, subsidiary simultaneously explanation to the Query Result probable value;
(6) user is according to Query Result that returns and explanation thereof, and whether decision accepts Query Result; If do not accept Query Result, user input is called the instruction of relevant explanation and is revised explanation to Query Result;
(7) Query Result is recomputated in the modification of Query Result being explained according to the user, returns to the user;
Wherein, the event ID of derived query result's topmost preceding k tuple and the explanation that probable value is called result of calculation thereof.
The present invention in the step (5) how effectively with user interactions, present to user inquiring result's explanation, being revised uncertainty and do not caused too big burden to the user by the user is that the method that the user imports the instruction of calling relevant explanation and presents to the user realizes by the following method:
The same with traditional relational database, probability database also is to come store data with table, and each row in the table is counted as the description to a probability event, and an event ID and a probable value field are arranged accordingly.The result of each inquiry also is a table, and each row in the table is derived by corresponding row in the table that leaves in the database, and this process is counted as by some elementary events and has derived a complicated event.Like this, each row just has the event representation formula of a correspondence in the table of Query Result, has write down this incident and by which elementary event has been derived, and by the probable value of this event representation formula and elementary event, can calculate the probable value of this complicated event.Yet, be wrong if leave the probable value of the elementary event in the database in, the probable value by the result event of its derivation also is wrong so.Therefore, in the probable value of presenting to user's result event, if also presenting to user's result event by which elementary event is derived, and their probable value, the user just can revise some wrong probable value like this, recomputate by system then, return one of user probability of outcome value more accurately.
In probability database, the result of an inquiry is normally derived by a large amount of uncertain tuples, present to the user all with tuple is relevant as a result uncertain tuple by they revise both unrealistic neither be user-friendly.In other words, the explanation to Query Result that system provides must enough be simplified, should between without any explain and overinterprete between, make the user can be fast correcting errors based on own personal knowledge judgement Query Result and derived query result's uncertain tuple.And the content of the explanation that system provides must have very high influence to Query Result, that is to say, if change has taken place the content of these explanations, Query Result also can change a lot accordingly.The probable value of tuple changes a lot if the change of the probable value of some tuple can not cause as a result, and the explanation of tuple as a result of of these tuples is not had much value.
The method of the present invention by partial derivative obtains preceding k incident that the Query Result probable value is had the greatest impact: concrete grammar is as follows:
For any incident e, represent its probable value with P (e).For each row in the Query Result, with ee=F (e1, e2 ..., en) represent its event representation formula, ee is by elementary event e1 in expression, e2 ..., en by Boolean operator with (∧), or (∨), non-(~) combines.The probable value of ee can be by elementary event e1, e2 ..., the probable value of en derives, be expressed as P (ee)=f (P (e1), P (e2) ..., P (en)).
The present invention weighs the probable value of each tuple to the influence of the probable value of tuple as a result with partial derivative, the probable value of tuple is big more with respect to the rate of change of the probable value of a uncertain tuple as a result, and this uncertain tuple is just big more to the influence of tuple as a result.Given one by elementary event e1, e2 ..., the complicated event ee=F that en derives (e1, e2 ..., en), its probable value be P (ee)=f (P (e1), P (e2) ..., P (en)), each elementary event ei weighs with respect to the rate of change of P (ei) with P (ee) the influence of the probable value of ee, and this is expressed as corresponding to mathematical local derviation notion
For event representation formula ee=F (e1, e2 ..., en), its probable value be P (ee)=f (P (e1), P (e2) ..., P (en)), the explanation of ee is defined as<ei1, P (ei1) 〉,<ei2, P (ei2)〉...,<eik, P (eik) 〉, wherein k specifies ei1 by user or system, ei2 ..., preceding k the incident that eik has the greatest impact for the probable value to ee.The user can be to ei1 in explaining, ei2 ..., some probable value of eik is revised, and system can recomputate the probable value of ee according to the probable value of correction and present to the user.
Further, the present invention calculates preceding k the incident of partial derivative to obtain the Query Result probable value is had the greatest impact by recursive mode, and concrete grammar is as follows:
The present invention is primarily aimed at the inquiry of safety program, each tuple of Query Result is provided explain and erase mechanism.For the inquiry that safety program is arranged, the event representation formula of tuple can be converted into the equivalent form of value that any event ID does not repeat as a result.For event representation formula ee=F (e1, e2 ..., en), e1, e2 ..., en does not repeat, known P (e1), P (e2) ..., what P (en), P (ee) can recurrence is calculated as follows:
(1) (1≤i≤n), P (ei) is known for event representation formula ei;
(2) for event representation formula~ee, wherein ee is an event representation formula, P (~ee)=1-P (ee);
(3) for event representation formula ee1 ∧ ee2, wherein ee1 and ee2 are two event representation formulas,
P(ee1∧ee2)=P(ee1)*P(ee2);
(4) for event representation formula ee1 ∨ ee2, wherein ee1 and ee2 are two event representation formulas,
P(ee1∨ee2)=P(ee1)+P(ee2)-P(ee1)*P(ee2)。
P (ee) is with respect to P (e1), P (e2) ..., what the partial derivative of P (en) can recurrence is calculated as follows:
(1) for event representation formula ei (1≤i≤n),
(2) for event representation formula~ee, wherein ee is an event representation formula,
(3) for event representation formula ee1 ∧ ee2, wherein ee1 and ee2 are two event representation formulas,
(4) for event representation formula ee1 ∨ ee2, wherein ee1 and ee2 are two event representation formulas,
The present invention has provided algorithm application with linear session complexity simultaneously in recursive calculation method preceding k the tuple that tuple as a result has the greatest impact of getting it right of falling into a trap, and as the user is presented in the explanation of tuple as a result, specifically describes as follows:
Each event representation formula can represent with a binary tree, and leafy node is corresponding to event ID, and inner node is corresponding to ∧, ∨ ,~three kinds of operational symbols.For each the node N in the binary tree, left child's node of representing it with LChild (N), representing its right child's node with RChild (N), is the subtree of root node with subTree (N) expression with N, with ee (N) expression and the corresponding event representation formula of subTree (N).Each node N increases an additional territory and deposits P (ee (N)).Suppose en1, en2 ..., enk, corresponding with all leafy nodes among the subTree (N) for appearing at all event IDs among the ee (N), node is deposited by additional tabulation
Be designated as
For event representation formula ee=F (e1, e2 ..., en), can in the process of calculating its probable value P (ee), calculate its explanation.The algorithm pair binary tree corresponding with ee carries out follow-up traversal, and when traversing each node N, its left child's node LChild (N) and right child's node RChild (N) are processed, therefore, for left child's node, P (ee (LChild (N))) and
For right child's node, P (ee (RChild (N))) and
Obtain, P (ee (N)) and
Can utilize computation rule above-mentioned to obtain.
After the postorder traversal binary tree corresponding, can obtain with ee
Next from
Middle preceding k value selecting maximum just can obtain the explanation to result of calculation.
The time complexity that can prove above algorithm is a square magnitude for the length of event representation formula ee, the event ID that the length of the ee of event representation formula here is meant among the ee to be comprised and the total number of operational symbol.The present invention optimizes above-mentioned algorithm, in the process that binary tree is traveled through, for each node N, can tabulate at partial derivative
In only keep preceding k element of absolute value maximum, and all the other elements are shifted to an earlier date jettisoning, can prove to be kept at last in the root node to be just
In preceding k element of absolute value maximum, and the time complexity of algorithm is reduced to linearity for event representation formula ee.
In the step of the present invention (7), after the user revises uncertainty, how to recomputate Query Result and not spend the too many time overhead of system? the present invention realizes in the following way:
To the probabilistic correction of data, query engine need recomputate Query Result based on the user.The present invention has developed otherwise effective technique, by preserve results of intermediate calculations in binary tree, only recomputates the node relevant with being corrected tuple, compares with the time that calculating inquiry is for the first time spent, and has significantly reduced the time that inquiry spends of recomputating.
Because the result of calculation before each node has all been preserved in the binary tree after the probable value of some node is modified, has only from the leafy node that is modified the node correspondence and need recomputate to the node on the root node path.For the number of the node that the time complexity of algorithm is revised with respect to the user for linearity, for the height of the tree of presentation of events expression formula ee, also be linear.
As shown in Figure 2, provided by the invention the probability database Query Result is made an explanation and wipes the system that changes, comprise user interface 100, server 200, wherein server 200 comprise central processing unit 201, storage unit 202, call matching unit 203, computing unit 204.Wherein, user interface 100 is used for for user input query, shows that to the user Query Result etc. sets up the interactive exchange of user and system.Data query after the query statement that 201 couples of users of described central processing unit send by user interface 100 is handled and will be handled sends to and calls matching unit 203.The event data of calling in the tabular of data query after matching unit 203 will be handled and the probability database in the storage unit 202 is mated, and obtains all associated event datas.Data in the described probability database are deposited in the mode of table, and each row in the table is counted as the description to a probability event, comprises an event ID and a probable value field.Call all associated event datas that matching unit 203 will obtain and send to computing unit 204.The probable value that 204 pairs of computing units call in the event data that matching unit 203 sends is calculated, and obtains Query Result, and Query Result is fed back to user interface is shown to the user.Simultaneously computing unit 204 is that probable value and event data probable value that each is associated are handled according to the result who calculates, the content (probable value) of the explanation (event data) that provides according to system sorts to the influence degree of Query Result, obtain the correlativity sequence table, that is to say, if change has taken place in the content of these explanations, Query Result also can change a lot accordingly, and so such event data will be emitted on the front.If the change of the probable value of some incident can not cause result's probable value to change a lot, the consequent explanation of these incidents there is not much value.Computing unit 204 among the present invention obtains the computing method of correlated series table and can obtain according to the algorithm with linear session complexity by the partial derivative recurrence method of as above method description.Be not repeated at this.Computing unit 204 stores the correlativity sequence table that obtains, and middle result of calculation is stored.User interface 100 shows Query Result to the user.When user's input checked that former correlativitys that obtain this result are explained, central processing unit 201 accessed corresponding with it former correlativitys explanation and feeds back to user interface according to user's instruction control computing unit 204.The user makes amendment to the probable value of some explanation and feeds back to central processing unit 201 by user interface 100, central processing unit 201 feeds back to computing unit 204 with the data of the user's modification of its acquisition, computing unit 204 knows that by comparison the probable value of those event datas is modified, and recomputates Query Result and feed back to user interface according to the binary tree algorithm in the said method step (7).
Among the present invention, computing unit 204 only recomputates the node relevant with being corrected tuple by preserve results of intermediate calculations in binary tree, compares with the time that calculating inquiry is for the first time spent, and has significantly reduced the time that inquiry spends of recomputating.Because the result of calculation before each node has all been preserved in the binary tree after the probable value of some node is modified, has only from the leafy node that is modified the node correspondence and need recomputate to the node on the root node path.For the number of the node that the time complexity of algorithm is revised with respect to the user for linearity, for the height of the tree of presentation of events expression formula ee, also be linear.
Fig. 3 has provided the example of a probability database.In this probability database, comprise three table: Movie (film), FemaleLike (woman likes) and MaleLike (man likes) altogether.Wherein show to have deposited among the Movie information of film, event ID of each row is T in the table, represents that this row expressed a definite incident, and its probable value is 1.0.Each line description of table among the FemaleLike every probability event that film is liked by the women, for example first line description MovieID be that 1 film is by probability event of women's preferr it that way, its probable value is 0.2 (for example, in certain questionnaire, there is 20% women to think and oneself likes this film).Similarly, each line description of table among the MaleLike every probability event that film is liked by the male sex.
Fig. 4 has provided the user interface synoptic diagram of system's operation.
Suppose that certain user will estimate that each time has at least a film to be liked by the women but the possibility do not liked by the male sex, he can submit following SQL (Structured Query Language (SQL)) inquiry to system, shown in Fig. 4 .1:
SELECT ReleaseYear (selection is put on show the time)
FROM(FemaleLike?EXCEPT?MaleLike)ASA,Movie
The Query Result that WHERE A.MovieID=Movie.MovieID system returns for the user is shown in Fig. 4 .2.For example, first line display of Query Result had at least a film to be liked by the women in 1948 but the probability do not liked by the male sex is 0.591.
The user may doubt the answer that obtains, and thinks further to understand the Query Result of first row, has at least a film be liked by the women in promptly 1948 but the probability do not liked by the male sex is 0.591, how to draw." Check Answer Explanation " (checking the answer explanation) item is selected in user's right click on first row in the shortcut menu that ejects, check explanation, shown in Fig. 4 .3.
System ejects a dialog box, and the prompting user imports the value of K, shown in Fig. 4 .4.Here K≤6, represent this as a result tuple be to derive by 6 basic tuples that leave in the database, the number of the basic tuple of selecting to want to check by the user that tuple is as a result had the greatest impact, the i.e. value of K.Because each tuple in the Query Result might be to be derived by the thousands of tuple that leaves in the database, it is very necessary therefore allowing the user select the value of K here.
System returns the explanation to Query Result, shown in Fig. 4 .5.The explanation of Query Result has provided the event ID and the probable value of preceding K the tuple that row that the user is concerned about has the greatest impact, and as a result the probable value of tuple with respect to the rate of change of their probable value.Here weigh each tuple to the influence of tuple probable value as a result with the probable value of tuple as a result with respect to the rate of change of the probable value of each tuple, rate of change is big more, and influence is just big more.For example, first line display in the explanation is FLike3 to the event ID of the tuple that the probable value of Query Result has the greatest impact, and corresponding the 3rd film is by probability event of women's preferr it that way, its probable value 0.7.
The user can make judgement according to the personal knowledge of oneself, think that the probability that the 3rd film liked by the women should be 0.3 rather than 0.7, the probability that first film liked by the male sex is 0.4 rather than 0.7, so he revises the probable value of Query Result in explaining, shown in Fig. 4 .6.
Next the user clicks " application " button, and system returns a new result of calculation according to user's modification, has at least a film be liked by the women in promptly 1948 but the probability do not liked by the male sex is 0.338, shown in Fig. 4 .7.
What the present invention proposed makes an explanation and wipes the system and method that changes database query result, has the following advantages:
1, make probability database to utilize its personal knowledge to provide Query Result more accurately at particular user for it;
2, the result to inquiry makes an explanation, and for the inquiry with safe inquiry plan, is linear with not providing explanation to compare required extra consumed time complexity;
3, by preserving inquiry plan, significantly reduced and recomputated the required consumed time cost of Query Result.
In this instructions, the present invention is described with reference to its certain embodiments, still, still can make various modifications and conversion obviously and does not deviate from the spirit and scope of the present invention, therefore, instructions of the present invention and accompanying drawing are considered to illustrative and nonrestrictive.
Claims (10)
1. one kind makes an explanation to Query Result in the probability database and wipes the method that changes, it is characterized in that, store data in the mode of table in the described probability database, each tuple in the table is counted as the description to a probability event, each tuple comprises event ID and probable value field, the probable value field is represented the probability that the described probability event of this tuple takes place, and comprises following step:
(1) user submits the Structured Query Language (SQL) inquiry to;
(2) inquiry that the user is submitted to is handled, the data after obtaining to handle;
(3) data after handling and the tuple data in the database are mated, obtain associated tuple data and relevant tuple data is accessed formation Query Result tuple from probability database;
(4) carry out computing according to the probable value of the associated tuple data that is obtained, obtain the probable value of Query Result tuple; Simultaneously the associated tuple data of obtain is carried out priority ordering and preservation;
(5) return the Query Result probable value to the user, subsidiary simultaneously explanation to the Query Result probable value,
(6) user is according to Query Result that returns and explanation thereof, and whether decision accepts Query Result; If do not accept Query Result, user input is called the instruction of relevant explanation and is revised explanation to Query Result;
(7) Query Result is recomputated in the modification of Query Result being explained according to the user, returns to the user;
Wherein, with Query Result topmost before the event ID of k tuple and the explanation that probable value is called result of calculation thereof.
2. as claimed in claim 1 Query Result in the probability database is made an explanation and wipes the method that changes, it is characterized in that, the Query Result that each Structured Query Language (SQL) data is produced also is a table, each tuple in the table is to be derived by other tuple that leaves in the database, is calculated result's probable value by the probable value of these basic tuples.
3. as claimed in claim 2 Query Result in the probability database is made an explanation and wipes the method that changes, it is characterized in that, for the explanation of presenting to the user inquiring result in the step (5), obtain by the following method: the probable value of weighing each tuple by partial derivative is to the influence of the probable value of tuple as a result, the probable value of tuple is big more with respect to the rate of change of the probable value of a uncertain tuple as a result, and this uncertain tuple is just big more to the influence of tuple as a result.
4. as claimed in claim 3 Query Result in the probability database made an explanation and wipes the method that changes, it is characterized in that, obtain user's explanation by following partial derivative measurement method:
For any incident e, represent its probable value with P (e); For each row in the Query Result, with ee=F (e1, e2 ..., en) represent its event representation formula, ee is by elementary event e1 in expression, e2 ..., en by Boolean operator with (∧), or (∨), non-(~) combines; The probable value of ee can be by elementary event e1, e2 ..., the probable value of en derives, be expressed as P (ee)=f (P (e1), P (e2) ..., P (en)); Its probable value be P (ee)=f (P (e1), P (e2) ..., P (en)), each elementary event ei weighs with respect to the rate of change of P (ei) with P (ee) the influence of the probable value of ee, promptly partial derivative is expressed as
The value of this partial derivative is big more, represents that the probability of this basic tuple incident is big more to the probable value influence of Query Result.
5. as claimed in claim 4 Query Result in the probability database is made an explanation and wipes the method that changes, it is characterized in that,, obtain partial derivative by recursive algorithm for the explanation of presenting to the user inquiring result in the step (5), thereby obtain shown explanation, specific algorithm is as follows:
For event representation formula ee=F (e1, e2 ..., en), e1, e2 ..., en does not repeat, known P (e1), P (e2) ..., what P (en), P (ee) can recurrence is calculated as follows:
(2) (1≤i≤n), P (ei) is known for event representation formula ei;
(2) for event representation formula~ee, wherein ee is an event representation formula, P (~ee)=1-P (ee);
(3) for event representation formula ee1 ∧ ee2, wherein ee1 and ee2 are two event representation formulas, P (ee1 ∧ ee2)=P (ee1) * P (ee2);
(5) for event representation formula ee1 ∨ ee2, wherein ee1 and ee2 are two event representation formulas, P (ee1 ∨ ee2)=P (ee1)+P (ee2)-P (ee1) * P (ee2).
P (ee) is with respect to P (e1), P (e2) ..., what the partial derivative of P (en) can recurrence is calculated as follows:
(2) for event representation formula~ee, wherein ee is an event representation formula,
(3) for event representation formula ee1 ∧ ee2, wherein ee1 and ee2 are two event representation formulas,
(4) for event representation formula ee1 ∨ ee2, wherein ee1 and ee2 are two event representation formulas,
6. as claimed in claim 5 Query Result in the probability database is made an explanation and wipes the method that changes, it is characterized in that, for the explanation of presenting to the user inquiring result in the step (5), be to calculate in recursive algorithm by the algorithm application with linear session complexity, concrete grammar is as follows:
1) obtain each event representation formula ee=F (e1, e2 ..., probable value P en) (ee (N)) and partial derivative
2) each event representation formula ee=F (e1, e2 ..., en) adopt a binary tree to represent that leafy node is corresponding to event ID, inner node is corresponding to the operational symbol of event representation formula; For each the node N in the binary tree,,, represent with N to be the subtree of root node with subTree (N), with ee (N) expression and the corresponding event representation formula of subTree (N) with the right node that RChild (N) represents it with the left node that LChild (N) represents it; Each node N increases an additional territory and deposits probable value P (ee (N)); Suppose en1, en2 ..., enk, corresponding with all leafy nodes among the subTree (N) for appearing at all event IDs among the ee (N), node is deposited the partial derivative tabulation by an additional tabulation
Be designated as
3) calculate event representation formula ee=F (e1, e2 ..., calculate it in the process of probable value P en) (ee) and embody the partial derivative of explaining;
4) the method pair binary tree corresponding with ee carries out follow-up traversal, and when traversing each node N, its left node LChild (N) and right node RChild (N) are processed, therefore, and for the probable value P (ee (LChild (N))) and the partial derivative of left node
Reach probable value P (ee (RChild (N))) and partial derivative for right node
Obtain,
7. as claimed in claim 6 Query Result in the probability database is made an explanation and wipes the method that changes, it is characterized in that, in the process that binary tree is traveled through,, tabulate at partial derivative for each node N
In only keep preceding k element of absolute value maximum, and all the other elements are shifted to an earlier date jettisoning, be kept at last and in the root node be just
Preceding k element of middle absolute value maximum.
8. as claimed in claim 7 Query Result in the probability database is made an explanation and wipes the method that changes, it is characterized in that, for the modification of Query Result being explained according to the user in the step (7), recomputate Query Result, be to calculate:, only recomputate the node relevant with being corrected tuple by in binary tree, preserving results of intermediate calculations according to following method.
9. one kind makes an explanation to Query Result in the probability database and wipes the system that changes, and it is characterized in that comprise user interface, server, wherein server comprises central processing unit, storage unit, calls matching unit and computing unit; Wherein, user interface is used for for user input query, shows that to the user Query Result etc. sets up the interactive exchange of user and system; Data query after the query statement that described central processing unit sends by user interface the user is handled and will be handled sends to and calls matching unit; The event data of calling in the tabular of data query after matching unit will be handled and the probability database in the storage unit is mated, and obtains all associated event datas; Data in the described probability database are deposited in the mode of table, and each row in the table is counted as the description to a probability event, comprises an event ID and a probable value field; Call all associated event datas that matching unit will obtain and send to computing unit; Computing unit calculates the probable value in the event data of calling matching unit and sending, and obtains Query Result, and Query Result is fed back to user interface is shown to the user; Simultaneously computing unit is that probable value is handled with the event data probable value that each is associated according to the result who calculates, and the probable value of the explanation that provides according to system sorts the sequence table of acquisition correlativity to the influence degree of Query Result; Computing unit stores the correlativity sequence table that obtains, and middle result of calculation is stored; User interface shows Query Result to the user; When user's input checked that former correlativitys that obtain this result are explained, central processing unit accessed corresponding with it former correlativitys explanation and feeds back to user interface according to user's instruction control computing unit; The user makes amendment to the probable value of some explanation and feeds back to central processing unit by user interface, central processing unit feeds back to computing unit with the data of the user's modification of its acquisition, computing unit knows that by comparison the probable value of those event datas is modified, and recomputates Query Result and feed back to user interface.
10. as claimed in claim 9 Query Result in the probability database is made an explanation and wipes the system that changes, it is characterized in that computing unit calculates Query Result and explanation is to carry out according to the mode described in the aforesaid right requirement 3-8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 201110003411 CN102073718B (en) | 2011-01-10 | 2011-01-10 | System and method for explaining, erasing and modifying search result in probabilistic database |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 201110003411 CN102073718B (en) | 2011-01-10 | 2011-01-10 | System and method for explaining, erasing and modifying search result in probabilistic database |
Publications (2)
Publication Number | Publication Date |
---|---|
CN102073718A true CN102073718A (en) | 2011-05-25 |
CN102073718B CN102073718B (en) | 2013-01-30 |
Family
ID=44032257
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN 201110003411 Active CN102073718B (en) | 2011-01-10 | 2011-01-10 | System and method for explaining, erasing and modifying search result in probabilistic database |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102073718B (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102880629A (en) * | 2012-06-20 | 2013-01-16 | 杜小勇 | Accelerating query method of probabilistic database |
CN103116575A (en) * | 2011-11-16 | 2013-05-22 | 富士通株式会社 | Translated text word order probability determination method and device based on gradation phrase model |
CN105117442A (en) * | 2015-08-12 | 2015-12-02 | 东北大学 | Probability based big data query method |
CN108121766A (en) * | 2017-11-27 | 2018-06-05 | 浙江大学 | Multipair many types of PSJ Aggregation Queries method based on tuple grade uncertainty models |
CN108121765A (en) * | 2017-11-27 | 2018-06-05 | 浙江大学 | One-to-one type PSJ Aggregation Query methods based on PME graph models |
CN113626678A (en) * | 2020-05-06 | 2021-11-09 | 北京大学 | Knowledge graph data mining and recommending method based on dynamic suboptimal minimum spanning tree |
CN114911851A (en) * | 2022-07-12 | 2022-08-16 | 北京信安世纪科技股份有限公司 | Data query method, device and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1639712A (en) * | 2002-03-07 | 2005-07-13 | 皇家飞利浦电子股份有限公司 | A method and apparatus for providing search results in response to an information search request |
CN101853272A (en) * | 2010-04-30 | 2010-10-06 | 华北电力大学(保定) | Search engine technology based on relevance feedback and clustering |
CN101876979A (en) * | 2009-04-28 | 2010-11-03 | 株式会社理光 | Query expansion method and equipment |
CN101923560A (en) * | 2010-04-03 | 2010-12-22 | 崔志明 | Sorting technique of deep web database only providing simple query interface |
-
2011
- 2011-01-10 CN CN 201110003411 patent/CN102073718B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1639712A (en) * | 2002-03-07 | 2005-07-13 | 皇家飞利浦电子股份有限公司 | A method and apparatus for providing search results in response to an information search request |
CN101876979A (en) * | 2009-04-28 | 2010-11-03 | 株式会社理光 | Query expansion method and equipment |
CN101923560A (en) * | 2010-04-03 | 2010-12-22 | 崔志明 | Sorting technique of deep web database only providing simple query interface |
CN101853272A (en) * | 2010-04-30 | 2010-10-06 | 华北电力大学(保定) | Search engine technology based on relevance feedback and clustering |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103116575A (en) * | 2011-11-16 | 2013-05-22 | 富士通株式会社 | Translated text word order probability determination method and device based on gradation phrase model |
CN103116575B (en) * | 2011-11-16 | 2016-06-22 | 富士通株式会社 | Translation word order probability defining method and device based on gradation phrase model |
CN102880629A (en) * | 2012-06-20 | 2013-01-16 | 杜小勇 | Accelerating query method of probabilistic database |
CN102880629B (en) * | 2012-06-20 | 2015-06-03 | 杜小勇 | Accelerating query method of probabilistic database |
CN105117442A (en) * | 2015-08-12 | 2015-12-02 | 东北大学 | Probability based big data query method |
CN105117442B (en) * | 2015-08-12 | 2018-05-04 | 东北大学 | A kind of big data querying method based on probability |
CN108121766A (en) * | 2017-11-27 | 2018-06-05 | 浙江大学 | Multipair many types of PSJ Aggregation Queries method based on tuple grade uncertainty models |
CN108121765A (en) * | 2017-11-27 | 2018-06-05 | 浙江大学 | One-to-one type PSJ Aggregation Query methods based on PME graph models |
CN108121766B (en) * | 2017-11-27 | 2020-07-17 | 浙江大学 | Many-to-many PSJ aggregation query method based on tuple-level uncertainty model |
CN108121765B (en) * | 2017-11-27 | 2020-07-17 | 浙江大学 | PME graph model-based one-to-one PSJ aggregation query method |
CN113626678A (en) * | 2020-05-06 | 2021-11-09 | 北京大学 | Knowledge graph data mining and recommending method based on dynamic suboptimal minimum spanning tree |
CN114911851A (en) * | 2022-07-12 | 2022-08-16 | 北京信安世纪科技股份有限公司 | Data query method, device and storage medium |
CN114911851B (en) * | 2022-07-12 | 2022-10-14 | 北京信安世纪科技股份有限公司 | Data query method, device and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN102073718B (en) | 2013-01-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102073718B (en) | System and method for explaining, erasing and modifying search result in probabilistic database | |
JP7449919B2 (en) | Natural language interface for autonomous agents and databases with thesaurus | |
Chen et al. | Usher: Improving data quality with dynamic forms | |
US10402453B2 (en) | Utilizing large-scale knowledge graphs to support inference at scale and explanation generation | |
KR20200106950A (en) | Dimensional context propagation techniques for optimizing SQL query plans | |
US20110066605A1 (en) | Processes and systems for collaborative manipulation of data | |
US20180165610A1 (en) | Business intelligence language macros | |
Nikolic et al. | Incremental view maintenance with triple lock factorization benefits | |
US9146960B2 (en) | Adaptive optimization of iterative or recursive query execution by database systems | |
US20110313844A1 (en) | Real-time-ready behavioral targeting in a large-scale advertisement system | |
CN102509549B (en) | Language model training method and system | |
US20190228343A1 (en) | Flexible configuration of model training pipelines | |
EP3036660A1 (en) | Database access | |
CN107193882A (en) | Why not query answer methods based on figure matching on RDF data | |
Raghavan et al. | emrkbqa: A clinical knowledge-base question answering dataset | |
Zhang et al. | Mitigating language model hallucination with interactive question-knowledge alignment | |
AU2011210554A1 (en) | Statistical record linkage calibration for geographic proximity matching | |
Chari | Model composition in a distributed environment | |
US20170255752A1 (en) | Continuous adapting system for medical code look up | |
Padhy et al. | A quantitative performance analysis between Mongodb and Oracle NoSQL | |
Chevalier et al. | Document-oriented data warehouses: Complex hierarchies and summarizability | |
US9104707B1 (en) | Iterative generation of partial column schema | |
Kim et al. | M2Bench: a database benchmark for multi-model analytic workloads | |
Li et al. | A personalized recommendation system in E-Learning environment based on semantic analysis | |
Joshi et al. | Sampling-based estimators for subset-based queries |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |