CN102073718A - System and method for explaining, erasing and modifying search result in probabilistic database - Google Patents

System and method for explaining, erasing and modifying search result in probabilistic database Download PDF

Info

Publication number
CN102073718A
CN102073718A CN 201110003411 CN201110003411A CN102073718A CN 102073718 A CN102073718 A CN 102073718A CN 201110003411 CN201110003411 CN 201110003411 CN 201110003411 A CN201110003411 A CN 201110003411A CN 102073718 A CN102073718 A CN 102073718A
Authority
CN
China
Prior art keywords
user
event
probable value
query result
explanation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN 201110003411
Other languages
Chinese (zh)
Other versions
CN102073718B (en
Inventor
冯铃
陈建文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN 201110003411 priority Critical patent/CN102073718B/en
Publication of CN102073718A publication Critical patent/CN102073718A/en
Application granted granted Critical
Publication of CN102073718B publication Critical patent/CN102073718B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention provides a system and a method for explaining, erasing and modifying a search result in a probabilistic database, and the method is as follows: enabling a user to submit structured query language for query; processing the query and acquiring new data; matching with tuple data in the databse and acquiring the related tuple data so as to constitute a search result tuple; computing and processing according to probability value of the acquired related tuple data, and acquiring the probability value of the search result tuple; simultaneously performing priority sequencing and saving the acquired related tuple data; returning the probability value of the search result to the user, and simultaneously incidentally explaining the probability value of the search result; enabling the user to decide whether to accept or not according to the returned result and the explanation; enabling the user to modify the explanation of the search result if the returned result and the explanation are not accepted; and re-calculating the search result and returning the search result to the user according to the modification of the user to the explanation of the search result. The system and the method have the advantages that the user can acquire more accurate search result, and reduce time which is required to consume for re-calculating the search result.

Description

A kind of the probability database Query Result is made an explanation and wipes the system and method that changes
[technical field]
The present invention relates to the data management technique field, particularly the probability database Query Result is made an explanation and wipes the system and method that changes about a kind of.
[background technology]
The challenge that mobile Internet the is become popular technique raw data that to be management collected from mobile general environment also becomes it into the complicacy of the information that is fit to decision support.A main source of complicacy is uncertain, can be the uncertainty of body, also can be the uncertainty on the understanding.Uncertainty on the body mainly is because the entity existence of which kind of type is not described, uncertainty on the understanding is mainly derived from the measurement noise, deviation or inadequate sample, the data of losing owing to sensor or network failure, fuzzy situation, and different explanations.The present invention mainly utilizes the uncertainty on the probability theory solution understanding.
At present, expression and inquiry uncertain data have had number of research projects under mobile computing environment, and wherein most representative is probability database, comprises the probability data model, the aspects such as approximate treatment of probabilistic query and probabilistic query.Many universities and scientific research institution have all developed the prototype system of probability database in succession, the Trio of Stanford University for example, the MYSTIQ of University of Washington, MayBMS of Cornell University or the like.
In database field, in order to represent uncertain data, the most a kind of data model that probability model is adopted.The granularity of uncertain data comprises relation table, tuple and three kinds of ranks of attribute, wherein the most attractive based on the uncertain data model of tuple, this mainly is because it has done good balance at (independence between the probability event) and counting yield aspect aspect the expression.Research of the present invention basis is based on the probability data model of tuple.The set of a probability results tuple is returned in the inquiry of submitting to probability database, and wherein each probability results tuple is that elementary probability tuple from leave database in derives.The probable value of result of calculation tuple can be summed up as the probable value problem of calculating an any one occurrence expression formula, and its time complexity can be PTIME (polynomial time), also can be #P-hard (#P difficulty), and this is relevant with the event representation formula.In order to solve the situation of #P-hard, some approximate calculative strategies are developed, and comprise the searching algorithm based on figure, based on the optimized Algorithm of inquiry plan search, based on the random algorithm of Meng Te-Caro simulation with based on the algorithm of probability interval.In addition, in probability database, the inquiry of many particular types has also obtained support, and for example top-K (preceding K) inquiry, Skyline (local horizon) analyze and assemble and inquire about or the like.
Except a large amount of research work, thereby also obtained the attention of academia by cleaning quality that the uncertain data of depositing in the database improves database query result at the search efficiency that improves probability database.Existing research work comprises the mistake by the data of depositing in the integrity constraint correction database, for the data that clash in the database are given corresponding probable value, in inconsistent probability database, support consistent data base querying, assemble to recover disappearance or insecure sensing data by space-time, eliminate the uncertainty of data by acquisition sensor data source again.
The result who returns for system provides the work of explanation to originate from artificial intelligence field.The explanation of a sentence P is defined as a set E who derives the sentence of sentence P.If all sentences among the set E are set up, sentence P also sets up so.In infosystems such as expert system, commending system, explain the purpose that is often used as user's understanding, education, acceptance, persuasion, system debug and wipes.In database field, inquiry plan and routing are only considered in the explanation of Query Result usually, and fundamental purpose is to adjust application program with the better utilization index.What recently, had at being interpreted as research work that Query Result is returned, but the emphasis of research mainly is at traditional database rather than probability database.In addition,, the research work at the explanation aspect of rank (ordering) inquiry is arranged also, the attribute of the rank that can explain a tuple is sorted in the traditional database field.The notion of digital proof and relationship mainly is that trace data by which master data is derived, and has also obtained the attention of academia recently.Digital proof and relationship can provide a lot of important more can, for example explain, verify and recomputate.The Panda project of Stamford be devoted to based on data and based on seamless the combining of the proof of process, a kind of language of general inquiry, analysis and use digital proof is provided.
In information retrieval field, because the ambiguity of user inquiring and the redundancy of the document that system returns have a lot of research work at the user feedback technical elements.In user's feedback procedure, the user is required according to certain principle and individual preference the result who returns to be given a mark usually, and system upgrades the principle of coupling then, returns the Query Result that more meets customer requirements.The researchist in general fit calculation field also proposes in the application program of scene perception to use the technology of user feedback, this mainly be because with subscriber-related scene information can not be by technology the means perception or infer and.
[summary of the invention]
Purpose of the present invention has proposed a kind ofly in probability database Query Result to be made an explanation and wipe the system and method that changes, thereby provides Query Result more accurately for the user.
Provided by the inventionly a kind of Query Result in the probability database is made an explanation and wipes the method that changes, store data in the mode of table in the described probability database, each tuple in the table is counted as the description to a probability event, each tuple comprises event ID and probable value field, the probable value field is represented the probability that the described probability event of this tuple takes place, and comprises following step:
(1) user submits the Structured Query Language (SQL) inquiry to;
(2) inquiry that the user is submitted to is handled, the data after obtaining to handle;
(3) data after handling and the tuple data in the database are mated, obtain associated tuple data and relevant tuple data is accessed formation Query Result tuple from probability database;
(4) carry out computing according to the probable value of the associated tuple data that is obtained, obtain the probable value of Query Result tuple; Simultaneously the associated tuple data of obtain is carried out priority ordering and preservation;
(5) return the Query Result probable value to the user, subsidiary simultaneously explanation to the Query Result probable value,
(6) user is according to Query Result that returns and explanation thereof, and whether decision accepts Query Result; If do not accept Query Result, user input is called the instruction of relevant explanation and is revised explanation to Query Result;
(7) Query Result is recomputated in the modification of Query Result being explained according to the user, returns to the user;
Wherein, with Query Result topmost before the event ID of k tuple and the explanation that probable value is called result of calculation thereof.
The Query Result that each Structured Query Language (SQL) data is produced also is a table, and each tuple in the table is to be derived by other tuple that leaves in the database, is calculated result's probable value by the probable value of these basic tuples.
For the explanation of presenting to the user inquiring result in the step (5), obtain by the following method: the probable value of weighing each tuple by partial derivative is to the influence of the probable value of tuple as a result, the probable value of tuple is big more with respect to the rate of change of the probable value of a uncertain tuple as a result, and this uncertain tuple is just big more to the influence of tuple as a result.
Obtain user's explanation by following partial derivative measurement method:
For any incident e, represent its probable value with P (e); For each row in the Query Result, with ee=F (e1, e2 ..., en) represent its event representation formula, ee is by elementary event e1 in expression, e2 ..., en by Boolean operator with (∧), or (∨), non-(~) combines; The probable value of ee can be by elementary event e1, e2 ..., the probable value of en derives, be expressed as P (ee)=f (P (e1), P (e2) ..,, P (en)); Its probable value be P (ee)=f (P (e1), P (e2) ..., P (en)), each elementary event ei weighs with respect to the rate of change of P (ei) with P (ee) the influence of the probable value of ee, promptly partial derivative is expressed as
Figure BSA00000412976600031
The value of this partial derivative is big more, represents that the probability of this basic tuple incident is big more to the probable value influence of Query Result.
For the explanation of presenting to the user inquiring result in the step (5), obtain partial derivative by recursive algorithm, thereby obtain shown explanation, specific algorithm is as follows:
For event representation formula ee=F (e1, e2 ..., en), e1, e2 ..., en does not repeat, known P (e1), P (e2) ..., what P (en), P (ee) can recurrence is calculated as follows:
(1) (1≤i≤n), P (ei) is known for event representation formula ei;
(2) for event representation formula~ee, wherein ee is an event representation formula, P (~ee)=1-P (ee);
(3) for event representation formula ee1 ∧ ee2, wherein ee1 and ee2 are two event representation formulas, P (ee1 ∧ ee2)=P (ee1) * P (ee2);
(4) for event representation formula ee1 ∨ ee2, wherein ee1 and ee2 are two event representation formulas, P (ee1 ∨ ee2)=P (ee1)+P (ee2)-P (ee1) * P (ee2).
P (ee) is with respect to P (e1), P (e2) ..., what the partial derivative of P (en) can recurrence is calculated as follows:
(1) for event representation formula ei (1≤i≤n),
Figure BSA00000412976600041
(2) for event representation formula~ee, wherein ee is an event representation formula,
∂ P ( ~ ee ) ∂ P ( e i ) = ∂ ( 1 - P ( ee ) ) ∂ P ( e i ) = - ∂ P ( ee ) ∂ P ( e i ) ;
(3) for event representation formula ee1 ∧ ee2, wherein ee1 and ee2 are two event representation formulas,
Figure BSA00000412976600043
(4) for event representation formula ee1 ∨ ee2, wherein ee1 and ee2 are two event representation formulas,
Figure BSA00000412976600044
Figure BSA00000412976600045
For the explanation of presenting to the user inquiring result in the step (5), be in recursive algorithm, to calculate by algorithm application with linear session complexity, concrete grammar is as follows:
1) obtain each event representation formula ee=F (e1, e2 ..., probable value P en) (ee (N)) and partial derivative
Figure BSA00000412976600046
2) each event representation formula ee=F (e1, e2 ..., en) adopt a binary tree to represent that leafy node is corresponding to event ID, inner node is corresponding to the operational symbol of event representation formula; For each the node N in the binary tree,,, represent with N to be the subtree of root node with subTree (N), with ee (N) expression and the corresponding event representation formula of subTree (N) with the right node that RChild (N) represents it with the left node that LChild (N) represents it; Each node N increases an additional territory and deposits probable value P (ee (N)); Suppose en1, en2 ..., enk, corresponding with all leafy nodes among the subTree (N) for appearing at all event IDs among the ee (N), node is deposited the partial derivative tabulation by an additional tabulation Be designated as
3) calculate event representation formula ee=F (e1, e2 ..., calculate it in the process of probable value P en) (ee) and embody the partial derivative of explaining;
4) the method pair binary tree corresponding with ee carries out follow-up traversal, and when traversing each node N, its left node LChild (N) and right node RChild (N) are processed, therefore, and for the probable value P (ee (LChild (N))) and the partial derivative of left node
Figure BSA00000412976600053
Reach probable value P (ee (RChild (N))) and partial derivative for right node
Figure BSA00000412976600054
Obtain,
After the postorder traversal binary tree corresponding, can obtain with ee
Figure BSA00000412976600055
Next from
Figure BSA00000412976600056
Middle preceding k value selecting maximum just can obtain the explanation to result of calculation.
In the process that binary tree is traveled through,, tabulate at partial derivative for each node N
Figure BSA00000412976600057
In only keep preceding k element of absolute value maximum, and all the other elements are shifted to an earlier date jettisoning, be kept at last and in the root node be just
Figure BSA00000412976600058
Preceding k element of middle absolute value maximum.
For the modification of Query Result being explained according to the user in the step (7), recomputate Query Result, be to calculate:, only recomputate the node relevant with being corrected tuple by in binary tree, preserving results of intermediate calculations according to following method.
Provided by the inventionly a kind of Query Result in the probability database is made an explanation and wipes the system that changes, comprise user interface, server, wherein server comprises central processing unit, storage unit, calls matching unit and computing unit; Wherein, user interface is used for for user input query, shows that to the user Query Result etc. sets up the interactive exchange of user and system; Data query after the query statement that described central processing unit sends by user interface the user is handled and will be handled sends to and calls matching unit; The event data of calling in the tabular of data query after matching unit will be handled and the probability database in the storage unit is mated, and obtains all associated event datas; Data in the described probability database are deposited in the mode of table, and each row in the table is counted as the description to a probability event, comprises an event ID and a probable value field; Call all associated event datas that matching unit will obtain and send to computing unit; Computing unit calculates the probable value in the event data of calling matching unit and sending, and obtains Query Result, and Query Result is fed back to user interface is shown to the user; Simultaneously computing unit is that probable value is handled with the event data probable value that each is associated according to the result who calculates, and the probable value of the explanation that provides according to system sorts the sequence table of acquisition correlativity to the influence degree of Query Result; Computing unit stores the correlativity sequence table that obtains, and middle result of calculation is stored; User interface shows Query Result to the user; When user's input checked that former correlativitys that obtain this result are explained, central processing unit accessed corresponding with it former correlativitys explanation and feeds back to user interface according to user's instruction control computing unit; The user makes amendment to the probable value of some explanation and feeds back to central processing unit by user interface, central processing unit feeds back to computing unit with the data of the user's modification of its acquisition, computing unit knows that by comparison the probable value of those event datas is modified, and recomputates Query Result and feed back to user interface.
Computing unit calculates Query Result and explanation is carried out according to above-mentioned described mode.
The present invention is by when presenting to the user inquiring result, also present to user inquiring result's explanation, be that Query Result by which tuple is derived, and then revise by the mistake that exists in the tuple of user to the derived query result, thereby system can recomputate according to user's correction, returns a Query Result more accurately.To the probabilistic correction of data, query engine need recomputate Query Result based on the user.The present invention has developed otherwise effective technique, by preserve results of intermediate calculations in binary tree, only recomputates the node relevant with being corrected tuple, compares with the time that calculating inquiry is for the first time spent, and has significantly reduced the time that inquiry spends of recomputating.Owing to each node in the binary tree has all been preserved result of calculation in the past, after the probable value of some node is modified, has only from the leafy node that is modified the node correspondence and need recomputate in addition to the node on the root node path.For the number of the node that the time complexity of algorithm is revised with respect to the user for linearity, for the height of the tree of presentation of events expression formula, also be linear.What the present invention proposed makes an explanation and wipes the system and method that changes database query result, has the following advantages: 1, make probability database to utilize its personal knowledge to provide Query Result more accurately for it at particular user; 2, the result to inquiry makes an explanation, and for the inquiry with safe inquiry plan, is linear with not providing explanation to compare required extra consumed time complexity; 3, by preserving inquiry plan, significantly reduced and recomputated the required consumed time cost of Query Result.
[description of drawings]
Fig. 1 is the inventive method schematic flow sheet;
Fig. 2 is the functional block diagram of system of the present invention;
Fig. 3 has deposited three probability database examples of showing Movie, FemaleLike and MaleLike;
The flow process interface synoptic diagram of submitting SQL query demonstration Query Result to the end from the user to that Fig. 4 .1-Fig. 4 .7 passes the imperial examinations at the provincial level for the present invention.
[embodiment]
Reach technological means and the effect that predetermined purpose is taked for further setting forth the present invention, below in conjunction with accompanying drawing and preferred embodiment, a kind of the probability database Query Result is made an explanation and wipes the system and method that changes what foundation the present invention proposed, its embodiment, feature and effect thereof, illustrate as after.
As shown in Figure 1, it is the process flow diagram of method of the present invention, what the present invention proposed makes an explanation and wipes the method that changes the probability database Query Result, store data in the mode of table in the described probability database, each tuple in the table is counted as the description to a probability event, each tuple comprises event ID and probable value field, and the probable value field is represented the probability that the described probability event of this tuple takes place, and comprises following step:
(1) user submits SQL (a kind of Structured Query Language (SQL)) inquiry to;
(2) inquiry that the user is submitted to is handled, the data after obtaining to handle;
(3) data after handling and the tuple data in the database are mated, obtain associated tuple data and relevant tuple data is accessed formation Query Result tuple from probability database;
(4) carry out computing according to the probable value of the associated tuple data that is obtained, obtain the probable value of Query Result tuple; Simultaneously the associated tuple data of obtain is carried out priority ordering and preservation;
(5) return the Query Result probable value to the user, subsidiary simultaneously explanation to the Query Result probable value;
(6) user is according to Query Result that returns and explanation thereof, and whether decision accepts Query Result; If do not accept Query Result, user input is called the instruction of relevant explanation and is revised explanation to Query Result;
(7) Query Result is recomputated in the modification of Query Result being explained according to the user, returns to the user;
Wherein, the event ID of derived query result's topmost preceding k tuple and the explanation that probable value is called result of calculation thereof.
The present invention in the step (5) how effectively with user interactions, present to user inquiring result's explanation, being revised uncertainty and do not caused too big burden to the user by the user is that the method that the user imports the instruction of calling relevant explanation and presents to the user realizes by the following method:
The same with traditional relational database, probability database also is to come store data with table, and each row in the table is counted as the description to a probability event, and an event ID and a probable value field are arranged accordingly.The result of each inquiry also is a table, and each row in the table is derived by corresponding row in the table that leaves in the database, and this process is counted as by some elementary events and has derived a complicated event.Like this, each row just has the event representation formula of a correspondence in the table of Query Result, has write down this incident and by which elementary event has been derived, and by the probable value of this event representation formula and elementary event, can calculate the probable value of this complicated event.Yet, be wrong if leave the probable value of the elementary event in the database in, the probable value by the result event of its derivation also is wrong so.Therefore, in the probable value of presenting to user's result event, if also presenting to user's result event by which elementary event is derived, and their probable value, the user just can revise some wrong probable value like this, recomputate by system then, return one of user probability of outcome value more accurately.
In probability database, the result of an inquiry is normally derived by a large amount of uncertain tuples, present to the user all with tuple is relevant as a result uncertain tuple by they revise both unrealistic neither be user-friendly.In other words, the explanation to Query Result that system provides must enough be simplified, should between without any explain and overinterprete between, make the user can be fast correcting errors based on own personal knowledge judgement Query Result and derived query result's uncertain tuple.And the content of the explanation that system provides must have very high influence to Query Result, that is to say, if change has taken place the content of these explanations, Query Result also can change a lot accordingly.The probable value of tuple changes a lot if the change of the probable value of some tuple can not cause as a result, and the explanation of tuple as a result of of these tuples is not had much value.
The method of the present invention by partial derivative obtains preceding k incident that the Query Result probable value is had the greatest impact: concrete grammar is as follows:
For any incident e, represent its probable value with P (e).For each row in the Query Result, with ee=F (e1, e2 ..., en) represent its event representation formula, ee is by elementary event e1 in expression, e2 ..., en by Boolean operator with (∧), or (∨), non-(~) combines.The probable value of ee can be by elementary event e1, e2 ..., the probable value of en derives, be expressed as P (ee)=f (P (e1), P (e2) ..., P (en)).
The present invention weighs the probable value of each tuple to the influence of the probable value of tuple as a result with partial derivative, the probable value of tuple is big more with respect to the rate of change of the probable value of a uncertain tuple as a result, and this uncertain tuple is just big more to the influence of tuple as a result.Given one by elementary event e1, e2 ..., the complicated event ee=F that en derives (e1, e2 ..., en), its probable value be P (ee)=f (P (e1), P (e2) ..., P (en)), each elementary event ei weighs with respect to the rate of change of P (ei) with P (ee) the influence of the probable value of ee, and this is expressed as corresponding to mathematical local derviation notion
Figure BSA00000412976600081
For event representation formula ee=F (e1, e2 ..., en), its probable value be P (ee)=f (P (e1), P (e2) ..., P (en)), the explanation of ee is defined as<ei1, P (ei1) 〉,<ei2, P (ei2)〉...,<eik, P (eik) 〉, wherein k specifies ei1 by user or system, ei2 ..., preceding k the incident that eik has the greatest impact for the probable value to ee.The user can be to ei1 in explaining, ei2 ..., some probable value of eik is revised, and system can recomputate the probable value of ee according to the probable value of correction and present to the user.
Further, the present invention calculates preceding k the incident of partial derivative to obtain the Query Result probable value is had the greatest impact by recursive mode, and concrete grammar is as follows:
The present invention is primarily aimed at the inquiry of safety program, each tuple of Query Result is provided explain and erase mechanism.For the inquiry that safety program is arranged, the event representation formula of tuple can be converted into the equivalent form of value that any event ID does not repeat as a result.For event representation formula ee=F (e1, e2 ..., en), e1, e2 ..., en does not repeat, known P (e1), P (e2) ..., what P (en), P (ee) can recurrence is calculated as follows:
(1) (1≤i≤n), P (ei) is known for event representation formula ei;
(2) for event representation formula~ee, wherein ee is an event representation formula, P (~ee)=1-P (ee);
(3) for event representation formula ee1 ∧ ee2, wherein ee1 and ee2 are two event representation formulas,
P(ee1∧ee2)=P(ee1)*P(ee2);
(4) for event representation formula ee1 ∨ ee2, wherein ee1 and ee2 are two event representation formulas,
P(ee1∨ee2)=P(ee1)+P(ee2)-P(ee1)*P(ee2)。
P (ee) is with respect to P (e1), P (e2) ..., what the partial derivative of P (en) can recurrence is calculated as follows:
(1) for event representation formula ei (1≤i≤n),
(2) for event representation formula~ee, wherein ee is an event representation formula,
∂ P ( ~ ee ) ∂ P ( e i ) = ∂ ( 1 - P ( ee ) ) ∂ P ( e i ) = - ∂ P ( ee ) ∂ P ( e i ) ;
(3) for event representation formula ee1 ∧ ee2, wherein ee1 and ee2 are two event representation formulas,
Figure BSA00000412976600093
(4) for event representation formula ee1 ∨ ee2, wherein ee1 and ee2 are two event representation formulas,
Figure BSA00000412976600094
Figure BSA00000412976600095
The present invention has provided algorithm application with linear session complexity simultaneously in recursive calculation method preceding k the tuple that tuple as a result has the greatest impact of getting it right of falling into a trap, and as the user is presented in the explanation of tuple as a result, specifically describes as follows:
Each event representation formula can represent with a binary tree, and leafy node is corresponding to event ID, and inner node is corresponding to ∧, ∨ ,~three kinds of operational symbols.For each the node N in the binary tree, left child's node of representing it with LChild (N), representing its right child's node with RChild (N), is the subtree of root node with subTree (N) expression with N, with ee (N) expression and the corresponding event representation formula of subTree (N).Each node N increases an additional territory and deposits P (ee (N)).Suppose en1, en2 ..., enk, corresponding with all leafy nodes among the subTree (N) for appearing at all event IDs among the ee (N), node is deposited by additional tabulation
Figure BSA00000412976600101
Be designated as
Figure BSA00000412976600102
For event representation formula ee=F (e1, e2 ..., en), can in the process of calculating its probable value P (ee), calculate its explanation.The algorithm pair binary tree corresponding with ee carries out follow-up traversal, and when traversing each node N, its left child's node LChild (N) and right child's node RChild (N) are processed, therefore, for left child's node, P (ee (LChild (N))) and
Figure BSA00000412976600103
For right child's node, P (ee (RChild (N))) and
Figure BSA00000412976600104
Obtain, P (ee (N)) and
Figure BSA00000412976600105
Can utilize computation rule above-mentioned to obtain.
After the postorder traversal binary tree corresponding, can obtain with ee
Figure BSA00000412976600106
Next from
Figure BSA00000412976600107
Middle preceding k value selecting maximum just can obtain the explanation to result of calculation.
The time complexity that can prove above algorithm is a square magnitude for the length of event representation formula ee, the event ID that the length of the ee of event representation formula here is meant among the ee to be comprised and the total number of operational symbol.The present invention optimizes above-mentioned algorithm, in the process that binary tree is traveled through, for each node N, can tabulate at partial derivative In only keep preceding k element of absolute value maximum, and all the other elements are shifted to an earlier date jettisoning, can prove to be kept at last in the root node to be just
Figure BSA00000412976600109
In preceding k element of absolute value maximum, and the time complexity of algorithm is reduced to linearity for event representation formula ee.
In the step of the present invention (7), after the user revises uncertainty, how to recomputate Query Result and not spend the too many time overhead of system? the present invention realizes in the following way:
To the probabilistic correction of data, query engine need recomputate Query Result based on the user.The present invention has developed otherwise effective technique, by preserve results of intermediate calculations in binary tree, only recomputates the node relevant with being corrected tuple, compares with the time that calculating inquiry is for the first time spent, and has significantly reduced the time that inquiry spends of recomputating.
Because the result of calculation before each node has all been preserved in the binary tree after the probable value of some node is modified, has only from the leafy node that is modified the node correspondence and need recomputate to the node on the root node path.For the number of the node that the time complexity of algorithm is revised with respect to the user for linearity, for the height of the tree of presentation of events expression formula ee, also be linear.
As shown in Figure 2, provided by the invention the probability database Query Result is made an explanation and wipes the system that changes, comprise user interface 100, server 200, wherein server 200 comprise central processing unit 201, storage unit 202, call matching unit 203, computing unit 204.Wherein, user interface 100 is used for for user input query, shows that to the user Query Result etc. sets up the interactive exchange of user and system.Data query after the query statement that 201 couples of users of described central processing unit send by user interface 100 is handled and will be handled sends to and calls matching unit 203.The event data of calling in the tabular of data query after matching unit 203 will be handled and the probability database in the storage unit 202 is mated, and obtains all associated event datas.Data in the described probability database are deposited in the mode of table, and each row in the table is counted as the description to a probability event, comprises an event ID and a probable value field.Call all associated event datas that matching unit 203 will obtain and send to computing unit 204.The probable value that 204 pairs of computing units call in the event data that matching unit 203 sends is calculated, and obtains Query Result, and Query Result is fed back to user interface is shown to the user.Simultaneously computing unit 204 is that probable value and event data probable value that each is associated are handled according to the result who calculates, the content (probable value) of the explanation (event data) that provides according to system sorts to the influence degree of Query Result, obtain the correlativity sequence table, that is to say, if change has taken place in the content of these explanations, Query Result also can change a lot accordingly, and so such event data will be emitted on the front.If the change of the probable value of some incident can not cause result's probable value to change a lot, the consequent explanation of these incidents there is not much value.Computing unit 204 among the present invention obtains the computing method of correlated series table and can obtain according to the algorithm with linear session complexity by the partial derivative recurrence method of as above method description.Be not repeated at this.Computing unit 204 stores the correlativity sequence table that obtains, and middle result of calculation is stored.User interface 100 shows Query Result to the user.When user's input checked that former correlativitys that obtain this result are explained, central processing unit 201 accessed corresponding with it former correlativitys explanation and feeds back to user interface according to user's instruction control computing unit 204.The user makes amendment to the probable value of some explanation and feeds back to central processing unit 201 by user interface 100, central processing unit 201 feeds back to computing unit 204 with the data of the user's modification of its acquisition, computing unit 204 knows that by comparison the probable value of those event datas is modified, and recomputates Query Result and feed back to user interface according to the binary tree algorithm in the said method step (7).
Among the present invention, computing unit 204 only recomputates the node relevant with being corrected tuple by preserve results of intermediate calculations in binary tree, compares with the time that calculating inquiry is for the first time spent, and has significantly reduced the time that inquiry spends of recomputating.Because the result of calculation before each node has all been preserved in the binary tree after the probable value of some node is modified, has only from the leafy node that is modified the node correspondence and need recomputate to the node on the root node path.For the number of the node that the time complexity of algorithm is revised with respect to the user for linearity, for the height of the tree of presentation of events expression formula ee, also be linear.
Fig. 3 has provided the example of a probability database.In this probability database, comprise three table: Movie (film), FemaleLike (woman likes) and MaleLike (man likes) altogether.Wherein show to have deposited among the Movie information of film, event ID of each row is T in the table, represents that this row expressed a definite incident, and its probable value is 1.0.Each line description of table among the FemaleLike every probability event that film is liked by the women, for example first line description MovieID be that 1 film is by probability event of women's preferr it that way, its probable value is 0.2 (for example, in certain questionnaire, there is 20% women to think and oneself likes this film).Similarly, each line description of table among the MaleLike every probability event that film is liked by the male sex.
Fig. 4 has provided the user interface synoptic diagram of system's operation.
Suppose that certain user will estimate that each time has at least a film to be liked by the women but the possibility do not liked by the male sex, he can submit following SQL (Structured Query Language (SQL)) inquiry to system, shown in Fig. 4 .1:
SELECT ReleaseYear (selection is put on show the time)
FROM(FemaleLike?EXCEPT?MaleLike)ASA,Movie
The Query Result that WHERE A.MovieID=Movie.MovieID system returns for the user is shown in Fig. 4 .2.For example, first line display of Query Result had at least a film to be liked by the women in 1948 but the probability do not liked by the male sex is 0.591.
The user may doubt the answer that obtains, and thinks further to understand the Query Result of first row, has at least a film be liked by the women in promptly 1948 but the probability do not liked by the male sex is 0.591, how to draw." Check Answer Explanation " (checking the answer explanation) item is selected in user's right click on first row in the shortcut menu that ejects, check explanation, shown in Fig. 4 .3.
System ejects a dialog box, and the prompting user imports the value of K, shown in Fig. 4 .4.Here K≤6, represent this as a result tuple be to derive by 6 basic tuples that leave in the database, the number of the basic tuple of selecting to want to check by the user that tuple is as a result had the greatest impact, the i.e. value of K.Because each tuple in the Query Result might be to be derived by the thousands of tuple that leaves in the database, it is very necessary therefore allowing the user select the value of K here.
System returns the explanation to Query Result, shown in Fig. 4 .5.The explanation of Query Result has provided the event ID and the probable value of preceding K the tuple that row that the user is concerned about has the greatest impact, and as a result the probable value of tuple with respect to the rate of change of their probable value.Here weigh each tuple to the influence of tuple probable value as a result with the probable value of tuple as a result with respect to the rate of change of the probable value of each tuple, rate of change is big more, and influence is just big more.For example, first line display in the explanation is FLike3 to the event ID of the tuple that the probable value of Query Result has the greatest impact, and corresponding the 3rd film is by probability event of women's preferr it that way, its probable value 0.7.
The user can make judgement according to the personal knowledge of oneself, think that the probability that the 3rd film liked by the women should be 0.3 rather than 0.7, the probability that first film liked by the male sex is 0.4 rather than 0.7, so he revises the probable value of Query Result in explaining, shown in Fig. 4 .6.
Next the user clicks " application " button, and system returns a new result of calculation according to user's modification, has at least a film be liked by the women in promptly 1948 but the probability do not liked by the male sex is 0.338, shown in Fig. 4 .7.
What the present invention proposed makes an explanation and wipes the system and method that changes database query result, has the following advantages:
1, make probability database to utilize its personal knowledge to provide Query Result more accurately at particular user for it;
2, the result to inquiry makes an explanation, and for the inquiry with safe inquiry plan, is linear with not providing explanation to compare required extra consumed time complexity;
3, by preserving inquiry plan, significantly reduced and recomputated the required consumed time cost of Query Result.
In this instructions, the present invention is described with reference to its certain embodiments, still, still can make various modifications and conversion obviously and does not deviate from the spirit and scope of the present invention, therefore, instructions of the present invention and accompanying drawing are considered to illustrative and nonrestrictive.

Claims (10)

1. one kind makes an explanation to Query Result in the probability database and wipes the method that changes, it is characterized in that, store data in the mode of table in the described probability database, each tuple in the table is counted as the description to a probability event, each tuple comprises event ID and probable value field, the probable value field is represented the probability that the described probability event of this tuple takes place, and comprises following step:
(1) user submits the Structured Query Language (SQL) inquiry to;
(2) inquiry that the user is submitted to is handled, the data after obtaining to handle;
(3) data after handling and the tuple data in the database are mated, obtain associated tuple data and relevant tuple data is accessed formation Query Result tuple from probability database;
(4) carry out computing according to the probable value of the associated tuple data that is obtained, obtain the probable value of Query Result tuple; Simultaneously the associated tuple data of obtain is carried out priority ordering and preservation;
(5) return the Query Result probable value to the user, subsidiary simultaneously explanation to the Query Result probable value,
(6) user is according to Query Result that returns and explanation thereof, and whether decision accepts Query Result; If do not accept Query Result, user input is called the instruction of relevant explanation and is revised explanation to Query Result;
(7) Query Result is recomputated in the modification of Query Result being explained according to the user, returns to the user;
Wherein, with Query Result topmost before the event ID of k tuple and the explanation that probable value is called result of calculation thereof.
2. as claimed in claim 1 Query Result in the probability database is made an explanation and wipes the method that changes, it is characterized in that, the Query Result that each Structured Query Language (SQL) data is produced also is a table, each tuple in the table is to be derived by other tuple that leaves in the database, is calculated result's probable value by the probable value of these basic tuples.
3. as claimed in claim 2 Query Result in the probability database is made an explanation and wipes the method that changes, it is characterized in that, for the explanation of presenting to the user inquiring result in the step (5), obtain by the following method: the probable value of weighing each tuple by partial derivative is to the influence of the probable value of tuple as a result, the probable value of tuple is big more with respect to the rate of change of the probable value of a uncertain tuple as a result, and this uncertain tuple is just big more to the influence of tuple as a result.
4. as claimed in claim 3 Query Result in the probability database made an explanation and wipes the method that changes, it is characterized in that, obtain user's explanation by following partial derivative measurement method:
For any incident e, represent its probable value with P (e); For each row in the Query Result, with ee=F (e1, e2 ..., en) represent its event representation formula, ee is by elementary event e1 in expression, e2 ..., en by Boolean operator with (∧), or (∨), non-(~) combines; The probable value of ee can be by elementary event e1, e2 ..., the probable value of en derives, be expressed as P (ee)=f (P (e1), P (e2) ..., P (en)); Its probable value be P (ee)=f (P (e1), P (e2) ..., P (en)), each elementary event ei weighs with respect to the rate of change of P (ei) with P (ee) the influence of the probable value of ee, promptly partial derivative is expressed as
Figure FSA00000412976500021
The value of this partial derivative is big more, represents that the probability of this basic tuple incident is big more to the probable value influence of Query Result.
5. as claimed in claim 4 Query Result in the probability database is made an explanation and wipes the method that changes, it is characterized in that,, obtain partial derivative by recursive algorithm for the explanation of presenting to the user inquiring result in the step (5), thereby obtain shown explanation, specific algorithm is as follows:
For event representation formula ee=F (e1, e2 ..., en), e1, e2 ..., en does not repeat, known P (e1), P (e2) ..., what P (en), P (ee) can recurrence is calculated as follows:
(2) (1≤i≤n), P (ei) is known for event representation formula ei;
(2) for event representation formula~ee, wherein ee is an event representation formula, P (~ee)=1-P (ee);
(3) for event representation formula ee1 ∧ ee2, wherein ee1 and ee2 are two event representation formulas, P (ee1 ∧ ee2)=P (ee1) * P (ee2);
(5) for event representation formula ee1 ∨ ee2, wherein ee1 and ee2 are two event representation formulas, P (ee1 ∨ ee2)=P (ee1)+P (ee2)-P (ee1) * P (ee2).
P (ee) is with respect to P (e1), P (e2) ..., what the partial derivative of P (en) can recurrence is calculated as follows:
(1) for event representation formula ei (1≤i≤n),
Figure FSA00000412976500022
(2) for event representation formula~ee, wherein ee is an event representation formula,
∂ P ( ~ ee ) ∂ P ( e i ) = ∂ ( 1 - P ( ee ) ) ∂ P ( e i ) = - ∂ P ( ee ) ∂ P ( e i ) ;
(3) for event representation formula ee1 ∧ ee2, wherein ee1 and ee2 are two event representation formulas,
Figure FSA00000412976500024
(4) for event representation formula ee1 ∨ ee2, wherein ee1 and ee2 are two event representation formulas,
Figure FSA00000412976500031
Figure FSA00000412976500032
6. as claimed in claim 5 Query Result in the probability database is made an explanation and wipes the method that changes, it is characterized in that, for the explanation of presenting to the user inquiring result in the step (5), be to calculate in recursive algorithm by the algorithm application with linear session complexity, concrete grammar is as follows:
1) obtain each event representation formula ee=F (e1, e2 ..., probable value P en) (ee (N)) and partial derivative
Figure FSA00000412976500033
2) each event representation formula ee=F (e1, e2 ..., en) adopt a binary tree to represent that leafy node is corresponding to event ID, inner node is corresponding to the operational symbol of event representation formula; For each the node N in the binary tree,,, represent with N to be the subtree of root node with subTree (N), with ee (N) expression and the corresponding event representation formula of subTree (N) with the right node that RChild (N) represents it with the left node that LChild (N) represents it; Each node N increases an additional territory and deposits probable value P (ee (N)); Suppose en1, en2 ..., enk, corresponding with all leafy nodes among the subTree (N) for appearing at all event IDs among the ee (N), node is deposited the partial derivative tabulation by an additional tabulation
Figure FSA00000412976500034
Be designated as
Figure FSA00000412976500035
3) calculate event representation formula ee=F (e1, e2 ..., calculate it in the process of probable value P en) (ee) and embody the partial derivative of explaining;
4) the method pair binary tree corresponding with ee carries out follow-up traversal, and when traversing each node N, its left node LChild (N) and right node RChild (N) are processed, therefore, and for the probable value P (ee (LChild (N))) and the partial derivative of left node
Figure FSA00000412976500036
Reach probable value P (ee (RChild (N))) and partial derivative for right node
Figure FSA00000412976500037
Obtain,
After the postorder traversal binary tree corresponding, can obtain with ee
Figure FSA00000412976500038
Next from
Figure FSA00000412976500039
Middle preceding k value selecting maximum just can obtain the explanation to result of calculation.
7. as claimed in claim 6 Query Result in the probability database is made an explanation and wipes the method that changes, it is characterized in that, in the process that binary tree is traveled through,, tabulate at partial derivative for each node N
Figure FSA000004129765000310
In only keep preceding k element of absolute value maximum, and all the other elements are shifted to an earlier date jettisoning, be kept at last and in the root node be just
Figure FSA00000412976500041
Preceding k element of middle absolute value maximum.
8. as claimed in claim 7 Query Result in the probability database is made an explanation and wipes the method that changes, it is characterized in that, for the modification of Query Result being explained according to the user in the step (7), recomputate Query Result, be to calculate:, only recomputate the node relevant with being corrected tuple by in binary tree, preserving results of intermediate calculations according to following method.
9. one kind makes an explanation to Query Result in the probability database and wipes the system that changes, and it is characterized in that comprise user interface, server, wherein server comprises central processing unit, storage unit, calls matching unit and computing unit; Wherein, user interface is used for for user input query, shows that to the user Query Result etc. sets up the interactive exchange of user and system; Data query after the query statement that described central processing unit sends by user interface the user is handled and will be handled sends to and calls matching unit; The event data of calling in the tabular of data query after matching unit will be handled and the probability database in the storage unit is mated, and obtains all associated event datas; Data in the described probability database are deposited in the mode of table, and each row in the table is counted as the description to a probability event, comprises an event ID and a probable value field; Call all associated event datas that matching unit will obtain and send to computing unit; Computing unit calculates the probable value in the event data of calling matching unit and sending, and obtains Query Result, and Query Result is fed back to user interface is shown to the user; Simultaneously computing unit is that probable value is handled with the event data probable value that each is associated according to the result who calculates, and the probable value of the explanation that provides according to system sorts the sequence table of acquisition correlativity to the influence degree of Query Result; Computing unit stores the correlativity sequence table that obtains, and middle result of calculation is stored; User interface shows Query Result to the user; When user's input checked that former correlativitys that obtain this result are explained, central processing unit accessed corresponding with it former correlativitys explanation and feeds back to user interface according to user's instruction control computing unit; The user makes amendment to the probable value of some explanation and feeds back to central processing unit by user interface, central processing unit feeds back to computing unit with the data of the user's modification of its acquisition, computing unit knows that by comparison the probable value of those event datas is modified, and recomputates Query Result and feed back to user interface.
10. as claimed in claim 9 Query Result in the probability database is made an explanation and wipes the system that changes, it is characterized in that computing unit calculates Query Result and explanation is to carry out according to the mode described in the aforesaid right requirement 3-8.
CN 201110003411 2011-01-10 2011-01-10 System and method for explaining, erasing and modifying search result in probabilistic database Active CN102073718B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201110003411 CN102073718B (en) 2011-01-10 2011-01-10 System and method for explaining, erasing and modifying search result in probabilistic database

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201110003411 CN102073718B (en) 2011-01-10 2011-01-10 System and method for explaining, erasing and modifying search result in probabilistic database

Publications (2)

Publication Number Publication Date
CN102073718A true CN102073718A (en) 2011-05-25
CN102073718B CN102073718B (en) 2013-01-30

Family

ID=44032257

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201110003411 Active CN102073718B (en) 2011-01-10 2011-01-10 System and method for explaining, erasing and modifying search result in probabilistic database

Country Status (1)

Country Link
CN (1) CN102073718B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102880629A (en) * 2012-06-20 2013-01-16 杜小勇 Accelerating query method of probabilistic database
CN103116575A (en) * 2011-11-16 2013-05-22 富士通株式会社 Translated text word order probability determination method and device based on gradation phrase model
CN105117442A (en) * 2015-08-12 2015-12-02 东北大学 Probability based big data query method
CN108121766A (en) * 2017-11-27 2018-06-05 浙江大学 Multipair many types of PSJ Aggregation Queries method based on tuple grade uncertainty models
CN108121765A (en) * 2017-11-27 2018-06-05 浙江大学 One-to-one type PSJ Aggregation Query methods based on PME graph models
CN113626678A (en) * 2020-05-06 2021-11-09 北京大学 Knowledge graph data mining and recommending method based on dynamic suboptimal minimum spanning tree
CN114911851A (en) * 2022-07-12 2022-08-16 北京信安世纪科技股份有限公司 Data query method, device and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1639712A (en) * 2002-03-07 2005-07-13 皇家飞利浦电子股份有限公司 A method and apparatus for providing search results in response to an information search request
CN101853272A (en) * 2010-04-30 2010-10-06 华北电力大学(保定) Search engine technology based on relevance feedback and clustering
CN101876979A (en) * 2009-04-28 2010-11-03 株式会社理光 Query expansion method and equipment
CN101923560A (en) * 2010-04-03 2010-12-22 崔志明 Sorting technique of deep web database only providing simple query interface

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1639712A (en) * 2002-03-07 2005-07-13 皇家飞利浦电子股份有限公司 A method and apparatus for providing search results in response to an information search request
CN101876979A (en) * 2009-04-28 2010-11-03 株式会社理光 Query expansion method and equipment
CN101923560A (en) * 2010-04-03 2010-12-22 崔志明 Sorting technique of deep web database only providing simple query interface
CN101853272A (en) * 2010-04-30 2010-10-06 华北电力大学(保定) Search engine technology based on relevance feedback and clustering

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103116575A (en) * 2011-11-16 2013-05-22 富士通株式会社 Translated text word order probability determination method and device based on gradation phrase model
CN103116575B (en) * 2011-11-16 2016-06-22 富士通株式会社 Translation word order probability defining method and device based on gradation phrase model
CN102880629A (en) * 2012-06-20 2013-01-16 杜小勇 Accelerating query method of probabilistic database
CN102880629B (en) * 2012-06-20 2015-06-03 杜小勇 Accelerating query method of probabilistic database
CN105117442A (en) * 2015-08-12 2015-12-02 东北大学 Probability based big data query method
CN105117442B (en) * 2015-08-12 2018-05-04 东北大学 A kind of big data querying method based on probability
CN108121766A (en) * 2017-11-27 2018-06-05 浙江大学 Multipair many types of PSJ Aggregation Queries method based on tuple grade uncertainty models
CN108121765A (en) * 2017-11-27 2018-06-05 浙江大学 One-to-one type PSJ Aggregation Query methods based on PME graph models
CN108121766B (en) * 2017-11-27 2020-07-17 浙江大学 Many-to-many PSJ aggregation query method based on tuple-level uncertainty model
CN108121765B (en) * 2017-11-27 2020-07-17 浙江大学 PME graph model-based one-to-one PSJ aggregation query method
CN113626678A (en) * 2020-05-06 2021-11-09 北京大学 Knowledge graph data mining and recommending method based on dynamic suboptimal minimum spanning tree
CN114911851A (en) * 2022-07-12 2022-08-16 北京信安世纪科技股份有限公司 Data query method, device and storage medium
CN114911851B (en) * 2022-07-12 2022-10-14 北京信安世纪科技股份有限公司 Data query method, device and storage medium

Also Published As

Publication number Publication date
CN102073718B (en) 2013-01-30

Similar Documents

Publication Publication Date Title
CN102073718B (en) System and method for explaining, erasing and modifying search result in probabilistic database
JP7449919B2 (en) Natural language interface for autonomous agents and databases with thesaurus
Chen et al. Usher: Improving data quality with dynamic forms
US10402453B2 (en) Utilizing large-scale knowledge graphs to support inference at scale and explanation generation
KR20200106950A (en) Dimensional context propagation techniques for optimizing SQL query plans
US20110066605A1 (en) Processes and systems for collaborative manipulation of data
US20180165610A1 (en) Business intelligence language macros
Nikolic et al. Incremental view maintenance with triple lock factorization benefits
US9146960B2 (en) Adaptive optimization of iterative or recursive query execution by database systems
US20110313844A1 (en) Real-time-ready behavioral targeting in a large-scale advertisement system
CN102509549B (en) Language model training method and system
US20190228343A1 (en) Flexible configuration of model training pipelines
EP3036660A1 (en) Database access
CN107193882A (en) Why not query answer methods based on figure matching on RDF data
Raghavan et al. emrkbqa: A clinical knowledge-base question answering dataset
Zhang et al. Mitigating language model hallucination with interactive question-knowledge alignment
AU2011210554A1 (en) Statistical record linkage calibration for geographic proximity matching
Chari Model composition in a distributed environment
US20170255752A1 (en) Continuous adapting system for medical code look up
Padhy et al. A quantitative performance analysis between Mongodb and Oracle NoSQL
Chevalier et al. Document-oriented data warehouses: Complex hierarchies and summarizability
US9104707B1 (en) Iterative generation of partial column schema
Kim et al. M2Bench: a database benchmark for multi-model analytic workloads
Li et al. A personalized recommendation system in E-Learning environment based on semantic analysis
Joshi et al. Sampling-based estimators for subset-based queries

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant