CN103823885A - Data provenance dependence relation analysis model-based data dependence analysis method - Google Patents

Data provenance dependence relation analysis model-based data dependence analysis method Download PDF

Info

Publication number
CN103823885A
CN103823885A CN201410082707.1A CN201410082707A CN103823885A CN 103823885 A CN103823885 A CN 103823885A CN 201410082707 A CN201410082707 A CN 201410082707A CN 103823885 A CN103823885 A CN 103823885A
Authority
CN
China
Prior art keywords
data
dependence
dependency
origin
graph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410082707.1A
Other languages
Chinese (zh)
Inventor
许国艳
王志坚
杨莉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hohai University HHU
Original Assignee
Hohai University HHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hohai University HHU filed Critical Hohai University HHU
Priority to CN201410082707.1A priority Critical patent/CN103823885A/en
Publication of CN103823885A publication Critical patent/CN103823885A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/288Entity relationship models

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a data dependence relation analysis method. The method comprises the steps of based on an OPM (open provenance model), establishing a data provenance dependence relation analysis model, specifically giving the data provenance dependence relation and the data dependence relation, and specifically analyzing the data dependence relation by using the refining operation and synthesis operation; the method is characterized by comprising the following steps of (1), establishing the data provenance dependence relation analysis model, and specifically giving the data provenance dependence relation, the data dependence relation and related operations in the model; (2) designing a data dependence relation analysis method based on the data provenance dependence relation model; (3), based on a rule, mainly designing the data dependence analysis refining and synthesis algorithms. By adopting the model and the method, the semantic completing and storage space saving demands on the data provenance are fully considered, the data provenance analysis of different abstract levels is realized, the individual tracing of different users on the data provenance are well met and the method has good conversion prospect.

Description

Based on the data dependence analytical approach of data origin dependence analytical model
Technical field
The present invention relates to data management technique field, more specifically, relate to the technical fields such as data origin, workflow, dependence, semanteme, a kind of data dependence analytical approach based on data origin dependence analytical model has been proposed, specifically complete data origin dependence and the definition of data origin dependence analytical model, utilize Refinement operation and synthetic operation to analyze dependence.
Background technology
Data origin (Data Provenance, Data Lineage, Data Pedigree, Data Derivation) is called again data lineage, data blood relationship, data pedigree, data pedigree, Data Source etc.Data origin is the whole historical information to data processing, comprises the source of data and processes all follow-up process (data produce and the whole process that develops along with passage of time) of these data.
Data origin dependence is exactly in fact the semantic information of data origin.In abstraction level, data origin is a kind of dependence, and how data of description product obtains, and what relevant data and the effect of process be, what role is.The research of dependence needs very strong semantic basis to support, so, data origin informational needs formalized description clearly, the semanteme of operation is followed the trail of in definition, strengthens data origin dependency analysis and inferential capability.
At present, one of main model that data origin dependence is analyzed is OPM(The Open Provenance Model), OPM is the data origin model that community drives, and has supported the interoperability of source technology.OPM is based on directed acyclic graph, represents associated process in data product and calculating, and cause and effect dependence between them.
The present invention is based on the data origin semantic information of mark, OPM is carried out to refinement, a kind of data dependence analytical approach based on data origin dependence analytical model is proposed, mainly set up data origin dependence analytical model, utilize intersecting and merging, refinement and synthetic operation, further analyze dependence, and designed set of rule, provide structure, refinement and the composition algorithm of data dependence graph, met the demand of different user for the information inquiry of different abstraction hierarchy data origin.
Summary of the invention
Goal of the invention: the problem of analyzing in order to solve data origin dependence, the object of the invention is to propose a kind of method of analyzing for data dependence, provide data origin dependence and data origin dependence analytical model, based on this model, design a kind of method that data dependence is analyzed, specifically comprise the design of series of rules, and the structure of data dependence graph structure, refinement and composition algorithm, thereby meet the data origin information requirement of different abstraction level, different information categories.
Technical scheme: a kind of data dependence analytical approach based on data origin dependence analytical model, comprises following content:
Data origin dependence concept:
Data origin dependence is the semantic information of data origin in itself, can be divided into data dependence relation, process dependence and control dependence.The present invention, with reference to OPM, provides the definition of data origin dependence and character thereof.
Define 1 data origin dependence and be defined as 5 tuple DP_Dependency=(Data_Set, Process_Set, Data_Data_Dependency, Data_Process_Dependency, Process_Data_Dependency, Process_Process_Dependency), wherein
■ Data_Set is the set of data;
■ Process_Set is the set of process;
■ Data_Data_Dependency:Data_Set → Data_Set, is the mapping relations of data to data, is called data dependence relation;
■ Data_Process_Dependency:Data_Set → Process_Set, is the mapping relations of data to process, is called process to data dependence relation, and process depends on data, and data are inputs of process;
■ Process_Data_Dependency:Process_Set → Data_Set, is the mapping relations of process to data, is called data to process dependence, and data dependence is in process, and data are the output of process;
■ Process_Process_Dependency:Process_Set → Process_Set, is the mapping relations of process to process, is called process dependence;
■ process is referred to as control dependence to data dependence relation and data to process dependence.
Theorem 1 data dependence relation has transitivity
If D1, D2, D3 ∈ Data_Set, meets D2:Data_Data_Dependency(D1), D3:Data_Data_Dependency(D2), D3:Data_Data_Dependency(D1) to set up, data dependence relation meets transitivity.
Theorem 2 is controlled dependence and is had transitivity
If D1, D2 ∈ Data_Set, P1, P2 ∈ Process_Set, meets one of following situations, controls dependence and meets transitivity:
■ meets D1:Data_Process_Dependency(P1), P2:Process_Process_Dependency(P1), D1:Data_Process_Dependency(P2);
■ meets D2:Data_Data_Dependency(D1), D1:Data_Process_Dependency(P1), D2:Data_Process_Dependency(P1);
■ meets D1:Process_Data_Dependency(P2), P2:Process_Process_Dependency(P1), D1:Process_Data_Dependency(P1);
■ meets D2:Data_Data_Dependency(D1), D1:Process_Data_Dependency(P1), D2:Process_Data_Dependency(P1).
Theorem 3 process dependences have transitivity
If P1, P2, P3 ∈ Process_Set, meet P2:Process_Process_Dependency(P1), P3:Process_Process_Dependency(P2), P3:Process_Process_Dependency(P1), claim process dependence to meet transitivity.
Theorem 4 data dependence relations do not have reflexivity and symmetry.
According to the definition of data origin dependence, data dependence relation, process dependence and control dependence can not depend on itself, do not meet reflexivity.Equally, data dependence relation, process dependence and control dependence do not meet symmetry.
If the mode of the data origin dependence employing figure of definition is described, represent data or process, limit represents dependence.What WasGeneratedBy and Used represented is to control dependence, and what WasDerivedFrom represented is data dependence relation, and what WasInformedBy represented is process dependence.
Data origin dependence analytical model:
A kind of data origin dependence analytical model of the present invention's design, in order to meet the information inquiry service of user to data origin, take mark data origin information as basis, provide clearly data of description product for the Dependency Specification of other data, process, the data origin application function based on different application theme, different abstraction hierarchy demands that meets different user, comprises following functions layer:
(1) data origin dependence layer: based on the data origin information of mark, mainly comprise that data dependence, process rely on and control this three classes data origin dependence of dependence;
(2) dependence operation layer: based on data origin dependence layer, data dependence, process are relied on, control these three kinds of dependences of dependence and operate more specifically, be mainly intersecting and merging, refinement and synthetic operation;
(3) dependence view layer: the refinement according to dependence operation layer to dependence or synthetic concrete operations, also refinement or synthetic operation that the focused data that can specify according to user or process are carried out, for user provides the dependence view of different abstraction level, different information categories or the dependence view based on user's focus.
Data dependence analytical approach based on data origin dependence analytical model:
The method that data dependence based on data origin dependence analytical model is analyzed, for basic procedure and dependence, has designed set of rule, rule-based, has designed a kind of method that data dependence is analyzed, and the method mainly comprises following three class methods:
(1) structure of data dependence graph: based on a series of formation rules of design, designed the construction algorithm for data dependence graph;
(2) refinement of data dependence graph: based on basic procedure and dependence, designed a series of refinement rules, and with this rule, set up the thinning algorithm of data dependence graph; Data dependence graph thinning algorithm, adopts every two thinkings that node compares, and supposes to have the data dependence graph Dep_Graph of n node, the dependence of two nodes is carried out to refinement; When every two nodes of Dep_Graph have in the situation on limit, be need to calculate maximum time; And in this case, the complex situations of algorithm be (n-1)+(n-2)+... + 2+1=n (n-1)/2, so the complexity of algorithm is O(n 2);
(3) data dependence graph is synthetic: based on basic procedure and dependence, constructed a set of composition rule, and on this basis, designed the composition algorithm of data dependence graph.Whether data dependence graph composition algorithm, adopts BFS (Breadth First Search) to search the neighborhood of nodes of each node, be bipartite graph judgement, according to three kinds of situations of judgement rely on completely respectively synthetic, partial dependency synthetic and synthetic processing.The complexity of algorithm is made up of BFS (Breadth First Search), bipartite graph judgement and synthetic three parts of processing.If obtain the synthetic of optimum data dependence in a figure, obtain maximum bipartite graph, complexity is NP-hard so.If adopt just a node searched for to neighborhood of nodes, then judge the mode of processing, the complexity of algorithm is polynomial expression rank.
Beneficial effect: experiment shows, the data dependence analytical approach based on data origin dependence analytical model that the present invention proposes, according to data dependence relation analysis, utilize refinement or synthetic operation, can check the different views that data origin is followed the trail of, meet the demand of user's Different Origin information, there is more complete semanteme, lower storage and lower algorithm complex, can meet the demand of different user for the information inquiry of different abstraction hierarchy data origin.
Accompanying drawing explanation
Fig. 1 is data origin dependence analytical model;
Fig. 2 is for controlling dependency graph example;
Fig. 3 is the data dependence graph obtaining according to Fig. 2;
Fig. 4 expends comparison for storing.
Embodiment
Below in conjunction with specific embodiment, further illustrate the present invention, should understand these embodiment is only not used in and limits the scope of the invention for the present invention is described, after having read the present invention, those skilled in the art all fall within the application's claims limited range to the modification of the various equivalent form of values of the present invention.
Data origin dependence analytical model
Fig. 1 is the data origin dependence analytical model that the present invention proposes, and the concept relevant to the data dependence in this model is defined as follows elaboration.
Define 2 data dependences for a given source data Source_Data, Existence dependency sequence s=﹤ Source_Data, D 1, D 2... Dn, Sink_Data ﹥, meets:
■Source_Data,D 1,D 2,…Dn,Sink_Data∈Data_Set;
■ has e 0, e 1..., e n∈ Data_Data_Dependency, e i=Data_Data_Dependency (e i-1), 0≤i≤n.
The data dependence that s is Source_Data, Source_Data is developed by Sink_Data.
Define 3 complete datas and rely on for given data Source_Data and Sink_Data, Source_Data data dependence is in Sink_Data, Source_Data={I 1..., I n, Sink_Data={O 1..., O m, if O 1..., O m→ I i, i≤n, I 1..., I no places one's entire reliance upon 1..., O m, Source_Data complete data depends on Sink_Data, is denoted as
Figure BDA0000474308420000051
Define 4 partial datas and rely on for given data Source_Data and Sink_Data, Source_Data data dependence is in Sink_Data, Source_Data={I 1..., I n, Sink_Data={O 1..., O m, if O i→ I j, i≤m, j≤n, an item number of Source_Data, according to depending on of Sink_Data, claims Source_Data partial data to depend on Sink_Data, is denoted as
Figure BDA0000474308420000052
Definition 5 operations DP_Dep1, DP_Dep2 are two data origin dependence graphs, and the union of definition DP_Dep1 and DP_Dep2, is designated as DP_Dep1 ∪ DP_Dep2, specific as follows:
■Data_Set DP_Dep1∪DP_Dep2=Data_Set DP_Dep1∪Data_Set DP_Dep2
■Process_Set DP_Dep1∪DP_Dep2=Process_Set DP_Dep1∪Process_Set DP_Dep2
■Data_Data_Dependency DP_Dep1DP_Dep2=Data_Data_Dependency DP_Dep1∪Data_Data_Dependency DP_Dep2
■Data_Process_Dependency DP_Dep1∪DP_Dep2=Data_Process_Dependency DP_Dep1∪Data_Process_Dependency DP_Dep2
■Process_Data_Dependency DP_Dep1∪DP_Dep2=Process_Data_Dependency DP_Dep1∪Process_Data_Dependency DP_Dep2
■Process_Process_Dependency DP_Dep1∪DP_Dep2=Process_Process_Dependency DP_Dep1∪Process_Process_Dependency DP_Dep2
It is two data origin dependence graphs that DP_Dep1, DP_Dep2 are handed in definition 6 operations, and the common factor of definition DP_Dep1 and DP_Dep2, is designated as DP_Dep1 ∩ DP_Dep2, specific as follows:
■Data_Set DP_Dep1∩DP_Dep2=Data_Set DP_Dep1∩Data_Set DP_Dep2
■Process_Set DP_Dep1∩DP_Dep2=Process_Set DP_Dep1∩Process_Set DP_Dep2
■Data_Data_Dependency DP_Dep1∩DP_Dep2=Data_Data_Dependency DP_Dep1∩Data_Data_Dependency DP_Dep2
■Data_Process_Dependency DP_Dep1∩DP_Dep2=Data_Process_Dependency DP_Dep1∩Data_Process_Dependency DP_Dep2
■Process_Data_Dependency DP_Dep1∩DP_Dep2=Process_Data_Dependency DP_Dep1∩Process_Data_Dependency DP_Dep2
■Process_Process_Dependency DP_Dep1∩DP_Dep2=Process_Process_Dependency DP_Dep1∩Process_Process_Dependency DP_Dep2
Definition 7 operation refinement DP_Dep1, DP_Dep2 are two data origin dependence graphs, and the refinement DP_Dep2 of definition DP_Dep1, is designated as
Figure BDA0000474308420000061
specific as follows:
Data _ Set DP _ Dep 1 ⋐ Data _ Set DP _ Dep 2
Process _ Set DP _ Dep 1 ⋐ Process _ Set DP _ Dep 2
Data _ Data _ Dependency DP _ Dep 1 ⋐ Data _ Data _ Dependency DP _ Dep 2
Data _ Process _ Dependency DP _ Dep 1 ⋐ Data _ Process _ Dependency DP _ Dep 2
Process _ Data _ Dependency DP _ Dep 1 ⋐ Process _ Data _ Dependency DP _ Dep 2
Process _ Process _ Dependency DP _ Dep 1 ⋐ Process _ Process _ Dependency DP _ Dep 2
Define the refinement of 8 data dependence graphs for given data Source_Data and Sink_Data, Source_Data={I 1..., I n, Sink_Data={O 1..., O m, if Source_Data and Sink_Data meet complete dependence dep1, Source_Data and Sink_Data meet partial dependency and are related to dep2, dep2 is the refinement of dep1 so, is denoted as
Figure BDA00004743084200000714
Definition 9 operations synthetic DP_Dep1, DP_Dep2 are two data origin dependence graphs, and the synthetic DP_Dep2 of definition DP_Dep1, is designated as
Figure BDA00004743084200000715
specific as follows:
Data _ Set DP _ Dep 1 ⋐ Data _ Set DP _ Dep 2
Process _ Set DP _ Dep 1 ⋐ Process _ Set DP _ Dep 2
Data _ Data _ Dependency DP _ Dep 1 ⋐ Data _ Data _ Dependency DP _ Dep 2
Data _ Process _ Dependency DP _ Dep 1 ⋐ Data _ Process _ Dependency DP _ Dep 2
Process _ Data _ Dependency DP _ Dep 1 ⋐ Process _ Data _ Dependency DP _ Dep 2
Process _ Process _ Dependency DP _ Dep 1 ⋐ Process _ Process _ Dependency DP _ Dep 2
Define the synthetic data-oriented dependency graph DGraph=(Node_Set of 10 data dependences, Edge_Set, Role_Set), synthetic and the partial data relying on by complete data relies on the synthetic new dependence New_DG obtaining, claim that New_DG is a data dependence synthetic of DGraph, is denoted as
Figure BDA00004743084200000716
Define the synthetic data-oriented dependency graph DGraph=(Node_Set, Edge_Set, Role_Set) that 11 complete datas rely on, if
Figure BDA0000474308420000079
child_DP=(N, E, R),
Figure BDA00004743084200000710
Figure BDA00004743084200000711
Figure BDA00004743084200000712
meet N=Ns ∪ Nf, Ns={Ns 1, Ns 2..., Ns i, be limit starting point set, Nf={Nf 1, Nf 2..., Nf j, be limit destination set, figure (N, E) is complete two points of digraphs.If being complete data, R relies on, in Ns, node is merged into a node s so, in Nf, node is merged into a node f, and the limit in set E is merged into a limit e=< s, f >, figure the Dgraph-Child_DP+ ({ s that note generates, f}, e, role) be New_DG, New_DG be complete data of DGraph rely on synthetic, wherein role is that complete data relies on.
Define the synthetic data-oriented dependency graph DGraph=(Node_Set, Edge_Set, Role_Set) that 12 partial datas rely on, if
Figure BDA0000474308420000081
child_DP=(N, E, R),
Figure BDA0000474308420000082
Figure BDA0000474308420000083
Figure BDA0000474308420000084
Figure BDA0000474308420000085
meet N=Ns ∪ Nf, Ns={Ns 1, Ns 2..., Ns i, be limit starting point set, Nf={Nf 1, Nf 2..., Nf j, be limit destination set, figure (N, E) is complete two points of digraphs.If being partial data, R relies on, in Ns, node is merged into a node s so, in Nf, node is merged into a node f, and the limit in set E is merged into a limit e=< s, f >, figure the Dgraph-Child_DP+ ({ s that note generates, f}, e, role) be New_DG, New_DG be partial data of DGraph rely on synthetic, wherein role is that partial data relies on.
Define 13 parameters in the implementation of a process Process, if input Parameter is the parameter that process is carried out, do not produce the output data of direct correspondence; Or the data of process execution, produce corresponding output data, but output data are intermediate product.What Parameter was no longer general is called input data, but is called parameter.
Based on the data dependence analytical approach of data origin dependence analytical model
1, the structure of data dependence graph
(1) rule
Rule 1-1 sequential organization: establishing data 1 is to review the data product of origin, if data 2 are input data of process 1, data 3 are input data of process 2, and data 1 data dependence is in data 3 so.
Rule 1-2 parallel organization: establishing data 1 is to review the data product of origin, if data 2 are input data of process 1 and process 2, data 1 are the output data of process 1 and process 2, and data 1 data dependence is in data 2 so.
Rule 1-3 branched structure: establishing data 1 is to review the data product of origin, if data 1 are the output data of process 1 or process 2, data 2 are input data of process 1 or process 2, and data 1 data dependence is in data 2 so.
Rule 1-4 loop structure: establishing data 1 is to review the data product of origin, if data 1 are the result of process 1 after repeating, data 2 are input data of process 1, and data 1 data dependence is in data 2 so.
(2) algorithm
Input: (Node_Set, Edge_Set, Role_Set), wherein
Node_Set=Data_Set∪Process_Set;
Edge_Set=Data_Data_Dependency∪Data_Process_Dependency∪
Process_Data_Dependency∪Process_Process_Dependency;
Role_Set is that role gathers;
Data _ Data _ Dependency &SubsetEqual; Data _ Set &times; Data _ Set ;
Data _ Process _ Dependency &SubsetEqual; Data _ Set &times; Role _ Set &times; Process _ Set ;
Process _ Data _ Dependency &SubsetEqual; Process _ Set &times; Role _ Set &times; Data _ Set ;
Process _ Process _ Dependency &SubsetEqual; Process _ Set &times; Process _ Set .
Output: data dependence graph
Algorithm:
// initialization
1. initialization data origin figure, comprises Data_Set, Process_Set, Data_Data_Dependency, Process_Data_Dependency, Data_Process_Dependency, Process_Process_Dependency and Role_Set;
2. initialization will be reviewed the data Source_Data of origin;
3. initialization data dependency graph
Figure BDA0000474308420000095
, comprise DDEP_Data_Set, DDEP_Data_Data_Dependency;
4. initialization source data variable data=Source_Data;
// review
5.While (data existence)
6.do{
Obtain producing data workflow workflow;
Switch (type of workflow)
Case Sequence: process according to regular 1-1; Break;
Case AND_Split-join: process according to regular 1-2; Break;
Case OR_Split-join: process according to regular 1-3; Break;
Case Iteration: process according to regular 1-4; Break;
The value of data adds DDEP_Data_Set;
Obtain data dependence and add DDEP_Data_Data_Dependency;
Obtaining data is newly worth;
}
7. obtain DDEP=(DDEP_Data_Set, DDEP_Data_Data_Dependency).
2, the refinement of data dependence graph
(1) rule
Rule 2-1 input data and parameter rule: for given data Source_Data and Sink_Data, Source_Data={I 1..., I n, Sink_Data={O 1..., O m,
Figure BDA0000474308420000102
if there is an O ibe parameter, i≤m, will
Figure BDA0000474308420000103
replace with at parameter O icondition under, { I 1 , . . . , I n } &RightArrow; f { O 1 , . . . , O i - 1 , O i + 1 , . . . , O m } . If O ibe all input data, i≤m, does not carry out any processing.
Rule 2-2 complete data dependent Rule: for given data Source_Data and Sink_Data, Source_Data={I 1..., I n, Sink_Data={O 1..., O m,
Figure BDA0000474308420000105
if meet O i→ I j, i≤m, j≤n, will replace with
Figure BDA0000474308420000107
Figure BDA0000474308420000108
if do not meet O i→ I j, i≤m, j≤n, does not carry out any replacement.
Rule 2-3 partial data dependent Rule: for given data Source_Data and Sink_Data, Source_Data={I 1..., I n, Sink_Data={O 1..., O m,
Figure BDA0000474308420000109
do not carry out any replacement.
(2) algorithm
Input: Dependency_Graph=(Node_Set, Edge_Set, Role_Set), wherein
■ Node_Set=Data_Set;
■ Edge_Set=Data_Data_Dependency;
■ Role_Set={ complete data relies on, and partial data relies on };
Data _ Data _ Dependency &SubsetEqual; Data _ Set &times; Data _ Set .
Output: the data dependence graph of refinement
Algorithm:
// initialization
1. initialization data dependency graph Data_Data_Dependency_Graph, comprises Data_Set, Data_Data_Dependency and Role_Set; Initialization will be reviewed the data Source_Data of origin;
2. initialization data variable Source_Data and Sink_Data, Source_Data={I 1..., I n, Sink_Data={O 1..., O m, i≤m, j≤n;
3. initialization Dep=Data_Data_Dependency-{Source_Data → Sink_Data};
4. initializing variable role;
5. initializing variable i=1;
// refinement
6.
Figure BDA0000474308420000111
7.switch (type of role)
Case complete data relies on: process according to regular 2-2; Break;
Case partial data relies on: process according to regular 2-3; Break;
}
Dep=Dep-{Source_Data→Sink_Data};
The deposit data of the next dependence in dependency graph of fetching data is in variable Source_Data and Sink_Data;
Initializing variable role;
Initializing variable i=1;
While (Dep is not empty)
8. obtain refining data dependency graph.
3, data dependence graph is synthetic
(1) rule
Rule 3-1 complete data relies on composition rule: data acquisition { I 1..., I nand { O 1..., O m, meet completely and rely on
Figure BDA0000474308420000112
i≤m, j≤n.If note Source_Data={I 1..., I n, Sink_Data={O 1..., O m, former dependence replaces to
Figure BDA0000474308420000113
Rule 3-2 partial data relies on composition rule: for data acquisition { I 1..., I nand { O 1..., O m, partial dependency meets
Figure BDA0000474308420000121
i≤m, j≤n.If note Source_Data={I 1..., I n, Sink_Data={O 1..., O m, former dependence replaces to
(2) algorithm
Input: Dependency_Graph=(Node_Set, Edge_Set, Role_Set), wherein
■ Node_Set=Data_Set;
■ Edge_Set=Data_Data_Dependency;
■ Role_Set={ complete data relies on, and partial data relies on };
Data _ Data _ Dependency &SubsetEqual; Data _ Set &times; Data _ Set .
Output: synthetic data dependence graph
Algorithm:
// initialization
1. initialization data dependency graph Data_Data_Dependency_Graph, comprises Data_Set, Data_Data_Dependency and Role_Set;
2. initialization data variable Source_Data;
3. initialization Dep=Data_Data_Dependency;
4. initializing variable role;
// synthetic
5.do{
Fetch data and have the node of complete dependence in dependency graph with Source_Data, process according to regular 3-1;
Fetch data and have the node of partial dependency relation in dependency graph with Source_Data, process according to regular 3-2;
Get next node and deposit Source_Data in;
While (data dependence graph has not traveled through)
6. obtain generated data dependency graph.
4, Algorithm Analysis
The present invention carries out refinement to OPM from realizing layer, according to data dependence relation analysis, utilizes refinement or synthetic operation, can check the different views that data origin is followed the trail of, and meets the demand of user's Different Origin information.Method in this paper has the advantage of following three aspects:
(1) Semantic: herein based on dependence, carry out the mark of data origin based on process, the result of mark is the control dependence of data origin, set about from controlling dependence, design set of rule, can obtain data dependence graph, so its Semantic is complete.Example control dependency graph as shown in Figure 2, by dependency rule defined herein, can obtain data dependence graph as shown in Figure 3;
(2) storage expends: owing to carrying out the mark (result of mark is and controls dependence) of data origin herein based on process, start with from controlling to rely on, carry out dynamically the structure of data dependence graph, so its storage space has reduced relatively.Example control dependency graph as shown in Figure 2, for the dependence analysis that originates from, its storage expends for S=Data_Store+Process_Store+Data_Process_Store=8+6+14=28, and because OPM is the source model that rises of abstraction level, its storage expends the ss_Store=8+6+14+8+6=42 for S=Data_Store+Process_Store+Data_Process_Store+Data_Data_ Store+Process_Proce, to expend number percent be 28/42=2/3 in the storage of this paper method, specifically as shown in Figure 4;
(3) algorithm complex: no matter be thinning algorithm or composition algorithm, its algorithm complex is all polynomial expression rank, is specially:
■ thinning algorithm adopts every two thinkings that node compares, and supposes to have the data dependence graph Dep_Graph of n node, the dependence of two nodes is carried out to refinement.When every two nodes of Dep_Graph have in the situation on limit, be need to calculate maximum time.And in this case, the complex situations of algorithm be (n-1)+(n-2)+... + 2+1=n (n-1)/2, so the complexity of algorithm is O(n 2).
Whether composition algorithm adopts BFS (Breadth First Search) to search the neighborhood of nodes of each node, be bipartite graph judgement, according to three kinds of situations of judgement rely on completely respectively synthetic, partial dependency synthetic and synthetic processing.The complexity of algorithm is made up of BFS (Breadth First Search), bipartite graph judgement and synthetic three parts of processing.If obtain the synthetic of optimum data dependence in a figure, obtain maximum bipartite graph, complexity is NP-hard so.If adopt just a node searched for to neighborhood of nodes, then judge the mode of processing, the complexity of algorithm is polynomial expression rank.

Claims (3)

1. the data dependence analytical approach based on data origin dependence analytical model, is characterized in that, comprises the following steps:
1) set up data origin dependence analytical model;
2) design data dependency analysis method;
The data origin dependence analytical model that described step 1) is set up, take data origin information as basis, provide clearly data of description product for other data, the Dependency Specification of process, meet data origin and follow the trail of profound analysis demand, meet different user based on different application theme, the data origin application function of different abstraction hierarchy demands, comprise data origin inquiry and reasoning, data credibility, the discriminatory analysis of data security and the quality of data, data integration and data origin are visual, this model is by data origin dependence, three level compositions of dependence operation and dependence view, further comprise following steps:
11), at data origin dependence layer, be specifically divided into data dependence, process dependence and control dependence three types;
12) at dependence operation layer, for 11) dependence of described three types calculates by intersecting and merging, refinement and synthetic operation;
13) at dependence view layer, according to 12) calculating, obtain the dependence view based on different abstraction level or different information categories; Also the focus that can specify according to user carries out refinement and synthetic operation, obtains the dependence view based on user's focus;
Described step 2) data dependence analytical approach, be based on data origin dependence analytical model, according to the data origin information of procedure-oriented mark, the structure that has comprised data dependence graph, refinement and synthetic these three class methods:
21) structure of data dependence graph: from the formation angle of basic procedure, designed the structure of corresponding rule for data dependence graph, and provided the specific algorithm of data dependence graph structure;
22) refinement of data dependence graph: from formation and complete, the partial dependency angle of basic procedure, designed the refinement of corresponding rule for data dependence graph, and provided the specific algorithm of data dependence graph refinement;
23) data dependence graph is synthetic: from the formation of basic procedure and completely, partial dependency angle, designed synthetic for data dependence graph of corresponding rule, and provided the synthetic specific algorithm of data dependence graph.
2. the data dependence analytical approach based on data origin dependence analytical model according to claim 1, it is characterized in that, step 22) designed data dependence graph thinning algorithm, adopt every two thinkings that node compares, the data dependence graph Dep_Graph that supposes to have n node, carries out refinement to the dependence of two nodes.
3. the data dependence analytical approach based on data origin dependence analytical model according to claim 1, it is characterized in that, step 23) designed data dependence graph composition algorithm, adopt BFS (Breadth First Search) to search the neighborhood of nodes of each node, whether bipartite graph judgement, according to three kinds of situations of judgement rely on completely respectively synthetic, partial dependency synthetic and synthetic processing.
CN201410082707.1A 2014-03-07 2014-03-07 Data provenance dependence relation analysis model-based data dependence analysis method Pending CN103823885A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410082707.1A CN103823885A (en) 2014-03-07 2014-03-07 Data provenance dependence relation analysis model-based data dependence analysis method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410082707.1A CN103823885A (en) 2014-03-07 2014-03-07 Data provenance dependence relation analysis model-based data dependence analysis method

Publications (1)

Publication Number Publication Date
CN103823885A true CN103823885A (en) 2014-05-28

Family

ID=50758949

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410082707.1A Pending CN103823885A (en) 2014-03-07 2014-03-07 Data provenance dependence relation analysis model-based data dependence analysis method

Country Status (1)

Country Link
CN (1) CN103823885A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104361073A (en) * 2014-11-12 2015-02-18 河海大学 User view-oriented process dependency relationship analysis method
CN104503969A (en) * 2014-10-28 2015-04-08 河海大学 Method for calculating Web composite service credibility on the basis of How origin
CN105912595A (en) * 2016-04-01 2016-08-31 华南理工大学 Data origin collection method of relational databases
CN106610999A (en) * 2015-10-26 2017-05-03 北大方正集团有限公司 Query processing method and device
CN106713313A (en) * 2016-12-22 2017-05-24 河海大学 Access control method based on origin graph abstractness
CN106909696A (en) * 2017-03-27 2017-06-30 浙江工业大学 A kind of colleges and universities' data assembled view automatic generation method based on data, services dependency graph
CN107016065A (en) * 2017-03-16 2017-08-04 陕西科技大学 It is customizable to rely on semantic effective origin filter method
CN107239483A (en) * 2017-04-14 2017-10-10 浙江工业大学 A kind of cross-domain elevator data assembled view automatic generation method based on data, services
CN110348817A (en) * 2019-07-17 2019-10-18 桂林电子科技大学 A kind of semanteme workflow parallelization reconstructing method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102117302A (en) * 2009-12-31 2011-07-06 南京理工大学 Data origin tracking method on sensor data stream complex query results
CN103440553A (en) * 2013-08-28 2013-12-11 复旦大学 Workflow matching and finding system, based on provenance, facing proteomic data analysis
CN103473400A (en) * 2013-08-27 2013-12-25 北京航空航天大学 Software FMEA (failure mode and effects analysis) method based on level dependency modeling

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102117302A (en) * 2009-12-31 2011-07-06 南京理工大学 Data origin tracking method on sensor data stream complex query results
CN103473400A (en) * 2013-08-27 2013-12-25 北京航空航天大学 Software FMEA (failure mode and effects analysis) method based on level dependency modeling
CN103440553A (en) * 2013-08-28 2013-12-11 复旦大学 Workflow matching and finding system, based on provenance, facing proteomic data analysis

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
倪静等: "关联数据环境下数据溯源描述语言的比较", 《数字图书馆》 *
刘通: "基于OPM的安全起源研究", 《中国优秀硕士学位论文全文数据库》 *
明华等: "数据溯源技术综述", 《小型微型计算机系统》 *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104503969B (en) * 2014-10-28 2018-02-06 河海大学 A kind of Web Composite service confidence level computational methods based on How origins
CN104503969A (en) * 2014-10-28 2015-04-08 河海大学 Method for calculating Web composite service credibility on the basis of How origin
CN104361073A (en) * 2014-11-12 2015-02-18 河海大学 User view-oriented process dependency relationship analysis method
CN106610999A (en) * 2015-10-26 2017-05-03 北大方正集团有限公司 Query processing method and device
CN105912595A (en) * 2016-04-01 2016-08-31 华南理工大学 Data origin collection method of relational databases
CN106713313A (en) * 2016-12-22 2017-05-24 河海大学 Access control method based on origin graph abstractness
CN106713313B (en) * 2016-12-22 2020-05-05 河海大学 Access control method based on origin graph abstraction
CN107016065A (en) * 2017-03-16 2017-08-04 陕西科技大学 It is customizable to rely on semantic effective origin filter method
CN106909696B (en) * 2017-03-27 2020-01-14 浙江工业大学 College data combined view automatic generation method based on data service dependency graph
CN106909696A (en) * 2017-03-27 2017-06-30 浙江工业大学 A kind of colleges and universities' data assembled view automatic generation method based on data, services dependency graph
CN107239483A (en) * 2017-04-14 2017-10-10 浙江工业大学 A kind of cross-domain elevator data assembled view automatic generation method based on data, services
CN107239483B (en) * 2017-04-14 2020-06-09 浙江工业大学 Cross-domain elevator data combined view automatic generation method based on data service
CN110348817A (en) * 2019-07-17 2019-10-18 桂林电子科技大学 A kind of semanteme workflow parallelization reconstructing method
CN110348817B (en) * 2019-07-17 2021-06-18 桂林电子科技大学 Semantic workflow parallelization reconstruction method

Similar Documents

Publication Publication Date Title
CN103823885A (en) Data provenance dependence relation analysis model-based data dependence analysis method
Hoey et al. SPUDD: Stochastic planning using decision diagrams
CN109447261A (en) A method of the network representation study based on multistage neighbouring similarity
Gong et al. Convexity of n-dimensional fuzzy number-valued functions and its applications
CN109101530B (en) High-utility event sequence pattern mining method
CN103577899A (en) Service composition method based on reliability prediction combined with QoS
Moaven et al. A decision support system for software architecture-style selection
Tarassov et al. Granular meta-ontology and extended allen’s logic: some theoretical background and application to intelligent product lifecycle management systems valery
Horsch et al. Flexible policy construction by information refinement
CN105119961B9 (en) Automatic semantic Web service combination method based on ontology
CN104050082B (en) Test data automatic generation method oriented toward modified condition/decision coverage
Martins et al. Deriving processes of information mining based on semantic nets and frames
Goldszmidt Fast belief update using order-of-magnitude probabilities
Inoue et al. Oscillating behavior of logic programs
Li et al. The Interval Parameter Optimization Model Based on Three-Way Decision Space and Its Application on “Green Products Recommendation”
Maskin et al. On the fundamental theorems of general equilibrium
CN108765190B (en) River network data expression method oriented to large-scale parallel and suitable for river network multilevel nesting
Ellakwa et al. Integrated ontology for agricultural domain
CN105022798A (en) Categorical data mining method of discrete Bayesian network on the basis of prediction relationship
Farsi et al. Which product would be chosen? A fuzzy VIKOR method for evaluation and selection of products in terms of customers' point of view; Case study: Iranian cell phone market
Castro Spatial branch and bound algorithm for the global optimization of MIQCPs
Holzhauser et al. Convex generalized flows
Quach et al. Dealing with fuzzy ontology integration problem by using constraint satisfaction problem
Meert et al. Learning ground CP-logic theories by means of bayesian network techniques
Fan et al. Rough set-based concept mining from social networks

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20140528