CN113204789B - Origin filtering framework based on filtering primitive - Google Patents

Origin filtering framework based on filtering primitive Download PDF

Info

Publication number
CN113204789B
CN113204789B CN202110570577.6A CN202110570577A CN113204789B CN 113204789 B CN113204789 B CN 113204789B CN 202110570577 A CN202110570577 A CN 202110570577A CN 113204789 B CN113204789 B CN 113204789B
Authority
CN
China
Prior art keywords
filtering
graph
origin
provenance
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110570577.6A
Other languages
Chinese (zh)
Other versions
CN113204789A (en
Inventor
孙连山
陈秀婷
侯涛
张永斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shaanxi University of Science and Technology
Original Assignee
Shaanxi University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shaanxi University of Science and Technology filed Critical Shaanxi University of Science and Technology
Priority to CN202110570577.6A priority Critical patent/CN113204789B/en
Publication of CN113204789A publication Critical patent/CN113204789A/en
Application granted granted Critical
Publication of CN113204789B publication Critical patent/CN113204789B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • G06F21/6254Protecting personal data, e.g. for financial or medical purposes by anonymising data, e.g. decorrelating personal data from the owner's identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/9035Filtering based on additional data, e.g. user or group profiles

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Medical Informatics (AREA)
  • Computational Linguistics (AREA)
  • Complex Calculations (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

An origin filtering framework based on filtering primitives, comprising the steps of; step 1: defining specific primitives as minimal operations for transforming the provenance graph; step 2: formally defining structural constraints and filtering constraints of the provenance graph; step 3: checking the provenance graph to be filtered; step 4: declaring an origin filtering requirement, namely sensitive information to be filtered in an origin graph; the sensitive elements to be filtered in the filtering requirement are represented by nodes in the origin graph, and the direct or indirect dependency relationship to be filtered in the filtering requirement is represented by node pairs in the origin graph; step 5: constructing a filtering strategy space according to the filtering requirement; step 6: updating the provenance graph to obtain a provenance filtering view, quantitatively evaluating according to an existing provenance filtering view evaluation model, and generating a provenance filtering report. The application solves the problems that one filtering mechanism can only filter a certain specific type of sensitive element, can not simultaneously realize multiple filtering requirements of users and has poor universality.

Description

Origin filtering framework based on filtering primitive
Technical Field
The application relates to the technical field of origin filtering for realizing safe sharing of origin data, in particular to an origin filtering framework based on filtering primitives.
Background
The origin of Data (Data Provenance, simply referred to as origin) is also called Data tracing, data lineage, etc., and records the source of Data, the processing procedure undergone by the Data, and the related information such as time, place, tool, method, strategy, organization, personnel, etc., which are metadata describing the history of Data evolution. The origin is an important basis for evaluating the authenticity, credibility and reproducibility of data. To enable the publishing and sharing of origin data across organizations, W3C has published the origin data model criteria OPM and PROV-DM in succession, with the origin information represented by a labeled directed acyclic graph, as shown in FIG. 1. The core structure of the origin graph comprises three kinds of origin elements such as entities, activities, agents and the like and various dependency relations among the three kinds of origin elements.
The provenance data record describes the history of data evolution, possibly implying various sensitive information. Such as business and shipping details, distributor sales identity information, etc., directly publishing and sharing the original origin information can cause sensitive information leakage and thus serious consequences. The original origin also often contains redundant detail information that is not needed by some users. Origin filtering (Provenance Sanitization), also known as origin revision, origin abstraction, reforms the origin graph prior to origin data publication sharing, selectively filters sensitive or redundant information in the origin graph, resulting in a new technique for a safe, usable origin filtering view (sanitized provenance view). The final purpose of origin filtering is to filter out all sensitive origin information in the original origin map while keeping the traceability utility of the filtered view as much as possible under the premise of guaranteeing the origin safety. The tracing utility refers to the degree to which the filtered view meets the tracing needs of the user.
Existing provenance filtering techniques can filter specific nodes and edges in the provenance graph, or even subgraphs, but still suffer from the following disadvantages. Firstly, a filtering mechanism can only filter a certain type of sensitive elements, and cannot simultaneously realize multiple filtering requirements of users, so that the universality is poor. Secondly, the existing filtering mechanism can only acquire a single filtering view, does not allow a user to select a personalized filtering scheme according to the safety risk and the utility requirement in a specific environment, and has poor customization.
Disclosure of Invention
In order to overcome the technical problems, the application aims to provide an origin filtering framework based on filtering primitives, so as to solve the problems of poor universality, poor customizable filtering scheme and the like of the conventional origin filtering technology.
In order to achieve the above purpose, the technical scheme adopted by the application is as follows:
an origin filtering framework based on filtering primitives, comprising the steps of;
step 1: defining specific primitives as minimal operations for transforming the provenance graph;
step 2: formally defining structural constraints and filtering constraints of the provenance graph;
step 3: checking the provenance graph to be filtered to ensure that the provenance graph is a directed acyclic graph conforming to a standardized provenance model and model constraints thereof;
step 4: declaring an origin filtering requirement, namely sensitive information to be filtered in an origin graph; the sensitive elements to be filtered in the filtering requirement are represented by nodes in the origin graph, and the direct or indirect dependency relationship to be filtered in the filtering requirement is represented by node pairs in the origin graph;
step 5: constructing a filtering strategy space according to the filtering requirement;
step 6: updating the provenance graph to obtain the provenance filter view, quantitatively evaluating the utility and safety of the provenance filter view according to the existing provenance filter view evaluation model, and generating the provenance filter report.
The specific operation of the step 1 is as follows:
(1) delEdge (pg, < u, v >): delete edge e= < u, v >;
(2) delVertex (pg, v): deleting the node v;
(3) addEdge (pg, < u, v >): increasing the edge e= < u, v >;
(4) addnullpetex (pg, v): adding an empty node v;
(5) anoyVertex (pg, v): anonymous node v.
The specific operation of the step 2 is as follows:
model constraint 1: isolated nodes are not allowed to appear in the provenance graph;
model constraint 2: an entity cannot be generated by more than two activities;
model constraint 3: when one direct dependency can be inferred from other direct dependencies, the corresponding edge should be omitted from the provenance graph;
model constraint 4: the occurrence of loops is not allowed in the origin graph, and a topological sorting algorithm can be utilized to detect whether loops exist in the origin graph;
filter constraint 1: filtering the view should not introduce spurious dependencies;
filter constraint 2: the filtered sensitive elements should not be restored.
In the step 4, when the filtering requirement is declared, attention should be paid to:
(1) When the sensitive information is a node, an identifier or a part of attributes (attributes) of the node need to be described, and when the node type is active, the starting and ending time (startTime and endTime) of the activity can be pointed out;
(2) When the sensitive information is a dependency, whether the dependency is a direct dependency or an indirect dependency, identifiers or partial attributes of two source elements to which the dependency is attached are required to be indicated.
The construction filtering strategy space specifically comprises the following steps:
step 5.1: selecting proper filtering primitives according to the filtering requirement, delEdge, delVertex, anonyVertex, and constructing a sensitive information filtering primitive set P according to the filtering rules;
step 5.2: proper recovery primitives are selected according to the traceability requirement, addEdge, addNullVertex, and a recovery primitive set Q for recovering the communication paths accidentally interrupted in the origin graph is constructed according to the recovery rules;
step 5.3: constructing a filtering policy space P m ×Q n Checking whether the filtering strategy accords with a given constraint to form a legal filtering strategy space S, wherein m represents that no more than m filtering primitives are adopted, n represents that no more than n recovering primitives are adopted, each filtering strategy in the filtering strategy space is composed of no more than m+n recovering primitives, and the values of m and n are determined by the filtering rules and the recovering rules;
step 5.4: it is checked whether each filtering policy in the filtering policy space S meets the model constraints and the filtering constraints.
The filtering rules of the step 5.1 are shown in the table:
filtering rules
The recovery rule of the step 5.2 is as follows:
recovery rules
The application has the beneficial effects that:
1. the method solves the problems that a filtering mechanism can only filter a certain specific type of sensitive elements, can not simultaneously realize multiple filtering requirements of users and has poor universality, defines filtering primitives, designs a filtering strategy aiming at each sensitive information and assembled by the filtering primitives, further forms a filtering strategy space, divides the filtering strategy space into three stages of filtering, recovering and constraint detecting, and provides an origin filtering framework based on the primitives;
2. the method solves the problem that the existing filtering mechanism can only acquire a single filtering view, a complete filtering strategy space is constructed through a filtering primitive framework, each filtering strategy contains different transformation principles of each sensitive information, the generated filtering view has different effects and safety on the basis of meeting the filtering requirement, and a user can select in the filtering strategy space according to the preference of the user when issuing the filtering view for different scenes, so that the customization is high.
Drawings
FIG. 1 is a core structure diagram of a standard origin model PROV-DM to which the present application refers.
FIG. 2 is a flow chart of the provenance filtering framework of the present application.
FIG. 3 is an example of an original source map before filtering.
FIG. 4 shows the filter requirement R 1 In the case of (2), the first filtered view obtained in fig. 3 is processed using the method of the present application.
FIG. 5 shows the filter requirement R 1 In the case of (2), the second filtered view obtained in fig. 3 is processed using the method of the present application.
FIG. 6 shows the filter requirement R 2 In the case of (2) processing one of the filtered views obtained in fig. 3 using the method of the present application.
Detailed Description
The application is described in further detail below with reference to the accompanying drawings.
The present application assumes that the provenance graph to be filtered has traceable dependency semantics, i.e. if any node v can be traced back to another node u, u is the partial cause of v occurrence.
Examples referring to the standard origin model pro-DM, the relevant concepts involved in the method of the application are defined as follows:
definition 1 provenance graph= (V, E):
node set v= { V 1 ,v 2 ,…,v n In the origin graph, each node v represents an origin element, the origin element is divided into three types of Entity (EN), activity (AC) and Agent (AG), and the three types of origin element respectively represent intermediate form products, experienced operation activities and people or organizations for implementing related operation activities in the data evolution process;
mappingV→ { EN, AC, AG } maps node V to its type, i.e., to node V ε V, its type
Edge set e= { E i =<u,v>U e V; v e V; i=1, 2, …, m }, in the provenance graph, edge e =<u,v>Is a directed arc from v to u, representing a direct dependency of v to u, i.e., v is a direct consequence of u, and u is a direct cause of v.
Definition 2 of a subset of origin elements: entity sets (entities), activity sets (activities), and Agent sets (agents):
the set of origin elements V may be divided into three mutually disjoint subsets according to the type of origin element: entity sets (Entity), active sets (Activity) and Agent sets (Agent) have v=entity.
Entity set:an entity node e describes the state of a certain data object over a certain period of time or point in time, e having a unique identifier (id) and a series of properties;
activityCollection: an active node a represents a process that acts on a physical node over a period of time or a series of operations that cause a state change to occur;
agent set:an agent node ag represents an agent or organization that assumes some responsibility for the occurrence of an activity, the presence of an entity, or the activity of another agent.
The dependency relationships between different types of origin elements have different semantics, and edges in the origin graph can be divided into different subclasses according to the types of the interdependent origin elements.
Definition 3 direct dependency subset: usage, generation, association, attribute, description, communication, assignment:
the edges in the provenance graph represent causal relationships between two adjacent nodes, the edge set E may be divided into seven mutually disjoint subsets, E=usage U.S. Generation U.S. Attribute description U.S. Communication U.S. Delegation, wherein:
usage = { < en, ac > |en e Entity, ac e Activity, indicating active ac use Entity en };
generation = { < ac, en > |ac e Activity, en e Activity, indicating Entity en generated by active ac };
association = { < ag, ac > |ag e Agent, ac e Activity, meaning Agent ag assumes responsibility in active ac };
attribution = { < ag, en > |ag e Agent, en e Entity, indicating that proxy ag assumes responsibility for Entity en };
derivative (derived dependency) = {<en 1 ,en 2 >|en 1 ,en 2 E Entity, representing Entityen 2 By entity en 1 Derived };
communication (Communication dependency) = {<ac 1 ,ac 2 >|ac 1 ,ac 2 E Activity, representing Activity ac 2 Using ac by activity 1 The resulting entity };
delegation = { proxy dependency = {<ag 1 ,ag 2 >|ag 1 ,ag 2 E Agent, representing Agent ag 2 Representative proxy ag 1 Called ag 1 Is ag 2 Upper-level agent of (c).
Definition 4 origin indirect dependency:
indirect dependence P (u, v) = { v i |v 0 =u;v k =v;<v i ,v i+1 >∈E;i=0,1,…,k-1;k>2, which is a sequence of nodes in the provenance graph, representing the indirect dependency of node v on u, i.e., v is the indirect consequence of u, which is the indirect cause of v;
the connected path set PathSet (u, V) = { P (u, V) |u e V, V e V } is a set formed by all connected paths between the node pairs u and V, and represents indirect dependency semantics of the node V on the node u.
In the provenance graph ProvGraph, if en 1 、en 2 、en 3 ∈Entity,ac 1 、ac 2 、ac 3 ∈Activity,ag 1 、ag 2 E Agent, the following theorem represents the transitivity and inferability of the relevant origin dependency:
theorem 1 attributon has a speculative property:
(1) Responsible for active ac 1 Agent ag of (2) 1 Entity en generated for the campaign 1 Responsible for; that is, if<ag 1 ,ac 1 >E and<ac 1 ,en 1 >e, then<ag 1 ,en 1 >Establishment;
(2) Responsible agent ag 2 Is a higher-level agent ag of (1) 1 Entity en responsible for the agent 1 Responsible for; that is, if<ag 1 ,ag 2 >E and<ag 2 ,en 1 >e, then<ag 1 ,en 1 >This is true.
Theorem 2Association relationship has a prophetic property:
responsible agent ag 2 Is a higher-level agent ag of (1) 1 Activity ac responsible for the agent 1 Responsible for; that is, if<ag 1 ,ag 2 >E and<ag 2 ,ac 1 >e, then<ag 1 ,ac 1 >This is true.
Theorem 3Communication relationship has a prophetic property:
active ac 1 Some entity en generated 1 Is an active ac 2 Starting and then generating the necessary conditions of other entities, wherein the necessary conditions indicate that a communication relationship exists between the two entities; that is, if<ac 1 ,en 1 >E and<en 1 ,ac 2 >e, then<ac 1 ,ac 2 >This is true.
Theorem 4Communication relation has transitivity:
active ac 1 With active ac 2 With communication relationship, active ac 2 With active ac 3 With communication relationship between them, active ac 3 With active ac 1 Communication relationship exists between the two devices; that is, if<ac 1 ,ac 2 >E and<ac 2 ,ac 3 >e, then<ac 1 ,ac 3 >This is true.
The theorem 5Delegation relationship has transitivity:
agent ag 2 Is ag 1 ,ag 3 Is ag 2 Then it can be said that ag 1 Is ag 3 Is a higher-level agent of (a); that is, if<ag 1 ,ag 2 >E and<ag 2 ,ag 3 >e, then<ag 1 ,ag 3 >This is true.
Theorem 6 derivative relation has transitivity:
if entity en 1 By entity en 2 Derived from entity en 2 By entity en 3 Derived from, to a certain extent, en 1 By en 3 Derived from; that is, if<en 3 ,en 2 >E and<en 2 ,en 1 >e, then<en 3 ,en 1 >This is true.
To express the filtering operation accurately, the basic operation of origin filtering is defined below.
Definition 5 origin filtering basic operation:
(1) GetPre (v): acquiring a former cause node set of a node v, if u epsilon GetPre (v), then e= < u, v > epsilonE exists, namely u is one of the causes of v occurrence;
(2) GetSub (v): acquiring a result node set of a node v, and if u epsilon GetSub (v) exists, then an edge e= < v, u epsilon E exists, namely u is one of the occurrence results of v;
definition 6 uncertain communication edges:
origin map pg= (V, E, ph), m E V, n E Δ PG (m), m, n are all active nodes, andthen<m,n>To introduce an uncertain communication edge of the filtered view.
Definition 7 of an uncertain derivative edge:
origin map pg= (V, E, ph), m E V, n E Δ PG (m), m, n are all physical nodes, andthen<m,n>To introduce an uncertain descendant edge of the filtered view.
Definition 8 uncertain use edges:
origin map pg= (V, E, ph), m E V, n E Δ PG (m), m being an active node, n being an entity node,then<m,n>To introduce uncertain edges of use of the filtered view.
Definition 9 uncertain production edges:
origin map pg= (V, E, ph), m E V, n E Δ PG (m), m is an entity node, n is an active node,then<m,n>To create edges that can introduce uncertainty in the filtered view.
The application relates to an origin filtering framework based on filtering primitives, which is used for processing a directed acyclic origin graph conforming to an origin standard model and constraints thereof, processing the origin graph according to filtering rules, repairing rules and constraint detection rules in the proposed origin filtering framework according to filtering requirements comprising nodes to be filtered and node pairs, and finally evaluating the utility of a filtered view by adopting an origin filtering view evaluation model to generate an origin filtering report.
Step 1: specific primitives are defined as minimal operations to reformulate the provenance graph as follows:
(1) delEdge (pg, < u, v >): delete edge e= < u, v >;
(2) delVertex (pg, v): deleting the node v;
(3) addEdge (pg, < u, v >): increasing the edge e= < u, v >;
(4) addnullpetex (pg, v): adding an empty node v;
(5) anoyVertex (pg, v): anonymous node v.
Step 2: structural and filtering constraints of the provenance graph are formally defined as follows:
model constraint 1: isolated nodes are not allowed to appear in the provenance graph;
model constraint 2: an entity cannot be generated by more than two activities;
model constraint 3: when one direct dependency can be inferred from other direct dependencies, the corresponding edge should be omitted from the provenance graph;
model constraint 4: no loops are allowed to appear in the provenance graph, and a topological ordering algorithm can be used for detecting whether loops exist in the provenance graph.
Filter constraint 1: filtering the view should not introduce spurious dependencies;
filter constraint 2: the filtered sensitive elements should not be restored.
Step 3: the provenance data to be filtered is checked to ensure it is a directed acyclic graph conforming to the standardized provenance model and its model constraints.
Step 4: declaring an origin filtering requirement, namely sensitive information to be filtered in an origin graph; the sensitive elements to be filtered in the filtering requirement are represented by nodes in the origin graph, and the direct or indirect dependency relationship to be filtered in the filtering requirement is represented by node pairs in the origin graph.
When declaring the filtering requirement, care should be taken: (1) When the sensitive information is a node, an identifier or a part of attributes (attributes) of the node need to be described, and when the node type is active, the starting and ending time (startTime and endTime) of the activity can be pointed out; (2) When the sensitive information is a dependency, whether the dependency is a direct dependency or an indirect dependency, identifiers or partial attributes of two source elements to which the dependency is attached are required to be indicated.
Step 5: according to the filtering requirement, the filtering strategy space is constructed according to the following steps.
The method of the application processes the objects to be filtered in sequence according to the sequence of filtering, recovering and constraint detecting, and it is noted that each stage enters the next stage after the transformation operation of all sensitive information is executed, and the specific flow is shown in figure 2.
Firstly, selecting proper filtering primitives according to filtering requirements, and sequentially taking sensitive information el to be filtered i Judging el i The element types of which are different and correspond to different filtering rules, and the specific rules are shown in table 1. Wherein nv, nw, ns represent the neighbor nodes of nodes v, w, s, respectively. Sensitive information el i All the corresponding filter primitives sp are added to the set TempSP i And el all the sensitive information in the sensitive information set i Corresponding filter operation set TempSP i The elements in the filter primitive are subjected to Cartesian product operation, namely the filter primitive is assembled to obtain a filter primitive set SP 1
TABLE 1 Filter rules
In order not to cause excessive filtering on the origin graph, the distance between the neighbor node of the deletable sensitive element and the sensitive element is limited to be within a specific numerical value r. The distance between a neighboring node of a sensitive element and the sensitive element refers to the number of steps required to reach the node through an edge in the origin graph with the sensitive element as a starting point.
Then, selecting proper recovery primitives for tracing requirements, and sequentially taking the collection SP obtained in the filtering stage 1 Filtering policy SP in (a) 1i And simultaneously, the sensitive elements in the sensitive information set are considered, and recovery operation integrating the sensitive elements and filtering operation is designed. Each filtering strategy SP made up of a set of filtering primitives generated during the filtering phase 1i On the basis of (1) the recovery stage performs a recovery operation for each sensitive element and performs a Cartesian product operation on the executable recovery operation for each sensitive element, and then performs a Cartesian product operation with the set SP 1i Combining filter policies SP that result in a set representation of filter primitives 2ij Finally, a filtering primitive set SP of the whole recovery stage is obtained 2 . Different types of sensitive elements correspond to different recovery rules, and specific recovery rules are shown in table 2. Wherein the node SV is an SV, and the set SV represents a result node set of the node v, namely a deleted node set DV; node PV e PV, set PV representing the set of leading cause nodes of node v-DV, i.e. pv=Δ PG (v) -DV; the node SK epsilon SK, the set SK represents the result node set-DV of the node k; node PK e PK, pk=Δ PG (k) -DV; the node SM epsilon SM, the set SM represents a result node set of the node m, namely a deleted node set DV; node PT e PT, set PT represents the former node set-DV of deleted element t in the sensitive path, i.e. pt=Δ PG (t) -DV. The added edges defined in the table are all uncertain edges constructed according to the endpoint type of the edge.
Table 2 recovery rules
Assume SP 1 Representing the set of filter primitives generated by all sensitive elements during the filter phase. Recovery phase algorithm(Algorithm of Repair) the following:
input: pg, SI, SP 1
And (3) outputting: SP (service provider) 2
And finally, after the filtering and recovering stage, constructing a filtering constraint detection stage, deleting the generated non-compliant filtering strategy, ensuring the grammar validity of the source filtering view, and finally forming a filtering strategy space by the filtering strategy meeting the constraint detection.
The provenance graph in the embodiment conforms to a standardized provenance model PROV-DM, as shown in FIG. 3. Two different filtering requirements are declared:
R 1 : sensitive information set si= { ac 6 ,<ac 4 ,en 5 >,P(en 2 ,en 4 )}
R 2 : sensitive information set si= { en 1 ,<en 1 ,ac 1 >,P(ac 2 ,ac 4 )}
Processing the provenance graph shown in FIG. 3 with the proposed provenance filtering framework to fulfill the requirement R 1 The resulting partial origin filtering policy space is as follows:
sscMap={1.[anoyVertex(pg,ac 6 )+delEdge(pg,<ac 4 ,en 5 >)+
delEdge(pg,<ac 2 ,en 3 >)+addEdge(pg,<en 4 ,en 5 >)+
addEdge(pg,<ac 2 ,ac 4 >)],
2.[S 1 +addEdge(pg,<en 4 ,en 6 >)+addEdge(pg,<ac 1 ,ac 3 >)],
3.[S 1 +addEdge(pg,<ac 3 ,en 5 >)+addEdge(pg,<en 0 ,en 3 >)],
4.[S 1 +addEdge(pg,<ac 3 ,ac 5 >)+addEdge(pg,<ac 1 ,ac 3 >)],
5.[S 1 +addEdge(pg,<ac 3 ,en 6 >)+addEdge(pg,<ac 1 ,en 3 >)],
6.[delVertex(pg,ac 6 )+delVertex(pg,en 5 )+delEdge(pg,<ac 2 ,en 3 >)+addEdge(pg,<en 7 ,en 3 >)+addEdge(pg,<ac 4 ,ac 5 >)+addEdge(pg,<ac 2 ,ac 4 >)],
7.[S 2 +addEdge(pg,<ac 0 ,en 3 >)+addEdge(pg,<ac 4 ,en 6 >)+addEdge(pg,<ac 1 ,ac 3 >)],
8.[delVertex(pg,ac 6 )+delVertex(pg,en 7 )+delVertex(pg,en 5 )+delVertex(pg,ac 5 )+delVertex(pg,ac 3 )+addEdge(pg,<ac 0 ,en 3 >)+addEdge(pg,<ac 4 ,en 6 >)+addEdge(pg,<ac 1 ,en 4 >)],
9.[S 3 +addEdge(pg,<ac 0 ,ac 3 >)+addNullVertex(pg,acn 1 )+addEdge(pg,<ac 4 ,acn 1 >)+addEdge(pg,<acn 1 ,en 6 >)+addEdge(pg,<ac 1 ,ac 4 >)],
10.[delVertex(pg,ac 0 )+delVertex(pg,en 7 )+delVertex(pg,ac 6 )+delVertex(pg,en 5 )+delVertex(pg,ac 5 )+delVertex(pg,en 6 )+delVertex(pg,ac 3 )+addEdge(pg,<en 0 ,en 3 >)+addEdge(pg,<en 0 ,en 1 >)+addNullVertex(pg,acn 1 )+addEdge(pg,<ac 4 ,acn 1 >)+addEdge(pg,<en 0 ,en 4 >)],
11.S 4 +addNullVertex(pg,acn 1 )+addEdge(pg,<en 0 ,acn 1 >)+addEdge(pg,<acn 1 ,en 3 >)+addEdge(pg,<acn 1 ,en 1 >)+addNullVertex(pg,acn 2 )+addEdge(pg,<ac 4 ,acn 2 >)+addEdge(pg,<ac 1 ,en 4 >)]}
as shown in the above filtering policy space, for convenience of description, the same filtering operation is performed with S i Alternatively, the filtering operation as strategy 1 is anoyVertex (pg, ac) 6 )+delEdge(pg,<ac 4 ,en 5 >)+delEdge(pg,<en 2 ,ac 2 >) The filtering operation of policy 2 is the same as it, abbreviated as S 1 Policy 6 is the same as policy 7. The filtering views obtained by executing strategy 1 and strategy 6 are shown in fig. 4 and 5 respectively, to realize the requirement R 2 One of the resulting filter policy spaces is shown in FIG. 6 as an originating filter view.
Step 6: updating the provenance graph to obtain the provenance filter view, quantitatively evaluating the utility and safety of the provenance filter view according to the existing provenance filter view evaluation model, and generating the provenance filter report.

Claims (2)

1. An origin filtering framework based on filtering primitives, comprising the steps of;
step 1: defining specific primitives as minimal operations for transforming the provenance graph;
step 2: formally defining structural constraints and filtering constraints of the provenance graph;
step 3: checking the provenance graph to be filtered to ensure that the provenance graph is a directed acyclic graph conforming to a standardized provenance model and model constraints thereof;
step 4: declaring an origin filtering requirement, namely sensitive information to be filtered in an origin graph; the sensitive elements to be filtered in the filtering requirement are represented by nodes in the origin graph, and the direct or indirect dependency relationship to be filtered in the filtering requirement is represented by node pairs in the origin graph;
step 5: constructing a filtering strategy space according to the filtering requirement;
step 6: updating the provenance graph to obtain a provenance filtering view, quantitatively evaluating the utility and the safety of the provenance filtering view according to the existing provenance filtering view evaluation model, and generating a provenance filtering report;
in the step 4, when the filtering requirement is declared, attention should be paid to:
(1) When the sensitive information is a node, an identifier or a part of attributes (attributes) of the node need to be described, and when the node type is active, start time and end time of the activity are also pointed out;
(2) When the sensitive information is a dependency relationship, whether the dependency relationship is direct or indirect, identifiers or partial attributes of two source elements attached to the dependency relationship are required to be indicated;
the construction filtering strategy space specifically comprises the following steps:
step 5.1: selecting proper filtering primitives according to the filtering requirement, delEdge, delVertex, anonyVertex, and constructing a sensitive information filtering primitive set P according to the filtering rules;
step 5.2: proper recovery primitives are selected according to the traceability requirement, addEdge, addNullVertex, and a recovery primitive set Q for recovering the communication paths accidentally interrupted in the origin graph is constructed according to the recovery rules;
step 5.3: constructing a filtering policy space P m ×Q n Checking whether the filtering strategy accords with a given constraint to form a legal filtering strategy space S, wherein m represents that no more than m filtering primitives are adopted, n represents that no more than n recovering primitives are adopted, each filtering strategy in the filtering strategy space is composed of no more than m+n recovering primitives, and the values of m and n are determined by the filtering rules and the recovering rules;
step 5.4: checking whether each filtering strategy in the filtering strategy space S accords with model constraint and filtering constraint;
the specific operation of the step 2 is as follows:
model constraint 1: isolated nodes are not allowed to appear in the provenance graph;
model constraint 2: an entity cannot be generated by more than two activities;
model constraint 3: when one direct dependency can be inferred from other direct dependencies, the corresponding edge should be omitted from the provenance graph;
model constraint 4: the occurrence of loops is not allowed in the origin graph, and a topological sorting algorithm can be utilized to detect whether loops exist in the origin graph;
filter constraint 1: filtering the view should not introduce spurious dependencies;
filter constraint 2: the filtered sensitive elements should not be restored.
2. The source filtering framework based on filtering primitives as defined in claim 1, wherein step 1 is specifically operative to:
(1) delEdge (pg, < u, v >): delete edge e= < u, v >;
(2) delVertex (pg, v): deleting the node v;
(3) addEdge (pg, < u, v >): increasing the edge e= < u, v >;
(4) addnullpetex (pg, v): adding an empty node v;
(5) anoyVertex (pg, v): anonymous node v.
CN202110570577.6A 2021-05-25 2021-05-25 Origin filtering framework based on filtering primitive Active CN113204789B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110570577.6A CN113204789B (en) 2021-05-25 2021-05-25 Origin filtering framework based on filtering primitive

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110570577.6A CN113204789B (en) 2021-05-25 2021-05-25 Origin filtering framework based on filtering primitive

Publications (2)

Publication Number Publication Date
CN113204789A CN113204789A (en) 2021-08-03
CN113204789B true CN113204789B (en) 2023-08-25

Family

ID=77023201

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110570577.6A Active CN113204789B (en) 2021-05-25 2021-05-25 Origin filtering framework based on filtering primitive

Country Status (1)

Country Link
CN (1) CN113204789B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107016065A (en) * 2017-03-16 2017-08-04 陕西科技大学 It is customizable to rely on semantic effective origin filter method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11601442B2 (en) * 2018-08-17 2023-03-07 The Research Foundation For The State University Of New York System and method associated with expedient detection and reconstruction of cyber events in a compact scenario representation using provenance tags and customizable policy

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107016065A (en) * 2017-03-16 2017-08-04 陕西科技大学 It is customizable to rely on semantic effective origin filter method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
一种高效用数据起源过滤机制;孙连山等;计算机工程;第44卷(第3期);144-150 *

Also Published As

Publication number Publication date
CN113204789A (en) 2021-08-03

Similar Documents

Publication Publication Date Title
CN102239458B (en) Visualizing relationships between data elements
Van Der Aalst Process discovery from event data: Relating models and logs through abstractions
Weijters et al. Rediscovering workflow models from event-based data using little thumb
US8776009B2 (en) Method and system for task modeling of mobile phone applications
WO2015025386A1 (en) Data processing system, data processing method, and data processing device
US8812411B2 (en) Domains for knowledge-based data quality solution
Li et al. Mining context-dependent and interactive business process maps using execution patterns
Hallerbach et al. Guaranteeing soundness of configurable process variants in Provop
US7861215B2 (en) Method, system, and program product for modeling processes
JP5284403B2 (en) Ontology update device, method, and system
CN104967620A (en) Access control method based on attribute-based access control policy
JP5689361B2 (en) Method, program, and system for converting a part of graph data into a data structure that is an image of a homomorphic map
CN101535946A (en) Primenet data management system
CN108845942B (en) Product feature management method, device, system and storage medium
CN107451831A (en) Task method for pushing
US8244644B2 (en) Supply chain multi-dimensional serial containment process
CN109409940A (en) Browse processing method, device, equipment and storage medium based on path
US20140365498A1 (en) Finding A Data Item Of A Plurality Of Data Items Stored In A Digital Data Storage
GB2451240A (en) Spatial data validation system utilising rule tree
CN113204789B (en) Origin filtering framework based on filtering primitive
Whang et al. Disinformation techniques for entity resolution
Shen et al. Software product line engineering for developing self-adaptive systems: Towards the domain requirements
Li Process mining based on object-centric behavioral constraint (ocbc) models
Ruiz et al. Applying a web engineering method to design web services
Lin et al. A traceability approach to constructing feature model from use case models

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant