AU2020101787A4 - Data security for distributed streaming data collection system using provenance model - Google Patents
Data security for distributed streaming data collection system using provenance model Download PDFInfo
- Publication number
- AU2020101787A4 AU2020101787A4 AU2020101787A AU2020101787A AU2020101787A4 AU 2020101787 A4 AU2020101787 A4 AU 2020101787A4 AU 2020101787 A AU2020101787 A AU 2020101787A AU 2020101787 A AU2020101787 A AU 2020101787A AU 2020101787 A4 AU2020101787 A4 AU 2020101787A4
- Authority
- AU
- Australia
- Prior art keywords
- data
- activity
- model
- provenance
- security
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/285—Clustering or classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/23—Updating
- G06F16/2358—Change logging, detection, and notification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/254—Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/258—Data format conversion from or to a database
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Computer Hardware Design (AREA)
- Computer Security & Cryptography (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Bioethics (AREA)
- Quality & Reliability (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
DATA SECURITY FOR DISTRIBUTED STREAMING DATA
COLLECTION SYSTEM USING PROVENANCE MODEL
ABSTRACT
In recent technology, there was lot of algorithm involved in modeling of datasets collected from
different sources in varying timestamp. In the dataflow, if there exists a point of interest, there is
a need to reproduce the data again or any data at the point of interest. This involves the need of
data security for streaming data collection using provenance model. This can be performed when
there is a record of the delineation of the origin of the information, the relationship and process it
arrived the present state. The input digital object is represented with an entity to track the flow in
any software approach. The database or the file format or any other information format can be
retrieved by query in MySQL, Oracle etc. Then the activity to be performed in the entity like job,
tasks and data activity has to be allotted and it has to be completed. The agent operates on the
information are data generator, user and submitter. Then the relationship of entity, activity and
agent as influencers handle usage, production, invalidation, interlink, assignment and
responsibility. With these record update, big data provenance model can be carried out for data
security.
11 P a g e
DATA SECURITY FOR DISTRIBUTED STREAMING DATA
COLLECTION SYSTEM USING PROVENANCE MODEL
Drawings
PROV%-EN.ANCE MODEL
FOR DATA SECUTY
TIBUTED STREAMINGT
CION SYSTEM
NORMIATE ) AIONSHDI
mmTY CIMTV AGENTENTIYAS
TASXS ,JOB/'TASK DATA AC TIVITY AS
........ ......... AC...-TIVI TY GENE ATOR - F LU NCR
JOBS
DATA -aUSER
AGENTAS
DATA +ACTIVITY
ACTIVITY
NETWORK SUBMITTER
OBJECT
Fig. 1 Block diagram of data security for streaming data collection using provenance
model
1 P a g e
Description
Drawings
mmTY CIMTV AGENTENTIYAS
DATA -aUSER AGENTAS DATA +ACTIVITY
Fig. 1 Block diagram of data security for streaming data collection using provenance model
1 Pa g e
Field of the Invention.
The Field of invention is related to provenance model.
Technology invention in most of the application involves the data collection from variety of sources and clustering them into number of blocks to be applied for software calibration, processing, analysis, etc., But in all the cases, huge volume of data is given as an input and huge volume of data is extracted from it. The data that has been transformed has to be secure as there are many threats. So, this invention is modeling a data security for distributed streaming data collection system using provenance model.
Background of the invention.
In the field of medical or any field, when there is a need to make predictions by modeling, it requires large datasets to undergo computation. It needs an external data storage space to store the large data and the computed data. Not only the medical field, even in the monitoring and updating the status in any infrastructure observation, it requires different computational unit using software approach. But there is always a threat to the information that has been transformed and moved over a communication channel. This gives an importance to implement a security system for monitoring the collection of data using provenance model.
Always the datas that has been transformed from the source does not reach the destination. There may be loss of data and it creates a need to repeat certain data again known as the point of interest. For this kind of data transformation, there is need to involve a provenance model. It is a model to delineate the beginning of the data and the proceeding by which the current state has been arrived. It emphasis on data provenance which has to keep hold of the previous records or the history of the complete observation or the details.
1 P a g e
Data provenance need to capture the transformation of data at different levels in the computational unit. It enables the debugging of parallel processing more easily in any data flow. It captures the output of every parallel processing by tracking the inputs that has been given along with the original data to automatically correct itself to update obtained data.
Earlier model was Open Provenance model known as OPM. To deploy the OPM, the domain has to be examined whether it may be dataflow, web or the medical. Next, the data collection and its attribution have been done for the domain that has been chosen. Later an abstract model has to be produced. This process involves software approach querying along with the embedding and other technologies are binded like XML serialization. The main drawback is obtaining the process automation software solution, PASS.
Next, PROV-DM model is deployed which has elements namely entity which is a digital object, activity which acts with the entity and the agents which take the responsibilities for activities. In examining the relationship, it has to observe the following namely (1) entity, activity along with the creation, used and ended time (2) derivations of digital object (3) agents taking responsibility for generated entities and activities (4) support of provenance (5) properties and (6) collections that form logical structure. For defining big data, this PROV DM model structure must be extended further.
Even the Map Reduce method on OPM is also not suitable for defining the new relationship, especially in case of data processing and for any generic purpose.
So, for the data security for distributed streaming data collection system using provenance model having characteristics of big data must be deployed. The characteristics include the representation to support data that has high variability, complete proceedings of data transformation, monitoring privacy along with data security and multi-layer feature.
Objects of the Invention
The main object of the invention is to deploy data security for distributed streaming data collection system using a provenance model having characteristics of big data. This invention focuses on the security on the clusters of data input on various blocks of computation and the output from the computational block. It records the delineate of the origin of the information and its process how the present state is obtained, so that it 2 Page represents the provenance model of big data to maintain the quality and keep track of the transformed data before arriving to cluster.
Summary of the Invention
This invention is deploying a data security for distributed streaming data collection system using provenance model. Most of the computation requires huge volume of information that is needed as an input and must be transformed. The information is split into clusters and is computed in smaller blocks. Sometimes there may be loss of some information. They are to be tracked and can be done only when there is record of the delineate information from the beginning and the relationship of different process which is a provenenance model. The distributed streaming data collection using provenance model has entity as the digital object, activity as the operation it performs and the agent who generate and use the data. The entity needs an identifier for the data structure namely file, index, stream, and message. The activities are tasks, jobs and data operation done for the clusters of data. The data activity handles the stream data collection. The agent is the data generators, users, and the submitters. The relationship is been analyzed by the influencers in the entity, activity, and the agent.
Detailed Description of the Invention
Fig.1 shows provenance model for data security in a distributed streaming data collection system. When there is a collection of data from the sources or when there is a collection of outputs from a software approach, there is a need to track that the dataflow has any point of interest to check for any missing block of data in the clusters. In this provenance model, the delineation of the origin of the information, the relationship between the information and the proceedings of how the present state is arrived are a significant requirement in data security monitoring. The information has three levels, the first one is the entity which is a digital object having an identifier with optional attributes. The entities have the structure for tasks, jobs, data, and network objects. A job has many tasks to be carried out in parallel processing of data. The network object is the favorites, bookmarks, website, etc., which is a origin of a source and the end of the destination. There are different data formats and most frequently represented format is the file type. One of the main file types is the distributed file system and another file is of local file type. There is also another type of file system namely unstructured and semi-structured file. Semi-classified file system has many records used for data security monitoring. Another type of information that can store the cluster of data is the 3|Page database. The database is the collection of various complete information on online for the data transaction by responding to the query in software approach namely MySQL, Ms SQL server, etc. For handling analytical processing, database utilized at data warehousing. The most significant data format is the stream of data which is obtained from application observations. It has clusters of data represented as tuples sequentially transferred with time constraint. Next level is the activity performed on the entities. When an activity is performed, the attributes of the entities is transformed, and the update is to be recorded. Data activity and job or the tasks activity must be performed on the entities with time parameter. Data must be collected, organized and used. The job or the task activity must be allotted and completed. The last level is the agent point of view. It is someone having a duty or a control over the activity that must be performed on the entity. It can be the data generator, user and the submitter. It can also be software agent that handles the data. The relationship with entities influenced by the usage, the activities influenced by production, destruction and the interlinks between activities. The agent is influenced by responsibility and assignment.
Fig.2 shows the data activity which is a significant operation performed on the entities that alter the attributes of the entities. The first data activity role is the collection of clusters of data. It is a sequence of data which is a distributed stream of information obtained from different sources. After the data collection, it can be organized by performing operations like creation, deletion, etc. The organization types are file, indexed, stream and message operation. Then preparation of data is done by validation and standardization. It is then analyzed and visualized by chart and report.
4|Page
Claims (5)
1. A high speed optic fiber connection to perform the software approach data collection and activity.
2. Highly configured computer to carry out the computation involved by software agent in assigning and executing a query.
3. MySQL or Oracle or any other query software to execute from a database.
4. Smart devices with display to monitor the status of query.
5. A large volume of storage space to execute the cluster data and store the update of big data.
1 Pag e
DATA SECURITY FOR DISTRIBUTED STREAMING DATA 12 Aug 2020
COLLECTION SYSTEM USING PROVENANCE MODEL
Drawings 2020101787
Fig. 1 Block diagram of data security for streaming data collection using provenance model
1|Page
Fig. 2 Data Activity
2|Page
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU2020101787A AU2020101787A4 (en) | 2020-08-12 | 2020-08-12 | Data security for distributed streaming data collection system using provenance model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU2020101787A AU2020101787A4 (en) | 2020-08-12 | 2020-08-12 | Data security for distributed streaming data collection system using provenance model |
Publications (1)
Publication Number | Publication Date |
---|---|
AU2020101787A4 true AU2020101787A4 (en) | 2020-09-17 |
Family
ID=72432545
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
AU2020101787A Ceased AU2020101787A4 (en) | 2020-08-12 | 2020-08-12 | Data security for distributed streaming data collection system using provenance model |
Country Status (1)
Country | Link |
---|---|
AU (1) | AU2020101787A4 (en) |
-
2020
- 2020-08-12 AU AU2020101787A patent/AU2020101787A4/en not_active Ceased
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11269822B2 (en) | Generation of automated data migration model | |
US9275422B2 (en) | Distributed k-core view materialization and maintenance for graphs | |
CN107945086A (en) | A kind of big data resource management system applied to smart city | |
CN110300963A (en) | Data management system in large-scale data repository | |
CN110168523A (en) | Change monitoring to inquire across figure | |
CN106104533A (en) | Process the data set in large data storage vault | |
CN103853821A (en) | Method for constructing multiuser collaboration oriented data mining platform | |
CN1734451A (en) | System and method for automated data storage management | |
CN109213752A (en) | A kind of data cleansing conversion method based on CIM | |
US20170132284A1 (en) | Query hint management for a database management system | |
US20190050435A1 (en) | Object data association index system and methods for the construction and applications thereof | |
Al-Janabi | A proposed framework for analyzing crime data set using decision tree and simple k-means mining algorithms | |
Tu et al. | IoT streaming data integration from multiple sources | |
KR101552216B1 (en) | Integrated system for research productivity and operation managment based on big date technology, and method thereof | |
CN109213826A (en) | Data processing method and equipment | |
CN112528279A (en) | Method and device for establishing intrusion detection model | |
JP2014164618A (en) | Frequent pattern extraction device, frequent pattern extraction method, and program | |
AU2020101787A4 (en) | Data security for distributed streaming data collection system using provenance model | |
CN112347314B (en) | Data resource management system based on graph database | |
CN110062112A (en) | Data processing method, device, equipment and computer readable storage medium | |
CN103488693A (en) | Data processing device and data processing method | |
Sun et al. | A Novel CEP Model and Its Applications in Internet of Things Big Data Processing | |
CN111552847A (en) | Method and device for changing number of objects | |
Munir et al. | A temporal knowledge graph dataset for profiling | |
Englbrecht et al. | Supporting Process Mining with Recovered Residual Data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
FGI | Letters patent sealed or granted (innovation patent) | ||
MK22 | Patent ceased section 143a(d), or expired - non payment of renewal fee or expiry |