AU2020101787A4 - Data security for distributed streaming data collection system using provenance model - Google Patents

Data security for distributed streaming data collection system using provenance model Download PDF

Info

Publication number
AU2020101787A4
AU2020101787A4 AU2020101787A AU2020101787A AU2020101787A4 AU 2020101787 A4 AU2020101787 A4 AU 2020101787A4 AU 2020101787 A AU2020101787 A AU 2020101787A AU 2020101787 A AU2020101787 A AU 2020101787A AU 2020101787 A4 AU2020101787 A4 AU 2020101787A4
Authority
AU
Australia
Prior art keywords
data
activity
model
provenance
security
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
AU2020101787A
Inventor
Danish Ahamad
Mohammad Ahmad
Abdullah Shawan Alotaibi
Mohammad Nadeem Khalid
Nayyar Ahmed Khan
Sivaram Rajeyyagari
Khan Asif Rashid
Ahmed Masih Uddin Siddiqi
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alotaibi Abdullah Shawan Dr
Rajeyyagari Sivaram Dr
Siddiqi Ahmed Masih Uddin Mr
Original Assignee
Alotaibi Abdullah Shawan Dr
Rajeyyagari Sivaram Dr
Siddiqi Ahmed Masih Uddin Mr
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alotaibi Abdullah Shawan Dr, Rajeyyagari Sivaram Dr, Siddiqi Ahmed Masih Uddin Mr filed Critical Alotaibi Abdullah Shawan Dr
Priority to AU2020101787A priority Critical patent/AU2020101787A4/en
Application granted granted Critical
Publication of AU2020101787A4 publication Critical patent/AU2020101787A4/en
Ceased legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2358Change logging, detection, and notification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

DATA SECURITY FOR DISTRIBUTED STREAMING DATA COLLECTION SYSTEM USING PROVENANCE MODEL ABSTRACT In recent technology, there was lot of algorithm involved in modeling of datasets collected from different sources in varying timestamp. In the dataflow, if there exists a point of interest, there is a need to reproduce the data again or any data at the point of interest. This involves the need of data security for streaming data collection using provenance model. This can be performed when there is a record of the delineation of the origin of the information, the relationship and process it arrived the present state. The input digital object is represented with an entity to track the flow in any software approach. The database or the file format or any other information format can be retrieved by query in MySQL, Oracle etc. Then the activity to be performed in the entity like job, tasks and data activity has to be allotted and it has to be completed. The agent operates on the information are data generator, user and submitter. Then the relationship of entity, activity and agent as influencers handle usage, production, invalidation, interlink, assignment and responsibility. With these record update, big data provenance model can be carried out for data security. 11 P a g e DATA SECURITY FOR DISTRIBUTED STREAMING DATA COLLECTION SYSTEM USING PROVENANCE MODEL Drawings PROV%-EN.ANCE MODEL FOR DATA SECUTY TIBUTED STREAMINGT CION SYSTEM NORMIATE ) AIONSHDI mmTY CIMTV AGENTENTIYAS TASXS ,JOB/'TASK DATA AC TIVITY AS ........ ......... AC...-TIVI TY GENE ATOR - F LU NCR JOBS DATA -aUSER AGENTAS DATA +ACTIVITY ACTIVITY NETWORK SUBMITTER OBJECT Fig. 1 Block diagram of data security for streaming data collection using provenance model 1 P a g e

Description

DATA SECURITY FOR DISTRIBUTED STREAMING DATA COLLECTION SYSTEM USING PROVENANCE MODEL
Drawings
PROV%-EN.ANCE MODEL FOR DATA SECUTY TIBUTED STREAMINGT CION SYSTEM NORMIATE ) AIONSHDI
mmTY CIMTV AGENTENTIYAS
TASXS ,JOB/'TASK DATA AC TIVITY AS ........ ......... AC...-TIVI TY GENE ATOR - F LU NCR JOBS
DATA -aUSER AGENTAS DATA +ACTIVITY
ACTIVITY NETWORK SUBMITTER OBJECT
Fig. 1 Block diagram of data security for streaming data collection using provenance model
1 Pa g e
DATA SECURITY FOR DISTRIBUTED STREAMING DATA COLLECTION SYSTEM USING PROVENANCE MODEL DESCRIPTION
Field of the Invention.
The Field of invention is related to provenance model.
Technology invention in most of the application involves the data collection from variety of sources and clustering them into number of blocks to be applied for software calibration, processing, analysis, etc., But in all the cases, huge volume of data is given as an input and huge volume of data is extracted from it. The data that has been transformed has to be secure as there are many threats. So, this invention is modeling a data security for distributed streaming data collection system using provenance model.
Background of the invention.
In the field of medical or any field, when there is a need to make predictions by modeling, it requires large datasets to undergo computation. It needs an external data storage space to store the large data and the computed data. Not only the medical field, even in the monitoring and updating the status in any infrastructure observation, it requires different computational unit using software approach. But there is always a threat to the information that has been transformed and moved over a communication channel. This gives an importance to implement a security system for monitoring the collection of data using provenance model.
Always the datas that has been transformed from the source does not reach the destination. There may be loss of data and it creates a need to repeat certain data again known as the point of interest. For this kind of data transformation, there is need to involve a provenance model. It is a model to delineate the beginning of the data and the proceeding by which the current state has been arrived. It emphasis on data provenance which has to keep hold of the previous records or the history of the complete observation or the details.
1 P a g e
Data provenance need to capture the transformation of data at different levels in the computational unit. It enables the debugging of parallel processing more easily in any data flow. It captures the output of every parallel processing by tracking the inputs that has been given along with the original data to automatically correct itself to update obtained data.
Earlier model was Open Provenance model known as OPM. To deploy the OPM, the domain has to be examined whether it may be dataflow, web or the medical. Next, the data collection and its attribution have been done for the domain that has been chosen. Later an abstract model has to be produced. This process involves software approach querying along with the embedding and other technologies are binded like XML serialization. The main drawback is obtaining the process automation software solution, PASS.
Next, PROV-DM model is deployed which has elements namely entity which is a digital object, activity which acts with the entity and the agents which take the responsibilities for activities. In examining the relationship, it has to observe the following namely (1) entity, activity along with the creation, used and ended time (2) derivations of digital object (3) agents taking responsibility for generated entities and activities (4) support of provenance (5) properties and (6) collections that form logical structure. For defining big data, this PROV DM model structure must be extended further.
Even the Map Reduce method on OPM is also not suitable for defining the new relationship, especially in case of data processing and for any generic purpose.
So, for the data security for distributed streaming data collection system using provenance model having characteristics of big data must be deployed. The characteristics include the representation to support data that has high variability, complete proceedings of data transformation, monitoring privacy along with data security and multi-layer feature.
Objects of the Invention
The main object of the invention is to deploy data security for distributed streaming data collection system using a provenance model having characteristics of big data. This invention focuses on the security on the clusters of data input on various blocks of computation and the output from the computational block. It records the delineate of the origin of the information and its process how the present state is obtained, so that it 2 Page represents the provenance model of big data to maintain the quality and keep track of the transformed data before arriving to cluster.
Summary of the Invention
This invention is deploying a data security for distributed streaming data collection system using provenance model. Most of the computation requires huge volume of information that is needed as an input and must be transformed. The information is split into clusters and is computed in smaller blocks. Sometimes there may be loss of some information. They are to be tracked and can be done only when there is record of the delineate information from the beginning and the relationship of different process which is a provenenance model. The distributed streaming data collection using provenance model has entity as the digital object, activity as the operation it performs and the agent who generate and use the data. The entity needs an identifier for the data structure namely file, index, stream, and message. The activities are tasks, jobs and data operation done for the clusters of data. The data activity handles the stream data collection. The agent is the data generators, users, and the submitters. The relationship is been analyzed by the influencers in the entity, activity, and the agent.
Detailed Description of the Invention
Fig.1 shows provenance model for data security in a distributed streaming data collection system. When there is a collection of data from the sources or when there is a collection of outputs from a software approach, there is a need to track that the dataflow has any point of interest to check for any missing block of data in the clusters. In this provenance model, the delineation of the origin of the information, the relationship between the information and the proceedings of how the present state is arrived are a significant requirement in data security monitoring. The information has three levels, the first one is the entity which is a digital object having an identifier with optional attributes. The entities have the structure for tasks, jobs, data, and network objects. A job has many tasks to be carried out in parallel processing of data. The network object is the favorites, bookmarks, website, etc., which is a origin of a source and the end of the destination. There are different data formats and most frequently represented format is the file type. One of the main file types is the distributed file system and another file is of local file type. There is also another type of file system namely unstructured and semi-structured file. Semi-classified file system has many records used for data security monitoring. Another type of information that can store the cluster of data is the 3|Page database. The database is the collection of various complete information on online for the data transaction by responding to the query in software approach namely MySQL, Ms SQL server, etc. For handling analytical processing, database utilized at data warehousing. The most significant data format is the stream of data which is obtained from application observations. It has clusters of data represented as tuples sequentially transferred with time constraint. Next level is the activity performed on the entities. When an activity is performed, the attributes of the entities is transformed, and the update is to be recorded. Data activity and job or the tasks activity must be performed on the entities with time parameter. Data must be collected, organized and used. The job or the task activity must be allotted and completed. The last level is the agent point of view. It is someone having a duty or a control over the activity that must be performed on the entity. It can be the data generator, user and the submitter. It can also be software agent that handles the data. The relationship with entities influenced by the usage, the activities influenced by production, destruction and the interlinks between activities. The agent is influenced by responsibility and assignment.
Fig.2 shows the data activity which is a significant operation performed on the entities that alter the attributes of the entities. The first data activity role is the collection of clusters of data. It is a sequence of data which is a distributed stream of information obtained from different sources. After the data collection, it can be organized by performing operations like creation, deletion, etc. The organization types are file, indexed, stream and message operation. Then preparation of data is done by validation and standardization. It is then analyzed and visualized by chart and report.
4|Page

Claims (5)

DATA SECURITY FOR DISTRIBUTED STREAMING DATA COLLECTION SYSTEM USING PROVENANCE MODEL CLAIMS: We Claim:
1. A high speed optic fiber connection to perform the software approach data collection and activity.
2. Highly configured computer to carry out the computation involved by software agent in assigning and executing a query.
3. MySQL or Oracle or any other query software to execute from a database.
4. Smart devices with display to monitor the status of query.
5. A large volume of storage space to execute the cluster data and store the update of big data.
1 Pag e
DATA SECURITY FOR DISTRIBUTED STREAMING DATA 12 Aug 2020
COLLECTION SYSTEM USING PROVENANCE MODEL
Drawings 2020101787
Fig. 1 Block diagram of data security for streaming data collection using provenance model
1|Page
Fig. 2 Data Activity
2|Page
AU2020101787A 2020-08-12 2020-08-12 Data security for distributed streaming data collection system using provenance model Ceased AU2020101787A4 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2020101787A AU2020101787A4 (en) 2020-08-12 2020-08-12 Data security for distributed streaming data collection system using provenance model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
AU2020101787A AU2020101787A4 (en) 2020-08-12 2020-08-12 Data security for distributed streaming data collection system using provenance model

Publications (1)

Publication Number Publication Date
AU2020101787A4 true AU2020101787A4 (en) 2020-09-17

Family

ID=72432545

Family Applications (1)

Application Number Title Priority Date Filing Date
AU2020101787A Ceased AU2020101787A4 (en) 2020-08-12 2020-08-12 Data security for distributed streaming data collection system using provenance model

Country Status (1)

Country Link
AU (1) AU2020101787A4 (en)

Similar Documents

Publication Publication Date Title
US11269822B2 (en) Generation of automated data migration model
US9275422B2 (en) Distributed k-core view materialization and maintenance for graphs
CN107945086A (en) A kind of big data resource management system applied to smart city
CN110300963A (en) Data management system in large-scale data repository
CN110168523A (en) Change monitoring to inquire across figure
CN106104533A (en) Process the data set in large data storage vault
CN103853821A (en) Method for constructing multiuser collaboration oriented data mining platform
CN1734451A (en) System and method for automated data storage management
CN109213752A (en) A kind of data cleansing conversion method based on CIM
US20170132284A1 (en) Query hint management for a database management system
US20190050435A1 (en) Object data association index system and methods for the construction and applications thereof
Al-Janabi A proposed framework for analyzing crime data set using decision tree and simple k-means mining algorithms
Tu et al. IoT streaming data integration from multiple sources
KR101552216B1 (en) Integrated system for research productivity and operation managment based on big date technology, and method thereof
CN109213826A (en) Data processing method and equipment
CN112528279A (en) Method and device for establishing intrusion detection model
JP2014164618A (en) Frequent pattern extraction device, frequent pattern extraction method, and program
AU2020101787A4 (en) Data security for distributed streaming data collection system using provenance model
CN112347314B (en) Data resource management system based on graph database
CN110062112A (en) Data processing method, device, equipment and computer readable storage medium
CN103488693A (en) Data processing device and data processing method
Sun et al. A Novel CEP Model and Its Applications in Internet of Things Big Data Processing
CN111552847A (en) Method and device for changing number of objects
Munir et al. A temporal knowledge graph dataset for profiling
Englbrecht et al. Supporting Process Mining with Recovered Residual Data

Legal Events

Date Code Title Description
FGI Letters patent sealed or granted (innovation patent)
MK22 Patent ceased section 143a(d), or expired - non payment of renewal fee or expiry