CN112134834B - Data lake system architecture based on block chain - Google Patents

Data lake system architecture based on block chain Download PDF

Info

Publication number
CN112134834B
CN112134834B CN202010423876.2A CN202010423876A CN112134834B CN 112134834 B CN112134834 B CN 112134834B CN 202010423876 A CN202010423876 A CN 202010423876A CN 112134834 B CN112134834 B CN 112134834B
Authority
CN
China
Prior art keywords
data
block chain
bdl
blockchain
lake
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010423876.2A
Other languages
Chinese (zh)
Other versions
CN112134834A (en
Inventor
蔡维德
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianmin Qingdao International Sandbox Research Institute Co ltd
Beijing Tiande Technology Co ltd
Original Assignee
Tianmin Qingdao International Sandbox Research Institute Co ltd
Beijing Tiande Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianmin Qingdao International Sandbox Research Institute Co ltd, Beijing Tiande Technology Co ltd filed Critical Tianmin Qingdao International Sandbox Research Institute Co ltd
Priority to CN202010423876.2A priority Critical patent/CN112134834B/en
Publication of CN112134834A publication Critical patent/CN112134834A/en
Application granted granted Critical
Publication of CN112134834B publication Critical patent/CN112134834B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/04Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
    • H04L63/0428Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/10Network architectures or network communication protocols for network security for controlling access to devices or network resources
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/56Provisioning of proxy services
    • H04L67/565Conversion or adaptation of application format or content
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/50Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols using hash chains, e.g. blockchains or hash trees

Abstract

The invention provides a Data lake BDL (blockchain Data lake) system architecture based on a block chain, which enables the block chain to be applied to more complex service scenes and is particularly suitable for big Data analysis. The system architecture includes: (1) the realized abstract block chain node comprises a data acquisition and transmission module; (2) the block chain data pipeline provides functions of block chain data sending and receiving, block chain data conversion and processing and the like; (3) the data lake BDL of the block chain comprises a block chain database, a data analysis component, a data security and access component and a BDL chain.

Description

Data lake system architecture based on block chain
Technical Field
The invention belongs to the technical field of block chain technology and data processing, and particularly relates to a design technology of a data lake system with data collaboration among a plurality of heterogeneous or homogeneous block chains.
Background
The data of the block chain adopts an increment storage mode, and the block data and the transaction data are taken as main data. The current block link data or transaction data is mainly queried in a mode of traversal query, hash-based query, block height-based query and the like. In this design, the application stores the hash value corresponding to the uplink data, and performs a query based on the hash value provided by the application. The data query mode of the block chain is single, and is particularly unfriendly to big data analysis, and the block chain cannot be applied to more complex service scenes, particularly service scenes of cross-chain data fusion, such as block chain supervision. With the development of the blockchain technology, application scenarios of blockchains are continuously enriched, and requirements for blockchain data fusion and data collaboration are higher and higher. For example: a certain user has a large amount of funds in a bank A, other users related to the user have large amount of abnormal fund flow in other banks such as a bank B, a bank C, a bank D … … and the like, and data among the banks need to be fused and coordinated through inter-bank supervision to solve the problem of fund supervision.
Therefore, the invention provides a data lake framework based on the block chains, which enables the originally mutually isolated block chains to realize the interconnection and intercommunication of the block chain data, can support the functions of complex query, data mining and data analysis, and greatly improves the data utilization efficiency.
Disclosure of Invention
The invention provides a data lake system architecture based on block chains, which can get through various isomorphic or heterogeneous block chains, realize the mutual fusion and cooperation of block chain data, support the functions of complex query, data mining and data analysis and greatly improve the data utilization efficiency. The architecture proposed by the present invention comprises the following parts:
(1) achievable abstract blockchain nodes: various block chain systems collect and exchange Data with a block chain Data lake bdl (blockchain Data lake) by implementing the abstract node, and the node comprises the following modules:
(1a) the block chain adds the data acquisition module: acquiring data of the newly added block at regular time;
(1b) a block chain new data sending module: sending data to a block chain data pipeline at regular time;
(2) block chain data pipe: the connection between the block chain and the BDL comprises block chain data sending and receiving and block chain data conversion and processing, and the data pipeline comprises the following modules:
(2a) the block link point and BDL link connecting module: establishing a block chain and BDL communication channel;
(2b) a block chain data receiving module: receiving data sent by an abstract block chain node;
(2c) the block chain data conversion processing module: carrying out formatting processing and encryption processing on data sent by the abstract block chain node;
(2d) the block chain data safety transmission module: transmitting the formatted and encrypted data to the BDL;
(3) the blockchain data lake BDL comprises the following modules or components:
(3a) block chain database: the storage and the quick retrieval of massive block chain data are supported;
(3b) a data analysis component: support for a variety of data analysis tools, including but not limited to SQL, hive, impala, spark, etc.;
(3c) data security and access component: the system is responsible for block chain access authorization and data lake access control;
(3d) BDL chain: for linking up the critical data of the BDL and providing the self data checking function of the BDL.
Further, the blockchain system in (1) may include homogeneous blockchains and may also include heterogeneous blockchains.
Further, the block chain storage database in the step (3 a) adopts a distributed storage technology, has strong expansibility and easy maintenance, and supports disaster recovery and automatic backup. In the specific embodiment, various data retrieval modes such as SQL, YARN, MapReduce, key, filter and the like can be supported. The data table of the block chain storage database can be self-adaptive according to the service requirement, and the fields can be automatically adjusted.
Further, the block chain access authorization function in (3 c) registers a legal identity that can access the BDL for each block chain, and an unauthorized block chain cannot access the BDL. The abstract blockchain node addresses of each chain need to be registered in the BDL declaration in advance, and nodes that have not been registered cannot access the BDL even if authorized.
Further, the BDL chain in (3 d) is used to ensure correctness and non-tamper property of the key information of the BDL itself, to verify the key data and hash generated during the BDL analysis process, and to provide a data verification function.
The BDL system processing method provided by the invention comprises the following steps:
(1) the accessed block chain is registered in the BDL first to obtain authorization and obtain a unique identity;
(2) the accessed block chain realizes abstract block chain nodes, is deployed in the chain of the accessed block chain and is responsible for regularly acquiring newly added block chain data;
(3) after obtaining the authorization of (1), the node address deployed in the BDL is required to be registered in (2);
(4) deploying a block chain data pipeline (hereinafter referred to as a pipeline) in the self server environment by the accessed block chain, and connecting the node and the BDL in the step (2) by using the acquired identity in the step (1);
(4a) the pipeline acquires the block chain data acquired in the step (2) and verifies the validity of the data;
(4b) the pipeline formats the data obtained in the step (4 a), and then encryption processing is carried out;
(4c) the pipe sends (4 b) the encrypted data to the BDL;
(5) the BDL writes the received block chain data into a data table;
(5a) after receiving the data sent by the step (4 c), the BDL verifies the registered address in the step (3) to ensure the source validity of the data, and the data is not directly discarded after verification;
(5b) the BDL verifies the authorization acquired in the step (1), and the verification fails to directly discard the data;
(5c) after the BDL passes the two-step verification of (5 a) and (5 b), the data sent by (4 c) is decrypted and reformatted again, and then the data is written into a disk;
(6) the BDL extracts key information of the received data and stores the key information in a BDL chain;
(7) the BDL client analyzes data by using a data mining tool, and the client can verify the authenticity of the data by using key information on a BDL chain;
(8) the BDL provides a standard data query interface for the outside, so that a service system is conveniently butted with the BDL;
(9) and the service system calls the interface of the BDL to obtain data to perform subsequent service processing.
Further, the functions of evidence storage and verification provided by the BDL chain in the steps (6) and (7) are described as follows: the BDL receives block data uploaded by a certain block chain, processes each transaction in the block, and performs combined Hash operation on the Hash of the block where the transaction is located, the transaction data Tx and the Hash Prehash of a block before the block where the transaction is located to obtain a newHash, wherein the newHash can be expressed as newHash = Hash (Hash + Tx + PreHash), and the newHash is stored on the BDL chain as a key information value of the transaction. When data authenticity verification requirements exist in the data analysis process and the data query process, the block Hash, the previous block Hash and the verified transaction data of the verified transaction calculate the Hash value again according to a newHash = Hash (Hash + Tx + Prehash) formula, and if the Hash exists on the BDL chain, the verified transaction is not tampered.
The data lake (BDL) system architecture based on the block chain mainly solves the problems of low utilization efficiency of the block chain data and data fusion and collaboration among a plurality of heterogeneous/isomorphic block chains, and greatly enriches the application scene of the block chain as a new technology.
Drawings
FIG. 1 is a block chain-based data lake system architecture design diagram according to the present invention;
fig. 2 is a schematic view of a processing flow of a data lake system based on a blockchain according to the present invention.
Detailed description of the preferred embodiments
In the following description, numerous technical details are set forth in order to provide a better understanding of the present application, but it will be apparent to those of ordinary skill in the art that the present invention is not limited to these technical details and that various changes and modifications can be made based on the following embodiments.
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.
A typical application scenario of the present invention is the capital administration of commercial banks at the central bank. Each commercial bank may have its own blockchain system, and the central row requires to take the blockchain of each row into supervision, which requires to link each row of blockchains into BDL to implement data fusion and data collaboration, and needs to be implemented according to the corresponding steps of the present invention, as shown in fig. 2:
each commercial bank block chain firstly registers in a BDL of a central bank to acquire access authorization; after obtaining authorization, BDL abstract block chain nodes are realized and are deployed into the block chain of each commercial bank; after the server deployment is completed, the IP address of the server needs to be registered in the BDL of the central row, so that illegal access after stealing authorization is avoided, and the BDL needs to verify whether access authorization is obtained before registration.
After the IP registration of the server is completed, the access work of each row of block chains is completed, the abstract node can regularly acquire added block data of each row of block chains and send the added block data to BDL data pipelines deployed by each row, the pipelines format and encrypt the original data and send the original data to BDLs of central rows, after receiving the data, the BDLs of the central rows need to carry out validity check on data sources, verify whether the BDLs acquire access authorization and whether the IP is legal, ensure the authenticity of the data, and confirm that the data is wrote into a disk and is called when a client analyzes the data. During the process of writing to the disk, the BDL calculates the key information of the transaction according to newHash = Hash (Hash + Tx + prefhash), and stores the newHash on the BDL chain.
The foregoing is directed to embodiments of the present invention, and it is understood that various changes and modifications may be made by those skilled in the art without departing from the spirit and scope of the invention.

Claims (4)

1. A data lake system based on a block chain is characterized by comprising the following parts:
(1) achievable abstract blockchain nodes: various blockchain systems exchange data with blockchain data lake BDL through collection of the realizable abstract blockchain nodes that include the following modules:
(1a) the block chain adds the data acquisition module: the method is used for acquiring data of the newly added block at regular time;
(1b) a block chain new data sending module: the data transmission pipeline is used for transmitting data to the block chain data pipeline in a timing mode;
(2) block chain data pipe: a connection between a blockchain and the blockchain data lake BDL, the blockchain data pipeline including the following modules:
(2a) the block chain link point and block chain data lake BDL link connecting module: a communication channel for establishing a block chain and the block chain data lake BDL;
(2b) a block chain data receiving module: for receiving data from the realizable abstract blockchain node;
(2c) the block chain data conversion processing module: the data processing device is used for carrying out formatting processing and encryption processing on the data sent by the realizable abstract blockchain node;
(2d) the block chain data safety transmission module: the data processing module is used for transmitting the formatted and encrypted data to the block chain data lake BDL;
(3) the blockchain data lake BDL comprises the following modules or components:
(3a) block chain database: the method is used for supporting storage and rapid retrieval of massive block chain data;
(3b) a data analysis component: for supporting a plurality of data analysis tools;
(3c) data security and access component: access authorization for the blockchain and access control for the blockchain data lake BDL;
(3d) block chain data lake BDL chain: the system is used for chaining key data of the block chain data lake BDL and providing verification of self data of the block chain data lake BDL;
the processing flow of the system is as follows:
step 1, the accessed block chain is registered in the block chain data lake BDL first to obtain authorization and obtain a unique identity;
step 2, the accessed block chain realizes the realizable abstract block chain node, and is deployed in the block chain of the block chain, and is responsible for regularly acquiring newly added block chain data;
step 3, after obtaining the authorization, registering the node address deployed in the step 2 in the block chain data lake BDL;
step 4, deploying a blockchain data pipeline in the server environment of the accessed blockchain, and connecting the realizable abstract blockchain node in the step 2 and the blockchain data lake BDL by using the obtained unique identity, including:
step 41, the block chain data pipeline acquires the newly added block chain data acquired in the step 2, and verifies the validity of the block chain data;
step 42, the block chain data pipeline formats the block chain data collected in step 41, and then performs encryption processing;
step 43, the blockchain data pipeline sends the encrypted blockchain data to the blockchain data lake BDL;
step 5, the block chain data lake BDL writes the received block chain data into a data table, including:
step 51, after receiving the blockchain data sent in the step 43, the blockchain data lake BDL verifies the registered node address in the step 3 to ensure the source validity of the blockchain data, and the blockchain data is not directly discarded after verification;
step 52, the blockchain data lake BDL verifies the authorization obtained in step 1, and verifies that the blockchain data is not directly discarded;
step 53, after the verification of the block chain data lake BDL in the steps 51 and 52, decrypting and reformatting the block chain data sent in the step 43, and then writing the data into a disk;
step 6, the block chain data lake BDL extracts key information of the received block chain data and stores the key information in a chain of the block chain data lake BDL;
step 7, the client of the blockchain data lake BDL analyzes the blockchain data using a data mining tool;
step 8, the block chain data lake BDL provides a standard data query interface for the outside, so that a service system can be conveniently butted with the block chain data lake BDL;
and 9, calling the data query interface of the BDL by the service system to obtain the data of the block chain, and performing subsequent service processing.
2. The blockchain-based data lake system of claim 1, wherein the blockchain accessible to the blockchain-based data lake system is a homogeneous blockchain or a heterogeneous blockchain.
3. The blockchain-based data lake system of claim 1, wherein each blockchain registers a legal identity that can access the blockchain data lake BDL, and unauthorized blockchains cannot access the blockchain data lake BDL; the abstract block link point address of each block chain needs to be declared and registered in the block chain data lake BDL in advance, and nodes that do not have been declared and registered cannot access the block chain data lake BDL even if the authorization is obtained.
4. The blockchain-based data lake system according to claim 3, wherein: the chain of the block chain data lake BDL provides a evidence storage and verification function, the block chain data lake BDL receives block data uploaded by a certain block chain, each transaction in the block is processed, the Hash of the block where each transaction is located, the transaction data Tx of the transaction and the Hash Prehash of a block before the block where the transaction is located are subjected to combined Hash operation to obtain newHash, the newHash can be expressed as newHash = Hash (Hash + Tx + PreHash), and the newHash is stored on the chain of the block chain data lake BDL as a key information value of the transaction; when a data authenticity verification requirement exists in a data analysis process and a data query process, the Hash of a verified transaction block, the Hash of a previous block and the verified transaction data Tx calculate the Hash value again according to a newHash = Hash (Hash + Tx + PreHash) formula, and if the Hash value calculated again according to a newHash = Hash (Hash + Tx + PreHash) formula exists on the block chain data lake BDL chain, the verified transaction is not tampered.
CN202010423876.2A 2020-05-19 2020-05-19 Data lake system architecture based on block chain Active CN112134834B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010423876.2A CN112134834B (en) 2020-05-19 2020-05-19 Data lake system architecture based on block chain

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010423876.2A CN112134834B (en) 2020-05-19 2020-05-19 Data lake system architecture based on block chain

Publications (2)

Publication Number Publication Date
CN112134834A CN112134834A (en) 2020-12-25
CN112134834B true CN112134834B (en) 2021-05-25

Family

ID=73851796

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010423876.2A Active CN112134834B (en) 2020-05-19 2020-05-19 Data lake system architecture based on block chain

Country Status (1)

Country Link
CN (1) CN112134834B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112734545B (en) * 2020-12-31 2024-02-02 中国工商银行股份有限公司 Block chain data sharing method, device and system
CN113114744B (en) * 2021-03-30 2022-04-26 清华大学 Block chain system supporting cross-chain transaction under data lake architecture
CN114723422B (en) * 2021-10-15 2023-06-09 北京天德科技有限公司 Block chain-based large transaction and settlement system
CN114168685B (en) * 2021-12-15 2023-07-18 北京天德科技有限公司 Novel database architecture based on blockchain system and operation method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109241753A (en) * 2018-08-09 2019-01-18 南京简诺特智能科技有限公司 A kind of data sharing method and system based on block chain
CN111125787A (en) * 2019-12-27 2020-05-08 上海共链信息科技有限公司 Gas inspection data cochain system based on block chain and use method thereof

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180144378A1 (en) * 2015-01-18 2018-05-24 Alejandro Evaristo Perez Method, system, and apparatus for managing focus groups
US10944546B2 (en) * 2017-07-07 2021-03-09 Microsoft Technology Licensing, Llc Blockchain object interface
CN110060162B (en) * 2019-03-29 2023-10-27 创新先进技术有限公司 Data authorization and query method and device based on block chain
CN110069932B (en) * 2019-05-08 2023-02-21 山东浪潮科学研究院有限公司 Data lake fusion data security analysis method based on intelligent contract
CN110727737B (en) * 2019-10-29 2022-10-18 南京邮电大学 Intelligent medical data storage method based on multilevel block chain system architecture

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109241753A (en) * 2018-08-09 2019-01-18 南京简诺特智能科技有限公司 A kind of data sharing method and system based on block chain
CN111125787A (en) * 2019-12-27 2020-05-08 上海共链信息科技有限公司 Gas inspection data cochain system based on block chain and use method thereof

Also Published As

Publication number Publication date
CN112134834A (en) 2020-12-25

Similar Documents

Publication Publication Date Title
CN112134834B (en) Data lake system architecture based on block chain
US20220231869A1 (en) Cross-blockchain mutual data storage
JP6577680B2 (en) Method and system for modified blockchain using digital signature
CN111461723B (en) Data processing system, method and device based on block chain
CN102291268B (en) Safety domain name server and hostile domain name monitoring system and method based on same
CN111490978B (en) Distributed log auditing system and method based on state channel
CN109858228A (en) Data sharing service platform and method based on block chain
CN110009201B (en) Electric power data link system and method based on block chain technology
US20230089134A1 (en) Data communication method and apparatus, computer device, and storage medium
CN109995530B (en) Safe distributed database interaction system suitable for mobile positioning system
CN110852745A (en) Block chain distributed dynamic network key automatic updating method
CN109936620B (en) Block chain-based storage method, device, system and storage medium
CN113837760B (en) Data processing method, data processing device, computer equipment and storage medium
CN111405223A (en) Video processing method, device and equipment
CN111949726A (en) Relational database synchronization method and system based on block chain
US20240073045A1 (en) Blockchain-based data processing method and apparatus, device, medium, and product
CN104539636A (en) Video evidence service system
CN115694847A (en) Equipment management method, system and device
CN111582866B (en) Payment information management method and system based on block chain technology
CN111865983A (en) Block chain-based data security tracing method
CN112667586B (en) Method, system, equipment and medium for synchronizing data based on stream processing
CN112416981A (en) Data processing method and device based on block chain, electronic equipment and storage medium
CN110888935A (en) Data transaction method based on block chain
CN117539645B (en) Block chain network construction method, system and storage medium based on service chain
CN111797161B (en) Method and system for assisting data cross-network exchange based on block chain technology

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant