CN112100265A - Multi-source data processing method and device for big data architecture and block chain - Google Patents

Multi-source data processing method and device for big data architecture and block chain Download PDF

Info

Publication number
CN112100265A
CN112100265A CN202010978288.5A CN202010978288A CN112100265A CN 112100265 A CN112100265 A CN 112100265A CN 202010978288 A CN202010978288 A CN 202010978288A CN 112100265 A CN112100265 A CN 112100265A
Authority
CN
China
Prior art keywords
data
data stream
block chain
stream
source
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010978288.5A
Other languages
Chinese (zh)
Inventor
孙圣力
赖凯庭
李青山
司华友
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Boya Blockchain Research Institute Co ltd
Boya Chain Beijing Technology Co ltd
Peking University
Original Assignee
Nanjing Boya Blockchain Research Institute Co ltd
Boya Chain Beijing Technology Co ltd
Peking University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Boya Blockchain Research Institute Co ltd, Boya Chain Beijing Technology Co ltd, Peking University filed Critical Nanjing Boya Blockchain Research Institute Co ltd
Priority to CN202010978288.5A priority Critical patent/CN112100265A/en
Publication of CN112100265A publication Critical patent/CN112100265A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24552Database cache management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24568Data stream processing; Continuous queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor

Abstract

The invention provides a multi-source data processing method, a device and a system for a big data architecture and a block chain, wherein the method comprises the following steps: carrying out data acquisition on various data sources and converting the acquired data into data streams with a uniform format; realizing the classified caching of the data stream and providing a data stream output interface; acquiring a data stream through a data stream output interface and calling a big data open source algorithm to consume the acquired data stream; and acquiring the data stream through the data stream output interface and transferring the acquired data stream to the block chain. The invention provides a unified and lightweight data processing platform which can meet various actual service scenes, realizes data acquisition of different data sources, converts the acquired data into data streams with unified formats and facilitates quick reading of various data query and analysis tools. In addition, the classified stored data stream can be rapidly and conveniently transferred to the block chain, so that the block chain application is met.

Description

Multi-source data processing method and device for big data architecture and block chain
Technical Field
The invention relates to the technical field of communication, in particular to a multi-source data processing method and device for a big data architecture and a block chain.
Background
In recent years, with the rapid development of science and technology and the advancement of informatization construction, the data is as small as user cache of each application program background on the mobile terminal and as large as log data of user access and self running state stored on a server cluster, and the data is generated and accumulated at PB level at any time. The increase of data volume brings the increase of data value, a large amount of data plays a vital role in the fields of user behavior analysis, system safety alarm and the like, and under the support of various big data analysis technologies, a large amount of data which are discarded and not taken into consideration in the past begin to show new value.
On the other hand, in the early enterprise development and production environment, the data format is not standard, the data storage is random, a centralized storage means is lacked, and the difficulty is brought to the current big data processing. Numerous data are scattered in various types of databases which are not arranged and have non-uniform formats, and developers need to repeatedly build data pipelines and clean data on a server or a local host for use when obtaining the data, so that the development difficulty, the development time and the labor consumption are greatly increased. Therefore, how to collect and process scattered, non-uniform and data with complex data sources is a difficult problem for data management personnel and developers.
Based on the problem, many large-scale enterprises at home and abroad choose to build a data warehouse or a data center, and the data in the enterprises are intensively stored in the data warehouse or the data center in a uniform format to serve as a uniform data source in actual development. However, the development time of a data warehouse or a data center is long, the labor cost is high, the cluster building is difficult, the architecture is complex, a large amount of actual business data is needed for supporting, and a large number of medium-sized and small enterprises do not have the condition for building the data warehouse or the data center. In view of this, a unified and lightweight data platform capable of applying various actual service scenarios is a more practical technical solution.
The increase in the amount of data also brings about another problem: i.e. data security issues. The traditional database runs on a single-node server or a cluster consisting of a plurality of servers, the cost for data maintenance is high, and the safety is not good. The block chain technology is a distributed account book technology, transaction records are connected in series through the principle of cryptography, and are confirmed among nodes through a consensus mechanism, so that the transaction records are guaranteed not to be tampered, and are public and transparent. Therefore, a new idea is provided for important data encryption, namely, the important data are linked up and encrypted and stored in a consensus encryption mode, and better performance and safety guarantee can be obtained compared with the traditional database encryption mode.
However, the same problem of data conversion exists in the uplink process of data, and since the blockchain database server usually only opens a specific port and requires to send data in a specific HTTP request format, which does not directly correspond to the format of data storage in the database, the problem of conversion of the data format required in the request for communication between the data format stored in the database and the blockchain server is also an urgent problem to be solved.
Disclosure of Invention
In order to solve the above technical problems, a first aspect of the present invention provides a multi-source data processing method for a big data architecture and a block chain, which can acquire heterogeneous data from different data sources and convert the acquired data into a data stream with a uniform format. The specific technical scheme of the invention is as follows:
a multi-source data processing method facing a big data architecture and a block chain comprises the following steps:
carrying out data acquisition on various data sources and converting the acquired data into data streams with a uniform format;
realizing the classified caching of the data stream and providing a data stream output interface;
acquiring data stream through the data stream interface and consuming the acquired data stream; and/or
And acquiring the data stream through the data stream output interface and transferring the acquired data stream to the block chain.
In some embodiments, the plurality of data sources includes at least a relational database and a non-relational database, and the data streams are JSON formatted data streams.
In some embodiments, the obtaining the data stream from the data caching and transmission module and the unloading the data to the block chain comprises: parsing the data stream into data fields; extracting a target data field and packaging the extracted target data field into a message; and transferring the message encapsulated with the target data field to the block chain.
A second aspect of the present invention provides a big data architecture and blockchain oriented multi-source data processing apparatus, including:
the data acquisition module is used for acquiring data of various data sources and converting the acquired data into data streams with a uniform format;
the data caching and transmitting module is used for realizing the classified caching of the data stream and providing a data stream output interface;
the data consumption module acquires data stream through the data stream interface and consumes the acquired data stream; and/or
And the block chain uplink module acquires the data stream through the data stream interface and forwards the acquired data stream to the block chain.
In some embodiments, the plurality of data sources includes at least a relational database and a non-relational database, the data acquisition module includes a plurality of data acquisition components that can run in parallel, the plurality of data acquisition components are connected with the plurality of data sources via a JDBC interface, the plurality of data acquisition components include a Kafka component, a Logstash component, a Canal component, and a Maxwell component, and the data stream is a JSON formatted data stream.
In some embodiments, the data caching and transmission module comprises a Kafka open source platform, and the data stream is classified and cached in a Topic of the Kafka open source platform.
In some embodiments, the data consumption module comprises data query tools Hive, Impala and data analysis tools Spark, Storm.
In some embodiments, the block chain uplink module comprises:
the analysis submodule analyzes the data stream into data fields;
the encapsulation submodule extracts a target data field and encapsulates the extracted target data field into a message;
and the uplink sub-module is used for transferring the message encapsulated with the target data field to the block chain.
In some embodiments, the blockchain is a pre-arranged private, federation, or public chain.
The invention provides a unified and lightweight data processing platform which can meet various actual service scenes, can realize the acquisition of heterogeneous data from different data sources, converts the acquired data into data streams with unified formats for classified storage, and is convenient for various data query and analysis tools to quickly read. In addition, the data stream of the classified storage can be rapidly and conveniently transferred to the block chain.
Drawings
Fig. 1 is a schematic flowchart of a multi-source data processing method for a big data architecture and a block chain according to an embodiment of the present invention;
fig. 2 is a schematic flowchart of a multi-source data processing method for a big data architecture and a block chain according to an embodiment of the present invention;
FIG. 3 is a flowchart of a multi-source data processing apparatus for big data architecture and block chain according to an embodiment of the present invention;
FIG. 4 is a flowchart of a multi-source data processing apparatus for big data architecture and block chain according to an embodiment of the present invention;
FIG. 5 is an example of an environment that may be used to implement embodiments of the present invention;
fig. 6 is a flowchart of an application example of a big data architecture and block chain oriented multi-source data processing method according to an embodiment of the present invention;
fig. 7 is a flowchart of another application example of a big data architecture and block chain oriented multi-source data processing method according to an embodiment of the present invention.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
Although the present invention provides the method operation steps or apparatus structures as shown in the following embodiments or figures, more or less operation steps or module units may be included in the method or apparatus based on conventional or non-inventive labor. In the case of steps or structures which do not logically have the necessary cause and effect relationship, the execution order of the steps or the block structure of the apparatus is not limited to the execution order or the block structure shown in the embodiment or the drawings of the present invention. The described methods or modular structures, when applied in an actual device or end product, may be executed sequentially or in parallel according to embodiments or the methods or modular structures shown in the figures.
In order to realize the collection and processing of scattered, non-uniform and complex data sources, a data warehouse or a data center platform is generally required to be built, the development time of the data warehouse or the data center platform is long, the labor cost is high, the cluster building is difficult, the architecture is complex, and a large amount of actual service data is required to be supported.
Aiming at the defects in multi-source data acquisition and processing in the prior art, the invention provides a multi-source data processing method facing a big data architecture and a block chain, which can realize acquisition of heterogeneous data from different data sources and convert the acquired data into a data stream with a uniform format.
Fig. 1 illustrates a multi-source data processing method for a big data architecture and a block chain according to an embodiment of the present invention, and for convenience of description, only the parts related to the embodiment of the present invention are shown, and detailed descriptions are as follows:
s101, data acquisition is carried out on various data sources, and the acquired data are converted into data streams with a uniform format.
As shown in fig. 5, the data source includes a traditional relational database and a cloud non-relational database, the traditional relational database includes MySQL, SQLite, Oracle, Acess, and the like, and the cloud non-relational database includes mongoOB, Redis, Hadoop, Menbase, and the like, which are deployed in the data source layer.
In the implementation process, data acquisition pipelines are deployed, the data stored in different target databases can be acquired through a compiled JDBC interface comprising the URL of the target database, and the acquired heterogeneous data is converted into data streams with a uniform format.
Optionally, as shown in fig. 5, the deployed data collection pipeline layers include Kafka, Logstach, Canal, and Maxwell components. Wherein: the Kafka component is used for collecting and inputting source data in the database, the Logstach component is used for collecting and conveying database logs, and the Canal component and the Maxwell component are used for analyzing the database logs to read and output the data. After the processing of the components, heterogeneous data stored in different databases are collected and output as a data stream in a unified Json format.
And S102, realizing classification and caching of the data stream and providing a data stream output interface.
Data streams in the Json format are classified and cached into Topic of the Kafka open source platform according to topics. Optionally, the theme of the data stream includes a data source and a data destination of the data. The data source and the data destination are defined according to a specific application service, and are not limited herein.
S103, acquiring data stream through the data stream interface and consuming the acquired data stream.
The step is executed by a big data processing engine, the big data processing engine queries and acquires the data stream in the required Json format from the Kafka open source platform, and analyzes and calculates the data stream, so that the corresponding application service is realized. Optionally, as shown in fig. 5, the big data processing engine includes data query tools Hive, Impala, and data analysis tools Spark, Storm, etc.
And S104, acquiring the data stream through the data stream output interface and transferring the acquired data stream to the block chain.
The step is a block chain processing step for unloading the data stream onto a block chain. The block chain is a private chain, a alliance chain or a public chain which is arranged in advance.
Optionally, as shown in fig. 2, step S104 specifically includes:
and S1041, analyzing the data stream into data fields.
S1042, extracting a target data field and packaging the extracted target data field into a message;
and S1043, transferring the message encapsulated with the target data field to the block chain.
Specifically, after the data stream in the json format is acquired, the data stream is analyzed by compiling an analysis program, and the target data field needing to be uplinked is identified and extracted. And then, packaging the extracted data field needing to be chained into the data field of the request message, and transferring the message to the block chain through a port in a POST form.
It should be noted that, in practical application, one or both of step S103 and step S104 may be executed in the multi-source data processing method of this embodiment. If both are executed, they may be executed in parallel, or one may be selectively executed first and then the other.
The present invention also provides a multi-source data processing apparatus oriented to big data architecture and block chain, as shown in fig. 3, the processing apparatus includes a data acquisition module 201, a data caching and transmission module 202, a data consumption module 203, and a block chain chaining module 204, wherein:
the data acquisition module 201 is configured to perform data acquisition on multiple data sources and convert the acquired data into a data stream with a uniform format.
As mentioned in the above embodiments, the data source is generally divided into a traditional relational database and a cloud non-relational database, as shown in fig. 5, the traditional relational database includes MySQL, SQLite, Oracle, access, etc., and the cloud non-relational database includes mongoOB, Redis, Hadoop, Menbase, etc.
Optionally, as shown in fig. 5, the data acquisition module 201 includes a plurality of data acquisition components that can run in parallel, and the data acquisition components are connected to the databases through JDBC interfaces. Optionally, the data acquisition component includes a Kafka component, a Logstash component, a Canal component, and a Maxwell component deployed at the data pipeline layer. Wherein: the Kafka component is used for collecting and inputting source data in the database, the Logstach component is used for collecting and conveying database logs, and the Canal component and the Maxwell component are used for analyzing the database logs to read and output the data. After the processing of the components, heterogeneous data stored in different databases are collected and output as a data stream in a uniform JSON format.
And the data caching and transmitting module 202 is configured to implement classified caching of data streams and provide a data stream output interface.
Optionally, as shown in fig. 5, the data caching and transmitting module 202 includes a Kafka open source platform deployed at the data pipe layer. The JSON data stream collected and output by the data collection module 201 is classified and cached in the Topic of the Kafka open source platform. Optionally, the theme of the data stream includes a data source and a data destination of the data. The data source and the data destination are defined according to a specific application service, and are not limited herein.
And the data consumption module 203 is configured to obtain a data stream through the data stream interface and call a big data open source algorithm to perform calculation analysis on the obtained data stream. Optionally, as shown in fig. 5, the data consumption module includes data query tools Hive, Impala and data analysis tools Spark, Storm deployed in the data consumption layer. The big data processing engines inquire and acquire the data stream in the required Json format from the Kafka open source platform, and analyze and calculate the data stream, so that the corresponding application service is realized.
A block chain uplink module 204, configured to acquire a data stream through the data stream interface and forward the acquired data stream to a block chain.
As shown in fig. 4 and 5, optionally, the block chain uplink module 204 includes:
the parsing submodule 2041 parses the data stream into data fields.
The encapsulating submodule 2042 extracts a target data field and encapsulates the extracted target data field into a message.
The uplink sub-module 2043 forwards the message encapsulated with the target data field to the block chain.
As shown in fig. 5, these functional modules are deployed within the data consumption layer.
The invention provides a unified and lightweight data processing platform which can meet various actual service scenes, can realize the acquisition of heterogeneous data from different data sources, and converts the acquired data into data streams with unified formats for classified storage, thereby facilitating the rapid reading and consumption of various data query and analysis tools. In addition, the classified stored data stream can be quickly and conveniently transferred to the block chain, so that the application requirements of certain service scenes with high requirements on safety and non-falsification are met.
In order to show the implementation of the present invention more clearly, the present invention will be described in more detail in the following two views of big data application, block chain application.
Fig. 6 shows a specific implementation flow of the present invention in a big data application, and for convenience of description, only a part of the flow is mainly described below, and the rest may refer to the related description in the foregoing.
As shown in fig. 5 and fig. 6, the present example adopts the way that the Kafka middleware reads the database data, and there are three ways to read, which are respectively described as follows:
the first way is by Kafka-connect-JDBC. Kafka-connect-JDBC is a third-party Kafka plug-in sourced by the confluent platform, supports copying of tables using various JDBC data types, dynamically synchronizes the state of the database, and supports addition and deletion operations on the database. It has three main modes: bulk import mode, increment mode, and Timestamp & increment combined with auto-increment mode.
The data acquisition plug-in is very simple to deploy, can be realized by adding the URL of the target database in the configuration file, supports various database sources to input data, and is easy to expand. The plug-in will output the data to the console under topic of Kafka in JSON format according to the mode selected in the configuration file, facilitating the subsequent multi-component consumption.
The second way is realized by a special data pipeline component, and the main technical selection types are Canal and Maxwell, wherein:
the Canal is an open-source data pipeline, based on the analysis of incremental logs of a database, the component can simulate an interactive protocol of the MySQL Slave and pretend to be the MySQL Slave when in use, so that the dump protocol is sent to the MySQL master. After receiving the dump request, the MySQL master starts to push the binary log to the slave, then the cancer receives the binary log and starts to analyze, so that the MySQL database is synchronized, and finally the Kafka is responsible for caching and outputting the cancer data, so that the data transition from the MySQL to the Kafka is realized.
The Maxwell has the advantages that MySQL data can be directly converted into json format for output, the use is simpler, and then the MySQL data can be directly read by Kafka.
After the data are cached to a topic of the Kafka through the three parallel data acquisition channels, the data acquisition and reading work is finished, and then the Kafka outputs the data stream in the json format to the data consumption layer through the data output interface.
The data consumption layer mainly comprises two parts, one is a data query module consisting of Hive and Impala, and the operation of the database including various tasks such as addition, deletion, modification, query and the like is completed through sql-like statements; the other part is a data calculation processing part consisting of Spark and Storm, can deal with tasks such as data calculation of an actual scene, and can support the training of models such as machine learning completed in Spark and Storm, and the marking result required to be returned to the database is returned to a specific field of the database.
The method for reading the data under Kafka specific topic by the data consumption layer is described in detail below.
Since Spark already provides a sufficiently rich interface or component to facilitate large amounts of data streaming batch processing. In the invention, a direct connection mode is used when Spark is butted against Kafka, which is different from a traditional Receiver mode in a mode of calling high-order api, the direct connection mode has no Receiver hierarchy, and the latest offsets in each partition in a specific topic in Kafka are periodically obtained based on Spark Streaming, and then each section of transmitted incoming data packet is processed according to the set maxRatePerpartition, so that the Spark is used for reading the Kafka data.
Storm provides a Storm-Kafka module for reading data in Kafka, and the specific construction mode comprises the following two steps: firstly, configuring mapping information of a Kafka browser host and a partition by using Brokerhosts interfaces, wherein the step supports two modes, one mode is realized based on zookeeper management, the other mode is directly connected with an open port, and the two modes are realized; the second is to configure output information related to Kafka, such as the amount of data output per unit time, port access timeout time, and the like, using Kafka Config.
Hive cannot directly synchronize Kafka data, but data communication between Hive and Kafka is increasingly emphasized with the appearance of actual scenes such as log processing. In the invention, two schemes are mainly considered, namely a camus component and a gobblin, the former is merged into a subset of the latter in 2015, the implementation is basically the same, Kafka data is extracted into HDFS by executing MapReduce task, and then the transition from HDFS to Kafka is carried out through shell script, and the scheme can realize a relatively simple data pipeline scene and achieve a relatively excellent extraction rate and capacity in an actual service scene.
The Impala is a big data real-time query analysis engine realized based on Hive, a Metadata base Metadata of Hive is directly used, meaning that Impala Metadata is stored in metastore of Hive, and the Impala is compatible with analysis of SQL-like statements of Hive, so that the Impala can be synchronized into Kafka only by operating Hive.
Fig. 7 shows a specific implementation flow of the present invention in a blockchain application, and for convenience of description, only a part of the flow is mainly described below, and the rest can refer to the related description in the foregoing.
In this example, during the data uplink process, the manager needs to manually mark an identifier of an important data field to mark the data that needs to be uplink-operated.
In the invention, in consideration of security, a simple private chain is built based on the Hyperleger Fabric hyper-book architecture, ports such as data chaining, contract certificate returning and the like are opened for use by examples, and the system can be subsequently butted with other external public chains only by opening and butting corresponding ports.
After the JSON format data stream output by the data transmission layer is obtained, a program is written to analyze the JSON format, the mark field is identified, and the data needing to be chained is determined. And then packaging the data in a data field of the request message, sending a data uplink request to the port in a POST form, and receiving a return message. It should be noted that, the manager may actively inquire the status to confirm whether the certificate is successfully stored, and may perform the data uplink request again if the failure information is returned. If the status code returned by the server is 0, the uplink is successful, if the status code is-1, the uplink is failed, three failure reasons are adopted, and-3 indicates illegal transaction, at this time, the manager needs to re-verify the identity and other information, if the status code is-2, the hash value is wrong, the manager needs to re-check the data integrity, and if the status code is 4000, the manager needs to check whether the error occurs in the storage system.
According to the above, after receiving the status code 0 returned by the server to indicate that the uplink is successfully completed, starting the service to return the contract certificate for the data manager, the specific implementation method is as follows: and the client sends a GET message to the server port to request transaction details, and after receiving the GET message, the client returns a transaction ID to a data manager as a data certificate of the data uplink. After the manager takes the data certificate, the manager can send a POST request to the block chain server by means of the certificate information, the uplink condition is inquired through a hash value generated by the ID, the POST request can be compared with original data in the database, data transmission is ensured to be correct, the whole process that the data are transferred to the block chain and the data uplink and returned to the certificate is completed, and then the realization that the data transfer module goes to the block chain module is completed.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit of the corresponding technical solutions and scope of the present invention as defined in the appended claims.
The invention has been described above with a certain degree of particularity. It will be understood by those of ordinary skill in the art that the description of the embodiments is merely exemplary and that all changes that come within the true spirit and scope of the invention are desired to be protected. The scope of the invention is defined by the appended claims rather than by the foregoing description of the embodiments.

Claims (9)

1. A multi-source data processing method oriented to a big data architecture and a block chain is characterized by comprising the following steps:
carrying out data acquisition on various data sources and converting the acquired data into data streams with a uniform format;
realizing the classified caching of the data stream and providing a data stream output interface;
acquiring data stream through the data stream interface and consuming the acquired data stream; and/or
And acquiring the data stream through the data stream output interface and transferring the acquired data stream to the block chain.
2. The multi-source data processing method of claim 1, wherein the plurality of data sources comprise at least a relational database and a non-relational database, and the data stream is a JSON formatted data stream.
3. The multi-source data processing method of claim 1, wherein the obtaining the data stream from the data cache and transfer module and the dumping the data to the blockchain comprises:
parsing the data stream into data fields;
extracting a target data field and packaging the extracted target data field into a message;
and transferring the message encapsulated with the target data field to the block chain.
4. A big-data architecture and blockchain oriented multi-source data processing apparatus, the processing apparatus comprising:
the data acquisition module is used for acquiring data of various data sources and converting the acquired data into data streams with a uniform format;
the data caching and transmitting module is used for realizing the classified caching of the data stream and providing a data stream output interface;
the data consumption module acquires data stream through the data stream interface and consumes the acquired data stream; and/or
And the block chain uplink module acquires the data stream through the data stream interface and forwards the acquired data stream to the block chain.
5. The multi-source data processing apparatus of claim 4, wherein the plurality of data sources comprises at least a relational database and a non-relational database, the data acquisition module comprises a plurality of data acquisition components capable of running in parallel, the plurality of data acquisition components are connected with the plurality of data sources via a JDBC interface, the plurality of data acquisition components comprise a Kafka component, a Logstash component, a Canal component and a Maxwell component, and the data stream is a JSON format data stream.
6. The multi-source data processing apparatus of claim 5, wherein the data caching and transmission module comprises a Kafka open source platform, and wherein data streams are cached in a classification within a Topic of the Kafka open source platform.
7. The multi-source data processing apparatus of claim 4, wherein the data consumption module comprises data query tools Hive, Impala, and data analysis tools Spark, Storm.
8. The multi-source data processing apparatus of claim 1, wherein the block-chain uplink module comprises:
the analysis submodule analyzes the data stream into data fields;
the encapsulation submodule extracts a target data field and encapsulates the extracted target data field into a message;
and the uplink sub-module is used for transferring the message encapsulated with the target data field to the block chain.
9. The multi-source data processing method of claim 1, wherein the blockchain is a pre-arranged private chain, a federation chain, or a public chain.
CN202010978288.5A 2020-09-17 2020-09-17 Multi-source data processing method and device for big data architecture and block chain Pending CN112100265A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010978288.5A CN112100265A (en) 2020-09-17 2020-09-17 Multi-source data processing method and device for big data architecture and block chain

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010978288.5A CN112100265A (en) 2020-09-17 2020-09-17 Multi-source data processing method and device for big data architecture and block chain

Publications (1)

Publication Number Publication Date
CN112100265A true CN112100265A (en) 2020-12-18

Family

ID=73759722

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010978288.5A Pending CN112100265A (en) 2020-09-17 2020-09-17 Multi-source data processing method and device for big data architecture and block chain

Country Status (1)

Country Link
CN (1) CN112100265A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112699170A (en) * 2020-12-31 2021-04-23 上海竞动科技有限公司 Query method and system based on multi-source data structure block chain
CN112800064A (en) * 2021-02-05 2021-05-14 成都延华西部健康医疗信息产业研究院有限公司 Real-time big data application development method and system based on Confluent community open source edition
CN113032379A (en) * 2021-03-16 2021-06-25 广东电网有限责任公司广州供电局 Distribution network operation and inspection-oriented multi-source data acquisition method
CN113360936A (en) * 2021-08-09 2021-09-07 湖南和信安华区块链科技有限公司 Data analysis system based on block chain
CN113641739A (en) * 2021-07-05 2021-11-12 南京联创信息科技有限公司 Spark-based intelligent data conversion method
CN114417408A (en) * 2022-01-18 2022-04-29 百度在线网络技术(北京)有限公司 Data processing method, device, equipment and storage medium
CN114528346A (en) * 2022-01-27 2022-05-24 中科大数据研究院 Method for sharing transaction of multi-source heterogeneous data assets by depending on block chain
CN114741447A (en) * 2022-03-28 2022-07-12 国网北京市电力公司 Distributed energy station data processing method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109684377A (en) * 2018-12-13 2019-04-26 深圳市思迪信息技术股份有限公司 General big data handles development platform and its data processing method in real time
CN109829009A (en) * 2018-12-28 2019-05-31 北京邮电大学 Configurable isomeric data real-time synchronization and visual system and method
CN110457929A (en) * 2019-08-16 2019-11-15 重庆华医康道科技有限公司 The sharing method and system of isomery HIS big data real-time encryption and decryption compression cochain
CN110597894A (en) * 2019-08-26 2019-12-20 重庆华医康道科技有限公司 Real-time inquiry system for organization mechanism data
CN110795257A (en) * 2019-09-19 2020-02-14 平安科技(深圳)有限公司 Method, device and equipment for processing multi-cluster operation records and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109684377A (en) * 2018-12-13 2019-04-26 深圳市思迪信息技术股份有限公司 General big data handles development platform and its data processing method in real time
CN109829009A (en) * 2018-12-28 2019-05-31 北京邮电大学 Configurable isomeric data real-time synchronization and visual system and method
CN110457929A (en) * 2019-08-16 2019-11-15 重庆华医康道科技有限公司 The sharing method and system of isomery HIS big data real-time encryption and decryption compression cochain
CN110597894A (en) * 2019-08-26 2019-12-20 重庆华医康道科技有限公司 Real-time inquiry system for organization mechanism data
CN110795257A (en) * 2019-09-19 2020-02-14 平安科技(深圳)有限公司 Method, device and equipment for processing multi-cluster operation records and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
王超 等: "《Flink与Kylin深度实践》", 1 August 2020, 机械工业出版社, pages: 178 - 181 *
黄源 等: "《大数据技术与应用》", 1 May 2020, 机械工业出版社, pages: 79 - 81 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112699170A (en) * 2020-12-31 2021-04-23 上海竞动科技有限公司 Query method and system based on multi-source data structure block chain
CN112699170B (en) * 2020-12-31 2022-10-21 上海竞动科技有限公司 Query method and system based on multi-source data structure block chain
CN112800064A (en) * 2021-02-05 2021-05-14 成都延华西部健康医疗信息产业研究院有限公司 Real-time big data application development method and system based on Confluent community open source edition
CN113032379A (en) * 2021-03-16 2021-06-25 广东电网有限责任公司广州供电局 Distribution network operation and inspection-oriented multi-source data acquisition method
CN113641739A (en) * 2021-07-05 2021-11-12 南京联创信息科技有限公司 Spark-based intelligent data conversion method
CN113360936A (en) * 2021-08-09 2021-09-07 湖南和信安华区块链科技有限公司 Data analysis system based on block chain
CN114417408A (en) * 2022-01-18 2022-04-29 百度在线网络技术(北京)有限公司 Data processing method, device, equipment and storage medium
CN114528346A (en) * 2022-01-27 2022-05-24 中科大数据研究院 Method for sharing transaction of multi-source heterogeneous data assets by depending on block chain
CN114528346B (en) * 2022-01-27 2023-01-13 中科大数据研究院 Method for sharing transaction of multi-source heterogeneous data assets by depending on block chain
CN114741447A (en) * 2022-03-28 2022-07-12 国网北京市电力公司 Distributed energy station data processing method and device

Similar Documents

Publication Publication Date Title
CN112100265A (en) Multi-source data processing method and device for big data architecture and block chain
CN109492040B (en) System suitable for processing mass short message data in data center
CN109063196B (en) Data processing method and device, electronic equipment and computer readable storage medium
CN103209087B (en) Distributed information log statistical processing methods and system
CN108681569B (en) Automatic data analysis system and method thereof
CN111400326B (en) Smart city data management system and method thereof
CN105404701A (en) Peer-to-peer network-based heterogeneous database synchronization method
Grover et al. Data Ingestion in AsterixDB.
CN106815338A (en) A kind of real-time storage of big data, treatment and inquiry system
CN110009201B (en) Electric power data link system and method based on block chain technology
CN111885439B (en) Optical network integrated management and duty management system
CN101808051B (en) Application integration gateway and control method thereof
CN104156798A (en) System data real-time push framework adopting enterprise authority source and method
CN110096545A (en) One kind being based on big data platform data processing domain architecting method
CN108924228B (en) Industrial internet optimization system based on edge calculation
CN113886055A (en) Intelligent model training resource scheduling method based on container cloud technology
CN102090039A (en) A method of performing data mediation, and an associated computer program product, data mediation device and information system
Maske et al. A real time processing and streaming of wireless network data using storm
CN116389475A (en) Kafka-based industrial enterprise real-time ubiquitous interconnection method
CN112667586B (en) Method, system, equipment and medium for synchronizing data based on stream processing
CN104333578A (en) Distributed data exchange system and method
Li et al. Research on Artificial Intelligence Industrial Big Data Platform for Industrial Internet Applications
CN112101894A (en) Coal dressing intelligent system
CN107330089B (en) Cross-network structured data collection system
CN104298718A (en) SOA based distributed drawing-document system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination