CN109829009B

CN109829009B - Configurable real-time synchronization and visualization system and method for heterogeneous data

Info

Publication number: CN109829009B
Application number: CN201811621636.2A
Authority: CN
Inventors: 鄂海红; 宋美娜; 刘行行
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2018-12-28
Filing date: 2018-12-28
Publication date: 2021-05-25
Anticipated expiration: 2038-12-28
Also published as: CN109829009A

Abstract

The invention discloses a configurable real-time synchronization and visualization system and method for heterogeneous data, wherein the system comprises: the meta-bin subsystem module is used for storing historical and real-time incremental metadata information; the historical data batch processing synchronization subsystem module is used for acquiring and processing historical metadata information and storing the historical metadata information in a preset data warehouse in a preset format; the real-time synchronization subsystem module is used for acquiring real-time incremental metadata information, processing the real-time incremental metadata information to finish data type conversion and storing the real-time incremental metadata information in a preset data warehouse; the visual large-screen subsystem module is used for carrying out configuration management on data and user-defined statistical indexes in a preset data warehouse and a visual large screen; and the visualization module is used for displaying the data on a large screen for a user to check and manage. The system can combine the real-time data flow in the big data warehouse Hive with the statistic analysis task through the form of the user-defined statistic index and bind with the real-time big screen, and provides a solution for data statistics analysis from an online service system.

Description

Configurable real-time synchronization and visualization system and method for heterogeneous data

Technical Field

The invention relates to the technical field of big data, in particular to a configurable real-time synchronization and visualization system and method for heterogeneous data.

Background

The solution for data synchronization between relational databases and big data warehouse Hive is on one hand embodied on open source products: 1. the Sqoop is widely applied to importing offline data from a relational database into a Hadoop big data platform; 2. the button can sort the original full data through the timestamp field and realize the real-time incremental synchronization of the data by combining the mode of recording the timestamp of each update by the middle table. On the other hand, it appears from the same research: 1. message middleware is used to mask the diversity of two heterogeneous databases. For example, a method for synchronizing heterogeneous databases based on message middleware in real time is disclosed, the method comprising: and the data acquisition module is used for deploying different data acquisition devices according to different types of data sources to acquire data. And the data model module is used for processing the acquired data and packaging the data into a unified data model by adopting a Protobuffer technology. And the persistence module is used for sending the uniform data model packaged by the acquisition module to the message middleware for persistence. And the data processing module is used for pulling data from the message middleware through a message processing framework API (application programming interface) and carrying out service processing according to the service rule.

Although the open source synchronization tool provides powerful functions for data synchronization, the open source synchronization tool generally needs to be deployed separately, and integration of large data statistical analysis platforms is difficult. In addition, the button can sort the original full data through the timestamp field, and realize the real-time incremental synchronization of the data by combining the mode of recording the timestamp of each update by the middle table, and the obvious disadvantages of the mode are as follows: 1. carrying out IO operation on the relational data source to influence the online service performance; 2. data synchronization is performed periodically, and instantaneity is poor. The Sqooop is mainly used for synchronizing offline data, and has the defects that the synchronization operation is performed in a command line mode, the use threshold is high, and the usability is poor. The message system can be used for shielding the heterogeneity of heterogeneous databases in peer research, but does not reflect the convenience of use and the integration convenience with a specific statistical analysis task.

In order to solve the analysis requirement of mass data, it is a common solution in the industry to construct a big data warehouse around Hive (for statistical analysis, mapping a structured data file into a data table) and perform statistical analysis by using a HiveQL, and data generated by an online Web service system is generally stored in a relational database, which needs to import data in the relational database into the big data warehouse Hive, and for a user who needs to perform real-time statistical analysis on the latest full data generated by online services, it needs to synchronize the real-time data of incremental data generated by the online services into the big data warehouse Hive.

Disclosure of Invention

The present invention is directed to solving, at least to some extent, one of the technical problems in the related art.

Therefore, one objective of the present invention is to provide a configurable real-time synchronization and visualization system for heterogeneous data, which can combine the real-time data stream in the big data warehouse Hive with the statistical analysis task in a form of customized statistical index, and can bind the customized statistical index with the real-time large screen in a configuration manner, thereby providing a convenient and fast solution for statistical analysis personnel from an online business system to the statistical analysis of the big data.

Another objective of the present invention is to provide a configurable real-time synchronization and visualization method for heterogeneous data.

To achieve the above object, an embodiment of an aspect of the present invention provides a configurable system for real-time synchronization and visualization of heterogeneous data, including: a meta-bin subsystem module for storing meta-data information, wherein the meta-data information comprises: historical metadata information and real-time incremental metadata information; the historical data batch processing synchronization subsystem module is used for acquiring the historical metadata information, carrying out data batch processing and storing the batch processed data in a preset data warehouse in a preset format; the real-time synchronization subsystem module is used for acquiring the real-time incremental metadata information for processing, completing the conversion of the specified data types, and storing the processed data in the preset data warehouse; the visual large-screen subsystem module is used for carrying out configuration management on the data in the preset data warehouse, the self-defined statistical indexes and the visual large screen; and the visualization module is used for displaying the data after the configuration management of the visualization large-screen subsystem module on the visualization large screen for the user to check and manage.

The configurable real-time synchronization and visualization system for heterogeneous data, provided by the embodiment of the invention, obtains real-time incremental data of online services in a form of Canal analysis Binlog logs, does not need IO operation, has no influence on the performance of the online services, combines a batch processing synchronization task based on Spark and a real-time synchronization task based on Storm, sequentially performs batch processing synchronization and real-time synchronization on a configuration task through a task switch, and further synchronizes the full amount of data in a production system into a big data warehouse Hive, thereby greatly facilitating the use of tasks used by the system.

In addition, the system for real-time synchronization and visualization of configurable heterogeneous data according to the above embodiment of the present invention may further have the following additional technical features:

further, in an embodiment of the present invention, the preset data warehouse completes the customized statistical index in a form of a graphical interface.

Further, in an embodiment of the present invention, the historical data batching synchronization subsystem module is further configured to: and when the batched data is saved to the preset data warehouse, saving the table format of the preset data warehouse into an ORC form so as to support the insertion operation of the real-time data.

Further, in an embodiment of the present invention, the visualization large screen subsystem module is further configured to: and binding the customized statistical indexes and the large visual screen together to provide a configuration management interface to update or preview the statistical indexes corresponding to the large visual screen, and viewing the related metadata information of the large visual screen.

Further, in an embodiment of the present invention, the visualization module is further configured to: and checking the task name, the creation time, the execution state and the metadata information of the real-time and non-real-time task switch of the data real-time synchronization task.

In order to achieve the above object, another embodiment of the present invention provides a configurable method for real-time synchronization and visualization of heterogeneous data, including: s1, acquiring historical metadata information, carrying out data batch processing, and storing the batch processed data in a preset data warehouse in a preset format; s2, acquiring real-time incremental metadata information for processing, completing conversion of specified data types, and storing the processed data in the preset data warehouse; s3, configuring and managing the data in the preset data warehouse, the customized statistical indexes and the visual large screen; and S4, displaying the data after configuration management in the S3 on the visualization large screen for the user to view and manage.

The configurable real-time synchronization and visualization method for heterogeneous data, provided by the embodiment of the invention, obtains real-time incremental data of online services in a form of Canal analysis Binlog log, does not need IO operation, has no influence on the performance of the online services, combines a batch processing synchronization task based on Spark and a real-time synchronization task based on Storm, and performs batch processing synchronization and real-time synchronization successively on a configuration task through a task switch, so that the full amount of data in a production system is synchronized into a big data warehouse Hive, and the use of tasks used by the system is greatly facilitated.

In addition, the configurable real-time synchronization and visualization method for heterogeneous data according to the above embodiment of the present invention may further have the following additional technical features:

Further, in an embodiment of the present invention, S2 further includes: and when the batched data is saved to the preset data warehouse, saving the table format of the preset data warehouse into an ORC form so as to support the insertion operation of the real-time data.

Further, in an embodiment of the present invention, S3 further includes: and binding the customized statistical indexes and the large visual screen together to provide a configuration management interface to update or preview the statistical indexes corresponding to the large visual screen and view the related metadata information of the large visual screen.

Further, in an embodiment of the present invention, S4 further includes: and checking the task name, the creation time, the execution state and the metadata information of the real-time and non-real-time task switch of the data real-time synchronization task.

Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

FIG. 1 is a schematic structural diagram of a configurable real-time synchronization and visualization system for heterogeneous data according to an embodiment of the present invention;

FIG. 2 is an overall architecture diagram of a configurable real-time heterogeneous data synchronization and visualization system according to one embodiment of the present invention;

FIG. 3 is a block diagram of a system for configurable real-time synchronization and visualization of heterogeneous data according to one embodiment of the present invention;

FIG. 4 is a system architecture diagram of a historical data batching synchronization subsystem module according to one embodiment of the present invention;

FIG. 5 is a diagram of a real-time synchronization subsystem module functional architecture according to one embodiment of the present invention;

FIG. 6 is a diagram of a visual large screen subsystem module functional architecture according to one embodiment of the present invention;

FIG. 7 is a diagram of a Web platform visualization module functional architecture, according to one embodiment of the present invention;

FIG. 8 is a flow chart of a configurable real-time synchronization and visualization method for heterogeneous data according to an embodiment of the invention.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.

The following describes a configurable real-time heterogeneous data synchronization and visualization system and method according to an embodiment of the present invention with reference to the accompanying drawings, and first, a configurable real-time heterogeneous data synchronization and visualization system according to an embodiment of the present invention will be described with reference to the accompanying drawings.

Fig. 1 is a schematic structural diagram of a configurable real-time synchronization and visualization system for heterogeneous data according to an embodiment of the present invention.

As shown in fig. 1, the configurable real-time synchronization and visualization system for heterogeneous data includes: the system comprises a meta-bin subsystem module 100, a historical data batch processing synchronization subsystem module 200, a real-time synchronization subsystem module 300, a visualization large-screen subsystem module 400 and a visualization module 500.

The meta-bin subsystem module 100 is configured to store metadata information, where the metadata information includes: historical metadata information and real-time incremental metadata information.

The historical data batch processing synchronization subsystem module 200 is configured to obtain historical metadata information, perform data batch processing, and store the batch processed data in a preset data warehouse in a preset format.

The real-time synchronization subsystem module 300 is configured to obtain real-time incremental metadata information for processing, complete conversion of a specified data type, and store the processed data in a preset data warehouse.

The large visual screen subsystem module 400 is configured and managed for configuring and managing data in a preset data warehouse and customized statistical indexes with a large visual screen.

The visualization module 500 is used for displaying the data after the configuration management of the visualization large-screen subsystem module on the visualization large screen for the user to view and manage.

The system can combine the real-time data flow in the big data warehouse Hive with a statistical analysis task through the form of a user-defined statistical index, and meanwhile, the system can also bind the statistical user-defined statistical index with a real-time large screen in a configuration mode, so that a convenient and fast stacked solution from an online service system to big data statistical analysis is provided for statistical analysis personnel.

Further, the overall architecture design is shown in fig. 2, and as can be seen from fig. 2, the configurable heterogeneous data real-time synchronization and visualization system comprises an interaction layer, a logical abstraction layer, a framework layer and a base layer. As shown in fig. 3, the system can be specifically divided into five modules, namely a meta-bin subsystem module, a historical data batch processing synchronization subsystem module, a real-time synchronization subsystem module, a visualization large-screen subsystem module and a Web platform visualization module.

The Web platform is constructed on Spark and Storm distributed clusters, Spark is used as a calculation engine for synchronously processing the whole mass historical data, and a batch processing synchronization subsystem of the historical data carries out secondary development based on Spark so that the batch processing synchronization subsystem has the capacity of processing the mass historical data in batch processing synchronization. Storm is adopted as a main real-time synchronization component, and secondary development is carried out on the Storm so that the Storm can meet real-time data processing tasks of specific service scenes. And meanwhile, incremental data in a Canal analytical relational database Mysql is used in a real-time synchronization subsystem, Kafka is used as message middleware of peak cache data, and Hive is used as a heterogeneous data destination. And performing a timed statistical task by using Quartz in the visual large-screen subsystem, and updating the statistical indexes in real time to generate a real-time visual large screen. Vue is uniformly used for realizing a visual Web platform outside the whole platform, and Echarts plug-ins are used for realizing the drawing display of a visual large screen.

The system emphasizes two functions, namely, convenient data synchronization and synchronization, and can complete batch processing of historical data and real-time synchronization of incremental data in a data synchronization task in a task switch mode so as to acquire the latest full data in the relational database table of the online business system; the data synchronization task and the big data statistical analysis are conveniently bound together, data in a deployment target Hive table can be previewed on a graphical interface, a user-defined statistical index task can be added to the current task while the data is previewed, and the statistical indexes and a specific big screen can be bound together in a configuration mode. Adding a quartz timing task into the started task system, calculating and updating a statistical value corresponding to the statistical index at regular time, drawing the visual large screen in real time in a mode of regularly polling the accessed statistical index value bound with the current large screen when drawing the visual large screen, and finally displaying the visual large screen in real time.

Further, in the embodiment of the present invention, real-time synchronization and visualization of heterogeneous data are realized through the modules and interactions among the modules, and the composition of the system modules is described in detail below.

1) Meta-warehouse subsystem module

The meta-bin subsystem module 100 is used for storing metadata information of the system, uses Mysql as a bottom storage framework, and mainly stores data source configuration information, synchronization filtering rule configuration information, data destination configuration information, creation time, creator and task state information of a task, custom statistical index information of a real-time synchronization module, configuration information of a configurable large screen and the like, which are required by other modules, and is an auxiliary module of other subsystem modules.

2) Historical data batch processing synchronization subsystem module

Further, in an embodiment of the present invention, the historical data batching synchronization subsystem module is further configured to: when the batch processed data is saved to the preset data warehouse, the table format of the preset data warehouse is saved in an ORC form so as to support the insertion operation of the real-time data. In the embodiment of the present invention, the preset data warehouse is Hive. And the preset data warehouse Hive finishes self-defined statistical indexes in a graphical interface mode.

As shown in fig. 4, a system architecture diagram of a historical data batch processing synchronization subsystem module is shown, wherein a graph bottom layer is constructed based on Spark, and the excellent computing capability and system fault tolerance capability of Spark are fully utilized. And performing secondary packaging on the Spark to construct an execution processing engine module. The execution processing engine module mainly comprises three sub-modules, a data source reading module, a data filtering processing module and a data loading module.

2-1) data source reading module

The data source reading module is developed by carrying out secondary packaging on interfaces supported by various data sources based on Spark, and packages the interface for reading Mysql by Spark and reads the interface for reading Hive. Meanwhile, the task configuration data stored in the meta-bin subsystem module is read through a DataFrame of Pandas and then resides in a memory, and is provided for the data filtering processing module to use. In order to realize later expandability of the platform, different data reading interfaces can be selected to read data according to different data source types in the data source configuration information.

2-2) data filtering processing module

The data filtering processing module is developed based on a Spark abstract data structure DataFrame, the Spark DataFrame provides detailed data structure information, the Spark DataFrame exists in a memory in a form of a database table, and the Spark DataFrame is used for filtering read data by using a filter interface of the Spark DataFrame. And (3) completing field ignoring operation by using a drop interface of a Spark DataFrame, and encrypting the selected column by using a withColumn function combination udf (user defined function) of the Spark DataFrame, and meanwhile, storing the encrypted key in a persistent mode through the meta-bin subsystem module.

2-3) data loading module

The data loading module is the same as the data source reading module and is used for carrying out secondary packaging development on interfaces supported by various data sources based on Spark, and the difference is that the data source reading module reads a data source needing synchronous operation, and the data loading module stores a result data set processed by the data filtering processing module to a synchronous data destination. Meanwhile, the module performs special processing when the data is saved to Hive, and the table format of Hive needs to be saved in an ORC form, because the ORC type table supports the insertion operation of real-time data.

3) Real-time synchronization subsystem module

As shown in fig. 5, a system architecture diagram of a real-time synchronization subsystem module is shown. The bottom layer is developed based on Canal, Storm and Kafka, makes full use of the excellent characteristics of each big data component, and carries out secondary development on the original API (Application Programming Interface), thereby respectively realizing a log analysis module, an asynchronous transmission module, a data processing module and a data warehouse module.

3-1) Log parsing Module

The log analysis module mainly completes the work of capturing real-time incremental data in a Binlog mode by using Canal and sending the real-time incremental data to a downstream asynchronous transmission module to specifically process an execution flow:

a. and performing relevant configuration on the Mysql data source. The method comprises the steps of starting a Binlog log function of Mysql, adding a database management user for Canal in Mysql, granting replay authority, configuring a user name and a password which are added in front and database information which needs to be monitored in an instance file of Mysql, and disguising the database information into Mysql slave. And configuring destinations, wherein each Destination represents a thread for monitoring a certain feature table, so that parallelization is realized, and meanwhile, a database and names for indicating the destinations are used for ensuring the uniqueness of the Destination names.

b. And starting the Canal Server to monitor the data source.

c. And connecting a Cannal client to a Cannal Server in the Web project, and subscribing and specifying a corresponding Destination according to the database name and the table name in the data source configuration information to acquire a Connector.

d. And judging whether a Topic name is formed in the format of the destination IP, the destination database and the destination table acquired from the data destination configuration information in the downstream Kafka, and if the Topic name does not exist, creating the Topic name by using a Cafka client API.

e. And reading the task metadata to obtain the read database name and the read table name, subscribing the specified database name and the specified table name by using a Canal connecor, and then performing polling operation, so far, Canal can process the monitoring work of the subscribed table by using the specified Destination.

f. The monitored incremental data is encapsulated into a Protobuff defined communication protocol structure, the monitored incremental data is analyzed to obtain only insert type incremental data, and the data type, the data field name and the data value of the incremental data are encapsulated into a custom data structure KafkaTuple.

g. KafkaTuple is populated with task metadata filtering rules obtained from the meta-bin subsystem, and data destination configuration information.

h. Kafka tuple was serialized into Json format using Gson and sent into the corresponding Topic using the Kafka Producer API.

3-2) asynchronous transmission module

The asynchronous transmission module has the functions of caching messages sent from the upstream, playing a role of peak clipping when peak data arrives, mainly completing the work of carrying out secondary packaging on an API (application programming interface) of the asynchronous transmission module, providing an easy-to-use API interface for the log analysis module, and mainly packaging an interface for judging whether the designated Topic exists or not and an interface for sending the messages to the designated Topic.

3-3) data processing module

The module is realized based on Storm, secondarily encapsulates API of Storm, and fully utilizes the characteristics of low delay, high fault tolerance and distributed real-time parallel computation of Storm. The method mainly completes integration with an upstream heterogeneous transmission module and processing of filtering services. In the data integration with the upstream heterogeneous transmission module, a Storm Kafka spout interface is packaged, and in order to prevent repeated reading of information in the Topic of the upstream Kafka, Kafka. The filtering business processing realizes the field filtering function by three bolts, OmitColumBolt, data Filter Bolt, data encryption Bolt, Storm frame provides many Bolt interfaces, the Bolt realized in the module is realized by the integration abstract BaseRichbolt

3-4) data warehouse Module

The data warehouse module mainly completes the integration of Storm, Hive and Mysql, and since the data types of the incremental data acquired from the Canal are all String types, the conversion work from the String types to the specified data types is completed in the data warehouse module.

4) Visual large-screen subsystem module

As shown in fig. 6, a system architecture diagram of a large-screen subsystem module is visualized while metadata management is performed using a meta-bin subsystem module. According to actual business requirements, a user-defined unified index module, a large-screen configuration module and a visual large-screen module are respectively realized.

4-1) self-defined statistical index module

The custom statistical index module is developed for the second time directly on the real-time synchronization subsystem module based on Storm, on one hand, the custom statistical index can serve the real-time synchronization subsystem module based on Storm, the custom statistical index is directly related to a specific real-time synchronization task, when a data destination of the real-time synchronization task is added with a piece of data, the custom statistical index can be dynamically reflected on the custom statistical index, a checking function can be performed on the real-time synchronization data task, and meanwhile, the data of the real-time synchronization task in the first stage can be visually displayed. The custom statistical index submodule is developed based on Quartz, and a system user completes the adding work of the custom statistical index by directly adding a statistical index name and corresponding statistical SQL to a target data set. And judging the name again in order to ensure the uniqueness of the statistical index. And persistently storing the customized statistical indexes into the meta-bin subsystem module, and then regularly executing statistical SQL corresponding to the statistical indexes through a Quartz timing task and updating the statistical results. Meanwhile, the SQL engine can be dynamically selected according to the data destination type of the synchronous task, so that the integration of a new SQL execution engine is conveniently supported, and the system expansion is facilitated.

4-2) visual large screen configuration module

Further, in an embodiment of the present invention, the customized statistical indicator and the large visual screen are bound together to provide a configuration management interface to update or preview the statistical indicator corresponding to the large visual screen, and view the relevant metadata information of the large visual screen.

Specifically, the visual large-screen configuration module is mainly responsible for binding some self-defined statistical indexes and the visual large-screen together, providing an easy-to-use configuration management interface, updating or previewing the statistical indexes corresponding to the large-screen, checking related metadata information of the large-screen, and searching for a specific large-screen in a fuzzy query mode.

4-3) visual large screen module

The visualization large-screen module is mainly responsible for final visualization large-screen display, pre-configuration work is needed for achieving final visualization large-screen display, real-time attributes of the large screen need to be configured, if the visualization large-screen is real-time, the request interface can be polled to obtain the latest statistical index value of the current large screen, if the visualization large-screen is not real-time, the interface is requested to obtain the latest statistical index value at present only once, and finally Echart plug-in is used for dynamic drawing to generate the final visualization large screen.

5) Visualization module

It can be understood that the visualization module is developed based on a Web platform, as shown in fig. 7, the visualization module function architecture diagram of the Web platform mainly includes three sub-modules, a visualization task management module, a visualization large-screen management module, and an access right control module.

5-1) visual task management module

The visualization task management module corresponds to functions corresponding to a Spark-based historical data batch processing synchronization subsystem module and a Storm-based real-time synchronization subsystem module, and can perform data source configuration, filtering rule configuration, data destination configuration and the like on the synchronization tasks in a visualization mode. The metadata information such as task names, creation time, execution states, real-time and non-real-time task switches and the like of the tasks can be checked.

5-2) visual large screen management module

The visual large-screen management module corresponds to the system capacity provided by the visual large-screen subsystem module based on the Quartz timing task, can add self-defined statistical indexes in a Web visual mode, check statistical values corresponding to the statistical indexes in real time, can manage large-screen metadata information in a visual mode, bind corresponding statistical indexes for a large screen, and finally can visually display the large screen by selecting a configured large-screen metadata information mode.

5-3) Access rights control Module

The access right control module is used for giving different use rights based on the unused position of the platform use sum and providing different functions according to the different use rights.

According to the configurable real-time synchronization and visualization system for heterogeneous data, disclosed by the embodiment of the invention, real-time incremental data of online services are obtained in a form of Canal analysis Binlog logs, the IO operation is not performed, the performance of the online services is not influenced, a batch processing synchronization task based on Spark and a real-time synchronization task based on Storm are combined together, the batch processing synchronization and the real-time synchronization are sequentially performed on one configuration task through a task switch, and then the full amount of data in a production system is synchronized into a big data warehouse Hive, so that the use of tasks used by the system is greatly facilitated.

Next, a method for real-time synchronization and visualization of configurable heterogeneous data according to an embodiment of the present invention is described with reference to the drawings.

As shown in fig. 8, the configurable real-time synchronization and visualization method for heterogeneous data includes the following steps:

s1, acquiring historical metadata information, carrying out data batch processing, and storing the batch processed data in a preset data warehouse in a preset format; s2, acquiring real-time incremental metadata information for processing, completing conversion of specified data types, and storing the processed data in a preset data warehouse; s3, configuring and managing the data in the preset data warehouse, the self-defined statistical indexes and the visual large screen; and S4, displaying the data after configuration management in the S3 on a visualization large screen for a user to view and manage.

Further, in an embodiment of the present invention, the preset data warehouse completes the customized statistical index in the form of a graphical interface.

Further, in an embodiment of the present invention, S2 further includes: when the batch processed data is saved to the preset data warehouse, the table format of the preset data warehouse is saved in an ORC form so as to support the insertion operation of the real-time data.

Further, in an embodiment of the present invention, S3 further includes: and binding the customized statistical indexes and the visual large screen together to provide a configuration management interface to update or preview the statistical indexes corresponding to the visual large screen and view the related metadata information of the visual large screen.

It should be noted that the foregoing explanation of the system embodiment for real-time synchronization and visualization of configurable heterogeneous data also applies to the method of the embodiment, and is not repeated herein.

According to the configurable real-time synchronization and visualization method for heterogeneous data, provided by the embodiment of the invention, real-time incremental data of online services are obtained in a form of Canal analysis Binlog, the method does not need IO operation, the performance of the online services is not affected, a batch processing synchronization task based on Spark and a real-time synchronization task based on Storm are combined together, and the batch processing synchronization and the real-time synchronization are sequentially carried out on one configuration task through a task switch, so that the full amount of data in a production system is synchronized into a big data warehouse Hive, and the use of tasks used by the system is greatly facilitated.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims

1. A configurable real-time synchronization and visualization system for heterogeneous data, comprising the steps of:

a meta-bin subsystem module for storing meta-data information, wherein the meta-data information comprises: historical metadata information and real-time incremental metadata information;

the historical data batch processing synchronization subsystem module is used for acquiring the historical metadata information, carrying out data batch processing and storing the batch processed data in a preset data warehouse in a preset format;

the real-time synchronization subsystem module is used for acquiring the real-time incremental metadata information for processing, completing the conversion of the specified data types, and storing the processed data in the preset data warehouse;

the visual large-screen subsystem module is used for carrying out configuration management on the data in the preset data warehouse, the self-defined statistical indexes and the visual large screen;

the visualization module is used for displaying the data after the configuration management of the visualization large-screen subsystem module on the visualization large screen for the user to check and manage;

the visual large-screen subsystem module comprises a user-defined statistical index module, a visual large-screen configuration module and a visual large-screen module;

the user-defined statistical index module is used for dynamically reflecting the user-defined statistical index when a piece of data is added to the data destination of the real-time synchronous task, verifying the real-time synchronous data task and visually displaying the data of the real-time synchronous task;

the visual large-screen configuration module is used for binding the customized statistical indexes and the visual large-screen together to provide a configuration management interface to update or preview the statistical indexes corresponding to the visual large-screen and view the related metadata information of the visual large-screen;

the visual large-screen module is used for configuring real-time attributes of a large screen, polling the request interface to obtain the latest statistical index value of the current large screen when the real-time visual large screen is displayed, requesting the interface to obtain the latest statistical index value when the real-time visual large screen is not displayed, and finally dynamically drawing by using an Echart plug-in to generate the final visual large screen.

2. The configurable system for real-time synchronization and visualization of heterogeneous data according to claim 1,

and the preset data warehouse completes the self-defined statistical indexes in a graphical interface mode.

3. The configurable real-time synchronization and visualization system of heterogeneous data according to claim 1, wherein the historical data batch synchronization subsystem module is further configured to:

and when the batched data is saved to the preset data warehouse, saving the table format of the preset data warehouse into an ORC form so as to support the insertion operation of the real-time data.

4. The configurable real-time synchronization and visualization system of heterogeneous data according to claim 1, wherein the visualization module is further configured to:

and checking the task name, the creation time, the execution state and the metadata information of the real-time and non-real-time task switch of the data real-time synchronization task.

5. A configurable real-time synchronization and visualization method for heterogeneous data, comprising:

s1, acquiring historical metadata information, carrying out data batch processing, and storing the batch processed data in a preset data warehouse in a preset format;

s2, acquiring real-time incremental metadata information for processing, completing conversion of specified data types, and storing the processed data in the preset data warehouse;

s3, configuring and managing the data in the preset data warehouse, the customized statistical indexes and the visual large screen; s3 further includes:

when a piece of data is added to the data destination of the real-time synchronization task, dynamically reflecting the data destination to a self-defined statistical index, checking the real-time synchronization data task, and visually displaying the data of the real-time synchronization task; binding the customized statistical index and the visual large screen together to provide a configuration management interface to update or preview the statistical index corresponding to the visual large screen, configuring the real-time attribute of the large screen, polling a request interface to obtain the latest statistical index value of the current large screen when the real-time visual large screen is displayed, requesting a primary interface to obtain the latest statistical index value when the real-time visual large screen is not displayed, and finally using an Echart plug-in to carry out dynamic drawing to generate the final visual large screen;

and S4, displaying the data after configuration management in the S3 on the visualization large screen for the user to view and manage.

6. The configurable method for real-time synchronization and visualization of heterogeneous data according to claim 5,

7. The configurable real-time synchronization and visualization method of heterogeneous data according to claim 5, wherein the S2 further comprises:

8. The configurable real-time synchronization and visualization method of heterogeneous data according to claim 5, wherein the S4 further comprises: