CN112925767A - Multi-data-source dynamic data synchronization management method and system based on internet supervision - Google Patents

Multi-data-source dynamic data synchronization management method and system based on internet supervision Download PDF

Info

Publication number
CN112925767A
CN112925767A CN202110234138.8A CN202110234138A CN112925767A CN 112925767 A CN112925767 A CN 112925767A CN 202110234138 A CN202110234138 A CN 202110234138A CN 112925767 A CN112925767 A CN 112925767A
Authority
CN
China
Prior art keywords
data
flow
source
internet
supervision
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110234138.8A
Other languages
Chinese (zh)
Inventor
侯居永
栾丽丽
张雷
陈兆亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Cloud Information Technology Co Ltd
Original Assignee
Inspur Cloud Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Cloud Information Technology Co Ltd filed Critical Inspur Cloud Information Technology Co Ltd
Priority to CN202110234138.8A priority Critical patent/CN112925767A/en
Publication of CN112925767A publication Critical patent/CN112925767A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • Quality & Reliability (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a multi-data-source dynamic data synchronous treatment method and a system based on internet supervision, belonging to the field of internet plus supervision, aiming at solving the technical problems of helping a user to quickly construct a big data processing analysis flow and realizing low-cost quick construction of a data center, and adopting the following technical scheme: the method fuses data trends of various structured data, semi-structured data and unstructured data, provides a one-stop data development environment, visual process design, rich data types and intelligent task monitoring, and realizes that a user quickly constructs a big data processing analysis process and a data center with low cost; the method comprises the following specific steps: data source management: managing data connection services; designing a data flow: defining each data processing flow as a data flow operation, and managing the data processing flows through the data flow operation; template management: and (4) migrating and multiplexing the flow.

Description

Multi-data-source dynamic data synchronization management method and system based on internet supervision
Technical Field
The invention relates to the field of Internet plus supervision, in particular to a method and a system for synchronously managing dynamic data of multiple data sources based on Internet supervision.
Background
Currently, the new generation of information technology is rapidly changing the production and living style of society, data has become a core asset of organizations and enterprises, digital economy is driving a new round of global revolution, and the digital transformation of enterprises has become a trend of big data era.
The integration of the internet, big data, artificial intelligence and entity economy is deeply integrated, and the integration innovation of various industries is promoted. In the era of amalgamation innovation, the value maximization of the big data by fully utilizing the association, the intersection and the amalgamation of the data becomes the key point for implementing the digital transformation of various industries. Under the background, the data trends of cross-field, cross-industry and cross-region are fused in a cross-domain mode, multi-source data such as organization data, internet of things data and scientific research data are fused in a trend mode, and hypermedia data such as structured data, semi-structured data and unstructured data are fused in a trend mode. The multi-source heterogeneous hypermedia data fusion which takes large scale, multi-source heterogeneous, cross-domain, cross-media, cross-language and dynamic evolution as main characteristics becomes a key problem to be solved urgently for implementing digital transformation strategy in vertical industry and ecological enterprises.
In a conventional data warehouse system, data models are defined in advance before data is loaded and stored, and only structured and processed data can be stored in the data warehouse system.
Therefore, how to help users to quickly construct a big data processing and analyzing process and realize low-cost quick construction of a data center is a problem which needs to be solved at present.
Disclosure of Invention
The technical task of the invention is to provide a method and a system for synchronously managing dynamic data of multiple data sources based on internet supervision, so as to solve the problems of how to help users to quickly construct a big data processing and analyzing process and realize quick construction of a data center with low cost.
The technical task of the invention is realized in the following way, the method for synchronously managing the dynamic data of multiple data sources based on internet supervision fuses the data trends of various structured data, semi-structured data and unstructured data, provides one-stop data development environment, visual process design, abundant data types and intelligent task monitoring, and realizes that a user quickly constructs a big data processing analysis process and a data center with low cost; the method comprises the following specific steps:
data source management: managing data connection services;
and (3) data management: data flow design, data flow debugging, data flow monitoring and data flow operation and maintenance, wherein each data processing flow is defined as a data flow operation, and the data processing flows are managed through the data flow operation;
template management: and the flow is migrated and reused, and the functions of uploading, deleting and downloading the data flow template are provided.
Preferably, the data source management is specifically as follows:
the user uniformly defines data source connection to ensure that the data source connection can be directly referred when designing a data processing flow;
the data source connection adopts a connection pool mode, so that a large number of data source connection numbers are prevented from being occupied; the types of the data source connection comprise the following:
firstly, JDBC connection types such as various JDBC-supported databases of MySQL, Oracle, MSSQL, DB2 and the like;
② FTP connection type;
③ SFTP connection type;
fourthly, HDFS connection type;
fifthly, HBase connection type;
sixth, Hive connection type;
seventhly, an elastic search linkage type;
the connection type of the Kafka;
ninthly, Excel, csv and other connection types.
Preferably, the data flow design is specifically as follows:
grouping the processes: the flow design functions of adding, deleting, modifying grouping, starting and stopping are provided, and the data flow operation is classified in a layering way through grouping, so that the management and operation and maintenance difficulty of the data processing flow is reduced;
and (3) flow tree display: all the jobs created by the current user are displayed by a tree, and the job names are distinguished by different colors: green indicates that the operation is normal, red indicates that prompt warning information exists during the operation, and black indicates that the operation does not run;
designing a visual operation flow;
data access: providing a plurality of data access processors for acquiring various multi-source heterogeneous data, providing wide data source adaptation, high-performance data acquisition and flexible scheduling modes, and meeting various data acquisition requirements;
loading data: the data loading provides a plurality of data loading processors for importing data into various data storage services;
data cleaning: the data loading provides a plurality of data cleaning processors for checking and cleaning the acquired data;
data conversion: the data loading provides a plurality of data conversion processors for converting the acquired data;
a self-defining processor: the processor for realizing the specific function is written by java code, and is loaded to the flow operation to realize more complex functions, such as data splitting, whether the flow data is in a data table or not, and the like.
Preferably, the visualized workflow design is specifically as follows:
each data flow design manages an independent canvas, defines one or more flow nodes and forms one or more data flows;
providing abundant data processing types in a toolbar of a canvas, defining flow nodes in a dragging mode and connecting the flow nodes;
configuring a flow node scheduling rule, configuring flow node attributes, configuring starting and stopping flows or nodes, and configuring debugging and monitoring the running state of the flows;
providing auxiliary functions of flow node alignment and highlight display;
the flow design in a visual mode is completed through an interface by the operations of flow definition, start and stop, debugging, monitoring and operation and maintenance.
Preferably, the data sources supported by the data access include the following:
collecting data in a JDBC mode, such as MySQL, Oracle, DB2 and various databases supporting JDBC;
secondly, Oracle data are collected through Oracle logs, and all data operations of databases INSERT, UPDATE and DELETE can be collected;
thirdly, MySQL data is collected through MySQL logs, and all data operations of databases INSERT, UPDATE and DELETE can be collected;
fourthly, collecting FTP/SFTP file data;
collecting HDFS file data;
sixthly, collecting HBase data;
collecting Hive data;
and eighthly, consuming Kafka data.
More preferably, the data storage services include the following:
firstly, data is imported into various databases supporting JDBC, such as MySQL, Oracle, DB2 and the like;
secondly, importing data into FTP/SFTP;
thirdly, importing data into an HDFS;
fourthly, importing data into HBase;
importing data into Hive;
sixthly, importing data into an elastic search;
and seventhly, importing data into Kafka.
More preferably, the data cleansing types include the following:
firstly, checking a null value and non-null;
secondly, prefix verification and suffix verification;
checking the data length;
checking the numerical range;
checking an enumeration value;
sixthly, checking the result regularly;
the data conversion types include the following:
firstly, mapping data;
secondly, converting a character set;
thirdly, data format conversion;
fourthly, splitting data;
fifthly, merging data;
sixthly, date format conversion;
seventhly, replacing character strings;
replacing the null value;
and ninthly, replacing dictionary values.
A multi-data source dynamic data synchronous treatment system based on internet supervision comprises,
the data source management unit is used for managing data connection service;
the data management unit is used for managing data, defining each data processing flow as a data flow operation and managing the data processing flow through the data flow operation; the data management unit comprises a data flow design subunit, a data flow debugging subunit, a data flow monitoring subunit and a data flow operation and maintenance subunit; and the template management unit is used for migrating and multiplexing the process and providing the functions of uploading, deleting and downloading the data stream template.
Preferably, the data flow design subunit includes,
the flow grouping module is used for providing functions of adding, deleting and modifying groups, starting and stopping flow design, and reducing the management and operation and maintenance difficulty of the data processing flow by hierarchically classifying the data flow operation through the groups;
the tree display module is used for displaying all the jobs created by the current user by using a tree, and the job operation states are distinguished by different colors according to job names: green indicates that the operation is normal, red indicates that prompt warning information exists during the operation, and black indicates that the operation does not run;
the visualized flow design module is used for completing the visualized mode of the operations of flow definition, start and stop, debugging, monitoring and operation and maintenance through an interface to complete flow design;
and the data access module is used for acquiring various multi-source heterogeneous data. The method has the advantages that wide data source adaptation, high-performance data acquisition and flexible scheduling modes are provided, and various data acquisition requirements are met;
the data loading module is used for importing data into various data storage services;
the data cleaning module is used for verifying and cleaning the acquired data;
the data conversion module is used for converting the acquired data;
and the self-defining module is used for compiling the processor with the specific function through the java code.
A computer readable storage medium having stored therein a computer program executable by a processor to implement a multiple data source dynamic data synchronization governance method based on internet policing as described above.
The multi-data-source dynamic data synchronous treatment method and system based on internet supervision have the following advantages:
the method has the advantages that various structured data, semi-structured data, unstructured data and other data trends are fused, a one-stop data development environment, visual process design, abundant data types and intelligent task monitoring are provided, a user is helped to quickly construct a big data processing and analyzing process, and a data center is quickly constructed at low cost; the invention is a data processing and distributing system which is easy to use, powerful in function and reliable, supports powerful and highly configurable data routing, conversion and system intermediate logic based on directed graphs, supports dynamic pulling of data from various data sources, and fully utilizes the association, intersection and fusion of the data to realize the value maximization of the data;
the invention provides visual task arrangement capacity by relying on years of experience and practice accumulation of the wave internet + supervision industry, multi-source heterogeneous data are fused and stored in a big data center, multi-source heterogeneous data acquisition, storage and access are realized, a client is helped to extract and integrate all related data, a unified data center is built, a data island is broken, data interconnection and intercommunication are realized, data analysis insight is supported, data value is released, and the client is helped to complete transformation of big data information;
the invention provides the visual task scheduling capability for the user, does not need to install any client program, and can complete the operations of scheduling, debugging, starting and stopping, monitoring and the like of the task flow at the browser end through simple dragging operation;
the method is internally provided with rich data development types, and comprises various data development types such as SQL, Hive, MapReduce, Spark, Streaming, Flink, Kylin, Jar, RestAPI, Pyspark, machine learning, deep learning and the like;
the invention can provide rich scheduling configuration strategies and massive job scheduling capability, and simultaneously support various scheduling modes such as time period scheduling, event-driven scheduling, manual scheduling and the like;
the invention has the advantages of visual task arrangement capability, abundant data development types and abundant scheduling configuration strategies.
Drawings
The invention is further described below with reference to the accompanying drawings.
FIG. 1 is a schematic diagram of a multi-data-source dynamic data synchronization management method based on Internet supervision.
Detailed Description
The method and system for synchronously managing the dynamic data of multiple data sources based on internet supervision according to the present invention are described in detail below with reference to the drawings and the specific embodiments of the specification.
Example 1:
as shown in fig. 1, the method for synchronously managing multiple data sources dynamic data based on internet supervision of the present invention fuses data trends of various structured data, semi-structured data and unstructured data, provides a one-stop data development environment, a visual process design, abundant data types and intelligent task monitoring, and realizes that a user quickly constructs a big data processing analysis process and a data center with low cost; the method comprises the following specific steps:
s1, data source management: managing data connection services;
s2, data management: data flow design, data flow debugging, data flow monitoring and data flow operation and maintenance, wherein each data processing flow is defined as a data flow operation, and the data processing flows are managed through the data flow operation;
s3, template management: and the flow is migrated and reused, and the functions of uploading, deleting and downloading the data flow template are provided. The data flow template comprises a complaint report template, a risk early warning template, a knowledge base template and the like.
The data source management in step S1 in this embodiment is specifically as follows:
s101, uniformly defining data source connection by a user, and ensuring that the data source connection can be directly referred when a data processing flow is designed;
s102, a connection pool mode is adopted for data source connection, and a large number of data source connection numbers are prevented from being occupied;
the types of the data source connection comprise the following:
firstly, JDBC connection types such as various JDBC-supported databases of MySQL, Oracle, MSSQL, DB2 and the like;
② FTP connection type;
③ SFTP connection type;
fourthly, HDFS connection type;
fifthly, HBase connection type;
sixth, Hive connection type;
seventhly, an elastic search linkage type;
the connection type of the Kafka;
ninthly, Excel, csv and other connection types.
The data flow design of step S2 in this embodiment is specifically as follows:
s201, grouping the processes: the flow design functions of adding, deleting, modifying grouping, starting and stopping are provided, and the data flow operation is classified in a layering way through grouping, so that the management and operation and maintenance difficulty of the data processing flow is reduced;
s202, flow tree display: all the jobs created by the current user are displayed by a tree, and the job names are distinguished by different colors: green indicates that the operation is normal, red indicates that prompt warning information exists during the operation, and black indicates that the operation does not run;
s203, designing a visual operation flow;
s204, data access: providing a plurality of data access processors for acquiring various multi-source heterogeneous data, providing wide data source adaptation, high-performance data acquisition and flexible scheduling modes, and meeting various data acquisition requirements;
s205, data loading: the data loading provides a plurality of data loading processors for importing data into various data storage services;
s206, data cleaning: the data loading provides a plurality of data cleaning processors for checking and cleaning the acquired data;
s207, data conversion: the data loading provides a plurality of data conversion processors for converting the acquired data;
s208, customizing the processor: the processor for realizing the specific function is written by java code, and is loaded to the flow operation to realize more complex functions, such as data splitting, whether the flow data is in a data table or not, and the like.
The visualized workflow design in step S203 in this embodiment is specifically as follows:
s20301, designing and managing an independent canvas for each data flow, defining one or more flow nodes, and forming one or more data flows;
s20302, providing rich data processing types in a toolbar of a canvas, defining flow nodes in a dragging mode and connecting the flow nodes;
s20303, configuring a process node scheduling rule, configuring process node attributes, configuring start and stop processes or nodes, and configuring a debugging and monitoring process running state;
s20304, providing auxiliary functions of flow node alignment and highlight display;
s20305, process design in a visual mode is completed through operation of process definition, start and stop, debugging, monitoring and operation and maintenance through an interface.
In this embodiment, the data sources supported by the data access in step S204 include the following:
collecting data in a JDBC mode, such as MySQL, Oracle, DB2 and various databases supporting JDBC;
secondly, Oracle data are collected through Oracle logs, and all data operations of databases INSERT, UPDATE and DELETE can be collected;
thirdly, MySQL data is collected through MySQL logs, and all data operations of databases INSERT, UPDATE and DELETE can be collected;
fourthly, collecting FTP/SFTP file data;
collecting HDFS file data;
sixthly, collecting HBase data;
collecting Hive data;
and eighthly, consuming Kafka data.
The data storage service of step S205 in this embodiment includes the following:
firstly, data is imported into various databases supporting JDBC, such as MySQL, Oracle, DB2 and the like;
secondly, importing data into FTP/SFTP;
thirdly, importing data into an HDFS;
fourthly, importing data into HBase;
importing data into Hive;
sixthly, importing data into an elastic search;
and seventhly, importing data into Kafka.
The data cleansing types of step S206 in this embodiment include the following:
firstly, checking a null value and non-null;
secondly, prefix verification and suffix verification;
checking the data length;
checking the numerical range;
checking an enumeration value;
sixthly, checking the result regularly;
the data conversion types of step S207 in this embodiment include the following:
firstly, mapping data;
secondly, converting a character set;
thirdly, data format conversion;
fourthly, splitting data;
fifthly, merging data;
sixthly, date format conversion;
seventhly, replacing character strings;
replacing the null value;
and ninthly, replacing dictionary values.
Example 2:
the invention relates to a multi-data source dynamic data synchronous treatment system based on internet supervision, which comprises,
the data source management unit is used for managing data connection service;
the data management unit is used for managing data, defining each data processing flow as a data flow operation and managing the data processing flow through the data flow operation; the data management unit comprises a data flow design subunit, a data flow debugging subunit, a data flow monitoring subunit and a data flow operation and maintenance subunit; and the template management unit is used for migrating and multiplexing the process and providing the functions of uploading, deleting and downloading the data stream template.
The data flow design subunit in this embodiment includes,
the flow grouping module is used for providing functions of adding, deleting and modifying groups, starting and stopping flow design, and reducing the management and operation and maintenance difficulty of the data processing flow by hierarchically classifying the data flow operation through the groups;
the tree display module is used for displaying all the jobs created by the current user by using a tree, and the job operation states are distinguished by different colors according to job names: green indicates that the operation is normal, red indicates that prompt warning information exists during the operation, and black indicates that the operation does not run;
the visualized flow design module is used for completing the visualized mode of the operations of flow definition, start and stop, debugging, monitoring and operation and maintenance through an interface to complete flow design;
and the data access module is used for acquiring various multi-source heterogeneous data. The method has the advantages that wide data source adaptation, high-performance data acquisition and flexible scheduling modes are provided, and various data acquisition requirements are met;
the data loading module is used for importing data into various data storage services;
the data cleaning module is used for verifying and cleaning the acquired data;
the data conversion module is used for converting the acquired data;
and the self-defining module is used for compiling the processor with the specific function through the java code.
Example 3:
the embodiment of the invention also provides a computer-readable storage medium, wherein a plurality of instructions are stored, and the instructions are loaded by the processor, so that the processor executes the multi-data-source dynamic data synchronization management method based on internet supervision in any embodiment of the invention. Specifically, a system or an apparatus equipped with a storage medium on which software program codes that realize the functions of any of the above-described embodiments are stored may be provided, and a computer (or a CPU or MPU) of the system or the apparatus is caused to read out and execute the program codes stored in the storage medium.
In this case, the program code itself read from the storage medium can realize the functions of any of the above-described embodiments, and thus the program code and the storage medium storing the program code constitute a part of the present invention.
Examples of the storage medium for supplying the program code include a floppy disk, a hard disk, a magneto-optical disk, an optical disk (e.g., CD-ROM, CD-R, CD-RW, DVD-ROM, DVD-R, and systems M, DVD-RW, DVD + RW) for managing dynamic data synchronization of multiple data sources based on internet administration), a magnetic tape, a nonvolatile memory card, and a ROM. Alternatively, the program code may be downloaded from a server computer via a communications network.
Further, it should be clear that the functions of any one of the above-described embodiments may be implemented not only by executing the program code read out by the computer, but also by causing an operating system or the like operating on the computer to perform a part or all of the actual operations based on instructions of the program code.
Further, it is to be understood that the program code read out from the storage medium is written to a memory provided in an expansion board inserted into the computer or to a memory provided in an expansion unit connected to the computer, and then causes a CPU or the like mounted on the expansion board or the expansion unit to perform part or all of the actual operations based on instructions of the program code, thereby realizing the functions of any of the above-described embodiments.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. A multi-data-source dynamic data synchronization management method based on internet supervision is characterized in that the method fuses data trends of various structured data, semi-structured data and unstructured data, provides a one-stop data development environment, a visual process design, abundant data types and intelligent task monitoring, and achieves the purposes that a user quickly constructs a big data processing analysis process and a data center is quickly constructed at low cost; the method comprises the following specific steps:
data source management: managing data connection services;
and (3) data management: data flow design, data flow debugging, data flow monitoring and data flow operation and maintenance, wherein each data processing flow is defined as a data flow operation, and the data processing flows are managed through the data flow operation;
template management: and the flow is migrated and reused, and the functions of uploading, deleting and downloading the data flow template are provided.
2. The internet-supervision-based multiple data source dynamic data synchronization management method according to claim 1, wherein the data source management is specifically as follows:
the user uniformly defines data source connection;
the data source connection adopts a connection pool mode; the types of the data source connection comprise the following:
firstly, JDBC connection type;
② FTP connection type;
③ SFTP connection type;
fourthly, HDFS connection type;
fifthly, HBase connection type;
sixth, Hive connection type;
seventhly, an elastic search linkage type;
the connection type of the Kafka;
ninthly, Excel and csv connection types.
3. The internet-supervision-based multiple data source dynamic data synchronization management method according to claim 1, wherein the data flow design is specifically as follows:
grouping the processes: the flow design functions of adding, deleting, modifying grouping, starting and stopping are provided, and the data flow operation is classified in a layering way through grouping, so that the management and operation and maintenance difficulty of the data processing flow is reduced;
and (3) flow tree display: all the jobs created by the current user are displayed by a tree, and the job names are distinguished by different colors: green indicates that the operation is normal, red indicates that prompt warning information exists during the operation, and black indicates that the operation does not run;
designing a visual operation flow;
data access: providing a plurality of data access processors for acquiring various multi-source heterogeneous data, providing wide data source adaptation, high-performance data acquisition and flexible scheduling modes, and meeting various data acquisition requirements;
loading data: the data loading provides a plurality of data loading processors for importing data into various data storage services;
data cleaning: the data loading provides a plurality of data cleaning processors for checking and cleaning the acquired data;
data conversion: the data loading provides a plurality of data conversion processors for converting the acquired data;
a self-defining processor: a processor for realizing a specific function is written by java code and is loaded to the flow operation.
4. The internet-supervision-based multiple data source dynamic data synchronization management method according to claim 3, wherein the visualization workflow design is specifically as follows:
each data flow design manages an independent canvas, defines one or more flow nodes and forms one or more data flows;
providing abundant data processing types in a toolbar of a canvas, defining flow nodes in a dragging mode and connecting the flow nodes;
configuring a flow node scheduling rule, configuring flow node attributes, configuring starting and stopping flows or nodes, and configuring debugging and monitoring the running state of the flows;
providing auxiliary functions of flow node alignment and highlight display;
the flow design in a visual mode is completed through an interface by the operations of flow definition, start and stop, debugging, monitoring and operation and maintenance.
5. The internet-based-supervision-based multi-data-source dynamic data synchronization management method as claimed in claim 3, wherein the data sources supported by data access include the following:
firstly, data are collected through a JDBC mode;
secondly, Oracle data are collected through Oracle logs, and all data operations of databases INSERT, UPDATE and DELETE can be collected;
thirdly, MySQL data is collected through MySQL logs, and all data operations of databases INSERT, UPDATE and DELETE can be collected;
fourthly, collecting FTP/SFTP file data;
collecting HDFS file data;
sixthly, collecting HBase data;
collecting Hive data;
and eighthly, consuming Kafka data.
6. The internet-based-surveillance-based multiple-data-source dynamic data synchronization management method as claimed in claim 3, wherein the data storage service comprises the following:
firstly, data is imported into various databases supporting JDBC, such as MySQL, Oracle, DB2 and the like;
secondly, importing data into FTP/SFTP;
thirdly, importing data into an HDFS;
fourthly, importing data into HBase;
importing data into Hive;
sixthly, importing data into an elastic search;
and seventhly, importing data into Kafka.
7. The internet-based-surveillance-based multiple-data-source dynamic data synchronization management method as claimed in claim 3, wherein the data cleaning types include the following:
firstly, checking a null value and non-null;
secondly, prefix verification and suffix verification;
checking the data length;
checking the numerical range;
checking an enumeration value;
sixthly, checking the result regularly;
the data conversion types include the following:
firstly, mapping data;
secondly, converting a character set;
thirdly, data format conversion;
fourthly, splitting data;
fifthly, merging data;
sixthly, date format conversion;
seventhly, replacing character strings;
replacing the null value;
and ninthly, replacing dictionary values.
8. A multi-data source dynamic data synchronous treatment system based on internet supervision is characterized by comprising,
the data source management unit is used for managing data connection service;
the data management unit is used for managing data, defining each data processing flow as a data flow operation and managing the data processing flow through the data flow operation; the data management unit comprises a data flow design subunit, a data flow debugging subunit, a data flow monitoring subunit and a data flow operation and maintenance subunit;
and the template management unit is used for migrating and multiplexing the process and providing the functions of uploading, deleting and downloading the data stream template.
9. The Internet governance-based multiple data source dynamic data synchronization governance system according to claim 8, wherein said data flow design subunit comprises,
the flow grouping module is used for providing functions of adding, deleting and modifying groups, starting and stopping flow design, and reducing the management and operation and maintenance difficulty of the data processing flow by hierarchically classifying the data flow operation through the groups;
the tree display module is used for displaying all the jobs created by the current user by using a tree, and the job operation states are distinguished by different colors according to job names: green indicates that the operation is normal, red indicates that prompt warning information exists during the operation, and black indicates that the operation does not run;
the visualized flow design module is used for completing the visualized mode of the operations of flow definition, start and stop, debugging, monitoring and operation and maintenance through an interface to complete flow design;
and the data access module is used for acquiring various multi-source heterogeneous data. The method has the advantages that wide data source adaptation, high-performance data acquisition and flexible scheduling modes are provided, and various data acquisition requirements are met;
the data loading module is used for importing data into various data storage services;
the data cleaning module is used for verifying and cleaning the acquired data;
the data conversion module is used for converting the acquired data;
and the self-defining module is used for compiling the processor with the specific function through the java code.
10. A computer-readable storage medium having stored thereon a computer program executable by a processor to implement the internet-based curation of multiple data source dynamic data synchronization governance method as claimed in any one of claims 1 to 7.
CN202110234138.8A 2021-03-03 2021-03-03 Multi-data-source dynamic data synchronization management method and system based on internet supervision Pending CN112925767A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110234138.8A CN112925767A (en) 2021-03-03 2021-03-03 Multi-data-source dynamic data synchronization management method and system based on internet supervision

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110234138.8A CN112925767A (en) 2021-03-03 2021-03-03 Multi-data-source dynamic data synchronization management method and system based on internet supervision

Publications (1)

Publication Number Publication Date
CN112925767A true CN112925767A (en) 2021-06-08

Family

ID=76173125

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110234138.8A Pending CN112925767A (en) 2021-03-03 2021-03-03 Multi-data-source dynamic data synchronization management method and system based on internet supervision

Country Status (1)

Country Link
CN (1) CN112925767A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114296417A (en) * 2022-03-11 2022-04-08 中国人民解放军海军工程大学 General flow control system for efficient fusion of multi-source data
CN116882826A (en) * 2023-07-14 2023-10-13 广东东方思维科技有限公司 Highway engineering quality management system and method based on Internet of things

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108846076A (en) * 2018-06-08 2018-11-20 山大地纬软件股份有限公司 The massive multi-source ETL process method and system of supporting interface adaptation
CN109739922A (en) * 2019-01-10 2019-05-10 江苏徐工信息技术股份有限公司 A kind of industrial data intelligent analysis system
CN109947746A (en) * 2017-10-26 2019-06-28 亿阳信通股份有限公司 A kind of quality of data management-control method and system based on ETL process
CN111857659A (en) * 2020-06-30 2020-10-30 太极计算机股份有限公司 Data visualization design platform for dragging heterogeneous data source
CN111880837A (en) * 2020-07-21 2020-11-03 上海伯俊软件科技有限公司 Business process engine system supporting dynamic expansion and visual configuration
CN111917887A (en) * 2020-08-17 2020-11-10 普元信息技术股份有限公司 System for realizing data governance under big data environment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109947746A (en) * 2017-10-26 2019-06-28 亿阳信通股份有限公司 A kind of quality of data management-control method and system based on ETL process
CN108846076A (en) * 2018-06-08 2018-11-20 山大地纬软件股份有限公司 The massive multi-source ETL process method and system of supporting interface adaptation
CN109739922A (en) * 2019-01-10 2019-05-10 江苏徐工信息技术股份有限公司 A kind of industrial data intelligent analysis system
CN111857659A (en) * 2020-06-30 2020-10-30 太极计算机股份有限公司 Data visualization design platform for dragging heterogeneous data source
CN111880837A (en) * 2020-07-21 2020-11-03 上海伯俊软件科技有限公司 Business process engine system supporting dynamic expansion and visual configuration
CN111917887A (en) * 2020-08-17 2020-11-10 普元信息技术股份有限公司 System for realizing data governance under big data environment

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114296417A (en) * 2022-03-11 2022-04-08 中国人民解放军海军工程大学 General flow control system for efficient fusion of multi-source data
CN114296417B (en) * 2022-03-11 2022-07-29 中国人民解放军海军工程大学 General flow control system for efficient fusion of multi-source data
CN116882826A (en) * 2023-07-14 2023-10-13 广东东方思维科技有限公司 Highway engineering quality management system and method based on Internet of things
CN116882826B (en) * 2023-07-14 2024-05-03 广东东方思维科技有限公司 Highway engineering quality management system and method based on Internet of things

Similar Documents

Publication Publication Date Title
CN109445802B (en) Privatized Paas platform based on container and method for publishing application thereof
CN111813963B (en) Knowledge graph construction method and device, electronic equipment and storage medium
CN111061788B (en) Multi-source heterogeneous data conversion integration system based on cloud architecture and implementation method thereof
CN108804630B (en) Industry application-oriented big data intelligent analysis service system
CN111752959B (en) Real-time database cross-database SQL interaction method and system
US11256755B2 (en) Tag mapping process and pluggable framework for generating algorithm ensemble
CN111324610A (en) Data synchronization method and device
CN101980207B (en) Method and system for implementing database access
CN108171528B (en) Attribution method and attribution system
US10949218B2 (en) Generating an execution script for configuration of a system
CN102467532A (en) Task processing method and task processing device
CN109408493A (en) A kind of moving method and system of data source
CN108874924A (en) Creation method, device and the computer readable storage medium of search service
CN112148788A (en) Data synchronization method and system for heterogeneous data source
CN112925767A (en) Multi-data-source dynamic data synchronization management method and system based on internet supervision
CN113282795B (en) Data structure diagram generation and updating method and device, electronic equipment and storage medium
CN103268226A (en) Test script file generation method and device
CN110866029A (en) sql statement construction method, device, server and readable storage medium
US10747941B2 (en) Tag mapping process and pluggable framework for generating algorithm ensemble
CN108427709A (en) A kind of multi-source mass data processing system and method
Zou et al. From a stream of relational queries to distributed stream processing
CN111144123B (en) Industrial Internet identification analysis data dictionary construction method
CN105573763A (en) Embedded system modeling method supporting RTOS
CN111143408B (en) Event processing method and device based on business rule
CN117314139A (en) Modeling method and device for business process, terminal equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210608

RJ01 Rejection of invention patent application after publication