CN117667942A - Data synchronous integration method and device, electronic equipment and storage medium - Google Patents

Data synchronous integration method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN117667942A
CN117667942A CN202311684517.2A CN202311684517A CN117667942A CN 117667942 A CN117667942 A CN 117667942A CN 202311684517 A CN202311684517 A CN 202311684517A CN 117667942 A CN117667942 A CN 117667942A
Authority
CN
China
Prior art keywords
data
integrated
synchronization
source
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311684517.2A
Other languages
Chinese (zh)
Inventor
李海博
王长生
胡斐
王鑫毅
王臻
魏鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Agricultural Bank of China
Original Assignee
Agricultural Bank of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Agricultural Bank of China filed Critical Agricultural Bank of China
Priority to CN202311684517.2A priority Critical patent/CN117667942A/en
Publication of CN117667942A publication Critical patent/CN117667942A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a data synchronous integration method, a data synchronous integration device, electronic equipment and a storage medium. The data synchronization integration method comprises the steps of obtaining source data sets of various databases through reconstructed storage analysis software, wherein the source data sets comprise source data of various data structures; determining a time stamp of each source data, carrying out data identification on the time stamp, determining a data synchronization mode of each source data, and carrying out data synchronization on the source data based on the data synchronization mode to obtain target synchronization data corresponding to the source data; and taking the target synchronous data as data to be integrated, carrying out data integration on the data to be integrated to obtain target integrated data, and storing the target integrated data into a target storage table. According to the technical scheme, the effect that the storage analysis software directly analyzes and stores source data of various sources and various data structures can be achieved, the efficiency and convenience of synchronous integration of multi-source heterogeneous data are improved, and the user experience is improved.

Description

Data synchronous integration method and device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a method and apparatus for synchronously integrating data, an electronic device, and a storage medium.
Background
With the continuous development of economy, various large enterprises in the market are more and more rich in service types, so that the related data types are more and more rich, and generally, data of different service types can be stored in different databases, so that unified management analysis on the whole data is very valuable.
In the related art, data from multiple databases are generally integrated synchronously based on data storage analysis software to realize unified management analysis of the data, but the existing data storage analysis software, such as HUDI, cannot perform data desensitization, cannot support the identification and warehousing of a pull chain table and a data table with special fields and special separators, needs to perform operations such as manual desensitization or special data table processing before data acquisition, and obviously has the defects of lack of functionality, poor data synchronous integration efficiency and poor user experience.
Disclosure of Invention
The invention provides a data synchronization integration method, a data synchronization integration device, electronic equipment and a storage medium, which are used for solving the technical problems of functional deficiency of current storage analysis software and poor data synchronization integration efficiency.
According to an aspect of the present invention, there is provided a data synchronization integration method, wherein the method includes:
acquiring a source data set of various databases through the reconstructed storage analysis software, wherein the source data set comprises source data of various data structures;
determining a time stamp of each source data, carrying out data identification on the time stamp, and determining a data type corresponding to each source data;
determining a data synchronization mode based on the data type, and performing data synchronization on the source data based on the data synchronization mode to obtain target synchronization data corresponding to the source data;
and taking the target synchronous data as data to be integrated, carrying out data integration on the data to be integrated to obtain target integrated data, and storing the target integrated data into a target storage table corresponding to the reconstructed storage analysis software.
According to another aspect of the present invention, there is provided a data synchronization integration apparatus, wherein the apparatus includes:
the data acquisition module is used for acquiring source data sets of various databases through the reconstructed storage analysis software, wherein the source data sets comprise source data of various data structures;
The data identification module is used for determining the time stamp of each source data, carrying out data identification on the time stamp and determining the data type corresponding to each source data;
the data synchronization module is used for determining a data synchronization mode based on the data type, and performing data synchronization on the source data based on the data synchronization mode to obtain target synchronization data corresponding to the source data;
and the data integration module is used for taking the target synchronous data as data to be integrated, integrating the data to be integrated to obtain target integrated data, and storing the target integrated data into a target storage table corresponding to the reconstructed storage analysis software.
According to another aspect of the present invention, there is provided an electronic apparatus including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the data synchronization integration method according to any one of the embodiments of the present invention.
According to another aspect of the present invention, there is provided a computer readable storage medium storing computer instructions for causing a processor to implement the data synchronization integration method according to any one of the embodiments of the present invention when executed.
According to the technical scheme, the source data sets of various databases are obtained through the reconstructed storage analysis software, wherein the source data sets comprise source data of various data structures; determining a time stamp of each source data, carrying out data identification on the time stamp, and determining a data type corresponding to each source data; determining a data synchronization mode based on the data type, and performing data synchronization on the source data based on the data synchronization mode to obtain target synchronization data corresponding to the source data; and taking the target synchronous data as data to be integrated, carrying out data integration on the data to be integrated to obtain target integrated data, and storing the target integrated data into a target storage table corresponding to the reconstructed storage analysis software. According to the method and the device, the effect that the storage analysis software directly analyzes and stores source data of multiple sources and multiple data structures is achieved, the efficiency and convenience of synchronous integration of the source data of multiple sources and heterogeneous sources are improved, and the user experience is improved.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the invention or to delineate the scope of the invention. Other features of the present invention will become apparent from the description that follows.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of a method for data synchronization integration according to a first embodiment of the present invention;
fig. 2 is a flowchart of a data synchronization integration method according to a second embodiment of the present invention;
FIG. 3 is an overall flowchart of a method for data synchronization integration according to an embodiment of the present invention;
FIG. 4 is a flow chart of a data integration method according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a data synchronization integration device according to a third embodiment of the present invention;
Fig. 6 is a schematic structural diagram of an electronic device implementing a data synchronization integration method according to an embodiment of the present invention.
Detailed Description
In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Example 1
Fig. 1 is a flowchart of a data synchronization integration method according to an embodiment of the present invention, where the method may be applied to a case of unified management and analysis of data, and the method may be performed by a data synchronization integration device, and the data synchronization integration device may be implemented in a form of hardware and/or software, and the data synchronization integration device may be configured in a computer. As shown in fig. 1, the method includes:
s110, acquiring source data sets of various databases through the reconstructed storage analysis software, wherein the source data sets comprise source data of various data structures.
The storage analysis software has the functions of storing data and analyzing the data. Alternatively, the storage analysis software may be software that does not have a function of identifying and storing a data table in which a specific field exists, a data table in which a specific separator exists, and a zipper table. The storage analysis software may be, for example, the original hudi.
The reconstructed storage analysis software may be understood as software having the function of identifying and storing the data table in which the special field exists, the data table in which the special separator exists, and the zipper table. The reconstructed storage analysis software may be, for example, a reconstructed hudi.
The database may be understood as a database storing the source data. In an embodiment of the present invention, the database may include a plurality of data tables. The database stores the source data based on a data table. By way of example, the databases may include Oracle, mySQL, DB, TDSQL, HBase, hive, and the like.
The source data set may be understood as a set of source data. The source data may be understood as data to be synchronously integrated. In the embodiment of the present invention, the source data may be preset according to a scene requirement, which is not specifically limited herein. Alternatively, the source data may be a data table, and the source data may include data in the table and table structure information. By way of example, the source data may include medical data sheets, banking data sheets, grid data sheets, and the like.
S120, determining a time stamp of each source data, carrying out data identification on the time stamp, and determining a data type corresponding to each source data.
Wherein the time stamp can be understood as the time of a certain moment.
The data type may be understood as the type of the source data. Optionally, the data types may include first-time synchronization data, non-first-time synchronization data to be tracked, and non-first-time synchronization update data.
The first synchronization data may be understood as a first synchronization data table.
The non-first-time synchronized data to be tracked may be understood as a data table of the non-first-time synchronized data to be tracked for the historical tracked data. Specifically, when the source data is a data table which is not first synchronized and the current timestamp of the source data lacks a certain moment, the data type of the source data is determined to be the data to be tracked which is not first synchronized.
The non-first synchronized update data may be understood as a data table in which the delta update data exists for the non-first synchronization. Specifically, when the source data is a data table which is not first synchronized and when the current time stamp of the source data has an update time, the data type of the source data is determined to be the update data which is not first synchronized.
S130, determining a data synchronization mode based on the data type, and performing data synchronization on the source data based on the data synchronization mode to obtain target synchronization data corresponding to the source data.
The data synchronization mode may be understood as a mode of performing data synchronization on the source data. Alternatively, the data synchronization mode may include a data bottoming mode, a history tracking mode, and an incremental update mode.
The target synchronization data may be understood as the source data that has been synchronized. Optionally, the target synchronization data includes full data corresponding to the source data, historical tracking data corresponding to the source data, and incremental update data corresponding to the source data.
Wherein the full amount data may be understood as full amount data of the source data.
The historical tracking data can be understood as the tracking-obtained historical data corresponding to the source data.
The incremental update data may be understood as updated incremental data corresponding to the source data.
Optionally, the determining a data synchronization mode based on the data type, and performing data synchronization on the source data based on the data synchronization mode to obtain target synchronization data corresponding to the source data, includes:
under the condition that the data type is the first synchronous data, aiming at the data bottoming mode, carrying out data synchronization on the source data in a batch new adding mode to obtain the full data corresponding to the source data;
when the data type is the data to be tracked which is not synchronized for the first time, responding to parameter setting operation aiming at the historical tracking mode, determining a time point parameter, and importing the historical tracking data corresponding to the source data based on the time point parameter;
And under the condition that the data type is the data to be tracked for the non-first time, importing the incremental update data corresponding to the source data in an update insertion mode aiming at the incremental update mode.
The batch new addition mode can be understood as a mode of adding data in batch. It is understood that the full data is the full data corresponding to the source data, and has the data characteristics of whole and whole coverage. Therefore, the mode of adding data in batches is suitable for acquiring the full data. The batch addition method may be a bulk insert method in hudi, for example.
The parameter setting operation may be understood as an operation of setting the time point parameter. In the embodiment of the present invention, the parameter setting operation may be preset according to a scene requirement, which is not specifically limited herein. Alternatively, the parameter setting operation may be a click operation for a point-in-time option.
The time point parameter may be understood as a parameter corresponding to a time point at which the timestamp corresponding to the source data is missing.
The update insertion manner may be understood as a manner of inserting the incremental update data. It can be understood that the incremental update data is the incremental data corresponding to the source data, and has the characteristic of single data, so that the data insertion mode is suitable for obtaining the incremental update data. The update insertion method may be an update method in hudi, for example.
And S140, taking the target synchronous data as data to be integrated, carrying out data integration on the data to be integrated to obtain target integrated data, and storing the target integrated data into a target storage table corresponding to the reconstructed storage analysis software.
The data to be integrated may be understood as data to be integrated. Optionally, the data to be integrated may be the source data after data synchronization.
The target integrated data may be understood as data after data integration. Optionally, the target integrated data have the same data structure.
The target storage table may be understood as a data table storing the target integration data. Alternatively, the target storage table may be a hudi table. The target storage table may comprise a set of target syndication data, i.e. comprising a plurality of the target syndication data. The plurality of target integrated data stored in the target storage table have the same data structure.
Optionally, the storing the target integrated data in a target storage table corresponding to the reconstructed storage analysis software includes:
and directly identifying the target integrated data, and storing the target integrated data into a target storage table corresponding to the reconstructed storage analysis software.
The reconstructed storage analysis software has a function of directly identifying the target integrated data.
According to the technical scheme, the source data sets of various databases are obtained through the reconstructed storage analysis software, wherein the source data sets comprise source data of various data structures; determining a time stamp of each source data, carrying out data identification on the time stamp, and determining a data type corresponding to each source data; determining a data synchronization mode based on the data type, and performing data synchronization on the source data based on the data synchronization mode to obtain target synchronization data corresponding to the source data; and taking the target synchronous data as data to be integrated, carrying out data integration on the data to be integrated to obtain target integrated data, and storing the target integrated data into a target storage table corresponding to the reconstructed storage analysis software. According to the method and the device, the effect that the storage analysis software directly analyzes and stores source data of multiple sources and multiple data structures is achieved, the efficiency and convenience of synchronous integration of the source data of multiple sources and heterogeneous sources are improved, and the user experience is improved.
Example two
Fig. 2 is a flowchart of a data synchronization integration method according to a second embodiment of the present invention, where the data to be integrated is integrated according to the foregoing embodiment, and target integrated data is obtained and refined. As shown in fig. 2, the method includes:
s210, acquiring source data sets of various databases through the reconstructed storage analysis software, wherein the source data sets comprise source data of various data structures.
S220, determining a time stamp of each source data, carrying out data identification on the time stamp, and determining a data type corresponding to each source data.
S230, determining a data synchronization mode based on the data type.
S240, determining a data integration file based on the source data set, wherein the data integration file comprises a reading rule file and/or a conversion rule file.
The read-in rule file may be understood as a file defining a read-in rule. Alternatively, the read-in rule file may be a Source avsc file.
The conversion rule file defines a file of conversion rules. Alternatively, the read rule file may be a target avsc file.
S250, carrying out data integration on the data to be integrated through the data integration file to obtain the target integrated data, wherein the data integration comprises data conversion and/or a data zipper.
The data to be integrated comprises at least one of a data table with special fields, a data table with special separators and a zipper table. In the embodiment of the invention, the storage analysis software cannot identify at least one of the data table with the special field, the data table with the special separator and the zipper table. The reconstructed storage analysis software can identify at least one of a data table with special fields, a data table with special separators and a zipper table.
The data conversion may be understood as an operation of performing field conversion on the data to be integrated. In the embodiment of the invention, the data table with the special field and the data table with the special separator included in the data to be integrated after the data conversion can be directly identified by the storage analysis software after the reconstruction.
The data zipper can be understood as an operation of zipper the data to be integrated. In the embodiment of the invention, the pull chain table included in the data to be integrated after the data is zipped can be directly identified by the storage analysis software after reconstruction.
The pull chain table is understood to be a data model defined for the manner in which data is stored in the database design, and can be used for maintaining historical state and up-to-date state data.
Optionally, the data integration further comprises data cleansing and/or data desensitization.
The data cleaning may be understood as an operation of cleaning various dirty data such as a null value, an abnormal value, an error value, and the like, which are removed from the data to be integrated.
The data desensitization may be understood as an operation of desensitizing sensitive data in the data to be integrated based on a preset desensitization rule.
Optionally, the data to be integrated may further include a sensitive field.
In the embodiment of the invention, the reconstructed storage analysis software has the function of identifying the data to be integrated comprising the sensitive field, and the problem that the original storage analysis software cannot identify the data comprising the sensitive field by importing is solved.
Optionally, the integrating the data to be integrated by the data integration file to obtain the target integrated data includes:
determining the read-in data to be integrated based on the field read-in type defined in the read-in rule file;
performing data conversion on the read file to be integrated based on a field conversion type defined in the conversion rule file to obtain the data to be integrated after data conversion;
And carrying out data zipper on the data to be integrated after data conversion based on a zipper algorithm to obtain the target integrated data.
The field read-in type may be understood as a field type for reading in the data to be integrated. Alternatively, the field read type may be a string type.
The field conversion type may be understood as converting the field type of the data to be integrated.
In the embodiment of the invention, the data conversion is carried out according to the field conversion type defined by the target avsc file, so that the problem that special fields of special types such as int, long, short, float, decimal, uuid, datetime and timetap in a data table cannot be identified and processed by the storage analysis software can be solved, and the effect that the storage analysis software can identify the data to be integrated with the special fields after reconstruction is realized.
The zipper algorithm may be understood as an identification algorithm of a pull chain table included in the data to be integrated.
Optionally, before the data to be integrated after the data conversion is subjected to data zipper based on the zipper algorithm, the method further comprises: reconstructing an original warehousing algorithm in the storage analysis software, and taking the reconstructed warehousing algorithm as the zipper algorithm.
In the embodiment of the invention, the zipper algorithm can directly identify the zipper table included in the data to be integrated.
And S260, taking the target synchronous data as data to be integrated, carrying out data integration on the data to be integrated to obtain target integrated data, and storing the target integrated data into a target storage table corresponding to the reconstructed storage analysis software.
According to the technical scheme, the data integration file is determined based on the source data set, wherein the data integration file comprises a reading rule file and/or a conversion rule file; and carrying out data integration on the data to be integrated through the data integration file to obtain the target integrated data, wherein the data integration at least comprises data conversion and/or a data zipper. The method and the device are based on the fact that the reconstructed storage analysis software directly performs field desensitization of the data table, and the zipper table, the data table with special separator and the data table with special field are directly identified and put in storage. Under the condition of ensuring data safety, the convenience and the efficiency of data synchronous integration are improved, and the user experience is improved.
FIG. 3 is an overall flowchart of a method for data synchronization integration according to an embodiment of the present invention; as shown in fig. 3, the overall flow of the data synchronization integration method may be:
1. The large-scale enterprise has more application systems, and takes a banking enterprise as an example, a plurality of databases are used simultaneously. According to different business scenes, databases of Oracle, mySQL, DB, TDSQL, HBase, hive and the like exist.
2. The method comprises the steps of firstly processing source data stored in a source system through an ETL tool, transmitting the data to a large data platform (HDFS), processing the source data based on different topics, and finally storing the source data into a data warehouse (MPP).
3. And (3) inputting the source system data into the data lake storehouse platform, and maintaining the table structure to a unified management platform, namely a metadata management system.
4. And acquiring data and table structure information in the lake bin when the hudi performs data synchronous conversion, and taking the data and the table structure information in the lake bin as source data.
5-7, hudi are highly sensitive to time stamps generated by the source data, different data synchronization patterns are selected based on different scenarios: a data bottoming mode, a history tracking mode, or an incremental update mode.
8. For the source data which are synchronized for the first time, if the data are to be paved, the data paving mode is used, and the speed and the efficiency of data synchronization are improved through a bulk insert mode.
9. If the data table corresponding to the source data already exists, the history tracking mode is used for tracking the history data, and the time stamp in the hudi table is adjusted by manually setting a check point parameter, so that the history tracking data is imported.
10. If the data table corresponding to the source data exists, the incremental update mode is used to update the incremental data periodically, and the synchronization of the incremental update data can be realized by means of an upsut mode.
11. And selecting different data synchronization/updating modes through different service scenes, and finally merging the synchronized source data into a target storage table (hudi table).
FIG. 4 is a flow chart of a data integration method according to an embodiment of the present invention; as shown in fig. 4, the flow of the data integration method may be:
1. in the field of bank data analysis, it is difficult to read the acquired incremental update data, full-size data, or history run-out data, and in consideration of data diversity, it is difficult for the existing data separation field (or, or; or', or the like) to cover all the data. Specific separators need to be set to accommodate the traffic scenario. However, the original hudi technology does not support data import based on user-defined separators, and a platform side is required to carry out transformation and adaptation based on the original hudi of the cloud.
2. Reading the data obtained in the step 1 as data to be integrated, and obtaining a Source avsc file corresponding to user-defined Source data for generating Source data of a user-defined converter, wherein in the Source data, each table field needs to be read in and analyzed by a string type, such as decimal, datetime, int, long, float, short and the like. Wherein the avsc type file defines metadata information.
3. In order to ensure the data quality, the data acquired in the step 2 is cleaned according to a service self-defined cleaning rule, including but not limited to removing various dirty data such as null values, abnormal values, error values and the like.
4. And (3) converting the data obtained in the step (3) according to the data type defined by the target avsc metafile, thereby solving the problem that special types such as int, long, short, float, decimal, uuid, datetime, timetap and the like cannot be processed.
5. And (3) desensitizing the data converted in the step (4) according to a self-defined desensitization rule.
6. In the design of the zipper table, three time stamps are additionally reserved for each record, and the start time, the end time and the update time are respectively indicated to judge whether the current record of the current table is valid or not. When the Hudi table is subjected to Upsert, the traditional warehousing algorithm based on the native Hudi technology is difficult to realize effective zipper table data import, and a platform side needs to carry out adaptation based on the native Hudi. In the invention, the warehouse-in algorithm of the Upsert method is reconstructed, the warehouse-in import of any table type can be realized, specifically, the data processed in the step 5 is read, and the current data is processed through the zipper algorithm to obtain the data processed by the zipper algorithm.
7. And (3) writing the data obtained in the step (6) into the Hudi table according to the target avsc file rule, and finishing the operations of newly adding, updating, deleting, inquiring and the like of the Hudi table.
The method and the device realize the field desensitization of the data table based on the reconstructed storage analysis software, and directly identify and put in storage the zipper table, the data table with special separator and the data table with special field. Under the condition of ensuring data safety, the convenience and the efficiency of data synchronous integration are improved, and the user experience is improved.
Example III
Fig. 5 is a schematic structural diagram of a data synchronization integration device according to a third embodiment of the present invention. As shown in fig. 5, the apparatus includes: a data acquisition module 310, a data identification module 320, a data synchronization module 330, and a data integration module 340; wherein,
a data acquisition module 310, configured to acquire a source data set of multiple databases through the reconstructed storage analysis software, where the source data set includes source data of multiple data structures; the data identifying module 320 is configured to determine a timestamp of each source data, identify the timestamp, and determine a data type corresponding to each source data; a data synchronization module 330, configured to determine a data synchronization mode based on the data type, and perform data synchronization on the source data based on the data synchronization mode to obtain target synchronization data corresponding to the source data; the data integration module 340 is configured to use the target synchronization data as data to be integrated, perform data integration on the data to be integrated to obtain target integrated data, and store the target integrated data into a target storage table corresponding to the reconstructed storage analysis software.
According to the technical scheme, the source data sets of various databases are obtained through the reconstructed storage analysis software, wherein the source data sets comprise source data of various data structures; determining a time stamp of each source data, carrying out data identification on the time stamp, and determining a data type corresponding to each source data; determining a data synchronization mode based on the data type, and performing data synchronization on the source data based on the data synchronization mode to obtain target synchronization data corresponding to the source data; and taking the target synchronous data as data to be integrated, carrying out data integration on the data to be integrated to obtain target integrated data, and storing the target integrated data into a target storage table corresponding to the reconstructed storage analysis software. According to the method and the device, the effect that the storage analysis software directly analyzes and stores source data of multiple sources and multiple data structures is achieved, the efficiency and convenience of synchronous integration of the source data of multiple sources and heterogeneous sources are improved, and the user experience is improved.
Optionally, the data type includes first synchronization data, non-first synchronization data to be tracked and non-first synchronization update data, the target synchronization data includes full data corresponding to the source data, historical tracking data corresponding to the source data and incremental update data corresponding to the source data, and the data synchronization mode includes a data bottoming mode, a historical tracking mode and an incremental update mode.
Optionally, the data synchronization module 330 is configured to:
under the condition that the data type is the first synchronous data, aiming at the data bottoming mode, carrying out data synchronization on the source data in a batch new adding mode to obtain the full data corresponding to the source data;
when the data type is the data to be tracked which is not synchronized for the first time, responding to parameter setting operation aiming at the historical tracking mode, determining a time point parameter, and importing the historical tracking data corresponding to the source data based on the time point parameter;
and under the condition that the data type is the data to be tracked for the non-first time, importing the incremental update data corresponding to the source data in an update insertion mode aiming at the incremental update mode.
Optionally, the data integration module 340 includes: a file integrating unit and a data integrating unit; wherein,
the file integration unit is used for determining a data integration file based on the source data set, wherein the data integration file comprises a reading rule file and/or a conversion rule file;
the data integration unit is used for carrying out data integration on the data to be integrated through the data integration file to obtain the target integrated data, wherein the data integration comprises data conversion and/or a data zipper.
Optionally, the data to be integrated includes at least one of a data table with a special field, a data table with a special separator, and a zipper table, and the data integration unit is configured to:
determining the read-in data to be integrated based on the field read-in type defined in the read-in rule file;
performing data conversion on the read file to be integrated based on a field conversion type defined in the conversion rule file to obtain the data to be integrated after data conversion;
and carrying out data zipper on the data to be integrated after data conversion based on a zipper algorithm to obtain the target integrated data.
Optionally, the data integration module 340 is configured to:
and directly identifying the target integrated data, and storing the target integrated data into a target storage table corresponding to the reconstructed storage analysis software.
Optionally, the data integration further comprises data cleansing and/or data desensitization.
The data synchronization integration device provided by the embodiment of the invention can execute the data synchronization integration method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.
Example IV
Fig. 6 shows a schematic diagram of the structure of an electronic device 10 that may be used to implement an embodiment of the invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic equipment may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
As shown in fig. 6, the electronic device 10 includes at least one processor 11, and a memory, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, etc., communicatively connected to the at least one processor 11, in which the memory stores a computer program executable by the at least one processor, and the processor 11 may perform various appropriate actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from the storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data required for the operation of the electronic device 10 may also be stored. The processor 11, the ROM 12 and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.
Various components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, etc.; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processor, controller, microcontroller, etc. The processor 11 performs the various methods and processes described above, such as the data synchronization integration method.
In some embodiments, the data synchronization integration method may be implemented as a computer program tangibly embodied on a computer-readable storage medium, such as the storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19. When the computer program is loaded into RAM 13 and executed by processor 11, one or more of the steps of the data synchronization integration method described above may be performed. Alternatively, in other embodiments, the processor 11 may be configured to perform the data synchronization integration method in any other suitable way (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
A computer program for carrying out methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be implemented. The computer program may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. The computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) through which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.
The computing system may include clients and servers. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service are overcome.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present invention may be performed in parallel, sequentially, or in a different order, so long as the desired results of the technical solution of the present invention are achieved, and the present invention is not limited herein.
The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims (10)

1. A method for synchronously integrating data, comprising:
acquiring a source data set of various databases through the reconstructed storage analysis software, wherein the source data set comprises source data of various data structures;
determining a time stamp of each source data, carrying out data identification on the time stamp, and determining a data type corresponding to each source data;
determining a data synchronization mode based on the data type, and performing data synchronization on the source data based on the data synchronization mode to obtain target synchronization data corresponding to the source data;
And taking the target synchronous data as data to be integrated, carrying out data integration on the data to be integrated to obtain target integrated data, and storing the target integrated data into a target storage table corresponding to the reconstructed storage analysis software.
2. The method of claim 1, wherein the data types include first synchronization data, non-first synchronization data to be tracked, and non-first synchronization update data, the target synchronization data includes full data corresponding to the source data, historical tracking data corresponding to the source data, and incremental update data corresponding to the source data, and the data synchronization modes include a data bottoming mode, a historical tracking mode, and an incremental update mode.
3. The method according to claim 2, wherein determining a data synchronization pattern based on the data type, and performing data synchronization on the source data based on the data synchronization pattern, to obtain target synchronization data corresponding to the source data, includes:
under the condition that the data type is the first synchronous data, aiming at the data bottoming mode, carrying out data synchronization on the source data in a batch new adding mode to obtain the full data corresponding to the source data;
When the data type is the data to be tracked which is not synchronized for the first time, responding to parameter setting operation aiming at the historical tracking mode, determining a time point parameter, and importing the historical tracking data corresponding to the source data based on the time point parameter;
and under the condition that the data type is the data to be tracked for the non-first time, importing the incremental update data corresponding to the source data in an update insertion mode aiming at the incremental update mode.
4. The method of claim 1, wherein the data integration of the data to be integrated to obtain target integrated data comprises:
determining a data integration file based on the source data set, wherein the data integration file comprises a reading rule file and/or a conversion rule file;
and carrying out data integration on the data to be integrated through the data integration file to obtain the target integrated data, wherein the data integration comprises data conversion and/or a data zipper.
5. The method of claim 4, wherein the data to be integrated includes at least one of a data table with special fields, a data table with special separators, and a zipper table, and the data to be integrated is integrated by the data integration file to obtain the target integrated data, including:
Determining the read-in data to be integrated based on the field read-in type defined in the read-in rule file;
performing data conversion on the read file to be integrated based on a field conversion type defined in the conversion rule file to obtain the data to be integrated after data conversion;
and carrying out data zipper on the data to be integrated after data conversion based on a zipper algorithm to obtain the target integrated data.
6. The method according to claim 1, wherein storing the target integration data in a target storage table corresponding to the reconstructed storage analysis software comprises:
and directly identifying the target integrated data, and storing the target integrated data into a target storage table corresponding to the reconstructed storage analysis software.
7. The method of claim 4, wherein the data integration further comprises data cleansing and/or data desensitization.
8. A data synchronization integration apparatus, comprising:
the data acquisition module is used for acquiring source data sets of various databases through the reconstructed storage analysis software, wherein the source data sets comprise source data of various data structures;
The data identification module is used for determining the time stamp of each source data, carrying out data identification on the time stamp and determining the data type corresponding to each source data;
the data synchronization module is used for determining a data synchronization mode based on the data type, and performing data synchronization on the source data based on the data synchronization mode to obtain target synchronization data corresponding to the source data;
and the data integration module is used for taking the target synchronous data as data to be integrated, integrating the data to be integrated to obtain target integrated data, and storing the target integrated data into a target storage table corresponding to the reconstructed storage analysis software.
9. An electronic device, the electronic device comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the data synchronization integration method of any one of claims 1-7.
10. A computer readable storage medium storing computer instructions for causing a processor to perform the data synchronization integration method of any one of claims 1-7.
CN202311684517.2A 2023-12-08 2023-12-08 Data synchronous integration method and device, electronic equipment and storage medium Pending CN117667942A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311684517.2A CN117667942A (en) 2023-12-08 2023-12-08 Data synchronous integration method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311684517.2A CN117667942A (en) 2023-12-08 2023-12-08 Data synchronous integration method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117667942A true CN117667942A (en) 2024-03-08

Family

ID=90084306

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311684517.2A Pending CN117667942A (en) 2023-12-08 2023-12-08 Data synchronous integration method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117667942A (en)

Similar Documents

Publication Publication Date Title
CN112560468B (en) Meteorological early warning text processing method, related device and computer program product
CN112818013B (en) Time sequence database query optimization method, device, equipment and storage medium
US20210191921A1 (en) Method, apparatus, device and storage medium for data aggregation
CN114816578A (en) Method, device and equipment for generating program configuration file based on configuration table
CN112433757A (en) Method and device for determining interface calling relationship
CN116611411A (en) Business system report generation method, device, equipment and storage medium
CN115408546A (en) Time sequence data management method, device, equipment and storage medium
CN112860811B (en) Method and device for determining data blood relationship, electronic equipment and storage medium
CN117667942A (en) Data synchronous integration method and device, electronic equipment and storage medium
CN115438007A (en) File merging method and device, electronic equipment and medium
CN115525721A (en) Data synchronization method, device, equipment and storage medium
CN115237426A (en) Method, device and equipment for determining database difference and storage medium
CN114238335A (en) Buried point data generation method and related equipment thereof
CN114417070A (en) Method, device and equipment for converging data authority and storage medium
CN117742900B (en) Method, device, equipment and storage medium for constructing service call graph
CN116431698A (en) Data extraction method, device, equipment and storage medium
CN115599863A (en) Bank data synchronization method and device based on Hudi, electronic equipment and medium
CN115601172A (en) Data processing method, device, equipment and storage medium
CN114416881A (en) Real-time synchronization method, device, equipment and medium for multi-source data
CN115525659A (en) Data query method and device, electronic equipment and storage medium
CN116701061A (en) File attribute recovery method, device, equipment and medium
CN116450659A (en) Transaction information accounting method, device, equipment and medium based on distributed system
CN116308713A (en) Multiplexing method and device for business transaction codes, electronic equipment and storage medium
CN115392733A (en) Distribution network planning project engineering quantity investment estimation method, device, equipment and storage medium
CN116431764A (en) Data matching method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination