CN117171272A - Data synchronization method and device - Google Patents

Data synchronization method and device Download PDF

Info

Publication number
CN117171272A
CN117171272A CN202311234953.XA CN202311234953A CN117171272A CN 117171272 A CN117171272 A CN 117171272A CN 202311234953 A CN202311234953 A CN 202311234953A CN 117171272 A CN117171272 A CN 117171272A
Authority
CN
China
Prior art keywords
data
partition
target
target data
database
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311234953.XA
Other languages
Chinese (zh)
Inventor
王宇奇
黄兵华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Topsec Technology Co Ltd
Beijing Topsec Network Security Technology Co Ltd
Beijing Topsec Software Co Ltd
Hubei Topsec Network Security Technology Co Ltd
Original Assignee
Beijing Topsec Technology Co Ltd
Beijing Topsec Network Security Technology Co Ltd
Beijing Topsec Software Co Ltd
Hubei Topsec Network Security Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Topsec Technology Co Ltd, Beijing Topsec Network Security Technology Co Ltd, Beijing Topsec Software Co Ltd, Hubei Topsec Network Security Technology Co Ltd filed Critical Beijing Topsec Technology Co Ltd
Priority to CN202311234953.XA priority Critical patent/CN117171272A/en
Publication of CN117171272A publication Critical patent/CN117171272A/en
Pending legal-status Critical Current

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to the technical field of data processing, and provides a data synchronization method and device. The method comprises the following steps: grouping each target data according to preset partition keys of each partition of a target database, and determining the partition to which each target data belongs; synchronizing each target data to the target database according to the partition to which the target data belongs; the target data is obtained after format conversion of data to be synchronized, and the data to be synchronized is obtained by combining all field data of any event extracted from an original database. The data synchronization method provided by the embodiment of the application can improve the efficiency of synchronizing the data to the database.

Description

Data synchronization method and device
Technical Field
The application relates to the technical field of data processing, in particular to a data synchronization method and device.
Background
Along with the coming of IOT age of the Internet of things, data perceived and stored by the IOT equipment in an alarm mode are larger and larger, and requirements for data storage and analysis are higher and higher. Conventional databases, such as relational databases, do not meet the storage and analysis requirements of mass data, and therefore increasingly multiple users choose to synchronize data to a columnar storage database of the MPP architecture, i.e., clickhouse.
In the related art, the data of the original database is synchronized to the Clickhouse by directly converting the data of each field of the original database into a binary file format and then synchronizing to the Clickhouse. However, by the synchronization method, each piece of data is written, a new partition directory is generated by clickhouse, then the scattered partition directories are combined into a new directory according to a combination rule, so that a directory to be combined is generated by adding one piece of data, and the synchronization efficiency is low.
Disclosure of Invention
The present application is directed to solving at least one of the technical problems existing in the related art. Therefore, the application provides a data synchronization method which can improve the efficiency of data synchronization to a database.
The application also provides a data synchronization device.
The application further provides electronic equipment.
The application also proposes a computer readable storage medium.
According to an embodiment of the first aspect of the present application, a data synchronization method includes:
grouping each target data according to preset partition keys of each partition of a target database, and determining the partition to which each target data belongs;
synchronizing each target data to the target database according to the partition to which the target data belongs;
the target data is obtained after format conversion of data to be synchronized, and the data to be synchronized is obtained by combining all field data of any event extracted from an original database.
According to the data synchronization method provided by the embodiment of the application, the target data are grouped through the preset partition keys of the partitions of the target database, the partition to which the target data belong is determined, and the target data are synchronized to the target database according to the partition to which the target data belong, so that when the data synchronization is carried out, the partition required to be stored by the target data can be determined in advance based on the preset partition keys of the partitions of the target database, the target data are directly stored to the partition, a new partition catalog is not required to be generated in the target database, and the merging operation is carried out on the partition catalog, so that the efficiency of the data synchronization to the database is improved.
According to one embodiment of the present application, further comprising:
according to the primary key of any event, obtaining each field data of the event from an event table of the original database;
and combining the field data according to the preset field sequence of the target database to obtain the data to be synchronized.
According to one embodiment of the present application, further comprising:
merging all the accessory information of the event in the data to be synchronized to obtain initial data;
and carrying out format conversion on the initial data according to the field type of the target database to obtain the target data.
According to one embodiment of the present application, the preset partition key includes a preset time interval;
grouping each target data according to preset partition keys of each partition of the target database, and determining the partition to which each target data belongs, wherein the method comprises the following steps:
comparing the preset time intervals of all the partitions with the time stamp corresponding to the target data, and determining the preset time interval to which the time stamp belongs;
determining a partition to which the target data belongs according to a preset time interval to which the time stamp belongs;
the time stamp corresponding to the target data is a time point generated by the event corresponding to the target data.
According to one embodiment of the present application, synchronizing each of the target data to the target database according to the partition to which the target data belongs includes:
storing each target data belonging to the same partition into the same CSV file;
according to the partitions corresponding to the CSV files, sequentially synchronizing the CSV files to the target database;
and after the synchronization of the current CSV file is completed, executing the synchronization operation of the next CSV file.
According to one embodiment of the present application, storing each target data belonging to the same partition in the same CSV file includes:
and according to the sequencing key of the partition, sequentially storing each target data belonging to the same partition to the same CSV file according to the sequencing key of the partition.
According to one embodiment of the present application, according to the partition corresponding to each CSV file, synchronizing each CSV file to the target database sequentially includes:
determining the storage sequence of each target data in the CSV file according to the sequencing key of the partition corresponding to the CSV file;
and according to the storage sequence, sequentially synchronizing each target data in the CSV file to a partition corresponding to the CSV file in the target database.
According to an embodiment of the second aspect of the present application, a data synchronization apparatus includes:
the data grouping module is used for grouping each target data according to preset partition keys of each partition of the target database and determining the partition to which each target data belongs;
the data synchronization module is used for synchronizing each target data to the target database according to the partition to which the target data belong;
the target data is obtained after format conversion of data to be synchronized, and the data to be synchronized is obtained by combining all field data of any event extracted from an original database.
An electronic device according to an embodiment of the third aspect of the present application includes a processor and a memory storing a computer program, the processor implementing the data synchronization method according to any of the above embodiments when executing the computer program.
A computer readable storage medium according to an embodiment of a fourth aspect of the present application has stored thereon a computer program which, when executed by a processor, implements the data synchronization method according to any of the above embodiments.
A computer program product according to an embodiment of the fifth aspect of the present application comprises: the computer program, when executed by a processor, implements a data synchronization method as described in any of the embodiments above.
The above technical solutions in the embodiments of the present application have at least one of the following technical effects:
the target data are grouped through preset partition keys of all partitions of the target database, the partitions to which the target data belong are determined, and all the target data are synchronized to the target database according to the partitions to which the target data belong, so that when the data are synchronized, the partitions which the target data need to be stored can be determined in advance based on the preset partition keys of all the partitions of the target database, the target data are directly stored to the partitions, new partition catalogues are not required to be generated in the target database, and merging operation is carried out on the partition catalogues, and further the efficiency of synchronizing the data to the database is improved.
Drawings
In order to more clearly illustrate the application or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the application, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of a data synchronization method according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a second flow of a data synchronization method according to an embodiment of the present application;
FIG. 3 is a third flow chart of a data synchronization method according to an embodiment of the present application;
FIG. 4 is a fourth flowchart of a data synchronization method according to an embodiment of the present application;
FIG. 5 is a fifth flowchart of a data synchronization method according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of a data synchronization device according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The data synchronization method and apparatus provided by the embodiments of the present application will be described and illustrated in detail below by means of several specific embodiments.
With the advent of the IOT era of the internet of things, IOT devices perceive and alarm stored data more and more, and demands for storing and analyzing the data more and more are increasing, so that more and more choices synchronize the data to the Clickhouse. The ClickHouse is a column-type storage database of the MPP architecture, and has the advantages of being independent of Hadoop ecology, simple to install and maintain, high in query speed and the like. And how quickly this is synchronized to clickhouse is a current problem that needs to be addressed for data that is already stored in the original database, such as data stored in a relational data office.
In order to synchronize the data stored in the original database to the Clickhouse, in the related art, each field data of the original database is converted into an underlying binary file corresponding to the target table in the Clickhouse database, and then the binary file is replaced under the folder of the Clickhouse node corresponding table. And executing the mapping command of the data and the table metadata, thereby realizing the synchronization of the data. However, this synchronization method generates a new partition directory each time data is written to the clickhouse, and then merges the scattered partition directories into a new directory according to the merge rule, which results in that a new piece of data generates a directory to be merged, and the synchronization efficiency is low.
To this end, in one embodiment, a data synchronization method is provided that is applied to a server for synchronizing data in an original database to a target database, such as synchronization to a clickhouse. The server may be an independent server or a server cluster formed by a plurality of servers, and may also be a cloud server for providing cloud services, cloud credential pools, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs, basic cloud computing services such as big data and artificial intelligence sampling point devices, and the like.
As shown in fig. 1, the data synchronization method provided in this embodiment includes:
step 101, grouping each target data according to preset partition keys of each partition of a target database, and determining the partition to which each target data belongs;
102, synchronizing each target data to the target database according to the partition to which the target data belongs;
the target data is obtained after format conversion of data to be synchronized, and the data to be synchronized is obtained by combining all field data of any event extracted from an original database.
In some embodiments, the server may first extract the field data for any event from an original database, such as a relational database. The field data can be selected according to a primary key of an event, wherein the primary key refers to a field which can only find a certain piece of data in a database, namely, a unique identifier of the field data. Since the primary keys corresponding to the field data of the same event are the same in the database, the field data of the event can be extracted from the original database through the primary key of the event. When extracting each field data of any event, the extracted field data can be stored in redis or kafka of a server or other caching middleware for caching until all field data of the event are extracted, and then the field data are combined to form data to be synchronized.
Illustratively, an event is typically made up of multiple parts of the event sender, receiver, violation information, raw data information, and so on. Such as an sms mail protocol type event, the sender/receiver information includes: ip, port, physical address, mail name, etc.; the violation information includes: specific traffic violation information, such as configuration disallowing jack@test.com outbound mail policies, is violated when the sender is jack, and an event record is generated; the original data information includes: text, attachments, subjects, etc. in the original mail. In the original database, such as a PostgreSQL relational database, a plurality of event tables are usually provided, and the data of each part is stored separately in different event tables. Wherein each event table comprises an event main table for storing basic data of time and an event sub table for storing specific field data of each basic data. The event main table may be used to store metadata of field data of an event, such as "sender information" and/or "receiver information", and the event sub table is used to store field data corresponding to a certain metadata, such as ip of a sender, port, physical address, mail name, and the like. The data between the event main table and the event sub table can be self-increased through the event main table to obtain unique main key incoden_id association, namely metadata of the same event and corresponding field data, and the metadata corresponds to the same incoden_id. When data synchronization is needed, each field data of an event can be searched from each event table of an original database according to the incoden_id corresponding to the event, so that each searched field data is combined, and the data to be synchronized is obtained. For example, in PostgreSQL, the sender's field data is (index_id, s_1, s_2, s_3), and the receiver's field data is (index_id, r_1, r_2, r_3), then the two field data may be combined, and the resulting data to be synchronized includes (s_1, r_1, s_2, r_2, s_3, r_3).
In some embodiments, after obtaining the data to be synchronized, format conversion may be performed on the data to be synchronized, and the data may be converted into a data format that may be stored in the target database. If the target database is the Clickhouse, the format of the data to be synchronized can be converted into a CSV format which can be directly imported in the Clickhouse, so that the data to be synchronized after format conversion is determined to be the target data.
In some embodiments, the target database is preset with a plurality of partitions, each partition corresponding to a preset partition key. If the event sender can be set with multiple partitions, each partition corresponds to one event sender, and meanwhile, sender information of the event sender corresponding to a certain partition is used as a preset partition key of the partition. After each target data is obtained, all target data matched with a preset partition key can be obtained from each target data according to the preset partition key of a certain partition of the target database, and the partition is divided into the partitions, so that the partition to which each target data belongs is determined. If the preset partition key of partition 1 is sender information U0001, all target data including the field data of U0001 can be obtained from each target data and divided into partition 1. Thus, all the target data can be grouped, and the partition to which any target data belongs can be determined.
After determining the partition to which each target data belongs, each target data can be synchronized into the partition of the corresponding target database by executing SQL insert sentences, such as insert sentences, so as to complete data set synchronization.
The target data are grouped through preset partition keys of all partitions of the target database, the partitions to which the target data belong are determined, and all the target data are synchronized to the target database according to the partitions to which the target data belong, so that when the data are synchronized, the partitions which the target data need to be stored can be determined in advance based on the preset partition keys of all the partitions of the target database, the target data are directly stored to the partitions, new partition catalogues are not required to be generated in the target database, and merging operation is carried out on the partition catalogues, and further the efficiency of synchronizing the data to the database is improved.
To further improve the data synchronization efficiency, in some embodiments, as shown in fig. 2, the data synchronization method further includes:
step 201, according to the primary key of any event, obtaining each field data of the event from the event table of the original database;
step 202, according to a preset field sequence of the target database, combining the field data to obtain the data to be synchronized.
In some embodiments, all field data for an event may be extracted from each event table in the original database in advance by a primary key for any event, such as the incadenst_id of the event.
Considering that the data of each partition is usually stored according to the ordering key in the target database, such as clickhouse, i.e. the target database is preset with a preset field order of each partition, after extracting the data of each field of the event, the data of each field may be combined according to the preset field order. The preset field ordering for each partition may be: the User ID and ReceiveTime, fromAddr, toAddr … … may place the field data corresponding to the User ID in the first place, and the field data corresponding to the ReceiveTime may be connected to the field data corresponding to the User ID, and so on, so as to obtain the data to be synchronized that are combined according to the preset field sequence. Therefore, when the target data converted by the data to be synchronized is stored in the target database, the target data can be inserted according to the preset field sequence of the target database, so that the synchronization efficiency of the target data is improved.
After obtaining the data to be synchronized, in order to further improve the data synchronization efficiency, in some embodiments, as shown in fig. 3, the data synchronization method further includes:
step 301, merging the accessory information of the event in the data to be synchronized to obtain initial data;
and step 302, performing format conversion on the initial data according to the field type of the target database to obtain the target data.
In some embodiments, after obtaining the data to be synchronized, the multiple accessory names of the event in the data to be synchronized, and accessory information such as the corresponding accessory size, sub-file name, accessory md5, and the policy name of the violation, may be respectively combined into the Clickhouse array, so as to obtain the initial data. Illustratively, the plurality of attachment names of the event corresponding to the data to be synchronized correspond to a plurality of rows in the attachment information table of the original database, such as (incandent_id, filename
1, accessory 1
2, accessory 2
) The attachment information can be combined and converted into a plurality of sets (file_id)
Accessory 1, accessory 2
)。
After merging the accessory information of the event in the data to be synchronized to obtain the initial data, format conversion can be carried out on the initial data according to the field type of the target database to obtain the target data which can be stored in the target database. If the initial data is recorded with the time of string type "2023-09-15:00:00" and the field type in clickhouse is a timestamp, the initial data can be obtained with string type-to-timestamp, 2023-09-15:00:00- >1694707200. Meanwhile, for the newly added field type in the Clickhouse, filling default values in the initial data. The unused fields or the replaced fields in the initial fields are directly discarded, for example, the conversion to clickhouse through the association of the incoden_id among the sub-tables of the original database is not required, so that the incoden_id can be directly discarded. In this way, target data for storage in the target database is available.
After format conversion is carried out on the data to be synchronized to obtain target data, the target data can be grouped according to preset partition keys of all partitions of the target database, and the partition to which the target data belong is determined. To further improve the data synchronization efficiency, in some embodiments, the preset partition key includes a preset time interval. After each target data is obtained, comparing the preset time interval of the partition with the time stamp corresponding to the target data, and determining the preset time interval to which the time stamp belongs, so as to determine the partition to which the target data belongs according to the preset time interval to which the time stamp belongs. The time stamp corresponding to the target data is a time point generated by the event corresponding to the target data.
For example, the preset partition key may be preset 12 for each month, that is, the target database may preset 12 each partition, each corresponding to a month. After the target data is obtained, the time generated by the event corresponding to the target data can be obtained through a field detection indicating the date type in the target data, so that the month to which the target data belongs is determined according to the time generated by the event corresponding to the target data, and the partition to which the target data belongs is determined. If the time generated by the event corresponding to the target data is 2023-09-15 00:00:00, the month to which the target data belongs is 9 months, so as to determine the partition to which the target data belongs to be the partition corresponding to 9 months. And because the preset partition key is a preset time interval representing month, only 12 partitions corresponding to 12 months are needed to be set, each target data can be ensured to have the partition to which the target data belongs, and the situation that a new partition directory needs to be added because the partition to which the target data belongs cannot be found is avoided, so that the data synchronization efficiency is further improved.
After grouping the target data and determining the partition to which the target data belongs, the target data can be synchronized to the target database according to the partition to which the target data belongs. To improve the efficiency of synchronizing each target data to a target database, in some embodiments, as shown in fig. 4, synchronizing each target data to the target database according to the partition to which the target data belongs includes:
step 401, storing each target data belonging to the same partition into the same CSV file;
step 402, according to the partitions corresponding to the CSV files, sequentially synchronizing the CSV files to the target database;
and after the synchronization of the current CSV file is completed, executing the synchronization operation of the next CSV file.
In some embodiments, after grouping the target data, the target data belonging to the same partition may be stored to the same CSV file. Meanwhile, the Web of the network data leakage prevention system has read-write permission only in a specific directory, so that all field data in an original database can be transferred or copied to a file directory read by the Web to avoid the fact that CSV files cannot be displayed.
After storing each target data of the same partition into the same CSV file, each CSV file corresponding to each partition one by one can be obtained, and SQL insert sentences, such as insert sentences, can be executed at this time to synchronize each CSV file to the partition of the target database corresponding to each CSV file in sequence. And when data insertion is carried out once, the synchronization operation of one CSV file is carried out, namely, only one CSV file corresponding to a partition is inserted in each synchronization, and after the CSV file synchronization is completed, the next CSV file is synchronized to the corresponding partition.
For example, assuming that the preset partition key of each partition of the target database is month, that is, each partition corresponds to one month, after each target data of the same partition is stored in the same CSV file, 12 CSV files can be obtained, that is, only data of one month is stored in one CSV file. After 12 CSV files are obtained, the 12 CSV files can be sequentially synchronized to the target database from large to small according to months, or one CSV file is extracted from each CSV file randomly to be synchronized to the target database. After the synchronization of one CSV file is completed, the next CSV file is extracted and synchronized to the target database, namely, only the data of one partition is inserted each time the Insert insertion data is carried out.
Through storing each target data belonging to the same partition to the same CSV file, each CSV file is sequentially synchronized to a target database according to the partition corresponding to each CSV file, so that a plurality of data can be imported at one time by using the CSV file, and when the plurality of data are imported at one time for synchronization, the imported plurality of data belong to the data of the same partition, the data which are imported at one time belong to the plurality of partitions, so that the imported data are required to be split and combined into different partitions, and the data belonging to the same partition can be synchronized only by carrying out single query in the target database, and multiple queries are not required, thereby improving the synchronization efficiency of the data.
In some embodiments, when storing each target data of the same partition into the same CSV file, each target data belonging to the same partition may be sequentially stored into the same CSV file according to the sequencing key of the partition. For example, the sort key may be used to define a temporal ordering of the target data, such as in a partition, where the sort key is used to indicate the chronological order of the target data. After determining the ordering key of a certain partition, each target data belonging to the same partition can be sequentially stored into the same CSV file according to the sequence of the generation time of the corresponding event. In this way, the generated CSV files are stored according to the detection order, so that the efficiency of inserting the subsequent CSV files into the target database is higher.
After storing each target data of the same partition into the same CSV file, each CSV file can be sequentially synchronized to the target database according to the partition corresponding to each CSV file. In order to further improve the data synchronization efficiency, in some embodiments, as shown in fig. 5, according to the partition corresponding to each CSV file, synchronizing each CSV file to the target database sequentially includes:
step 501, determining the storage sequence of each target data in the CSV file according to the sequencing key of the partition corresponding to the CSV file;
step 502, according to the storage sequence, sequentially synchronizing each target data in the CSV file to a partition corresponding to the CSV file in the target database.
In some embodiments, it is considered that, since the partition is provided with a corresponding sorting key, after the CSV file is obtained, the storage order of each target data in the CSV file may be determined according to the sorting key corresponding to the partition. If the sorting key is used for indicating the sequence sorting of the target data according to the time sequence, the storage sequence of each target data in the CSV file is sorted according to the time sequence of the corresponding event. If the generation time of the event is earlier, the storage sequence of the corresponding target data is earlier. After determining the storage sequence of each target data, SQL insert sentences can be executed, and each target data is sequentially synchronized to the corresponding partition according to the storage sequence, so that the target data can be directly synchronized to the corresponding partition according to the storage sequence, and the data synchronization is performed after the column/row where the target data is required to be stored is not required to be queried, thereby further improving the data synchronization efficiency.
The data synchronization device provided by the application is described below, and the data synchronization device described below and the data synchronization method described above can be referred to correspondingly.
In one embodiment, as shown in fig. 6, there is provided a data synchronization apparatus, including:
the data grouping module 210 is configured to group each target data according to preset partition keys of each partition of the target database, and determine the partition to which each target data belongs;
a data synchronization module 220, configured to synchronize each target data to the target database according to the partition to which each target data belongs;
the target data is obtained after format conversion of data to be synchronized, and the data to be synchronized is obtained by combining all field data of any event extracted from an original database.
The target data are grouped through preset partition keys of all partitions of the target database, the partitions to which the target data belong are determined, and all the target data are synchronized to the target database according to the partitions to which the target data belong, so that when the data are synchronized, the partitions which the target data need to be stored can be determined in advance based on the preset partition keys of all the partitions of the target database, the target data are directly stored to the partitions, new partition catalogues are not required to be generated in the target database, and merging operation is carried out on the partition catalogues, and further the efficiency of synchronizing the data to the database is improved.
In an embodiment, the data grouping module 210 is further configured to:
according to the primary key of any event, obtaining each field data of the event from an event table of the original database;
and combining the field data according to the preset field sequence of the target database to obtain the data to be synchronized.
In an embodiment, the data grouping module 210 is further configured to:
merging all the accessory information of the event in the data to be synchronized to obtain initial data;
and carrying out format conversion on the initial data according to the field type of the target database to obtain the target data.
In one embodiment, the preset partition key includes a preset time interval; the data grouping module 210 is specifically configured to:
comparing the preset time intervals of all the partitions with the time stamp corresponding to the target data, and determining the preset time interval to which the time stamp belongs;
determining a partition to which the target data belongs according to a preset time interval to which the time stamp belongs;
the time stamp corresponding to the target data is a time point generated by the event corresponding to the target data.
In one embodiment, the data synchronization module 220 is specifically configured to:
storing each target data belonging to the same partition into the same CSV file;
according to the partitions corresponding to the CSV files, sequentially synchronizing the CSV files to the target database;
and after the synchronization of the current CSV file is completed, executing the synchronization operation of the next CSV file.
In one embodiment, the data synchronization module 220 is specifically configured to:
and according to the sequencing key of the partition, sequentially storing each target data belonging to the same partition to the same CSV file according to the sequencing key of the partition.
In one embodiment, the data synchronization module 220 is specifically configured to:
determining the storage sequence of each target data in the CSV file according to the sequencing key of the partition corresponding to the CSV file;
and according to the storage sequence, sequentially synchronizing each target data in the CSV file to a partition corresponding to the CSV file in the target database.
Fig. 7 illustrates a physical schematic diagram of an electronic device, as shown in fig. 7, which may include: processor 810, communication interface (Communication Interface) 820, memory 830, and communication bus 840, wherein processor 810, communication interface 820, memory 830 accomplish communication with each other through communication bus 840. Processor 810 may invoke computer programs in memory 830 to perform data synchronization methods including, for example:
grouping each target data according to preset partition keys of each partition of a target database, and determining the partition to which each target data belongs;
synchronizing each target data to the target database according to the partition to which the target data belongs;
the target data is obtained after format conversion of data to be synchronized, and the data to be synchronized is obtained by combining all field data of any event extracted from an original database.
Further, the logic instructions in the memory 830 described above may be implemented in the form of software functional units and may be stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method of the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
In another aspect, an embodiment of the present application further provides a processor readable storage medium, where a computer program is stored, where the computer program is configured to cause a processor to perform a method provided in the foregoing embodiments, for example, including:
grouping each target data according to preset partition keys of each partition of a target database, and determining the partition to which each target data belongs;
synchronizing each target data to the target database according to the partition to which the target data belongs;
the target data is obtained after format conversion of data to be synchronized, and the data to be synchronized is obtained by combining all field data of any event extracted from an original database.
The processor-readable storage medium may be any available medium or data storage device that can be accessed by a processor including, but not limited to, magnetic memory (e.g., floppy disk, hard disk, tape, magneto-optical disk (MO), etc.), optical memory (e.g., CD, DVD, BD, HVD, etc.), and semiconductor memory (e.g., ROM, EPROM, EEPROM, nonvolatile memory (NAND FLASH), solid State Disk (SSD)), etc.
The apparatus embodiments described above are merely illustrative, wherein elements illustrated as separate elements may or may not be physically separate, and elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present application without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims (10)

1. A method of data synchronization, comprising:
grouping each target data according to preset partition keys of each partition of a target database, and determining the partition to which each target data belongs;
synchronizing each target data to the target database according to the partition to which the target data belongs;
the target data is obtained after format conversion of data to be synchronized, and the data to be synchronized is obtained by combining all field data of any event extracted from an original database.
2. The data synchronization method according to claim 1, further comprising:
according to the primary key of any event, obtaining each field data of the event from an event table of the original database;
and combining the field data according to the preset field sequence of the target database to obtain the data to be synchronized.
3. The data synchronization method according to claim 1 or 2, characterized by further comprising:
merging all the accessory information of the event in the data to be synchronized to obtain initial data;
and carrying out format conversion on the initial data according to the field type of the target database to obtain the target data.
4. The data synchronization method according to claim 1, wherein the preset partition key includes a preset time interval;
grouping each target data according to preset partition keys of each partition of the target database, and determining the partition to which each target data belongs, wherein the method comprises the following steps:
comparing the preset time intervals of all the partitions with the time stamp corresponding to the target data, and determining the preset time interval to which the time stamp belongs;
determining a partition to which the target data belongs according to a preset time interval to which the time stamp belongs;
the time stamp corresponding to the target data is a time point generated by the event corresponding to the target data.
5. The data synchronization method according to claim 1, 2 or 4, wherein synchronizing each of the target data to the target database according to the partition to which the target data belongs, comprises:
storing each target data belonging to the same partition into the same CSV file;
according to the partitions corresponding to the CSV files, sequentially synchronizing the CSV files to the target database;
and after the synchronization of the current CSV file is completed, executing the synchronization operation of the next CSV file.
6. The method of claim 5, wherein storing each target data belonging to the same partition in the same CSV file comprises:
and according to the sequencing key of the partition, sequentially storing each target data belonging to the same partition to the same CSV file according to the sequencing key of the partition.
7. The method of claim 5, wherein sequentially synchronizing each of the CSV files to the target database according to the partition to which each of the CSV files corresponds, comprises:
determining the storage sequence of each target data in the CSV file according to the sequencing key of the partition corresponding to the CSV file;
and according to the storage sequence, sequentially synchronizing each target data in the CSV file to a partition corresponding to the CSV file in the target database.
8. A data synchronization device, comprising:
the data grouping module is used for grouping each target data according to preset partition keys of each partition of the target database and determining the partition to which each target data belongs;
the data synchronization module is used for synchronizing each target data to the target database according to the partition to which the target data belong;
the target data is obtained after format conversion of data to be synchronized, and the data to be synchronized is obtained by combining all field data of any event extracted from an original database.
9. An electronic device comprising a processor and a memory storing a computer program, characterized in that the processor implements the data synchronization method of any one of claims 1 to 7 when executing the computer program.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the data synchronization method of any one of claims 1 to 7.
CN202311234953.XA 2023-09-21 2023-09-21 Data synchronization method and device Pending CN117171272A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311234953.XA CN117171272A (en) 2023-09-21 2023-09-21 Data synchronization method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311234953.XA CN117171272A (en) 2023-09-21 2023-09-21 Data synchronization method and device

Publications (1)

Publication Number Publication Date
CN117171272A true CN117171272A (en) 2023-12-05

Family

ID=88944959

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311234953.XA Pending CN117171272A (en) 2023-09-21 2023-09-21 Data synchronization method and device

Country Status (1)

Country Link
CN (1) CN117171272A (en)

Similar Documents

Publication Publication Date Title
CN110147411B (en) Data synchronization method, device, computer equipment and storage medium
US10331641B2 (en) Hash database configuration method and apparatus
US10565208B2 (en) Analyzing multiple data streams as a single data object
US11157652B2 (en) Obfuscation and deletion of personal data in a loosely-coupled distributed system
US11914585B2 (en) Servicing queries of a hybrid event index
US10108634B1 (en) Identification and removal of duplicate event records from a security information and event management database
US20170031948A1 (en) File synchronization method, server, and terminal
CN110413595B (en) Data migration method applied to distributed database and related device
CN104133867A (en) DOT in-fragment secondary index method and DOT in-fragment secondary index system
US9069823B2 (en) Method for managing a relational database of the SQL type
CN111221791A (en) Method for importing multi-source heterogeneous data into data lake
EP3788505B1 (en) Storing data items and identifying stored data items
WO2017174013A1 (en) Data storage management method and apparatus, and data storage system
CN111046036A (en) Data synchronization method, device, system and storage medium
CN113177090A (en) Data processing method and device
CN113704790A (en) Abnormal log information summarizing method and computer equipment
WO2017000592A1 (en) Data processing method, apparatus and system
CN113051460A (en) Elasticissearch-based data retrieval method and system, electronic device and storage medium
CN113377876B (en) Data database processing method, device and platform based on Domino platform
US20140222771A1 (en) Management device and management method
CN111782886A (en) Method and device for managing metadata
US8498987B1 (en) Snippet search
CN111045994A (en) KV database-based file classification retrieval method and system
Gao et al. A forensic method for efficient file extraction in HDFS based on three-level mapping
CN117171272A (en) Data synchronization method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination