CN115455020A - Incremental data synchronization method and device, computer equipment and storage medium - Google Patents

Incremental data synchronization method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN115455020A
CN115455020A CN202211123309.0A CN202211123309A CN115455020A CN 115455020 A CN115455020 A CN 115455020A CN 202211123309 A CN202211123309 A CN 202211123309A CN 115455020 A CN115455020 A CN 115455020A
Authority
CN
China
Prior art keywords
data
incremental
merged
merging
physical deletion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211123309.0A
Other languages
Chinese (zh)
Inventor
邱荣明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Property and Casualty Insurance Company of China Ltd
Original Assignee
Ping An Property and Casualty Insurance Company of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Property and Casualty Insurance Company of China Ltd filed Critical Ping An Property and Casualty Insurance Company of China Ltd
Priority to CN202211123309.0A priority Critical patent/CN115455020A/en
Publication of CN115455020A publication Critical patent/CN115455020A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2365Ensuring data consistency and integrity
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • G06F16/275Synchronous replication
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP

Abstract

The embodiment of the application belongs to the field of artificial intelligence and big data, and relates to an incremental data synchronization method, which comprises the steps of obtaining incremental data and an operation log in the current synchronization time period; analyzing the operation log to extract physical deletion data and irregular operation data from the operation log; merging the unnormalized operation data and the incremental data to obtain first merged data, and merging the first merged data and the deleted data to obtain second merged data; the second consolidated data is synchronized into the data warehouse. The application also provides an incremental data synchronization device, computer equipment and a storage medium. In addition, the present application also relates to blockchain techniques, and incremental data and oplogs may be stored in blockchains. The method and the device effectively guarantee the data consistency of the two synchronous parties.

Description

Incremental data synchronization method and device, computer equipment and storage medium
Technical Field
The present application relates to the field of artificial intelligence and big data technologies, and in particular, to an incremental data synchronization method, apparatus, computer device, and storage medium.
Background
The data warehouse is used as a data source for enterprise intelligent operation analysis, and needs to access data of each application system of an enterprise, and the data of the data warehouse is ensured to be consistent with the data of the application systems.
At present, a data warehouse generally adopts an increment synchronization mode for warehousing, but the existing increment data synchronization mode causes data inconsistency between the data warehouse and an upstream database (such as a service database) due to the fact that an irregular increment record for not modifying a timestamp and a record for physically deleting in the service database cannot be obtained when data is newly added or modified, and the data tracing and checking are influenced; although the physical deletion can be replaced by the logical deletion, the longer the operation time of the service database is, the more redundant data of the service database is, so that the operation cost is high, the operation speed is slow, and the redundancy is not paid.
Disclosure of Invention
An object of the embodiments of the present application is to provide an incremental data synchronization method, an incremental data synchronization device, a computer device, and a storage medium, which mainly aim to ensure data consistency between two synchronized parties.
In order to solve the above technical problem, an embodiment of the present application provides an incremental data synchronization method, which adopts the following technical solutions:
acquiring incremental data and an operation log in the current synchronization time period;
analyzing the operation log to extract physical deletion data and irregular operation data from the operation log;
merging the unnormalized operation data and the incremental data to obtain first merged data, and merging the first merged data and the deleted data to obtain second merged data;
synchronizing the second consolidated data into a data warehouse.
Further, the step of extracting physical deletion data and non-canonical operation data from the operation log comprises;
extracting operation identification of a first primary key from an operation log, wherein the operation log comprises a plurality of first primary keys;
determining the operation type of the first main key according to the operation identifier;
and converging all the first main keys with the operation types of deletion to obtain physical deletion data, and determining the unnormalized operation data from all the first main keys with the operation types of addition operation and modification operation.
Further, the step of determining the non-specification operation data from all the first primary keys of which the operation types are the new operation and the modification operation comprises the following steps:
extracting first primary keys meeting the non-standard operation conditions from all the first primary keys of which the operation types are the newly added operation and the modified operation as second primary keys;
and converging all the second main keys to obtain the irregular operation data.
Further, before the step of merging the irregular operation data with the incremental data, the method further includes:
extracting a second main key from the irregular operation data, and storing the second main key into an irregular operation table;
the step of merging the denormal operation data with the delta data comprises:
and extracting a second main key of which the operation type is the unnormalized operation type from the unnormalized operation table, and merging the second main key into the incremental data.
Further, before the step of merging the first merged data with the deleted data, the method further includes:
storing each first primary key with the operation type being a physical deletion type in the physical deletion data into a physical deletion table;
the step of merging the first merged data with the deleted data includes:
extracting a first primary key of which the operation type is a physical deletion type from the physical deletion table, and determining redundant data in the first merged data according to the first primary key extracted from the physical deletion table;
deleting the redundant data in the first merged data.
Further, after the step of synchronizing the second consolidated data into the data warehouse, the method further comprises:
deleting the physical deletion table;
or deleting all the first primary keys in the physical deletion table.
Further, the step of synchronizing the second consolidated data into a data warehouse comprises:
acquiring historical data of the data warehouse;
identifying the same data of the second merged data and the historical data as duplicate data;
deleting the repeated data in the second merged data to obtain target data;
and synchronously updating the target data to the data warehouse.
In order to solve the foregoing technical problem, an embodiment of the present application further provides an incremental data synchronization apparatus, which adopts the following technical solutions:
the acquisition module is used for acquiring incremental data and an operation log in the current synchronization time period;
the extraction module is used for analyzing the operation log so as to extract physical deletion data and non-standard operation data from the operation log;
the merging module is used for merging the unnormalized operation data and the incremental data to obtain first merged data, and then merging the first merged data and the deleted data to obtain second merged data;
and the synchronization module is used for synchronizing the second merged data to the data warehouse.
In order to solve the above technical problem, an embodiment of the present application further provides a computer device, which adopts the following technical solutions:
the memory has stored therein computer readable instructions which, when executed by the processor, implement the steps of the incremental data synchronization method as described above.
In order to solve the above technical problem, an embodiment of the present application further provides a computer-readable storage medium, which adopts the following technical solutions:
the computer readable storage medium has stored thereon computer readable instructions which, when executed by a processor, implement the steps of the incremental data synchronization method as described above.
Compared with the prior art, the embodiment of the application mainly has the following beneficial effects: obtaining incremental data and an operation log in the current synchronization time period; analyzing the operation log to extract physical deletion data and non-standard operation data from the operation log; merging the unnormalized operation data and the incremental data to obtain first merged data, and merging the first merged data and the deleted data to obtain second merged data; synchronizing the second consolidated data into a data warehouse. According to the method and the device, the deleted data and the added and modified data are extracted from the operation log, the unnormalized operation data confirmed by the added and modified data and the incremental data are combined, and the incremental data combined with the unnormalized operation data are combined with the deleted data, so that the problems that the data of two parties are abnormal or data are omitted due to the fact that the records of time stamps are not modified when the data are added or modified and/or the data are physically deleted in the prior art are solved, and the consistency of the data of the two parties is guaranteed.
Drawings
In order to more clearly illustrate the solution of the present application, the drawings needed for describing the embodiments of the present application will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and that other drawings can be obtained by those skilled in the art without inventive effort.
FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;
FIG. 2 is a flow diagram of one embodiment of an incremental data synchronization method according to the present application;
FIG. 3 is a schematic block diagram illustrating one embodiment of an incremental data synchronization apparatus according to the present application;
FIG. 4 is a schematic block diagram of one embodiment of a computer device according to the present application.
Detailed Description
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "including" and "having," and any variations thereof, in the description and claims of this application and the description of the above figures are intended to cover non-exclusive inclusions. The terms "first," "second," and the like in the description and claims of this application or in the foregoing drawings are used for distinguishing between different objects and not for describing a particular sequential order.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings.
As shown in fig. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
A user may use terminal devices 101, 102, 103 to interact with a server 105 over a network 104 to receive or send messages or the like. Various communication client applications, such as a web browser application, a shopping application, a search application, an instant messaging tool, a mailbox client, social platform software, etc., may be installed on the terminal devices 101, 102, 103.
The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture experts Group Audio Layer III, mpeg compression standard Audio Layer 3), MP4 players (Moving Picture experts Group Audio Layer IV, mpeg compression standard Audio Layer 4), laptop portable computers, desktop computers, and the like.
The server 105 may be a server providing various services, such as a background server providing support for pages displayed on the terminal devices 101, 102, 103.
It should be noted that, the incremental data synchronization method provided in the embodiments of the present application is generally executed by a server/terminal device, and accordingly, the incremental data synchronization apparatus is generally disposed in the server/terminal device.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
With continued reference to FIG. 2, a flow diagram of one embodiment of a method of incremental data synchronization in accordance with the present application is shown. The incremental data synchronization method comprises the following steps:
step S201, obtaining incremental data and an operation log in the current synchronization time period.
In this embodiment, an electronic device (for example, the server/terminal device shown in fig. 1) on which the incremental data synchronization method operates may acquire the incremental data and the operation log in the current synchronization period through a wired connection manner or a wireless connection manner. It should be noted that the wireless connection means may include, but is not limited to, a 3G/4G/5G connection, a WiFi connection, a bluetooth connection, a WiMAX connection, a Zigbee connection, an UWB (ultra wideband) connection, and other wireless connection means now known or developed in the future.
The synchronous time period is a time period for data synchronization, and can be set by an operator; in some embodiments, the incremental data and the operation log in the last current synchronization period of the current synchronization period are subjected to data synchronization, and if the current synchronization period is 1 day and the current date is 2022-08-15, the incremental data and the operation log in the last current synchronization period (i.e., 2022-08-14) are subjected to data synchronization.
Step S202, analyzing the operation log to extract physical deletion data and non-standard operation data from the operation log.
In this embodiment, the physical deletion data is characterized as actually deleted data, and the physically deleted data cannot be queried; the above mentioned non-canonical operation data is characterized as data that is not modified for the timestamp when the data is newly added or modified.
In some embodiments, after the current synchronization period is reached, the operation log in the service database is synchronized into an intermediate database (e.g., kafka platform) through the golden gate platform, and then the operation log is parsed through a log parsing tool (e.g., fn _ dblog tool) by consuming the operation log on the intermediate database, and extracting physical deletion data and irregular operation data.
Step S203, merging the non-standard operation data and the incremental data to obtain first merged data, and merging the first merged data and the deleted data to obtain second merged data.
In this embodiment, the unnormalized operation data and the incremental data are merged to overlay and synchronize the unnormalized data into the incremental data to obtain first merged data, so that the problem that the two pieces of data are inconsistent due to the fact that a timestamp is not modified when the data is newly added or modified is solved, meanwhile, the first merged data and the deleted data are merged, physical deleted data are removed from the first merged data, and second merged data are obtained, so that the problem that the incremental data after merging of the normative operation data is solved due to physical deletion, and the consistency of the two pieces of data is further improved.
In some embodiments, non-normative operational data is merged with the delta data by a synchronization tool (e.g., a Sqoop or ETL tool).
And step S204, synchronizing the second merged data to a data warehouse.
In this embodiment, the second consolidated data overlay is synchronized into the data warehouse; if the age of the a user in the data warehouse is 20 years and the age of the a user in the second merged data is 22 years, the age of the a user in the data warehouse is modified to be 22 years after the second merged data is synchronized to the data warehouse.
According to the method and the device, the deleted data and the added and modified data are extracted from the operation log, the non-standard operation data confirmed by the added and modified data and the incremental data are combined, and the incremental data combined with the non-standard data and the deleted data are combined, so that the problems that in the prior art, the record of a timestamp is not modified when the data is added or modified, and/or the data is deleted physically, so that the data of two parties are abnormal or the data is omitted are solved, and the consistency of the data of the two parties is ensured.
In some optional implementations of this embodiment, the step of extracting physical deletion data and non-canonical operation data from the operation log includes;
extracting operation identification of a first primary key from an operation log, wherein the operation log comprises a plurality of first primary keys;
determining the operation type of the first primary key according to the operation identifier;
and converging all the first main keys with the operation types of deletion to obtain physical deletion data, and determining the unnormalized operation data from all the first main keys with the operation types of addition operation and modification operation.
In this embodiment, the operation log includes a plurality of first primary keys; in some embodiments, the operation log is presented in the form of a table, and each row of data in the operation log table is a first primary key.
The operation types comprise deletion operation, addition operation and modification operation, wherein the operation identifications corresponding to the deletion operation, the addition operation and the modification operation are different; in an example, the operation identifier corresponding to the delete operation is D, the operation identifier corresponding to the add operation is I, and the operation identifier corresponding to the modify operation is U, so that after the operation identifier is obtained, the operation type can be determined according to the operation identifier.
In practical application, according to different operation types, the first primary key in the operation log is divided into physical deletion data and non-normative operation data, so that the physical deletion data, the non-normative operation data and the incremental data are merged in the subsequent step S203, and thus, the consistency of the two synchronous data is ensured.
In some optional implementations of this embodiment, the step of determining the non-specification operation data from all the first primary keys whose operation types are the add operation and the modify operation includes:
extracting first primary keys meeting the non-standard operation conditions from all the first primary keys of which the operation types are the newly added operation and the modified operation as second primary keys;
and converging all the second main keys to obtain the irregular operation data.
In this embodiment, the irregular operation condition is that the operation type is a first primary key with an unmodified timestamp in the adding operation and the modifying operation, the first primary key meeting the irregular operation condition is separately extracted as a second primary key, and finally all the second primary keys are merged to obtain the irregular operation data.
In some optional implementations of this embodiment, step S203, before the step of merging the irregular operation data with the incremental data, further includes:
extracting a second main key from the irregular operation data, and storing the second main key into an irregular operation table;
in this embodiment, the second primary key is first stored in the unnormalized operation table, where the unnormalized operation table is a temporary table, and the unnormalized operation table is temporarily stored, so that the second primary key is called from the unnormalized operation table to merge with the incremental data, and synchronization efficiency of the incremental data is improved.
Step S203, merging the denormal operation data and the incremental data, including:
and extracting a second main key of which the operation type is the unnormalized operation type from the unnormalized operation table, and merging the second main key into the incremental data.
In this embodiment, in the merging process, first extract the timestamp and the second primary key information from the second primary key, first determine the position of the second primary key in the incremental data according to the timestamp of the second primary key, and if the incremental data includes a plurality of timestamps that are the same as the timestamp of the second primary key, determine the position of the second primary key in the incremental data according to the second primary key information, if the second primary key information is a name: zhang three, then match name from the incremental data: and e, opening the third position to determine the position of the second primary key in the incremental data, and after the position of the second primary key in the incremental data is determined, covering the second primary key in the incremental data, thereby completing the combination of the second primary key and the incremental data.
Further, after step S204 is executed, the denormal operation table is deleted to release the memory of the system and reduce the load of the system.
In some optional implementation manners of this embodiment, step S203, before the step of merging the first merged data with the deleted data, further includes:
storing each first primary key with the operation type being a physical deletion type in the physical deletion data into a physical deletion table;
in this embodiment, the first primary key whose operation type is the physical deletion type is stored in the physical deletion table, where the physical deletion table is a temporary table, and the physical deletion table is temporarily stored, so that the first primary key is called from the physical deletion table to merge with the incremental data, thereby improving the synchronization efficiency of the incremental data.
Step S203, merging the first merged data with the deleted data includes:
extracting a first primary key of which the operation type is a physical deletion type from the physical deletion table, and determining redundant data in the first merged data according to the first primary key extracted from the physical deletion table;
deleting the redundant data in the first merged data.
In this embodiment, after extracting the first primary key from the physical deletion table, first extract a timestamp and first primary key information from the first primary key, determine the position of the first primary key in the incremental data according to the timestamp of the first primary key, and if the first merged data includes multiple timestamps that are the same as the timestamp of the first primary key, determine the position of the first primary key in the first merged data according to the first primary key information, if the first primary key information is an identity card number: 441900 xxxxxxxxxxxx 201, then matching the identification number from the first merged data: 441900 xxxxxxxxxxxx 201, thereby determining a position of the first primary key in the first consolidated data, determining data at the position of the first primary key in the first consolidated data as redundant data after the position of the first primary key in the first consolidated data is determined, and deleting the redundant data, thereby completing the consolidation of the first consolidated data with the deleted data.
In some optional implementations of this embodiment, in step S204, after the step of synchronizing the second merged data into the data warehouse includes:
deleting the physical deletion table;
or deleting all the first primary keys in the physical deletion table.
In this embodiment, after the step S204 is executed, if the physical deletion table is a temporary table, the physical deletion table may be directly deleted, and if the physical deletion table is a non-temporary table, all the first primary keys in the physical deletion table may be deleted to release the memory of the system and reduce the load of the system.
In some optional implementations of this embodiment, in step S204, the step of synchronizing the second merged data into the data warehouse includes:
acquiring historical data of the data warehouse;
identifying the same data of the second merged data and the historical data as duplicate data;
deleting the repeated data in the second merged data to obtain target data;
and synchronously updating the target data to the data warehouse.
In this embodiment, a plurality of historical data are stored in the data warehouse, and while the incremental data and the operation log in the current synchronization time period are acquired, the database identifier of the upstream database generating the incremental data and the operation log is acquired, so as to match the corresponding historical data from the data warehouse according to the database identifier.
In practical application, because there may be possibility of duplicate data in the incremental data and the historical data in the data warehouse, in order to reduce data redundancy, the same data of the incremental data and the historical data in the data warehouse tool is identified as the duplicate data, then the duplicate data in the second merged data is deleted, the remaining data in the second merged data is the target data, and then the target data and the historical data are merged, so that the purpose of incremental data synchronization is achieved.
It is emphasized that, in order to further ensure the privacy and security of the incremental data and the oplogs, the incremental data and the oplogs may also be stored in nodes of a blockchain.
The block chain referred by the application is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a string of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, which is used for verifying the validity (anti-counterfeiting) of the information and generating a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
The embodiment of the application can acquire and process related data based on an artificial intelligence technology. Among them, artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result.
The artificial intelligence base technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above may be implemented by hardware that is configured to be instructed by computer-readable instructions, which can be stored in a computer-readable storage medium, and when executed, the programs may include the processes of the embodiments of the methods described above. The storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a Random Access Memory (RAM).
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless otherwise indicated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
With further reference to fig. 3, as an implementation of the method shown in fig. 2, the present application provides an embodiment of an incremental data synchronization apparatus, where the embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 2, and the apparatus may be specifically applied to various electronic devices.
As shown in fig. 3, the incremental data synchronization apparatus 300 according to the present embodiment includes: an acquisition module 301, an extraction module 302, a merging module 303, and a synchronization module 304. Wherein:
an obtaining module 301, configured to obtain incremental data and an operation log in a current synchronization time period;
an extracting module 302, configured to parse the operation log to extract physical deletion data and irregular operation data from the operation log;
a merging module 303, configured to merge the non-normative operation data and the incremental data to obtain first merged data, and then merge the first merged data and the deleted data to obtain second merged data;
a synchronization module 304, configured to synchronize the second merged data to a data warehouse.
According to the method and the device, the deleted data and the added and modified data are extracted from the operation log, the unnormalized operation data confirmed by the added and modified data and the incremental data are combined, and the incremental data combined with the unnormalized operation data are combined with the deleted data, so that the problems that the data of two parties are abnormal or data are omitted due to the fact that the records of time stamps are not modified when the data are added or modified and/or the data are physically deleted in the prior art are solved, and the consistency of the data of the two parties is guaranteed.
In some optional implementations of the present embodiment, the extraction module 302 includes an extraction sub-module, a first determination sub-module, and a second determination sub-module. Wherein:
the extraction submodule is used for extracting the operation identification of the first main key from an operation log, wherein the operation log comprises a plurality of first main keys;
the first determining submodule is used for determining the operation type of the first main key according to the operation identifier;
and the second determining submodule is used for converging all the first main keys of which the operation types are deletion types to obtain physical deletion data, and determining the unnormal operation data from all the first main keys of which the operation types are newly-added operation and modification operation.
In some optional implementations of this embodiment, the data determination sub-module includes an extraction unit and a merging unit. Wherein:
an extracting unit configured to extract, as a second primary key, a first primary key that satisfies an irregular operation condition from all the first primary keys whose operation types are the addition operation and the modification operation;
and the merging unit is used for merging all the second main keys to obtain the irregular operation data.
In some optional implementations of this embodiment, the apparatus further includes a first saving module.
Wherein:
the first storage module is used for extracting a second main key from the non-standard operation data and storing the second main key into a non-standard operation table;
the merge module 303 includes a first merge sub-module. Wherein:
and the first merging submodule is used for extracting the second main key of which the operation type is the unnormal operation type from the unnormal operation table and merging the second main key into the incremental data.
In some optional implementations of this embodiment, the apparatus further includes a second saving module.
Wherein:
and the second storage module is used for storing the first primary key of which each operation type is the physical deletion type in the physical deletion data into a physical deletion table.
The merging module 303 includes a third determining sub-module and a first deleting sub-module.
Wherein:
a third determining submodule, configured to extract the first primary key of which the operation type is the physical deletion type from the physical deletion table, and determine, according to the first primary key extracted from the physical deletion table, redundant data in the first merged data;
and the first deleting submodule is used for deleting the redundant data in the first combined data.
In some optional implementations of this embodiment, the apparatus further includes a deletion module. Wherein:
the deleting module is used for deleting the physical deleting table; or deleting all the first primary keys in the physical deletion table.
In some optional implementations of the present embodiment, the synchronization module 304 includes an obtaining sub-module, an identifying sub-module, a second deleting sub-module, and a synchronization sub-module. Wherein:
the acquisition sub-module is used for acquiring historical data of the data warehouse;
the identification submodule is used for identifying the same data of the second merged data and the historical data as repeated data;
the second deleting submodule is used for deleting the repeated data in the second combined data to obtain target data;
and the synchronization sub-module is used for synchronously updating the target data to the data warehouse.
In order to solve the technical problem, the embodiment of the application further provides computer equipment. Referring to fig. 4 in particular, fig. 4 is a block diagram of a basic structure of a computer device according to the embodiment.
The computer device 4 comprises a memory 41, a processor 42, and a network interface 43, which are communicatively connected to each other via a system bus. It is noted that only computer device 4 having components 41-43 is shown, but it is understood that not all of the shown components are required to be implemented, and that more or fewer components may be implemented instead. As will be understood by those skilled in the art, the computer device is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and the hardware includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.
The computer device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing devices. The computer equipment can carry out man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch panel or voice control equipment and the like.
The memory 41 includes at least one type of readable storage medium including flash memory, hard disks, multimedia cards, card-type memory (e.g., SD or DX memory, etc.), random Access Memory (RAM), static Random Access Memory (SRAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), programmable Read Only Memory (PROM), magnetic memory, magnetic disks, optical disks, etc. In some embodiments, the memory 41 may be an internal storage unit of the computer device 4, such as a hard disk or a memory of the computer device 4. In other embodiments, the memory 41 may also be an external storage device of the computer device 4, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the computer device 4. Of course, the memory 41 may also include both internal and external storage devices of the computer device 4. In this embodiment, the memory 41 is generally used for storing an operating system and various application software installed in the computer device 4, such as computer readable instructions of an incremental data synchronization method. Further, the memory 41 may also be used to temporarily store various types of data that have been output or are to be output.
The processor 42 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 42 is typically used to control the overall operation of the computer device 4. In this embodiment, the processor 42 is configured to execute computer readable instructions stored in the memory 41 or to process data, such as computer readable instructions for executing the incremental data synchronization method.
The network interface 43 may comprise a wireless network interface or a wired network interface, and the network interface 43 is generally used for establishing communication connection between the computer device 4 and other electronic devices.
According to the method and the device, the deleted data and the added and modified data are extracted from the operation log, the non-standard operation data confirmed by the added and modified data and the incremental data are combined, and the incremental data combined with the non-standard data and the deleted data are combined, so that the problems that in the prior art, the record of a timestamp is not modified when the data is added or modified, and/or the data is deleted physically, so that the data of two parties are abnormal or the data is omitted are solved, and the consistency of the data of the two parties is ensured.
The present application further provides another embodiment, which is to provide a computer-readable storage medium storing computer-readable instructions executable by at least one processor to cause the at least one processor to perform the steps of the incremental data synchronization method as described above.
According to the method and the device, the deleted data and the added and modified data are extracted from the operation log, the non-standard operation data confirmed by the added and modified data and the incremental data are combined, and the incremental data combined with the non-standard data and the deleted data are combined, so that the problems that in the prior art, the record of a timestamp is not modified when the data is added or modified, and/or the data is deleted physically, so that the data of two parties are abnormal or the data is omitted are solved, and the consistency of the data of the two parties is ensured.
Through the description of the foregoing embodiments, it is clear to those skilled in the art that the method of the foregoing embodiments may be implemented by software plus a necessary general hardware platform, and certainly may also be implemented by hardware, but in many cases, the former is a better implementation. Based on such understanding, the technical solutions of the present application or portions thereof that contribute to the prior art may be embodied in the form of a software product, where the computer software product is stored in a storage medium (such as a ROM/RAM, a magnetic disk, and an optical disk), and includes several instructions for enabling a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.
It should be understood that the above-described embodiments are merely exemplary of some, and not all, embodiments of the present application, and that the drawings illustrate preferred embodiments of the present application without limiting the scope of the claims appended hereto. This application is capable of embodiments in many different forms and the embodiments are provided so that this disclosure will be thorough and complete. Although the present application has been described in detail with reference to the foregoing embodiments, it will be apparent to one skilled in the art that modifications can be made to the embodiments described in the foregoing detailed description, or equivalents can be substituted for some of the features described therein. All equivalent structures made by using the contents of the specification and the drawings of the present application are directly or indirectly applied to other related technical fields and are within the protection scope of the present application.

Claims (10)

1. An incremental data synchronization method, characterized by comprising the following steps
Acquiring incremental data and an operation log in the current synchronization time period;
analyzing the operation log to extract physical deletion data and irregular operation data from the operation log;
merging the unnormalized operation data and the incremental data to obtain first merged data, and merging the first merged data and the deleted data to obtain second merged data;
synchronizing the second consolidated data into a data warehouse.
2. The incremental data synchronization method of claim 1 wherein said step of extracting physical delete data and denormal operation data from said operation log comprises;
extracting operation identification of a first primary key from an operation log, wherein the operation log comprises a plurality of first primary keys;
determining the operation type of the first primary key according to the operation identifier;
and converging all the first main keys with the operation types of deletion to obtain physical deletion data, and determining the unnormalized operation data from all the first main keys with the operation types of addition operation and modification operation.
3. The incremental data synchronization method of claim 2 wherein said step of determining unnormalized operation data from all first primary keys whose operation types are add and modify operations comprises:
extracting first primary keys meeting the non-standard operation conditions from all the first primary keys with the operation types of new operation and modification operation as second primary keys;
and converging all the second main keys to obtain the irregular operation data.
4. The incremental data synchronization method of claim 3 further comprising, prior to said step of merging said denormal operation data with said incremental data:
extracting a second main key from the irregular operation data, and storing the second main key into an irregular operation table;
the step of merging the denormal operation data with the delta data comprises:
and extracting a second main key of which the operation type is an irregular operation type from the irregular operation table, and merging the second main key into the incremental data.
5. The incremental data synchronization method of claim 2, further comprising, prior to said step of merging said first merged data with said deleted data:
storing each first primary key with the operation type being a physical deletion type in the physical deletion data into a physical deletion table;
the step of merging the first merged data with the deleted data includes:
extracting a first primary key of which the operation type is a physical deletion type from the physical deletion table, and determining redundant data in the first merged data according to the first primary key extracted from the physical deletion table;
deleting the redundant data in the first merged data.
6. The incremental data synchronization method of claim 5, wherein after said step of synchronizing said second consolidated data into a data warehouse comprises, further comprising:
deleting the physical deletion table;
or deleting all the first primary keys in the physical deletion table.
7. The incremental data synchronization method of any one of claims 1 to 5, wherein said step of synchronizing said second consolidated data into a data warehouse comprises:
acquiring historical data of the data warehouse;
identifying the same data of the second merged data and the historical data as duplicate data;
deleting the repeated data in the second combined data to obtain target data;
and synchronously updating the target data to the data warehouse.
8. An incremental data synchronization apparatus, comprising:
the acquisition module is used for acquiring incremental data and an operation log in the current synchronization time period;
the extraction module is used for analyzing the operation log so as to extract physical deletion data and non-standard operation data from the operation log;
the merging module is used for merging the unnormalized operation data and the incremental data to obtain first merged data, and then merging the first merged data and the deleted data to obtain second merged data;
and the synchronization module is used for synchronizing the second merged data to the data warehouse.
9. A computer device comprising a memory having computer readable instructions stored therein and a processor which when executed implements the steps of the incremental data synchronization method of any one of claims 1 to 7.
10. A computer-readable storage medium having computer-readable instructions stored thereon which, when executed by a processor, implement the steps of the incremental data synchronization method of any one of claims 1 to 7.
CN202211123309.0A 2022-09-15 2022-09-15 Incremental data synchronization method and device, computer equipment and storage medium Pending CN115455020A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211123309.0A CN115455020A (en) 2022-09-15 2022-09-15 Incremental data synchronization method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211123309.0A CN115455020A (en) 2022-09-15 2022-09-15 Incremental data synchronization method and device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115455020A true CN115455020A (en) 2022-12-09

Family

ID=84304705

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211123309.0A Pending CN115455020A (en) 2022-09-15 2022-09-15 Incremental data synchronization method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115455020A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117453730A (en) * 2023-12-21 2024-01-26 深圳海智创科技有限公司 Data query method, device, equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117453730A (en) * 2023-12-21 2024-01-26 深圳海智创科技有限公司 Data query method, device, equipment and storage medium
CN117453730B (en) * 2023-12-21 2024-03-08 深圳海智创科技有限公司 Data query method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN112507027B (en) Kafka-based incremental data synchronization method, device, equipment and medium
CN112527816B (en) Data blood relationship analysis method, system, computer equipment and storage medium
CN110795499B (en) Cluster data synchronization method, device, equipment and storage medium based on big data
CN115757495A (en) Cache data processing method and device, computer equipment and storage medium
CN115858488A (en) Parallel migration method and device based on data governance and readable medium
CN115455020A (en) Incremental data synchronization method and device, computer equipment and storage medium
CN114996675A (en) Data query method and device, computer equipment and storage medium
CN113535677B (en) Data analysis query management method, device, computer equipment and storage medium
CN112860662B (en) Automatic production data blood relationship establishment method, device, computer equipment and storage medium
CN112416934A (en) hive table incremental data synchronization method and device, computer equipment and storage medium
CN116956326A (en) Authority data processing method and device, computer equipment and storage medium
CN111782649A (en) Data acquisition format updating method and device, computer equipment and storage medium
CN114968725A (en) Task dependency relationship correction method and device, computer equipment and storage medium
CN114912003A (en) Document searching method and device, computer equipment and storage medium
CN114626352A (en) Report automatic generation method and device, computer equipment and storage medium
CN114003629A (en) Efficient pre-compiling type cache data management method, device, equipment and medium
CN112416875A (en) Log management method and device, computer equipment and storage medium
CN117093717B (en) Similar text aggregation method, device, equipment and storage medium thereof
CN115455066A (en) Data query method and device, computer equipment and storage medium
CN108595924A (en) A kind of service authority management method, device, computer equipment and storage medium
CN116821244A (en) Data synchronization method, device, equipment and storage medium
CN114385595A (en) Data migration method and device, computer equipment and storage medium
CN116737833A (en) CDC data resource synchronization method based on partition mode and related equipment thereof
CN112650569A (en) Timed task relation network graph generation method based on Oracle code and related equipment
CN117235786A (en) Data management method, device, equipment and storage medium thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination