CN110321383A - Big data platform method of data synchronization, device, computer equipment and storage medium - Google Patents

Big data platform method of data synchronization, device, computer equipment and storage medium Download PDF

Info

Publication number
CN110321383A
CN110321383A CN201910418941.XA CN201910418941A CN110321383A CN 110321383 A CN110321383 A CN 110321383A CN 201910418941 A CN201910418941 A CN 201910418941A CN 110321383 A CN110321383 A CN 110321383A
Authority
CN
China
Prior art keywords
data
record
data record
target
incremental
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910418941.XA
Other languages
Chinese (zh)
Inventor
赵乐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Puhui Enterprise Management Co Ltd
Original Assignee
Ping An Puhui Enterprise Management Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Puhui Enterprise Management Co Ltd filed Critical Ping An Puhui Enterprise Management Co Ltd
Priority to CN201910418941.XA priority Critical patent/CN110321383A/en
Publication of CN110321383A publication Critical patent/CN110321383A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2365Ensuring data consistency and integrity
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention provides the method for data synchronization and its data synchronization unit of a kind of big data platform, and the method for data synchronization includes: to monitor the incremental data record in source database in response to data synchronic command and delete data record;According to incremental data record and the deletion data record, generates the synchronous triggering command of incremental data and data delete instruction;According to the synchronous triggering command of the incremental data, the incremental data in source database is synchronized to target database;It is deleted and is instructed according to the data, the delete target data record from the target database.The present invention screens a large amount of incremental data and deletion record information, it determines to delete data to the final meaningful target delta data of simultaneously operating and target, meaningless synchronization can be executed to avoid to intermediate incremental data or intermediate data of deleting, to improve data synchronization efficiency.

Description

Big data platform method of data synchronization, device, computer equipment and storage medium
Technical field
The present invention relates between database technical field more particularly to a kind of achievable big data platform and upstream data table Data keep high consistency method of data synchronization, device, computer equipment and storage medium.
Background technique
In some big datas in scene, since upstream data is all vary daily, these data are used Down-stream system needs to refresh daily the data of these variations.When data volume as a child, simple and crude mode is exactly each full dose More new data, but as the growth of business (it is even more to reach hundred million ranks) when data volume increases in geometric ways, if every time Also carrying out full dose more new data will take time and effort very much, and be also that business side is intolerable, just need to become full dose number at this time It is synchronized according to synchronous for incremental data.
So-called incremental data, which synchronizes, to be referred to no longer needing copying to whole upstream datas in big data platform, and only Only extract the content that increment variation occurs in upstream data.Common increment extraction tool includes such as ETL, may be implemented to increase Measure the synchronization of data.However, above-mentioned increment extraction tool is when carrying out data and synchronizing, there is also intrinsic defects, i.e., can only basis The data that creation time or renewal time synchronize newly-increased data and change, but when upstream data library system-kill is certain When data, existing increment synchronization tool can not know relevant information, delete number so as to cause that can exist in big data platform According to record.
On the other hand, the same data record in upstream data is possible to that repeatedly change occurs for specific fields, but For the big data platform for having synchronisation requirement, it is only necessary to guarantee the latest data in the data record and upstream data of storage It is consistent, it is not necessary that the intermediate alteration of focused data.However in the prior art often by all generation increments The data of variation all synchronize, and generate many unnecessary operation bidirectionals.For example, the wherein account in upstream data Amount record situation of change are as follows: be changed to 3000 yuan from 1000 yuan, be changed to 5000 yuan from 3000 yuan, be changed to from 5000 yuan 2000 yuan, big data platform only needs to note down this account amount of money when executing data and synchronizing to be changed to by initial 1000 yuan Current 2000 yuan, it is not necessary that embody 3000 yuan intermediate, 5000 yuan change processes.
Therefore, how to guarantee the accuracy and height when progress data are synchronous between big data platform and upstream data library system Effect property, becomes those skilled in the art's technical problem urgently to be resolved.
Summary of the invention
The object of the present invention is to provide a kind of method of data synchronization of big data platform synchronous based on incremental data, dress It sets, computer equipment and storage medium, it is of the existing technology for solving the problems, such as.
To achieve the above object, the present invention provides a kind of method of data synchronization of big data platform, comprising the following steps:
In response to data synchronic command, monitors the incremental data record in the source database and delete data record;
According to incremental data record and the deletion data record, the synchronous triggering command of incremental data and data are generated Delete instruction;
In response to the synchronous triggering command of the incremental data, the incremental data in source database is synchronized to target data Library;
It is deleted and is instructed according to the data, the delete target data record from the target database.
The method of data synchronization provided according to the present invention, wherein described according to incremental data record and the deletion Data record, generating the step of incremental data synchronizes triggering command and data deletion instruction includes:
Extracting, there are all candidate incremental data records of same keyword section to delete data record with candidate;
The candidate incremental data record and the candidate data record of deleting are ranked up sequentially in time;
It is true from the candidate incremental data record and the candidate deletion data record being ranked up sequentially in time The incremental data that sets the goal record and target delete data record;
Data record, which is deleted, according to the target delta data record and the target generates that incremental data is synchronous to be touched respectively Send instructions to delete with data and instruct.
The method of data synchronization provided according to the present invention, wherein described deleted according to the data instructs, from the target The step of delete target data record, includes: in database
The data are deleted into instruction and are sequentially put into message queue;
It reads the data one by one from the message queue and deletes instruction, delete instruction delete target according to the data Data record.
The method of data synchronization provided according to the present invention, wherein described to delete instruction delete target number according to the data Include: according to the step of record
It extracts the element tool and deletes the keyword in instructing;
The corresponding target data record of the keyword is searched in the target database;
Delete the target data record.
The method of data synchronization provided according to the present invention, wherein the incremental data note in the monitoring source database It records and includes: the step of deleting data record
Determine that incremental data records by monitoring interim increment list in the source database or timestamp, wherein described The data of increment variation occur for record in interim increment list, at the time of data variation occurs for the timestamp characterization;
Deletion data record is determined by extracting the delete operation log in source database operation log.
To achieve the above object, the present invention also provides a kind of data synchronization units of big data platform, comprising:
Data record monitoring module, suitable for monitoring the record of the incremental data the source database and deleting data note Record;
Directive generation module is suitable for generating incremental number according to incremental data record and the deletion data record Instruction is deleted according to synchronous triggering command and data;
Increment synchronization module is adapted to respond to triggering command synchronous in the incremental data, by the increment in source database Data are synchronized to target database;
Removing module is suitable for deleting instruction according to the data, and delete target data are remembered from the target database Record.
The data synchronization unit provided according to the present invention, wherein described instruction generation module includes:
Candidate record extracting sub-module, suitable for extract all candidate incremental datas records with same keyword section and Candidate deletes data record;
Sorting sub-module was suitable for the candidate incremental data record and the candidate data record of deleting according to the time Sequence is ranked up;
Target record determines submodule, suitable for recording from the candidate incremental data being ranked up sequentially in time Determine that target delta data record and target delete data record with the candidate data record of deleting;
Instruction generates submodule, is suitable for deleting data record point according to the target delta data record and the target It Sheng Cheng not the synchronous triggering command of incremental data and data deletion instruction.
The data synchronization unit provided according to the present invention, wherein the removing module includes:
Keyword extraction submodule, suitable for extracting the keyword the delete operation log;
Record search submodule, suitable for searching the corresponding target data note of the keyword the target database Record;
Submodule is deleted, is suitable for deleting the target data record.
To achieve the above object, it the present invention also provides a kind of computer equipment, including memory, processor and is stored in On memory and the computer program that can run on a processor, the processor are realized above-mentioned when executing the computer program The step of method.
To achieve the above object, the present invention also provides computer readable storage mediums, are stored thereon with computer program, institute State the step of above method is realized when computer program is executed by processor.
Method of data synchronization, device, computer equipment and the computer-readable storage of big data platform provided by the invention Medium can be realized the high consistency between big data platform and upstream data based on incremental data synchronization.The present invention On the one hand by the incremental data in creation time upstream data synchronous with renewal time, on the other hand by obtaining delete operation The mode of log obtains the deletion record information in upstream data, according to the deletion record information to the phase in big data platform Data are closed to be deleted, thus guarantee that the data in big data platform are corresponding with the data indifference in the system of upstream data library, Improve the quality of data reliability of big data platform.Further, the present invention to a large amount of incremental data and deletion record information into It has gone screening, has determined to delete data to the final meaningful target delta data of simultaneously operating and target, and according to mesh It marks incremental data and target deletes data and generates increment synchronization instruction and delete data command.Through the invention above-mentioned was screened Journey can execute meaningless synchronization to avoid to intermediate incremental data or intermediate data of deleting, to improve data synchronization efficiency.
Detailed description of the invention
Fig. 1 is the application scenario diagram of method of data synchronization embodiment one of the invention;
Fig. 2 is the flow chart of method of data synchronization embodiment one of the invention;
Fig. 3 is to generate the synchronous triggering command of incremental data in method of data synchronization embodiment one of the invention and delete data The flow chart of instruction;
Fig. 4 is the program module schematic diagram of data synchronization unit embodiment one of the invention;
Fig. 5 is the hardware structural diagram of data synchronization unit embodiment one of the invention;
Fig. 6 is the flow chart of method of data synchronization embodiment two of the invention;
Fig. 7 is the program module schematic diagram of data synchronization unit embodiment two of the invention.
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, right The present invention is further elaborated.It should be appreciated that described herein, specific examples are only used to explain the present invention, not For limiting the present invention.Based on the embodiments of the present invention, those of ordinary skill in the art are not before making creative work Every other embodiment obtained is put, shall fall within the protection scope of the present invention.
Method of data synchronization, device, computer equipment and the storage medium of big data platform provided by the invention, are suitable for Database technical field, the precise information matching between big data platform and upstream data library system provide a kind of efficiently quick Automation implementation.One aspect of the present invention passes through the incremental data in creation time upstream data synchronous with renewal time, On the other hand the deletion record information in upstream data is obtained by way of obtaining delete operation log, is remembered according to the deletion Record information deletes the related data in big data platform, to guarantee the data and upstream data library in big data platform Data indifference in system is corresponding, improves the quality of data reliability of big data platform.Further, the present invention increases to a large amount of Amount data and deletion record information are screened, determine to final meaningful target delta data of simultaneously operating and Target deletes data, and deletes data according to target delta data and target and generate increment synchronization instruction and deletion data command. Above-mentioned screening process through the invention, can be meaningless same to avoid executing to intermediate incremental data or intermediate deletion data Step, to improve data synchronization efficiency.
Embodiment one
The method of data synchronization of big data platform proposed by the present invention can be applied in scene as shown in Figure 1.Its In, targeting database server 601 is communicated with dispatch server 600 and source database server 602 respectively by network. Targeting database server 601 inquires existing number according to the instruction of dispatch server 600 from source database server 602 According to existing data are synchronized in targeting database server 601 according to corresponding store path.Wherein source database service Device 602, which can be, relatively disperses independent each upstream data library system, for example, save each class respective end of term at Class's results database system of achievement saves class's student information data library system of each class student basic condition etc., mesh Mark database server 601 can be the big data platform system of comprehensive statistics, for example, save end of term of all classes, whole school at The school results Database Systems of achievement save whole school's student information data library system of all student's basic conditions of whole school.It is above-mentioned Data in school results Database Systems and whole school's student information data library system are respectively from results database system, class System and class's student information data library system, the data between homologous ray do not need to keep high consistency.
Referring to Fig. 2, the present embodiment proposes a kind of method of data synchronization of big data platform, it is to be answered shown in Fig. 1 With scene, the data synchronization process carried out using dispatch server as operating main body, specifically includes the following steps:
S1: it in response to the data synchronic command of target database, monitors the incremental data record in source database and deletes Data record.
In this step, targeting database server initiates data synchronic command, dispatch server to dispatch server first After receiving data synchronic command, incremental data simultaneously operating is executed.
Data synchronic command is the instruction of trigger data simultaneously operating, when which is generally based on defined Between put triggering, such as regulation executes data synchronic command every night on the stroke of midnight;In addition to this it can also manually trigger, I.e. data base administrator realizes that point executes data synchronic command at any time by way of clicking button manually.Data are same Step is by the Data Migration in source database server to targeting database server, to realize source database and target database In data it is synchronous.Data synchronic command is particularly used in the node time or data for specifying data to be synchronized, data synchronous Synchronous mode etc..In the present embodiment, data synchronic command, which is used to indicate, is synchronized to the data in certain particular source data library greatly The target database of data platform.
Specifically, dispatch server moves to target database after being extracted the incremental data in source database.Increase The extraction for measuring data can be realized using existing data synchronization means, such as ETL tool or sqoop tool etc., specifically Incremental data extracts mode can be using trigger mode, timestamp mode etc..Wherein, trigger mode refers to be extracted Trigger is established on table, such as establishes two insertion, modification triggers, whenever the data in the tables of data of source database become Change, the data of variation is just written by an interim table by corresponding trigger, extraction thread extracts data simultaneously from the interim table It migrates to target database.Timestamp mode is then a kind of delta data manner of comparison compared based on snapshot, by source number It is repaired simultaneously according to a timestamp field is increased in the tables of data in library when updating data in modification tables of data in source database Change the value of timestamp field.When carrying out data pick-up, determine to extract by comparing the value of system time and timestamp field Which data.
For example, the timestamp field established in the tables of data of source database in the present embodiment includes creation time and update Time, wherein creation time is characterized in the time for increasing data record in tables of data newly, and renewal time is characterized in tables of data and modifies The time of data record, then the step of incremental data in source database being synchronized to target database in the present embodiment include: The newly-increased data in source database are synchronized to target database according to creation time, and will be in source database according to renewal time Change data be synchronized to target database.
Deletion information of the invention refers to the information for the data record deleted from source database.Due to the data record It is deleted, therefore deletion label can not be directly acquired from source database, therefore in the target for being synchronized to big data platform When in server, it is difficult to notice the deleted data in this part.
In order to obtain the details for the data record being deleted in source database, invention introduces delete operation days Will obtains the details of deleted data record by analyzing the operation log of source database itself.When specific operation, The operation log of tables of data in source database can be acquired in real time by log collection tool, and extract and delete from operation log Except operation log.
S2: according to incremental data record and the deletion data record, generate the synchronous triggering command of incremental data and Data delete instruction.
Referring to Fig. 3, this step is in all incremental datas record got in source database and deletes data record On the basis of, data record is recorded and deleted to the incremental data and is filtered screening, is removed and is not needed to execute simultaneously operating Intermediate data, such as remove the incremental data record for belonging to pilot process and the deletion data record for belonging to pilot process, it obtains Target delta data record associated with the final synchronization action of purpose database and target delete data record;According to target Incremental data record triggering command synchronous with target deletion data record generation incremental data and data delete instruction.
S21: the present invention records and deletes the institute for extracting in data record and having same keyword section from all incremental datas There are candidate incremental data record and candidate deletion data record.
Above have same keyword section, refer to the data record with same operands.For example, about certain The situation of change of member's account balance data record of fitness center whithin a period of time are as follows: 2000 yuan of initial value;Become for the first time It is remaining 500 yuan rear;It does not continue to pay dues since membership expires for the second time, which records the data from fitness center It is deleted in library;Third time member request opens account again and continues to pay dues 3000 yuan, and member's account balance is 3000 yuan at this time, the Four members continue to pay dues 2000 yuan again, and member's account balance is 5000 yuan at this time.Above-mentioned a plurality of member's account balance record has Identical member's card number (such as 003576) and identical operation field (account balance), then above-mentioned identical member's card number (such as 003576) and identical operation field (account balance) are equivalent to so-called same keyword section in the present invention, to Characterize the operation carried out to the same field in same data record.At this point, member's account balance of above-mentioned different times is For candidate incremental data record or candidate deletion data record (being deleted because not continuing to pay dues).
S22: the candidate incremental data record and the candidate data record of deleting are ranked up sequentially in time.
Data instance is recorded with above-mentioned member's account balance, is sorted according to chronological order as follows:
Primary data: member's account balance record _ 2000;
Incremental data 1: member's account balance record _ 500;
Delete data: member's account has been deleted;
Incremental data 2: member's account balance record _ 3000;
Incremental data 3: member's account balance record _ 5000.
S23: from the candidate incremental data record and the candidate deletion data note being ranked up sequentially in time It records and determines that target delta data record and target delete data record.
From above-described embodiment, member's account balance is finally embodied from incremental data 3, and remaining incremental data 1, increment Data 2 and deletion data are the data during intermediate change, need synchronous data not influence on final.Therefore, originally The target delta data record determined in embodiment is the incremental data 3 that can be had an impact to the final data for needing synchronization, and There is no target to delete data record.
S24: it is same that incremental data is generated according to the target delta data record and target deletion data record respectively It walks triggering command and data deletes instruction.
In the present embodiment, generating the synchronous triggering command of incremental data according to incremental data 3, " member's account balance is recorded 5000 ", it deletes and instructs without data.
S3: according to the synchronous triggering command of the incremental data, the incremental data in source database is synchronized to target data Library;It is deleted and is instructed according to the data, the delete target data record from the target database.
This step includes two aspects, first is that incremental data is synchronous, second is that it is synchronous to delete data.It is same for incremental data Step successively executes synchronous after the synchronous triggering command of incremental data has been determined according to the instruction.It is same for deleting data Step, deleting data is counted by the operation log of source database, and the present invention is hereinafter described in detail.
Since big data platform can generate a large amount of log daily, it is also required to specifically to carry out processing to these logs Log system, what the present invention combined in terms of the acquisition of operation log and transmission using Kafka component and Flume component Mode, above-mentioned two component is all log system popular in the prior art, and respectively has feature.Wherein Flume is one The result collection system of distributed, reliable, High Availabitity massive logs acquisition, polymerization and transmission.It is fixed in log system to support Various types of data sender processed, for collecting data;Meanwhile Flume is provided and is carried out simple process to data, and writes various data The ability of reciever (customizable).Kafka be one it is distributed, can subregion, reproducible message system, safeguard message team Column.Flume can be selected in the present embodiment to be responsible for the real-time acquisition of the operation log of source database, due to acquiring the speed of data Degree is not necessarily synchronous with the speed of data processing, therefore adds a message-oriented middleware as buffering, such as kafka is selected to disappear Breath system is used to processing operation log flow data.
For saving the source database of certain class student's essential information, it is assumed that each data record includes that ID is compiled Number, name, age, gender and national several fields, wherein major key is that ID is numbered.In this week student's essential information happen as Lower variation: increase is joined a class in the middle of the course student one, and essential information is [No. 054, Mr. Lin, 12 years old, male, Han nationality];Student Mr. Wang's is original Essential information is recorded as [No. 037, Mr. Wang, 12 years old, female, the Manchu] because it is found that mistake when typing, needs student Mr. Wang's Age was changed to 13 years old by 12 years old, i.e. the essential information of student Mr. Wang is changed to [No. 037, Mr. Wang, 13 years old, female, the Manchu];It learns The essential information of raw Lee is deleted because of transfeing to another school, before deleting the essential information of student Lee for [No. 022, Lee, female, 12 years old, Han nationality], after deletion, the essential information of student Lee is no longer present in tables of data.In a specific embodiment of the present invention, such as It is recorded by the way of timestamp in every data and increases a timestamp field below, then the data record of student's Mr. Lin is last One can show that the timestamp of creation time, last of the data record of student Mr. Wang can show the time of change time Stamp, but for student Lee, since corresponding whole data record has been deleted from tables of data, so also not Corresponding timestamp may be added to this data record again, therefore when the data for carrying out big data platform are synchronous, can not obtained Data record to student Lee is deleted, and therefore, the present invention is further defined by way of acquire operation log Which data record is deleted.For example, the database of class student's essential information acquires log information within to this week When, it is available to arrive following log information: newly-increased record [No. 054, Mr. Lin, 12 years old, male, Han nationality], change record [No. 037, king Certain, 12 years old, female, the Manchu], deletion record [No. 022, Lee, female, 12 years old, Han nationality].For the present invention, increase newly record and Change record is obtained by way of timestamp, therefore the present invention is only concerned deletion record at this time.Only by filter type Retain and delete information-related log content, i.e. reservation deletion record ' name: Lee ' log content, this is about name Deletion record for the student of Lee is present invention target data record of interest.
On the basis of obtaining the delete operation log of the deleted datalogging information of characterization, dispatch server according to Deleted datalogging information deletes respective data record one by one from target database.
In a particular embodiment, the present invention is gone by extracting the keyword in delete operation log according to the keyword The mode of respective data record is compared in target database to delete data record corresponding to the delete operation log.It is preferred that , which is main key information.
For example, getting delete operation log are as follows: deletion record [No. 022, Lee, female, 12 years old, Han nationality].This record In major key be student ID, i.e., No. 022.Therefore, the present invention is searched in target database by keyword " No. 022 ", To navigate to the data record of [No. 022, Lee, female, 12 years old, Han nationality] and be deleted.
Due to the possible more than one of the data record of deletion, in order to guarantee that the log of each delete operation is all correctly held Row, the present invention store above-mentioned delete operation log using message queue, such as can choose kafka message subscribing system It is docked with the target database of big data platform such as hive, to realize the synchronization of daily data.
Further, after the synchronization for completing deletion data record, can increase in dispatch server subsynchronous on one Time record, directly to be carried out to from the data record after synchronization time last time when executing data synchronic command next time It is synchronous, it avoids that the data synchronized excessively are carried out to repeat simultaneously operating, improves the synchronous efficiency of data.
Please continue to refer to Fig. 2, a kind of data synchronization unit of big data platform is shown, in the present embodiment, data are same Step device 10 may include or be divided into one or more program modules, one or more program module is stored in storage In medium, and as performed by one or more processors, to complete the present invention, and above-mentioned automatic update method can be realized.This hair Bright so-called program module is the series of computation machine program instruction section for referring to complete specific function, is more suitable for than program itself In implementation procedure of the description data synchronization unit 10 in storage medium.Each program mould of the present embodiment will specifically be introduced by being described below The function of block:
Data record monitoring module 11, suitable for monitoring the record of the incremental data the source database and deleting data note Record;
Directive generation module 12 is suitable for generating increment according to incremental data record and the deletion data record The synchronous triggering command of data and data delete instruction;
Increment synchronization module 13 is adapted to respond to triggering command synchronous in the incremental data, by the increasing in source database Amount data are synchronized to target database;
Removing module 14 is suitable for deleting instruction, the delete target data from the target database according to the data Record.
The data synchronization unit provided according to the present invention, wherein described instruction generation module 12 includes:
Candidate record extracting sub-module 121, suitable for extracting all candidate incremental data notes with same keyword section Record and candidate deletion data record;
Sorting sub-module 122, be suitable for the candidate incremental data record and it is described it is candidate delete data record according to Time sequencing is ranked up;
Target record determines submodule 123, suitable for from the candidate incremental data being ranked up sequentially in time Record and the candidate data record of deleting determine that target delta data record and target delete data record;
Instruction generates submodule 124, is suitable for deleting data note according to the target delta data record and the target Record generates the synchronous triggering command of incremental data respectively and data delete instruction.
The data synchronization unit provided according to the present invention, wherein the removing module 14 includes:
Keyword extraction submodule 141, suitable for extracting the keyword the delete operation log;
Record search submodule 142, suitable for searching the corresponding number of targets of the keyword the target database According to record;
Submodule 143 is deleted, is suitable for deleting the target data record.
The present embodiment also provides a kind of computer equipment, can such as execute the smart phone, tablet computer, notebook of program Computer, desktop computer, rack-mount server, blade server, tower server or Cabinet-type server are (including independent Server cluster composed by server or multiple servers) etc..The computer equipment 20 of the present embodiment includes at least but not It is limited to: memory 21, the processor 22 of connection can be in communication with each other by system bus, as shown in Figure 3.It is pointed out that Fig. 3 The computer equipment 20 with component 21-22 is illustrated only, it should be understood that being not required for implementing all groups shown Part, the implementation that can be substituted is more or less component.
In the present embodiment, memory 21 (i.e. readable storage medium storing program for executing) includes flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory etc.), random access storage device (RAM), static random-access memory (SRAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), programmable read only memory (PROM), magnetic storage, magnetic Disk, CD etc..In some embodiments, memory 21 can be the internal storage unit of computer equipment 20, such as the calculating The hard disk or memory of machine equipment 20.In further embodiments, memory 21 is also possible to the external storage of computer equipment 20 The plug-in type hard disk being equipped in equipment, such as the computer equipment 20, intelligent memory card (Smart Media Card, SMC), Secure digital (Secure Digital, SD) card, flash card (Flash Card) etc..Certainly, memory 21 can also both include The internal storage unit of computer equipment 20 also includes its External memory equipment.In the present embodiment, memory 21 is commonly used in depositing Storage is installed on the operating system and types of applications software of computer equipment 20, such as the data synchronization unit 10 of embodiment one Program code etc..In addition, memory 21 can be also used for temporarily storing the Various types of data that has exported or will export.
Processor 22 can be in some embodiments central processing unit (Central Processing Unit, CPU), Controller, microcontroller, microprocessor or other data processing chips.The processor 22 is commonly used in control computer equipment 20 overall operation.In the present embodiment, program code or processing data of the processor 22 for being stored in run memory 21, Such as operation data synchronizing device 10, to realize the method for data synchronization of embodiment one.
The present embodiment also provides a kind of computer readable storage medium, such as flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory etc.), random access storage device (RAM), static random-access memory (SRAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), programmable read only memory (PROM), magnetic storage, magnetic Disk, CD, server, App are stored thereon with computer program, phase are realized when program is executed by processor using store etc. Answer function.The computer readable storage medium of the present embodiment synchronizing device 10 for storing data, realization when being executed by processor The method of data synchronization of embodiment one.
Embodiment two
The present embodiment is directed to the case where synchronizing to the data in multiple source datas library.Embodiment one describes The synchronous situation of data is carried out only for single source database, the present embodiment carries out the synchronous mistake of data for multiple source datas library The step of journey is similar with embodiment one, only increases serial process, i.e., in order respectively to each individual source database into Row data are synchronous.
Referring to Fig. 4, the information query method of the present embodiment is based on embodiment one, comprising the following steps:
S1: in response to the data synchronic command of target database, the incremental data in multiple source datas library is synchronized to mesh Mark database.
In this step, dispatch server moves to target after being extracted the incremental data in multiple source datas library respectively Database.The extraction of incremental data can be realized using existing data synchronization means, such as ETL tool or sqoop tool Deng specific incremental data extracts mode can be using trigger mode, timestamp mode etc..Wherein, trigger mode refers to Trigger is established on the table to be extracted, such as establishes two insertion, modification triggers, whenever in the tables of data of source database Data change, and an interim table just are written in the data of variation by corresponding trigger, extraction thread is from the interim table It extracts data and migrates to target database.Timestamp mode is then a kind of delta data manner of comparison compared based on snapshot, Updated by increasing a timestamp field in the tables of data of source database, in source database data in modification tables of data when It waits, while the value of modification time stamp field.When carrying out data pick-up, come by comparing the value of system time and timestamp field Determine which data extracted.
For example, the timestamp field established in the tables of data of source database in the present embodiment includes creation time and update Time, wherein creation time is characterized in the time for increasing data record in tables of data newly, and renewal time is characterized in tables of data and modifies The time of data record, then the step of incremental data in source database being synchronized to target database in the present embodiment include: The newly-increased data in source database are synchronized to target database according to creation time, and will be in source database according to renewal time Change data be synchronized to target database.
Each source database has corresponding fixed locations in target database, synchronizes to incremental data When, it needs to store the incremental data in each source database into the position of target database corresponding with the source database.
S2: the deletion information in the multiple source database is obtained, the deletion information representation is from the source database The target data record of deletion.
Deletion information of the invention refers to the information for the data record deleted from source database.Due to the data record It is deleted, therefore deletion label can not be directly acquired from source database, therefore in the target for being synchronized to big data platform When in server, it is difficult to notice the deleted data in this part.
In order to obtain the details for the data record being deleted in source database, invention introduces delete operation days Will obtains the details of deleted data record by analyzing the operation log of source database itself.When specific operation, The operation log of tables of data in source database can be acquired in real time by log collection tool, and extract and delete from operation log Except operation log.
Since big data platform can generate a large amount of log daily, it is also required to specifically to carry out processing to these logs Log system, what the present invention combined in terms of the acquisition of operation log and transmission using Kafka component and Flume component Mode, above-mentioned two component is all log system popular in the prior art, and respectively has feature.Wherein Flume is one The result collection system of distributed, reliable, High Availabitity massive logs acquisition, polymerization and transmission.It is fixed in log system to support Various types of data sender processed, for collecting data;Meanwhile Flume is provided and is carried out simple process to data, and writes various data The ability of reciever (customizable).Kafka be one it is distributed, can subregion, reproducible message system, safeguard message team Column.Flume can be selected in the present embodiment to be responsible for the real-time acquisition of the operation log of source database, due to acquiring the speed of data Degree is not necessarily synchronous with the speed of data processing, therefore adds a message-oriented middleware as buffering, such as kafka is selected to disappear Breath system is used to processing operation log flow data.
For saving the source database of certain class student's essential information, it is assumed that each data record includes that ID is compiled Number, name, age, gender and national several fields, wherein major key is that ID is numbered.In this week student's essential information happen as Lower variation: increase is joined a class in the middle of the course student one, and essential information is [No. 054, Mr. Lin, 12 years old, male, Han nationality];Student Mr. Wang's is original Essential information is recorded as [No. 037, Mr. Wang, 12 years old, female, the Manchu] because it is found that mistake when typing, needs student Mr. Wang's Age was changed to 13 years old by 12 years old, i.e. the essential information of student Mr. Wang is changed to [No. 037, Mr. Wang, 13 years old, female, the Manchu];It learns The essential information of raw Lee is deleted because of transfeing to another school, before deleting the essential information of student Lee for [No. 022, Lee, female, 12 years old, Han nationality], after deletion, the essential information of student Lee is no longer present in tables of data.In a specific embodiment of the present invention, such as It is recorded by the way of timestamp in every data and increases a timestamp field below, then the data record of student's Mr. Lin is last One can show that the timestamp of creation time, last of the data record of student Mr. Wang can show the time of change time Stamp, but for student Lee, since corresponding whole data record has been deleted from tables of data, so also not Corresponding timestamp may be added to this data record again, therefore when the data for carrying out big data platform are synchronous, can not obtained Data record to student Lee is deleted, and therefore, the present invention is further defined by way of acquire operation log Which data record is deleted.For example, the database of class student's essential information acquires log information within to this week When, it is available to arrive following log information: newly-increased record [No. 054, Mr. Lin, 12 years old, male, Han nationality], change record [No. 037, king Certain, 12 years old, female, the Manchu], deletion record [No. 022, Lee, female, 12 years old, Han nationality].For the present invention, increase newly record and Change record is obtained by way of timestamp, therefore the present invention is only concerned deletion record at this time.Only by filter type Retain and delete information-related log content, i.e. reservation deletion record ' name: Lee ' log content, this is about name Deletion record for the student of Lee is present invention target data record of interest.
It for the delete operation log from multiple source datas library, needs to show the source database belonging to it, facilitates subsequent It is positioned rapidly in target database according to corresponding storage location.
S3: the target data record is deleted from the target database.
On the basis of obtaining the delete operation log of the deleted datalogging information of characterization, dispatch server according to Deleted datalogging information deletes respective data record one by one from target database.
In a particular embodiment, the present invention is gone by extracting the keyword in delete operation log according to the keyword The mode of respective data record is compared in target database to delete data record corresponding to the delete operation log.It is preferred that , which is main key information.
For example, getting delete operation log are as follows: deletion record [No. 022, Lee, female, 12 years old, Han nationality].This record In major key be student ID, i.e., No. 022.Therefore, the present invention is searched in target database by keyword " No. 022 ", To navigate to the data record of [No. 022, Lee, female, 12 years old, Han nationality] and be deleted.
Due to the possible more than one of the data record of deletion, in order to guarantee that the log of each delete operation is all correctly held Row, the present invention store above-mentioned delete operation log using message queue, such as can choose kafka message subscribing system It is docked with the target database of big data platform such as hive, to realize the synchronization of daily data.
For the delete operation log from multiple source datas library, respectively according to each source database in target database Storage location successively deleted.
Further, after the synchronization for completing deletion data record, can increase in dispatch server subsynchronous on one Time record, directly to be carried out to from the data record after synchronization time last time when executing data synchronic command next time It is synchronous, avoid for carried out by data carry out repeating simultaneously operating, improve the synchronous efficiency of data.
Please continue to refer to Fig. 5, the data synchronization unit 30 of the present embodiment is based on embodiment one, to realize embodiment Two method of data synchronization comprising each program module function:
Increment synchronization module 31 is adapted to respond to the data synchronic command in target database, by the increasing in source database Amount data are synchronized to target database;
Data obtaining module 32 is deleted, suitable for obtaining the deletion information the source database, the deletion information table Levy the target data record deleted from the source database;
Removing module 33, suitable for deleting the target data record from the target database.
Further, the deletion data obtaining module 32 includes:
Log acquisition module 321, suitable for obtaining the operation log the source database, and from the operation log Extract delete operation log.
Further, the removing module 33 includes:
Keyword extraction submodule 331, suitable for extracting the keyword the delete operation log;
Record search submodule 332, suitable for searching the corresponding number of targets of the keyword the target database According to record;
Submodule 333 is deleted, is suitable for deleting the target data record.
In conclusion the information query method of associated data proposed by the present invention, device, computer equipment and storage are situated between Matter, it is possible to reduce the process largely indexed is created in the database, by establishing the index knot between tables of data with text formatting Structure data model can effectively reduce the access pressure to data.Simultaneously the present invention make in business procession can directly from Data are obtained in caching, improve efficiency data query and there is good expansibility, can support the storage under big data quantity Query demand.
The serial number of the above embodiments of the invention is only for description, does not represent the advantages or disadvantages of the embodiments.
Any process or the method description described in other ways in flow chart or herein is construed as, and expression includes It is one or more for realizing specific logical function or process the step of executable instruction code module, segment or portion Point, and the range of the preferred embodiment of the present invention includes other realization, wherein can not press shown or discussed suitable Sequence, including according to related function by it is basic simultaneously in the way of or in the opposite order, Lai Zhihang function, this should be of the invention Embodiment person of ordinary skill in the field understood.
Those skilled in the art are appreciated that all or part of step for realizing that above-described embodiment method carries It suddenly is that relevant hardware can be instructed to complete by program, the program can store in a kind of computer-readable medium In, which when being executed, includes the steps that one or a combination set of embodiment of the method.
In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show The description of example " or " some examples " etc. means particular features, structures, materials, or characteristics described in conjunction with this embodiment or example It is included at least one embodiment or example of the invention.In the present specification, schematic expression of the above terms are different Surely identical embodiment or example is referred to.Moreover, particular features, structures, materials, or characteristics described can be any It can be combined in any suitable manner in one or more embodiment or examples.
Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side Method can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but in many cases The former is more preferably embodiment.
The above is only a preferred embodiment of the present invention, is not intended to limit the scope of the invention, all to utilize this hair Equivalent structure or equivalent flow shift made by bright specification and accompanying drawing content is applied directly or indirectly in other relevant skills Art field, is included within the scope of the present invention.

Claims (10)

1. a kind of method of data synchronization of big data platform characterized by comprising
In response to data synchronic command, monitors the incremental data record in source database and delete data record;
According to incremental data record and the deletion data record, generates the synchronous triggering command of incremental data and data are deleted Instruction;
According to the synchronous triggering command of the incremental data, the incremental data in source database is synchronized to target database;
It is deleted and is instructed according to the data, the delete target data record from the target database.
2. method of data synchronization according to claim 1, which is characterized in that described according to incremental data record and institute Deletion data record is stated, generating the step of incremental data synchronizes triggering command and data deletion instruction includes:
Extracting, there are all candidate incremental data records of same keyword section to delete data record with candidate;
The candidate incremental data record and the candidate data record of deleting are ranked up sequentially in time;
Mesh is determined from the candidate incremental data record and the candidate data record of deleting being ranked up sequentially in time It marks incremental data record and target deletes data record;
Synchronous trigger of incremental data is generated respectively with target deletion data record according to the target delta data record to refer to It enables and data deletes instruction.
3. method of data synchronization according to claim 2, which is characterized in that described deleted according to the data instructs, from The step of delete target data record, includes: in the target database
The data are deleted into instruction and are sequentially put into message queue;
It reads the data one by one from the message queue and deletes instruction, delete instruction delete target data according to the data Record.
4. method of data synchronization according to claim 3, which is characterized in that described to delete instruction deletion according to the data The step of target data record includes:
It extracts the element tool and deletes the keyword in instructing;
The corresponding target data record of the keyword is searched in the target database;
Delete the target data record.
5. method of data synchronization according to claim 1, which is characterized in that the increment in the monitoring source database Data record and delete data record the step of include:
Determine that incremental data records by monitoring interim increment list in the source database or timestamp, wherein described interim The data of increment variation occur for record in increment list, at the time of data variation occurs for the timestamp characterization;
Deletion data record is determined by extracting the delete operation log in source database operation log.
6. a kind of data synchronization unit of big data platform characterized by comprising
Data record monitoring module, suitable for monitoring the record of the incremental data the source database and deleting data record;
Directive generation module is suitable for that it is same to generate incremental data according to incremental data record and the deletion data record It walks triggering command and data deletes instruction;
Increment synchronization module is adapted to respond to triggering command synchronous in the incremental data, by the incremental data in source database It is synchronized to target database;
Removing module is suitable for deleting instruction, the delete target data record from the target database according to the data.
7. data synchronization unit according to claim 6, which is characterized in that described instruction generation module includes:
Candidate record extracting sub-module, suitable for extracting all candidate incremental data records and candidate with same keyword section Delete data record;
Sorting sub-module is suitable for deleting data record sequentially in time to the candidate incremental data record and the candidate It is ranked up;
Target record determines submodule, suitable for from the candidate incremental data record being ranked up sequentially in time and institute It states candidate data record of deleting and determines that target delta data record and target delete data record;
Instruction generates submodule, gives birth to respectively suitable for deleting data record according to the target delta data record and the target Instruction is deleted at the synchronous triggering command of incremental data and data.
8. data synchronization unit according to claim 7, which is characterized in that the removing module includes:
Keyword extraction submodule, suitable for extracting the keyword the delete operation log;
Record search submodule, suitable for searching the corresponding target data record of the keyword the target database;
Submodule is deleted, is suitable for deleting the target data record.
9. a kind of computer equipment, can run on a memory and on a processor including memory, processor and storage Computer program, which is characterized in that the processor realizes any one of claim 1 to 5 institute when executing the computer program The step of stating method.
10. a kind of computer readable storage medium, is stored thereon with computer program, it is characterised in that: the computer program The step of any one of claim 1 to 5 the method is realized when being executed by processor.
CN201910418941.XA 2019-05-20 2019-05-20 Big data platform method of data synchronization, device, computer equipment and storage medium Pending CN110321383A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910418941.XA CN110321383A (en) 2019-05-20 2019-05-20 Big data platform method of data synchronization, device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910418941.XA CN110321383A (en) 2019-05-20 2019-05-20 Big data platform method of data synchronization, device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN110321383A true CN110321383A (en) 2019-10-11

Family

ID=68113169

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910418941.XA Pending CN110321383A (en) 2019-05-20 2019-05-20 Big data platform method of data synchronization, device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN110321383A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111651519A (en) * 2020-05-08 2020-09-11 携程计算机技术(上海)有限公司 Data synchronization method, data synchronization device, electronic device, and storage medium
CN111881091A (en) * 2020-06-08 2020-11-03 微梦创科网络科技(中国)有限公司 Data storage method and device, electronic equipment and storage medium
CN112380227A (en) * 2020-11-12 2021-02-19 平安科技(深圳)有限公司 Data synchronization method, device and equipment based on message queue and storage medium
CN112445799A (en) * 2020-11-19 2021-03-05 北京思特奇信息技术股份有限公司 Single-source multi-node data synchronization method and system
CN112465630A (en) * 2020-12-11 2021-03-09 天冕信息技术(深圳)有限公司 Index data processing method, device, equipment and storage medium
CN113297239A (en) * 2021-04-29 2021-08-24 上海淇玥信息技术有限公司 Data management platform and method and electronic equipment
CN114500569A (en) * 2022-01-27 2022-05-13 中国工商银行股份有限公司 Data synchronization method, device, equipment and storage medium
CN114817410A (en) * 2022-06-23 2022-07-29 心鉴智控(深圳)科技有限公司 Service data processing method, device, equipment and storage medium
CN116431688A (en) * 2022-11-14 2023-07-14 北京远舢智能科技有限公司 Data processing method and device, electronic equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140279920A1 (en) * 2013-03-15 2014-09-18 Amazon Technologies, Inc. Log record management
CN104834700A (en) * 2015-04-27 2015-08-12 南京邮电大学 Method for capturing movement data increment based on track change
CN107590277A (en) * 2017-09-28 2018-01-16 泰康保险集团股份有限公司 Method of data synchronization, device, electronic equipment and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140279920A1 (en) * 2013-03-15 2014-09-18 Amazon Technologies, Inc. Log record management
CN104834700A (en) * 2015-04-27 2015-08-12 南京邮电大学 Method for capturing movement data increment based on track change
CN107590277A (en) * 2017-09-28 2018-01-16 泰康保险集团股份有限公司 Method of data synchronization, device, electronic equipment and storage medium

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111651519A (en) * 2020-05-08 2020-09-11 携程计算机技术(上海)有限公司 Data synchronization method, data synchronization device, electronic device, and storage medium
CN111651519B (en) * 2020-05-08 2023-04-25 携程计算机技术(上海)有限公司 Data synchronization method, data synchronization device, electronic equipment and storage medium
CN111881091A (en) * 2020-06-08 2020-11-03 微梦创科网络科技(中国)有限公司 Data storage method and device, electronic equipment and storage medium
CN112380227A (en) * 2020-11-12 2021-02-19 平安科技(深圳)有限公司 Data synchronization method, device and equipment based on message queue and storage medium
CN112380227B (en) * 2020-11-12 2024-05-07 平安科技(深圳)有限公司 Data synchronization method, device, equipment and storage medium based on message queue
CN112445799A (en) * 2020-11-19 2021-03-05 北京思特奇信息技术股份有限公司 Single-source multi-node data synchronization method and system
CN112465630B (en) * 2020-12-11 2024-03-26 天冕信息技术(深圳)有限公司 Index data processing method, device, equipment and storage medium
CN112465630A (en) * 2020-12-11 2021-03-09 天冕信息技术(深圳)有限公司 Index data processing method, device, equipment and storage medium
CN113297239A (en) * 2021-04-29 2021-08-24 上海淇玥信息技术有限公司 Data management platform and method and electronic equipment
CN114500569A (en) * 2022-01-27 2022-05-13 中国工商银行股份有限公司 Data synchronization method, device, equipment and storage medium
CN114817410A (en) * 2022-06-23 2022-07-29 心鉴智控(深圳)科技有限公司 Service data processing method, device, equipment and storage medium
CN116431688A (en) * 2022-11-14 2023-07-14 北京远舢智能科技有限公司 Data processing method and device, electronic equipment and storage medium
CN116431688B (en) * 2022-11-14 2024-05-03 北京远舢智能科技有限公司 Data processing method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN110321383A (en) Big data platform method of data synchronization, device, computer equipment and storage medium
EP3456360B1 (en) Device and method for tuning relational database
US9130971B2 (en) Site-based search affinity
CN108536752B (en) Data synchronization method, device and equipment
CN109034993A (en) Account checking method, equipment, system and computer readable storage medium
US20140236890A1 (en) Multi-site clustering
KR101125911B1 (en) Information processing method and device for work process analysis
CN109376196B (en) Method and device for batch synchronization of redo logs
CN110489699B (en) Asynchronous data acquisition method and system
US20140006358A1 (en) Creation and replay of a simulation workload using captured workloads
CN107239382A (en) The log processing method and system of a kind of container application
EP2815335A1 (en) Method of machine learning classes of search queries
KR101740271B1 (en) Method and device for constructing on-line real-time updating of massive audio fingerprint database
CN110245145A (en) Structure synchronization method and apparatus of the relevant database to Hadoop database
CN110417873B (en) Network information extraction system for realizing recording webpage interactive operation
CN110019469A (en) Distributed data base data processing method, device, storage medium and electronic device
JP2016076003A (en) Instruction history analysis program, instruction history analysis device, and instruction history analysis method
CN109271545A (en) A kind of characteristic key method and device, storage medium and computer equipment
CN106802928B (en) Power grid historical data management method and system
CN114416868B (en) Data synchronization method, device, equipment and storage medium
CN113094442B (en) Full data synchronization method, device, equipment and medium
CN101952843A (en) Workflow processing program, method, and device
CN107291938A (en) Order Query System and method
CN113778996A (en) Large data stream data processing method and device, electronic equipment and storage medium
CN109857768B (en) Big data aggregation query method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination