CN108197155A - Information data synchronous method, device and computer readable storage medium - Google Patents

Information data synchronous method, device and computer readable storage medium Download PDF

Info

Publication number
CN108197155A
CN108197155A CN201711293634.0A CN201711293634A CN108197155A CN 108197155 A CN108197155 A CN 108197155A CN 201711293634 A CN201711293634 A CN 201711293634A CN 108197155 A CN108197155 A CN 108197155A
Authority
CN
China
Prior art keywords
information
data
task
synchronous
information data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711293634.0A
Other languages
Chinese (zh)
Inventor
卢道和
邸帅
谢健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
WeBank Co Ltd
Original Assignee
WeBank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by WeBank Co Ltd filed Critical WeBank Co Ltd
Priority to CN201711293634.0A priority Critical patent/CN108197155A/en
Publication of CN108197155A publication Critical patent/CN108197155A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • G06F16/275Synchronous replication

Abstract

The invention discloses a kind of information data synchronous method, including:Obtain the job configuration information of user setting;The information data of source position is synchronized to by destination locations according to the job configuration information.The invention also discloses a kind of information data synchronizing device, computer readable storage mediums.The present invention improves the flexibility that information data synchronizes.

Description

Information data synchronous method, device and computer readable storage medium
Technical field
The present invention relates to technical field of data processing more particularly to a kind of information data synchronous method, device and computers Readable storage medium storing program for executing.
Background technology
With the development of big data, the data warehouse structure of many companies all employs the Hadoop ecosphere technologies increased income It realizes.With the growth of data volume and business demand, the reliability requirement of data warehouse is promoted increasingly to increase severely, more sets of data warehouses Turn new technical solution into, between more sets of data warehouses continually synchrodata become a normality.Therefore, it is necessary to It is a set of across cluster, accurate, reliable, efficient information data synchronization means important support is provided, ensure what is transmitted across company-data Timeliness ensures the integrality of data transmission.For the information data of big data platform TB, PB grade synchronizes, such synchronization The importance of delivery means system seems especially prominent.The following two kinds method of synchronization exists in the prior art.One of which uses Reproduction technology resets all operations on master library from daily record.For Hive, general reproduction technology mainly passes through number According to and metadata export importing and event replay realize that this mode synchronous efficiency is low.Another mode is by increasing income Tool realizes, this mode can not custom-configure there are synchronous task, information data synchronizes the users such as progress is invisible and hands over The problems such as mutually experiencing difference and inconsistent data.It cannot flexibly as needed using user during prior art progress information data synchronization Controllably carry out information data synchronization.
Invention content
It is a primary object of the present invention to provide a kind of information data synchronous method, it is intended to which solving prior art user cannot The problem of flexibly controllably carrying out information data synchronization as needed.
To achieve the above object, the present invention provides a kind of information data synchronous method, described information method of data synchronization packet It includes:
Obtain the job configuration information of user setting;
The information data of source position is synchronized to by destination locations according to the job configuration information.
Preferably, it is described the information data of source position is synchronized to by destination locations according to the job configuration information to include:
Synchronous task is generated according to the job configuration information;
The information data of the source position is synchronized to by the destination locations according to the synchronous task.
Preferably, described information data are metadata, it is described according to the synchronous task by the Information Number of the source position Include according to the destination locations are synchronized to:
Described according to the synchronous task of the metadata using multithreading the metadata of the source position is synchronized to Destination locations.
Preferably, described information data are data, it is described according to the synchronous task by the information data of the source position The destination locations are synchronized to further include:
The synchronous task of at least two data is polymerize according to preset polymerization rule, obtains polymerization synchronous task;
The data of the source position are synchronized to by the destination locations according to the polymerization synchronous task.
Preferably, it is further included after described the step of generating synchronous task according to the job configuration information:
Monitor task is generated according to the synchronous task, and the execution of the synchronous task is obtained according to the monitor task Progress msg;
The execution state of the synchronous task is determined according to the implementation progress information;
When the execution state of the synchronous task is performs status of fail, step is performed:Obtain the operation of user setting Configuration information re-starts information data synchronization.
Preferably, it is described the synchronous task is determined according to the implementation progress information execution state the step of after also Including:
When the execution state of the synchronous task is to be finished state, the first information and mesh of source position data are obtained Position data the second information, the first information or the second information include at least file size information, quantity of documents information Or check code information;
Data check is carried out according to the first information and second information.
Preferably, it is further included after described the step of generating synchronous task according to the job configuration information:
Receive the synchronous task implementation progress inquiry request of user's triggering;
Synchronous task implementation progress information is shown according to the inquiry request.
Preferably, the synchronous task type includes increment synchronization and batch synchronization, and the job configuration information includes using The task create-rule of family selection, the task create-rule are included at least based on metadata attributes create-rule, based on Hive Hook create-rules or based on the last modification time create-rule of file.
In addition, to achieve the above object, the present invention also provides a kind of information data synchronizing device, described information data synchronize Device includes:Memory, processor and to be stored in the information data that can be run on the memory and on the processor same Program is walked, described information data synchronization program realizes the step of information data synchronous method as described above when being performed by the processor Suddenly.
In addition, to achieve the above object, it is described computer-readable the present invention also provides a kind of computer readable storage medium Information data synchronization program is stored on storage medium, as above institute is realized when described information data synchronization program is executed by processor The step of information data synchronous method stated.
The present invention provides a kind of information data synchronous method, device and computer readable storage medium, described information data Synchronous method includes:Obtain the job configuration information of user setting;Information data synchronization is according to the job configuration information by source The information data of position is synchronized to destination locations.By the above-mentioned means, user can be based on actual information data synchronization scenarios root According to setting job configuration information is needed, information data synchronization is carried out according to user setting job configuration information, improves Information Number According to synchronous flexibility.
Description of the drawings
Fig. 1 is the apparatus structure schematic diagram of hardware running environment that the embodiment of the present invention is related to;
Fig. 2 is the flow diagram of information data synchronous method first embodiment of the present invention;
Fig. 3 is the flow diagram of information data synchronous method second embodiment of the present invention;
Fig. 4 is the flow diagram of information data synchronous method 3rd embodiment of the present invention;
Fig. 5 is the flow diagram of information data synchronous method fourth embodiment of the present invention;
Fig. 6 is the flow diagram of the 5th embodiment of information data synchronous method of the present invention;
Fig. 7 is the flow diagram of information data synchronous method sixth embodiment of the present invention;
Fig. 8 is the flow diagram of the 7th embodiment of information data synchronous method of the present invention;
Fig. 9 is that the partial data of information data synchronous method embodiment of the present invention synchronizes flow diagram;
Figure 10 is the Hive synchronization flow charts that information data synchronous method embodiment of the present invention shares metadata framework;
Figure 11 is that the Hive of the unshared metadata framework of information data synchronous method embodiment of the present invention synchronizes flow chart.
Specific embodiment
It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not intended to limit the present invention.
Prior art user flexibly cannot controllably carry out information data synchronization as needed.
In order to solve the above technical problem, the present invention provides a kind of information data synchronous method, in the method, first obtain The information data of source position is synchronized to purpose position by the job configuration information of user setting further according to the job configuration information It puts.
As shown in Figure 1, the apparatus structure schematic diagram of hardware running environment that Fig. 1, which is the embodiment of the present invention, to be related to.
Terminal of the embodiment of the present invention can be that PC or smart mobile phone, tablet computer and pocket computer etc. have The packaged type terminal device of display function.
As shown in figure 3, the terminal can include:Processor 1001, such as CPU, network interface 1004, user interface 1003, memory 1005, communication bus 1002.Wherein, communication bus 1002 is used to implement the connection communication between these components. User interface 1003 can include display screen (Display), input unit such as keyboard (Keyboard), optional user interface 1003 can also include standard wireline interface and wireless interface.Network interface 1004 can optionally connect including the wired of standard Mouth, wireless interface (such as WI-FI interfaces).Memory 1005 can be high-speed RAM memory or the memory of stabilization (non-volatile memory), such as magnetic disk storage.Memory 1005 optionally can also be independently of aforementioned processor 1001 storage device.
It will be understood by those skilled in the art that the restriction of the terminal structure shown in Fig. 1 not structure paired terminal, can wrap It includes and either combines certain components or different components arrangement than illustrating more or fewer components.
As shown in figure 3, it can lead to as in a kind of memory 1005 of computer storage media including operating system, network Believe module, Subscriber Interface Module SIM and information data synchronization program.
In terminal shown in Fig. 3, network interface 1004 is mainly used for connecting background server, is carried out with background server Data communicate;User interface 1003 is mainly used for connecting client (user terminal), with client into row data communication;And processor 1001 can be used for calling the information data synchronization program stored in memory 1005, and perform following operate:
Obtain the job configuration information of user setting;
The information data of source position is synchronized to by destination locations according to the job configuration information.
Further, processor 1001 can call the information data synchronization program stored in memory 1005, also perform It operates below:
Synchronous task is generated according to the job configuration information;
The information data of the source position is synchronized to by the destination locations according to the synchronous task.
Further, processor 1001 can call the information data synchronization program stored in memory 1005, also perform It operates below:
Described according to the synchronous task of the metadata using multithreading the metadata of the source position is synchronized to Destination locations.
Further, processor 1001 can call the information data synchronization program stored in memory 1005, also perform It operates below:
The synchronous task of at least two data is polymerize according to preset polymerization rule, obtains polymerization synchronous task;
The data of the source position are synchronized to by the destination locations according to the polymerization synchronous task.
Further, processor 1001 can call the information data synchronization program stored in memory 1005, also perform It operates below:
Monitor task is generated according to the synchronous task, and the execution of the synchronous task is obtained according to the monitor task Progress msg;
The execution state of the synchronous task is determined according to the implementation progress information;
When the execution state of the synchronous task is performs status of fail, step is performed:Obtain the operation of user setting Configuration information re-starts information data synchronization.
Further, processor 1001 can call the information data synchronization program stored in memory 1005, also perform It operates below:
When the execution state of the synchronous task is to be finished state, the first information and mesh of source position data are obtained Position data the second information, the first information or the second information include at least file size information, quantity of documents information Or check code information;
Data check is carried out according to the first information and second information.
Further, processor 1001 can call the information data synchronization program stored in memory 1005, also perform It operates below:
Receive the synchronous task implementation progress inquiry request of user's triggering;
Synchronous task implementation progress information is shown according to the inquiry request.
Based on above-mentioned hardware configuration, the embodiment of information data synchronous method of the present invention is proposed.
Present invention is mainly applied to information data field of synchronization, such as the Data Migration between data warehouse.The present embodiment By taking across the cluster visualized informative data synchronizing process based on Hadoop as an example.With the development of big data, the number of many companies The Hadoop ecospheres technology increased income all is employed according to warehouse structure to realize.Hadoop, which is one, to divide mass data The software frame of cloth processing, including tetra- modules of Common, HDFS, YARN and MapReduce.Wherein, Common is supports The public tool of other modules;HDFS is the distributed file system for providing high access performance of handling up;YARN is provides operation tune Spend the frame of sum aggregate group resource management;MapReduce is big data parallel computation frame.With the increasing of data volume and business demand It is long, the reliability requirement of data warehouse is promoted increasingly to increase severely, more sets of data warehouses turn new technical solution into, are covering more Continually synchrodata becomes a normality between data warehouse.Therefore, it is necessary to it is a set of across cluster, accurate, reliable, efficient letter It ceases data synchronization means and important support is provided, ensure the timeliness transmitted across company-data, ensure the integrality of data transmission.For For the information data of big data platform TB, PB grade synchronizes, the importance of such synchronous transfer tool system seems particularly prominent Go out.The following two kinds method of synchronization exists in the prior art.One of which uses reproduction technology, and the institute on master library is reset from daily record There is operation.Hive is a Tool for Data Warehouse based on Hadoop, can the data file of structuring be mapped as a number According to library table, and SQL query function is provided, SQL statement can be converted to MapReduce tasks and run.Hive is to data The management in warehouse includes two aspects:When the management of metadata, second is that the management of data.For Hive, general duplication Technology mainly realizes that this mode synchronous efficiency is low by the export importing and event replay of data and metadata.In addition A kind of mode realizes that this mode can not be custom-configured there are synchronous task, the same stepping of information data by Open-Source Tools The problems such as spending the user-interaction experiences difference such as invisible and inconsistent data.Utilize user during prior art progress information data synchronization Information data synchronization flexibly cannot be controllably carried out as needed.The present invention provides a kind of setting operation according to user and matches confidence The method that breath carries out information data synchronization, so as to neatly carry out information data synchronization.
With reference to Fig. 2, the first embodiment of information data synchronous method of the present invention, in the present embodiment, described information are proposed Data include data or metadata, and data are to carry the resource data of actual information content, are typically stored in data warehouse In HDFS, most inquiry is completed by MapReduce.Metadata is the data that resource data is described, main to describe The attribute information of resource data, such as the storage location of resource data.Metadata is stored in the database, such as MySQL.In Hive Metadata include the name of table, the row and subregion and its attribute of table, the attribute (whether being external table etc.) of table, the data institute of table In catalogue etc..The operation of Hive metadata services offer service by HiveMetastore.The information data of the present embodiment synchronizes Method includes:
Step S10 obtains the job configuration information of user setting;
The technical program is under the premise of Hive is positioned as off-line calculation, compared to real-time synchronization timeliness, consider loss compared with Few synchronization timeliness provides a kind of accurate, reliable, quick, visual big data general information data synchronization scheme.To understand The deficiency of certainly existing Hive information datas simultaneous techniques scheme meets the Hive information data synchronisation requirements under several scenes, this Embodiment provides operation configuration feature, and when carrying out information data synchronization, user can lead to according to the needs of actual synchronization scene The page is crossed to synchronous Hive libraries or table is needed to be configured, including synchronisation source, the essential information in purpose library or table, generation task The relevant parameters such as control parameter, the relevant parameter of Mission Monitor or data check during the parameter of mode, tasks carrying Carry out customized configuration.In order to realize the information data synchronous method of the present invention, the present embodiment provides an information data is same Step system, the information data synchronization system can include interface module, task generator, task performer and task scheduling modules Component parts are waited, after user setting job configuration information, the job configuration information of user setting, scheduler are obtained by interface module Timing has adjusted configuration information and has been sent to task generator, is generated together according to the create-rule of user configuration by task generator Task list is walked, can be the data of full dose, increment, the various ranges of history, and task performer is notified to start execution task.
The information data of source position is synchronized to destination locations by step S20 according to the job configuration information.
Based on above-mentioned steps, in the present embodiment, after scheduler module obtains the job configuration information of user setting, send To task generator, and send and start to perform assignment instructions to task performer.Task generator is according to the operation of user setting Parameter generation synchronous task list is configured, task performer performs synchronous task according to synchronous task list, completes information data Synchronization.The source position of the present embodiment refers to the initial position of data storage, can be the number of the different stages such as library or table According to storage location;Destination locations are to carry out data transfer to be different from the storage location of source position, and destination locations can also be and source The corresponding library in position or the other storage location of table level.Information data synchronization of the present embodiment including different stage, such as library are same Synchronization of step or table etc., when the information data for carrying out library rank synchronizes, source position and destination locations are respectively corresponding source Library and purpose library.Specifically, task performer is when performing synchronous task, task performer first determine whether synchronous metadata, Whether synchrodata;If will synchronize, metadata is first synchronized, resynchronizes data.It can be true according to configuration during synchrodata It is fixed whether task to be polymerize and submit polymerization task, when needing polymerization task, polymerization submission is carried out to task, by holding Row polymerization task carries out information data synchronization.In the present embodiment, task prison can also be added in information data synchronization system The component parts such as device, task verifier or alert module are controlled, by task monitor timing acquisition tasks carrying progress situation, User shows tasks carrying progress situation information when inquiring, and responds the inquiry request of user;User matches confidence in operation Verification mode is configured in breath, after information data synchronously completes, is tested by data checker to data, to source position Data and the data of destination locations carry out the inspections such as consistency, integrality and availability.Specifically, the letter with different data center For ceasing data synchronization process, a complete synchronizing process is as shown in figure 9, user can be synchronized by front end and system Or the interaction such as inquiry, after task is configured, system query metadata from the source Hive Metastore generates synchronization Task list stores in systems.Specific generating mode is determined according to task type.For batch synchronization, directly inquiry pair The library answered, table, zone name generate;For incremental data, there are two types of the modes of generation task:One kind is by Hive tables or subregion Transient_lastDdlTime attributes generate list to be synchronized, wherein, all operations of Hive tables can all update Transient_lastDdlTime property values are stabbed for the last operation deadline.Another kind is captured by Hive Hook programs Hive operation behavior events, and MySQL database is recorded, periodically polymerization generation synchronous task list.After the completion of task generation, System is synchronized according to configuration.If necessary to synchronous metadata, then completed by the 2-1 in Fig. 9 and 2-2, system passes through more Thread dispatching Hive Metastore API read metadata from source and destination, after being compared, obtain difference and perform phase The operation answered is write metadata into destination Hive Metastore.Metamessage data synchronously complete, and system is according to configuration It synchronizes.If necessary to synchrodata, Distcpjob is submitted (to divide into the YARN of cluster-specific by multithreading first Cloth copies task), wherein, Distcp is that distributed copy is work for being copied between large-scale cluster inside and cluster Tool.It realizes file distributing, error handle and recovery and report generation using Map/Reduce.It is file and catalogue Input of the list as Map tasks, each task can complete the copy of partial document in the list of source.Before submission Distcp job Library to be synchronized in source HiveMetastore, the position of table, partition data on HDFS, i.e. file path, Ran Hougen can be inquired According to user configuration, choose whether that polymerization is submitted, after the parameter that setting information data synchronize, such as bandwidth control, number of concurrent control Or CRC check etc., Distcp job are submitted in the YARN clusters that user specifies, see 3 in Fig. 5.Distcp job exist After bringing into operation in YARN clusters, data can be read from source HDFS clusters in a distributed manner and are written to purpose source HDFS clusters On, see the 4-1 and 4-2 in Fig. 9.After Distcp job are submitted, system generates the monitor task of the job, periodically from specified The details such as Distcp job implementation progresses are inquired in YARN clusters, see 5 in Fig. 9.After the completion of Distcp job are performed, monitoring is appointed Business is automatically deleted, and information data synchronously completes, system carry out data check, from the HDFS clusters of source and destination on read letter Cease data, such as the information such as file size information, catalogue number information, number of files information, check code), then carry out consistency, complete Whole property and availability verification, are shown in 6 in Fig. 9.
This extrinsic information data synchronizes the synchronization that can be also related to the file attributes such as permission, the user group of data.System it is every A step has unsuccessfully automatic retry mechanism.The synchronous regime and progress of entire synchronous task can be written in each stage in real time, User can pass through front end real time inspection.
In the present embodiment, the job configuration information of user setting is obtained;Information data synchronization is configured according to the operation The information data of source position is synchronized to destination locations by information.By the above-mentioned means, user can be same based on actual information data Step scene is arranged as required to job configuration information, carries out information data synchronization according to user setting job configuration information, improves The flexibility that information data synchronizes expands the scope of application that information data synchronizes.
Further, with reference to Fig. 3, Fig. 3 is the flow diagram of information data synchronous method second embodiment of the present invention.
Based on above-described embodiment, in the present embodiment, step S20 includes:
Step S30 generates synchronous task according to the job configuration information;
The information data of the source position is synchronized to the destination locations by step S40 according to the synchronous task.
Further, the synchronous task type includes increment synchronization and batch synchronization, and the job configuration information includes The task create-rule of user's selection, the task create-rule are included at least based on metadata attributes create-rule, are based on Hive Hook create-rules or based on the last modification time create-rule of file.
Based on above-described embodiment, in the present embodiment, the job configuration information of user setting is sent to by scheduler module appoints It is engaged in after generator, synchronous task list is generated according to the create-rule of user configuration by task generator.Match in setting operation When confidence ceases, user can select to synchronize to data, to metadata or metadata and data are synchronized simultaneously, Selection based on user, task generator produce different types of synchronous task, by by the same of the metadata of Hive and data Step is detached, and according to different frameworks, is individually synchronized, and the Hive information datas synchronization between source and destination cluster then uses Distcp directly from the HDFS of source cluster in a distributed manner in synchrodata to the HDFS of purpose cluster, reduce data it is multiple in Turn.Under the premise of metadata and data separating are individually synchronous, shared metadata framework can be set, and Hive metadata does not need to It is synchronous, it is only necessary to synchronous Hive data, as shown in Figure 10.The Hive metadata of source and destination cluster all points to same in Figure 10 Metadatabase MySQL uses the MySQL of source, when failure is sent, the Hive of source and destination cluster under normal circumstances Metadata is just switched to standby MySQL.The present embodiment provides flexible task generating mode, for example, batch synchronization or increment it is same Step.For the framework of not shared metadata, Hive metadata be required for data it is synchronous, as shown in figure 11.Source and mesh in Figure 11 The Hive of cluster respectively perform the metadatabase MySQL of oneself.For batch synchronization, under the control of multiple parameters, pass through one Secondary configuration can accurately generate various ranks, clearly task list, to reduce the complexity being repeatedly configured and check Task details visualize.For example the synchronization in a library is only configured, by filter condition, it can generate that table level is other, partition level is other Detailed task list, the progress of each tasks carrying can real time inspection.Increment synchronization:The generating mode of task can pass through Parameter selects task create-rule that system defines, pluggable.For example selection is based on Hive metadata attributes Generating mode or HDFS file of the generating mode of transient_lastDdlTime either based on Hive Hook are last Generating mode of modification time etc..The generation of increment task is realized by resetting, but uses the timing that can be configured Scheduling realizes that time range synchronously completes ginseng of the time to the time interval of current scheduled time as generation task for the last time It counts to generate task.Playback sequence imperfection is avoided, dirty data is caused to lead to the inconsistent major accident of data.
Further, with reference to Fig. 4, Fig. 4 is the flow diagram of information data synchronous method 3rd embodiment of the present invention.
Based on above-described embodiment, in the present embodiment, step S40 includes:
Step S50 utilizes multithreading by the metamessage of the source position according to the metamessage data synchronous task Data are synchronized to the destination locations.
Based on above-described embodiment, this information data synchronization scheme by the metadata of Hive and data synchronize detach, It according to different frameworks, individually synchronizes, user can need only synchronous metadata or data, this reality according to actual scene It applies example and a kind of synchronous method of synchronous metadata is provided.Based on actual scene, when user needs synchronous metadata, it is configured in operation Interface selects metadata, and other job configuration informations that metamessage data is set to synchronize.Synchronization for metadata will generate Task list by the way of multithreading, metadata is rapidly synchronized under conditions of con current control to purpose cluster.At this In embodiment, task generator generates the synchronous task list of metadata according to the job configuration information that scheduler obtains, synchronous It can include multiple synchronous tasks in task list.When task is performed by way of multithreading, the distribution of different synchronous tasks It is performed into different threads, the effective efficiency and speed for improving tasks carrying.
In the present embodiment, it is using multithreading that the metadata of the source position is same according to the synchronous task of metadata Step effectively improves the synchronous efficiency and synchronizing speed of metadata to the destination locations.
Further, with reference to Fig. 5, Fig. 5 is the flow diagram of information data synchronous method fourth embodiment of the present invention.
Based on above-described embodiment, in the present embodiment, step S40 is further included:
Step S60 will at least two data synchronous tasks polymerize according to preset polymerization rule, obtain synchronous of polymerization Business;
The data of the source position are synchronized to the destination locations by step S70 according to the polymerization synchronous task.
Based on above-described embodiment, in the present embodiment, user can only select synchrodata, and the present embodiment provides a kind of letters Cease method of data synchronization.User can choose whether the synchronous task of data polymerizeing in the setting job configuration information stage It is synchronized again into polymerization synchronous task, when user's selection synchronizes task aggregation again, selectes corresponding polymeric rule. Wherein, after the polymerization synchronous task in the present embodiment is the synchronous task list generation of data, according to selected polymeric rule pair Two or more synchronous tasks in synchronous list are polymerize to obtain.Specifically, for the synchronization of data, by each of generation The task list of kind rank adaptively carries out polymerization and submits distcp job, for example, the size according to data volume merges It submits.It is distributed MapReduce that distcp job, which are synchronized, is suitable for the data copy of larger data amount, each distcp The submission of job all can spend the time, and the submission of the distcp job of multiple small data quantities is than the distcp of a big data quantity Job can take more time.Simultaneously also by the control of concurrent parameter and bandwidth parameter, ensure not because synchronous occupancy network is special Tape is wide and influences operation system.With reference to the present embodiment and 3rd embodiment, based on the needs of actual information data synchronization scenarios, User is also possible to simultaneous selection metadata in the setting job configuration information stage and synchronizes two options synchronous with data, for first number According to situation about will be synchronized with data, metadata is first synchronized, resynchronizes data;Synchrodata can determine whether according to configuration Distcp job are submitted in polymerization.
In the present embodiment, the synchronous task of at least two data is polymerize according to preset polymerization rule, is gathered Close synchronous task;The data of the source position are synchronized to by the destination locations according to the polymerization synchronous task.By above-mentioned Synchronous task is polymerize and carries out data synchronization according to the polymerization task after polymerization, when can save data synchronization by mode Between, improve the synchronous efficiencies of data.
Further, with reference to Fig. 6, Fig. 6 is the flow diagram of the 5th embodiment of information data synchronous method of the present invention.Base In above-described embodiment, in the present embodiment, further included after step S30:
Step S80 generates monitor task, and obtain the synchronization according to the monitor task and appoint according to the synchronous task The implementation progress information of business;
Step S90 determines the execution state of the synchronous task according to the implementation progress information;
Step S100 when the execution state of the synchronous task is performs status of fail, performs step:User is obtained to set The job configuration information put re-starts information data synchronization.
Existing data synchronization technology lacks complete visualization interface, and synchronizing progress to data cannot control in time, nothing Method supports more rich data synchronization scenarios.Based on above-described embodiment, obtain what synchronous task performed the present embodiment provides a kind of The method of progress and execution state information.In the present embodiment, after performing synchronous task, task monitor regularly monitors synchronization The situation of tasks carrying.Specifically, after each distcpjob starts execution, job executive conditions can be regularly monitored, for number According to big job is measured, performing the time can be longer, adaptive using according to synchrodata amount size in order to obtain execution details in time Ground is answered to define monitoring interval time to obtain execution details.The time interval of monitoring is according to the performance and reality of system and cluster Visual control demand determines, too continually query execution progress, can aggravate the burden of cluster and synchronization system, too long of monitoring Interval is also unsatisfactory for visualization timeliness requirement.During being monitored according to monitor task to synchronous task, obtain and appoint The implementation progress information of business, wherein implementation progress information include synchronous data amount information and execution state information, including synchronization State, synchronous task perform status of fail and synchronous task and are finished state during task normally performs, in the present embodiment, if It when the execution state of synchronous task is performs status of fail, is re-executed since step S10, to be carried out again to information data It is synchronous.When execution task is to be finished state, can be verified with trigger data.System each stage performs state and progress Predeterminated position can be written and be stored for user's inquiry.
In the present embodiment, monitor task is generated according to the synchronous task, and according to obtaining the monitor task The implementation progress information of synchronous task;The execution state of the synchronous task is determined according to the implementation progress information;When described When the execution state of synchronous task is performs status of fail, step is performed:Obtain user setting job configuration information, again into Row information data synchronize.By the above-mentioned means, the executive condition to synchronous task is monitored, when synchronous task performs failure When, re-start the synchronization of information data.
Further, with reference to Fig. 7, Fig. 7 is the flow diagram of the 5th embodiment of information data synchronous method of the present invention.
Based on above-described embodiment, in the present embodiment, include after step S90:
Step S110, when the synchronous task execution state for be finished state when, obtain the of source position data Second information of one information and destination locations data, the first information or the second information include at least file size information, text Part quantity information or check code information;
Step S120 carries out data check according to the first information and second information.
Based on above-described embodiment, in the synchronizing process of data, it is related to the permission or other factors of data, it may Data is caused to synchronize and exception occur, it is possible that the problems such as such as data of source position and destination locations are inconsistent.The present embodiment A kind of data verification method is provided, after data synchronously complete and task monitor monitors the execution state of execution task and is During the state that is finished, after task monitor transmission data notice of surveys gives task verifier, task verifier to receive notice, The verification mode being configured according to user in job information configuration phase carries out the data of source position and destination locations consistency, complete The verifications such as whole property or availability.Specifically, after data synchronously complete, system carries out data check, from source position and purpose position Data information, such as the information such as file size, catalogue number, number of files, check code are read on the HDFS clusters put, then carry out one Cause property, integrality and availability verification, the information such as the file size or check code of source position and destination locations are compared, are sentenced Whether both disconnected corresponding information is consistent, so that it is determined that whether destination locations are synchronized complete consistent data.It can be used Property when examining, the metadata that can combine source position and destination locations be tested, and can judgement obtain accurate according to first number Data.After being verified to data, there is no during abnormal conditions, retention data.When the appearance of consistency, integrality or availability is different During reason condition, re-executed according to the automatic retry mechanism of failure according to the synchronous task that the information data of the source position is same Step to the destination locations step carries out data check.
In the present embodiment, the first information of source position data and the second information of destination locations data are obtained, described the One information or the second information include at least file size information, quantity of documents information or summation inspection code information;According to described One information and second information carry out data check.By the above-mentioned means, improve the accuracy of synchronous data and synchronous effect Fruit.
Further, with reference to Fig. 8, Fig. 8 is the flow diagram of information data synchronous method sixth embodiment of the present invention.
Based on above-described embodiment, in the present embodiment, further included after step S80:
Step S130 receives the synchronous task implementation progress inquiry request of user's triggering;
Step S140 shows the implementation progress information of synchronous task according to the inquiry request.
Based on above-described embodiment, corresponding monitor task can be generated, and obtain corresponding task and hold after synchronous task generation Row progress msg.In the present embodiment, user can trigger inquiry request in front end, obtain data syn-chronization situation.Specifically, may be used To select corresponding synchronous task and preset inquiry button with the page that user interacts in system, when user's progress is above-mentioned After inquiry operation, customer headend equipment sends implementation progress inquiry request, Mission Monitor by interactive interface to task monitor Last time monitoring progress msg is sent to the interaction page of user front end and is shown by device
In the present embodiment, the synchronous task implementation progress inquiry request of user's triggering is received;According to the inquiry request Show synchronous task implementation progress information.By the above-mentioned means, user can visually control tasks carrying progress.
Further, the present invention also provides a kind of information data synchronizing device, described information data synchronization unit includes depositing Reservoir, processor and the information data synchronization program that can be run on the memory and on the processor is stored in, it is described Information data synchronization program is performed realized method and can refer to information data of the present invention when being performed by the processor Each embodiment of synchronous method, details are not described herein again.
Further, in addition, the embodiment of the present invention also proposes a kind of computer readable storage medium.
Information data synchronization program, described information data synchronization program are stored on computer readable storage medium of the present invention The step of identity identifying method as described above is realized during execution.
Wherein, the information data synchronization program run on the processor is performed realized method and can refer to this The each embodiment of invention information method of data synchronization, details are not described herein again.
It should be noted that herein, term " comprising ", "comprising" or its any other variant are intended to non-row His property includes, so that process, method, article or system including a series of elements not only include those elements, and And it further includes other elements that are not explicitly listed or further includes intrinsic for this process, method, article or system institute Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including this Also there are other identical elements in the process of element, method, article or system.
The embodiments of the present invention are for illustration only, do not represent the quality of embodiment.
Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side Method can add the mode of required general hardware platform to realize by software, naturally it is also possible to by hardware, but in many cases The former is more preferably embodiment.Based on such understanding, technical scheme of the present invention substantially in other words does the prior art Going out the part of contribution can be embodied in the form of software product, which is stored in one as described above In storage medium (such as ROM/RAM, magnetic disc, CD), including some instructions use so that a station terminal equipment (can be mobile phone, Computer, server, air conditioner or network equipment etc.) perform method described in each embodiment of the present invention.
It these are only the preferred embodiment of the present invention, be not intended to limit the scope of the invention, it is every to utilize this hair The equivalent structure or equivalent flow shift that bright specification and accompanying drawing content are made directly or indirectly is used in other relevant skills Art field, is included within the scope of the present invention.

Claims (10)

1. a kind of information data synchronous method, which is characterized in that described information data include data or source data, described information number Include according to synchronous method:
Obtain the job configuration information of user setting;
The information data of source position is synchronized to by destination locations according to the job configuration information.
2. information data synchronous method as described in claim 1, which is characterized in that described to be incited somebody to action according to the job configuration information The information data of source position is synchronized to destination locations and includes:
Synchronous task is generated according to the job configuration information;
The information data of the source position is synchronized to by the destination locations according to the synchronous task.
3. information data synchronous method as claimed in claim 2, which is characterized in that described information data are metadata, described The information data of the source position is synchronized to the destination locations according to the synchronous task to include:
The metadata of the source position is synchronized to by the purpose using multithreading according to the synchronous task of the metadata Position.
4. information data synchronous method as claimed in claim 2, which is characterized in that described information data be data, described The information data of the source position is synchronized to the destination locations according to the synchronous task to further include:
The synchronous task of at least two data is polymerize according to preset polymerization rule, obtains polymerization synchronous task;
The data of the source position are synchronized to by the destination locations according to the polymerization synchronous task.
5. information data synchronous method as claimed in claim 2, which is characterized in that described to be given birth to according to the job configuration information It is further included into after the step of synchronous task:
Monitor task is generated according to the synchronous task, and the implementation progress of the synchronous task is obtained according to the monitor task Information;
The execution state of the synchronous task is determined according to the implementation progress information;
When the execution state of the synchronous task is performs status of fail, step is performed:Obtain the operation configuration of user setting Information re-starts information data synchronization.
6. information data synchronous method as claimed in claim 5, which is characterized in that described true according to the implementation progress information It is further included after the step of execution state of the fixed synchronous task:
When the execution state of the synchronous task is to be finished state, the first information of source position data and purpose position are obtained The second information of data is put, the first information or the second information include at least file size information, quantity of documents information or inspection Test a yard information;
Data check is carried out according to the first information and second information.
7. such as information data synchronous method described in claim 5 or 6, which is characterized in that described to be given birth to according to the synchronous task Into monitor task, and the step of obtain the implementation progress information of the synchronous task according to the monitor task after further include:
Receive the synchronous task implementation progress inquiry request of user's triggering;
The implementation progress information of synchronous task is shown according to the inquiry request.
8. information data synchronous method as claimed in claim 2, which is characterized in that it is same that the synchronous task type includes increment Step and batch synchronization, the job configuration information include the task create-rule of user's selection, and the task create-rule is at least Including being based on metadata attributes create-rule, based on Hive Hook create-rules or based on the last modification time generation rule of file Then.
9. a kind of information data synchronizing device, which is characterized in that described information data synchronization unit includes:Memory, processor And the information data synchronization program that can be run on the memory and on the processor is stored in, described information data synchronize Program realizes the step of information data synchronous method according to any one of claims 1 to 8 when being performed by the processor Suddenly.
10. a kind of computer readable storage medium, which is characterized in that be stored with Information Number on the computer readable storage medium According to synchronization program, realized when described information data synchronization program is executed by processor as described in any item of the claim 1 to 8 The step of information data synchronous method.
CN201711293634.0A 2017-12-08 2017-12-08 Information data synchronous method, device and computer readable storage medium Pending CN108197155A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711293634.0A CN108197155A (en) 2017-12-08 2017-12-08 Information data synchronous method, device and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711293634.0A CN108197155A (en) 2017-12-08 2017-12-08 Information data synchronous method, device and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN108197155A true CN108197155A (en) 2018-06-22

Family

ID=62573688

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711293634.0A Pending CN108197155A (en) 2017-12-08 2017-12-08 Information data synchronous method, device and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN108197155A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109977169A (en) * 2019-03-19 2019-07-05 广州品唯软件有限公司 Method of data synchronization, device, computer readable storage medium and system
CN110175159A (en) * 2019-05-29 2019-08-27 京东数字科技控股有限公司 Method of data synchronization and system for object storage cluster
CN110704393A (en) * 2019-08-30 2020-01-17 北京浪潮数据技术有限公司 Data monitoring method and device for Hive data warehouse
CN110888760A (en) * 2019-11-26 2020-03-17 中国工商银行股份有限公司 Data recovery method and device, and data processing method and device
CN112395287A (en) * 2019-08-19 2021-02-23 北京国双科技有限公司 Table classification method, table creation method, device, equipment and medium
CN112751938A (en) * 2020-12-30 2021-05-04 上海赋算通云计算科技有限公司 Real-time data synchronization system based on multi-cluster operation, implementation method and storage medium
CN112948494A (en) * 2021-03-04 2021-06-11 北京沃东天骏信息技术有限公司 Data synchronization method and device, electronic equipment and computer readable medium
CN113742420A (en) * 2021-08-09 2021-12-03 广州市易工品科技有限公司 Data synchronization method and device
CN116383310A (en) * 2023-06-02 2023-07-04 天津金城银行股份有限公司 Method, device, equipment and storage medium for synchronizing daily terminal files

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005229509A (en) * 2004-02-16 2005-08-25 Ricoh Co Ltd Content metadata transmission/reception system, content metadata synchronizing method, program for making computer execute the method, and reception terminal in which content and metadata are associated with each other
CN101551801A (en) * 2008-03-31 2009-10-07 国际商业机器公司 Data synchronization method and data synchronization system
CN101854400A (en) * 2010-06-09 2010-10-06 中兴通讯股份有限公司 Database synchronization deployment and monitoring method and device
CN103761162A (en) * 2014-01-11 2014-04-30 深圳清华大学研究院 Data backup method of distributed file system
CN103875229A (en) * 2013-12-02 2014-06-18 华为技术有限公司 Asynchronous replication method, device and system
CN106101265A (en) * 2016-07-26 2016-11-09 浪潮软件股份有限公司 A kind of method carrying out file synchronization between Dropbox and desktop end
CN106919346A (en) * 2017-02-21 2017-07-04 无锡华云数据技术服务有限公司 A kind of shared Storage Virtualization implementation method based on CLVM

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005229509A (en) * 2004-02-16 2005-08-25 Ricoh Co Ltd Content metadata transmission/reception system, content metadata synchronizing method, program for making computer execute the method, and reception terminal in which content and metadata are associated with each other
CN101551801A (en) * 2008-03-31 2009-10-07 国际商业机器公司 Data synchronization method and data synchronization system
CN101854400A (en) * 2010-06-09 2010-10-06 中兴通讯股份有限公司 Database synchronization deployment and monitoring method and device
CN103875229A (en) * 2013-12-02 2014-06-18 华为技术有限公司 Asynchronous replication method, device and system
CN103761162A (en) * 2014-01-11 2014-04-30 深圳清华大学研究院 Data backup method of distributed file system
CN106101265A (en) * 2016-07-26 2016-11-09 浪潮软件股份有限公司 A kind of method carrying out file synchronization between Dropbox and desktop end
CN106919346A (en) * 2017-02-21 2017-07-04 无锡华云数据技术服务有限公司 A kind of shared Storage Virtualization implementation method based on CLVM

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109977169A (en) * 2019-03-19 2019-07-05 广州品唯软件有限公司 Method of data synchronization, device, computer readable storage medium and system
CN110175159A (en) * 2019-05-29 2019-08-27 京东数字科技控股有限公司 Method of data synchronization and system for object storage cluster
CN112395287A (en) * 2019-08-19 2021-02-23 北京国双科技有限公司 Table classification method, table creation method, device, equipment and medium
CN110704393A (en) * 2019-08-30 2020-01-17 北京浪潮数据技术有限公司 Data monitoring method and device for Hive data warehouse
CN110888760A (en) * 2019-11-26 2020-03-17 中国工商银行股份有限公司 Data recovery method and device, and data processing method and device
CN112751938A (en) * 2020-12-30 2021-05-04 上海赋算通云计算科技有限公司 Real-time data synchronization system based on multi-cluster operation, implementation method and storage medium
CN112948494A (en) * 2021-03-04 2021-06-11 北京沃东天骏信息技术有限公司 Data synchronization method and device, electronic equipment and computer readable medium
CN113742420A (en) * 2021-08-09 2021-12-03 广州市易工品科技有限公司 Data synchronization method and device
CN113742420B (en) * 2021-08-09 2024-02-02 广州市易工品科技有限公司 Data synchronization method and device
CN116383310A (en) * 2023-06-02 2023-07-04 天津金城银行股份有限公司 Method, device, equipment and storage medium for synchronizing daily terminal files
CN116383310B (en) * 2023-06-02 2023-08-04 天津金城银行股份有限公司 Method, device, equipment and storage medium for synchronizing daily terminal files

Similar Documents

Publication Publication Date Title
CN108197155A (en) Information data synchronous method, device and computer readable storage medium
CN106844198B (en) Distributed dispatching automation test platform and method
CN108304255A (en) Distributed task dispatching method and device, electronic equipment and readable storage medium storing program for executing
CN111026635B (en) Software project testing system, method, device and storage medium
CN109582466A (en) A kind of timed task executes method, distributed server cluster and electronic equipment
CN106406993A (en) Timed task management method and system
US8943127B2 (en) Techniques for capturing data sets
CN111125444A (en) Big data task scheduling management method, device, equipment and storage medium
CN113590386B (en) Disaster recovery method, system, terminal device and computer storage medium for data
CN106446168B (en) A kind of load client realization method of Based on Distributed data warehouse
CN108491254A (en) A kind of dispatching method and device of data warehouse
CN108446326B (en) A kind of isomeric data management method and system based on container
CN111784318A (en) Data processing method and device, electronic equipment and storage medium
CN110099084A (en) A kind of method, system and computer-readable medium guaranteeing storage service availability
JP2004038516A (en) Work processing system, operation management method and program for performing operation management
CN113760513A (en) Distributed task scheduling method, device, equipment and medium
CN109992373A (en) Resource regulating method, approaches to IM and device and task deployment system
CN106850724A (en) Data push method and device
CN113419872A (en) Application system interface integration system, integration method, equipment and storage medium
US9977726B2 (en) System and method for smart framework for network backup software debugging
CN111143177B (en) Method, system, device and storage medium for collecting RMF III data of IBM host
CN112015534A (en) Configurated platform scheduling method, system and storage medium
Weidmann et al. Conception and Installation of System Monitoring Using the SAP Solution Manager
CN112449061B (en) Outbound task allocation method and device, computer equipment and readable storage medium
CN117519838B (en) AI workflow modeling method, related device, equipment, system and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180622

RJ01 Rejection of invention patent application after publication