CN108197155A - Information data synchronous method, device and computer readable storage medium - Google Patents
Information data synchronous method, device and computer readable storage medium Download PDFInfo
- Publication number
- CN108197155A CN108197155A CN201711293634.0A CN201711293634A CN108197155A CN 108197155 A CN108197155 A CN 108197155A CN 201711293634 A CN201711293634 A CN 201711293634A CN 108197155 A CN108197155 A CN 108197155A
- Authority
- CN
- China
- Prior art keywords
- information
- data
- task
- synchronous
- information data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/285—Clustering or classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
- G06F16/275—Synchronous replication
Abstract
The invention discloses a kind of information data synchronous method, including:Obtain the job configuration information of user setting;The information data of source position is synchronized to by destination locations according to the job configuration information.The invention also discloses a kind of information data synchronizing device, computer readable storage mediums.The present invention improves the flexibility that information data synchronizes.
Description
Technical field
The present invention relates to technical field of data processing more particularly to a kind of information data synchronous method, device and computers
Readable storage medium storing program for executing.
Background technology
With the development of big data, the data warehouse structure of many companies all employs the Hadoop ecosphere technologies increased income
It realizes.With the growth of data volume and business demand, the reliability requirement of data warehouse is promoted increasingly to increase severely, more sets of data warehouses
Turn new technical solution into, between more sets of data warehouses continually synchrodata become a normality.Therefore, it is necessary to
It is a set of across cluster, accurate, reliable, efficient information data synchronization means important support is provided, ensure what is transmitted across company-data
Timeliness ensures the integrality of data transmission.For the information data of big data platform TB, PB grade synchronizes, such synchronization
The importance of delivery means system seems especially prominent.The following two kinds method of synchronization exists in the prior art.One of which uses
Reproduction technology resets all operations on master library from daily record.For Hive, general reproduction technology mainly passes through number
According to and metadata export importing and event replay realize that this mode synchronous efficiency is low.Another mode is by increasing income
Tool realizes, this mode can not custom-configure there are synchronous task, information data synchronizes the users such as progress is invisible and hands over
The problems such as mutually experiencing difference and inconsistent data.It cannot flexibly as needed using user during prior art progress information data synchronization
Controllably carry out information data synchronization.
Invention content
It is a primary object of the present invention to provide a kind of information data synchronous method, it is intended to which solving prior art user cannot
The problem of flexibly controllably carrying out information data synchronization as needed.
To achieve the above object, the present invention provides a kind of information data synchronous method, described information method of data synchronization packet
It includes:
Obtain the job configuration information of user setting;
The information data of source position is synchronized to by destination locations according to the job configuration information.
Preferably, it is described the information data of source position is synchronized to by destination locations according to the job configuration information to include:
Synchronous task is generated according to the job configuration information;
The information data of the source position is synchronized to by the destination locations according to the synchronous task.
Preferably, described information data are metadata, it is described according to the synchronous task by the Information Number of the source position
Include according to the destination locations are synchronized to:
Described according to the synchronous task of the metadata using multithreading the metadata of the source position is synchronized to
Destination locations.
Preferably, described information data are data, it is described according to the synchronous task by the information data of the source position
The destination locations are synchronized to further include:
The synchronous task of at least two data is polymerize according to preset polymerization rule, obtains polymerization synchronous task;
The data of the source position are synchronized to by the destination locations according to the polymerization synchronous task.
Preferably, it is further included after described the step of generating synchronous task according to the job configuration information:
Monitor task is generated according to the synchronous task, and the execution of the synchronous task is obtained according to the monitor task
Progress msg;
The execution state of the synchronous task is determined according to the implementation progress information;
When the execution state of the synchronous task is performs status of fail, step is performed:Obtain the operation of user setting
Configuration information re-starts information data synchronization.
Preferably, it is described the synchronous task is determined according to the implementation progress information execution state the step of after also
Including:
When the execution state of the synchronous task is to be finished state, the first information and mesh of source position data are obtained
Position data the second information, the first information or the second information include at least file size information, quantity of documents information
Or check code information;
Data check is carried out according to the first information and second information.
Preferably, it is further included after described the step of generating synchronous task according to the job configuration information:
Receive the synchronous task implementation progress inquiry request of user's triggering;
Synchronous task implementation progress information is shown according to the inquiry request.
Preferably, the synchronous task type includes increment synchronization and batch synchronization, and the job configuration information includes using
The task create-rule of family selection, the task create-rule are included at least based on metadata attributes create-rule, based on Hive
Hook create-rules or based on the last modification time create-rule of file.
In addition, to achieve the above object, the present invention also provides a kind of information data synchronizing device, described information data synchronize
Device includes:Memory, processor and to be stored in the information data that can be run on the memory and on the processor same
Program is walked, described information data synchronization program realizes the step of information data synchronous method as described above when being performed by the processor
Suddenly.
In addition, to achieve the above object, it is described computer-readable the present invention also provides a kind of computer readable storage medium
Information data synchronization program is stored on storage medium, as above institute is realized when described information data synchronization program is executed by processor
The step of information data synchronous method stated.
The present invention provides a kind of information data synchronous method, device and computer readable storage medium, described information data
Synchronous method includes:Obtain the job configuration information of user setting;Information data synchronization is according to the job configuration information by source
The information data of position is synchronized to destination locations.By the above-mentioned means, user can be based on actual information data synchronization scenarios root
According to setting job configuration information is needed, information data synchronization is carried out according to user setting job configuration information, improves Information Number
According to synchronous flexibility.
Description of the drawings
Fig. 1 is the apparatus structure schematic diagram of hardware running environment that the embodiment of the present invention is related to;
Fig. 2 is the flow diagram of information data synchronous method first embodiment of the present invention;
Fig. 3 is the flow diagram of information data synchronous method second embodiment of the present invention;
Fig. 4 is the flow diagram of information data synchronous method 3rd embodiment of the present invention;
Fig. 5 is the flow diagram of information data synchronous method fourth embodiment of the present invention;
Fig. 6 is the flow diagram of the 5th embodiment of information data synchronous method of the present invention;
Fig. 7 is the flow diagram of information data synchronous method sixth embodiment of the present invention;
Fig. 8 is the flow diagram of the 7th embodiment of information data synchronous method of the present invention;
Fig. 9 is that the partial data of information data synchronous method embodiment of the present invention synchronizes flow diagram;
Figure 10 is the Hive synchronization flow charts that information data synchronous method embodiment of the present invention shares metadata framework;
Figure 11 is that the Hive of the unshared metadata framework of information data synchronous method embodiment of the present invention synchronizes flow chart.
Specific embodiment
It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not intended to limit the present invention.
Prior art user flexibly cannot controllably carry out information data synchronization as needed.
In order to solve the above technical problem, the present invention provides a kind of information data synchronous method, in the method, first obtain
The information data of source position is synchronized to purpose position by the job configuration information of user setting further according to the job configuration information
It puts.
As shown in Figure 1, the apparatus structure schematic diagram of hardware running environment that Fig. 1, which is the embodiment of the present invention, to be related to.
Terminal of the embodiment of the present invention can be that PC or smart mobile phone, tablet computer and pocket computer etc. have
The packaged type terminal device of display function.
As shown in figure 3, the terminal can include:Processor 1001, such as CPU, network interface 1004, user interface
1003, memory 1005, communication bus 1002.Wherein, communication bus 1002 is used to implement the connection communication between these components.
User interface 1003 can include display screen (Display), input unit such as keyboard (Keyboard), optional user interface
1003 can also include standard wireline interface and wireless interface.Network interface 1004 can optionally connect including the wired of standard
Mouth, wireless interface (such as WI-FI interfaces).Memory 1005 can be high-speed RAM memory or the memory of stabilization
(non-volatile memory), such as magnetic disk storage.Memory 1005 optionally can also be independently of aforementioned processor
1001 storage device.
It will be understood by those skilled in the art that the restriction of the terminal structure shown in Fig. 1 not structure paired terminal, can wrap
It includes and either combines certain components or different components arrangement than illustrating more or fewer components.
As shown in figure 3, it can lead to as in a kind of memory 1005 of computer storage media including operating system, network
Believe module, Subscriber Interface Module SIM and information data synchronization program.
In terminal shown in Fig. 3, network interface 1004 is mainly used for connecting background server, is carried out with background server
Data communicate;User interface 1003 is mainly used for connecting client (user terminal), with client into row data communication;And processor
1001 can be used for calling the information data synchronization program stored in memory 1005, and perform following operate:
Obtain the job configuration information of user setting;
The information data of source position is synchronized to by destination locations according to the job configuration information.
Further, processor 1001 can call the information data synchronization program stored in memory 1005, also perform
It operates below:
Synchronous task is generated according to the job configuration information;
The information data of the source position is synchronized to by the destination locations according to the synchronous task.
Further, processor 1001 can call the information data synchronization program stored in memory 1005, also perform
It operates below:
Described according to the synchronous task of the metadata using multithreading the metadata of the source position is synchronized to
Destination locations.
Further, processor 1001 can call the information data synchronization program stored in memory 1005, also perform
It operates below:
The synchronous task of at least two data is polymerize according to preset polymerization rule, obtains polymerization synchronous task;
The data of the source position are synchronized to by the destination locations according to the polymerization synchronous task.
Further, processor 1001 can call the information data synchronization program stored in memory 1005, also perform
It operates below:
Monitor task is generated according to the synchronous task, and the execution of the synchronous task is obtained according to the monitor task
Progress msg;
The execution state of the synchronous task is determined according to the implementation progress information;
When the execution state of the synchronous task is performs status of fail, step is performed:Obtain the operation of user setting
Configuration information re-starts information data synchronization.
Further, processor 1001 can call the information data synchronization program stored in memory 1005, also perform
It operates below:
When the execution state of the synchronous task is to be finished state, the first information and mesh of source position data are obtained
Position data the second information, the first information or the second information include at least file size information, quantity of documents information
Or check code information;
Data check is carried out according to the first information and second information.
Further, processor 1001 can call the information data synchronization program stored in memory 1005, also perform
It operates below:
Receive the synchronous task implementation progress inquiry request of user's triggering;
Synchronous task implementation progress information is shown according to the inquiry request.
Based on above-mentioned hardware configuration, the embodiment of information data synchronous method of the present invention is proposed.
Present invention is mainly applied to information data field of synchronization, such as the Data Migration between data warehouse.The present embodiment
By taking across the cluster visualized informative data synchronizing process based on Hadoop as an example.With the development of big data, the number of many companies
The Hadoop ecospheres technology increased income all is employed according to warehouse structure to realize.Hadoop, which is one, to divide mass data
The software frame of cloth processing, including tetra- modules of Common, HDFS, YARN and MapReduce.Wherein, Common is supports
The public tool of other modules;HDFS is the distributed file system for providing high access performance of handling up;YARN is provides operation tune
Spend the frame of sum aggregate group resource management;MapReduce is big data parallel computation frame.With the increasing of data volume and business demand
It is long, the reliability requirement of data warehouse is promoted increasingly to increase severely, more sets of data warehouses turn new technical solution into, are covering more
Continually synchrodata becomes a normality between data warehouse.Therefore, it is necessary to it is a set of across cluster, accurate, reliable, efficient letter
It ceases data synchronization means and important support is provided, ensure the timeliness transmitted across company-data, ensure the integrality of data transmission.For
For the information data of big data platform TB, PB grade synchronizes, the importance of such synchronous transfer tool system seems particularly prominent
Go out.The following two kinds method of synchronization exists in the prior art.One of which uses reproduction technology, and the institute on master library is reset from daily record
There is operation.Hive is a Tool for Data Warehouse based on Hadoop, can the data file of structuring be mapped as a number
According to library table, and SQL query function is provided, SQL statement can be converted to MapReduce tasks and run.Hive is to data
The management in warehouse includes two aspects:When the management of metadata, second is that the management of data.For Hive, general duplication
Technology mainly realizes that this mode synchronous efficiency is low by the export importing and event replay of data and metadata.In addition
A kind of mode realizes that this mode can not be custom-configured there are synchronous task, the same stepping of information data by Open-Source Tools
The problems such as spending the user-interaction experiences difference such as invisible and inconsistent data.Utilize user during prior art progress information data synchronization
Information data synchronization flexibly cannot be controllably carried out as needed.The present invention provides a kind of setting operation according to user and matches confidence
The method that breath carries out information data synchronization, so as to neatly carry out information data synchronization.
With reference to Fig. 2, the first embodiment of information data synchronous method of the present invention, in the present embodiment, described information are proposed
Data include data or metadata, and data are to carry the resource data of actual information content, are typically stored in data warehouse
In HDFS, most inquiry is completed by MapReduce.Metadata is the data that resource data is described, main to describe
The attribute information of resource data, such as the storage location of resource data.Metadata is stored in the database, such as MySQL.In Hive
Metadata include the name of table, the row and subregion and its attribute of table, the attribute (whether being external table etc.) of table, the data institute of table
In catalogue etc..The operation of Hive metadata services offer service by HiveMetastore.The information data of the present embodiment synchronizes
Method includes:
Step S10 obtains the job configuration information of user setting;
The technical program is under the premise of Hive is positioned as off-line calculation, compared to real-time synchronization timeliness, consider loss compared with
Few synchronization timeliness provides a kind of accurate, reliable, quick, visual big data general information data synchronization scheme.To understand
The deficiency of certainly existing Hive information datas simultaneous techniques scheme meets the Hive information data synchronisation requirements under several scenes, this
Embodiment provides operation configuration feature, and when carrying out information data synchronization, user can lead to according to the needs of actual synchronization scene
The page is crossed to synchronous Hive libraries or table is needed to be configured, including synchronisation source, the essential information in purpose library or table, generation task
The relevant parameters such as control parameter, the relevant parameter of Mission Monitor or data check during the parameter of mode, tasks carrying
Carry out customized configuration.In order to realize the information data synchronous method of the present invention, the present embodiment provides an information data is same
Step system, the information data synchronization system can include interface module, task generator, task performer and task scheduling modules
Component parts are waited, after user setting job configuration information, the job configuration information of user setting, scheduler are obtained by interface module
Timing has adjusted configuration information and has been sent to task generator, is generated together according to the create-rule of user configuration by task generator
Task list is walked, can be the data of full dose, increment, the various ranges of history, and task performer is notified to start execution task.
The information data of source position is synchronized to destination locations by step S20 according to the job configuration information.
Based on above-mentioned steps, in the present embodiment, after scheduler module obtains the job configuration information of user setting, send
To task generator, and send and start to perform assignment instructions to task performer.Task generator is according to the operation of user setting
Parameter generation synchronous task list is configured, task performer performs synchronous task according to synchronous task list, completes information data
Synchronization.The source position of the present embodiment refers to the initial position of data storage, can be the number of the different stages such as library or table
According to storage location;Destination locations are to carry out data transfer to be different from the storage location of source position, and destination locations can also be and source
The corresponding library in position or the other storage location of table level.Information data synchronization of the present embodiment including different stage, such as library are same
Synchronization of step or table etc., when the information data for carrying out library rank synchronizes, source position and destination locations are respectively corresponding source
Library and purpose library.Specifically, task performer is when performing synchronous task, task performer first determine whether synchronous metadata,
Whether synchrodata;If will synchronize, metadata is first synchronized, resynchronizes data.It can be true according to configuration during synchrodata
It is fixed whether task to be polymerize and submit polymerization task, when needing polymerization task, polymerization submission is carried out to task, by holding
Row polymerization task carries out information data synchronization.In the present embodiment, task prison can also be added in information data synchronization system
The component parts such as device, task verifier or alert module are controlled, by task monitor timing acquisition tasks carrying progress situation,
User shows tasks carrying progress situation information when inquiring, and responds the inquiry request of user;User matches confidence in operation
Verification mode is configured in breath, after information data synchronously completes, is tested by data checker to data, to source position
Data and the data of destination locations carry out the inspections such as consistency, integrality and availability.Specifically, the letter with different data center
For ceasing data synchronization process, a complete synchronizing process is as shown in figure 9, user can be synchronized by front end and system
Or the interaction such as inquiry, after task is configured, system query metadata from the source Hive Metastore generates synchronization
Task list stores in systems.Specific generating mode is determined according to task type.For batch synchronization, directly inquiry pair
The library answered, table, zone name generate;For incremental data, there are two types of the modes of generation task:One kind is by Hive tables or subregion
Transient_lastDdlTime attributes generate list to be synchronized, wherein, all operations of Hive tables can all update
Transient_lastDdlTime property values are stabbed for the last operation deadline.Another kind is captured by Hive Hook programs
Hive operation behavior events, and MySQL database is recorded, periodically polymerization generation synchronous task list.After the completion of task generation,
System is synchronized according to configuration.If necessary to synchronous metadata, then completed by the 2-1 in Fig. 9 and 2-2, system passes through more
Thread dispatching Hive Metastore API read metadata from source and destination, after being compared, obtain difference and perform phase
The operation answered is write metadata into destination Hive Metastore.Metamessage data synchronously complete, and system is according to configuration
It synchronizes.If necessary to synchrodata, Distcpjob is submitted (to divide into the YARN of cluster-specific by multithreading first
Cloth copies task), wherein, Distcp is that distributed copy is work for being copied between large-scale cluster inside and cluster
Tool.It realizes file distributing, error handle and recovery and report generation using Map/Reduce.It is file and catalogue
Input of the list as Map tasks, each task can complete the copy of partial document in the list of source.Before submission Distcp job
Library to be synchronized in source HiveMetastore, the position of table, partition data on HDFS, i.e. file path, Ran Hougen can be inquired
According to user configuration, choose whether that polymerization is submitted, after the parameter that setting information data synchronize, such as bandwidth control, number of concurrent control
Or CRC check etc., Distcp job are submitted in the YARN clusters that user specifies, see 3 in Fig. 5.Distcp job exist
After bringing into operation in YARN clusters, data can be read from source HDFS clusters in a distributed manner and are written to purpose source HDFS clusters
On, see the 4-1 and 4-2 in Fig. 9.After Distcp job are submitted, system generates the monitor task of the job, periodically from specified
The details such as Distcp job implementation progresses are inquired in YARN clusters, see 5 in Fig. 9.After the completion of Distcp job are performed, monitoring is appointed
Business is automatically deleted, and information data synchronously completes, system carry out data check, from the HDFS clusters of source and destination on read letter
Cease data, such as the information such as file size information, catalogue number information, number of files information, check code), then carry out consistency, complete
Whole property and availability verification, are shown in 6 in Fig. 9.
This extrinsic information data synchronizes the synchronization that can be also related to the file attributes such as permission, the user group of data.System it is every
A step has unsuccessfully automatic retry mechanism.The synchronous regime and progress of entire synchronous task can be written in each stage in real time,
User can pass through front end real time inspection.
In the present embodiment, the job configuration information of user setting is obtained;Information data synchronization is configured according to the operation
The information data of source position is synchronized to destination locations by information.By the above-mentioned means, user can be same based on actual information data
Step scene is arranged as required to job configuration information, carries out information data synchronization according to user setting job configuration information, improves
The flexibility that information data synchronizes expands the scope of application that information data synchronizes.
Further, with reference to Fig. 3, Fig. 3 is the flow diagram of information data synchronous method second embodiment of the present invention.
Based on above-described embodiment, in the present embodiment, step S20 includes:
Step S30 generates synchronous task according to the job configuration information;
The information data of the source position is synchronized to the destination locations by step S40 according to the synchronous task.
Further, the synchronous task type includes increment synchronization and batch synchronization, and the job configuration information includes
The task create-rule of user's selection, the task create-rule are included at least based on metadata attributes create-rule, are based on
Hive Hook create-rules or based on the last modification time create-rule of file.
Based on above-described embodiment, in the present embodiment, the job configuration information of user setting is sent to by scheduler module appoints
It is engaged in after generator, synchronous task list is generated according to the create-rule of user configuration by task generator.Match in setting operation
When confidence ceases, user can select to synchronize to data, to metadata or metadata and data are synchronized simultaneously,
Selection based on user, task generator produce different types of synchronous task, by by the same of the metadata of Hive and data
Step is detached, and according to different frameworks, is individually synchronized, and the Hive information datas synchronization between source and destination cluster then uses
Distcp directly from the HDFS of source cluster in a distributed manner in synchrodata to the HDFS of purpose cluster, reduce data it is multiple in
Turn.Under the premise of metadata and data separating are individually synchronous, shared metadata framework can be set, and Hive metadata does not need to
It is synchronous, it is only necessary to synchronous Hive data, as shown in Figure 10.The Hive metadata of source and destination cluster all points to same in Figure 10
Metadatabase MySQL uses the MySQL of source, when failure is sent, the Hive of source and destination cluster under normal circumstances
Metadata is just switched to standby MySQL.The present embodiment provides flexible task generating mode, for example, batch synchronization or increment it is same
Step.For the framework of not shared metadata, Hive metadata be required for data it is synchronous, as shown in figure 11.Source and mesh in Figure 11
The Hive of cluster respectively perform the metadatabase MySQL of oneself.For batch synchronization, under the control of multiple parameters, pass through one
Secondary configuration can accurately generate various ranks, clearly task list, to reduce the complexity being repeatedly configured and check
Task details visualize.For example the synchronization in a library is only configured, by filter condition, it can generate that table level is other, partition level is other
Detailed task list, the progress of each tasks carrying can real time inspection.Increment synchronization:The generating mode of task can pass through
Parameter selects task create-rule that system defines, pluggable.For example selection is based on Hive metadata attributes
Generating mode or HDFS file of the generating mode of transient_lastDdlTime either based on Hive Hook are last
Generating mode of modification time etc..The generation of increment task is realized by resetting, but uses the timing that can be configured
Scheduling realizes that time range synchronously completes ginseng of the time to the time interval of current scheduled time as generation task for the last time
It counts to generate task.Playback sequence imperfection is avoided, dirty data is caused to lead to the inconsistent major accident of data.
Further, with reference to Fig. 4, Fig. 4 is the flow diagram of information data synchronous method 3rd embodiment of the present invention.
Based on above-described embodiment, in the present embodiment, step S40 includes:
Step S50 utilizes multithreading by the metamessage of the source position according to the metamessage data synchronous task
Data are synchronized to the destination locations.
Based on above-described embodiment, this information data synchronization scheme by the metadata of Hive and data synchronize detach,
It according to different frameworks, individually synchronizes, user can need only synchronous metadata or data, this reality according to actual scene
It applies example and a kind of synchronous method of synchronous metadata is provided.Based on actual scene, when user needs synchronous metadata, it is configured in operation
Interface selects metadata, and other job configuration informations that metamessage data is set to synchronize.Synchronization for metadata will generate
Task list by the way of multithreading, metadata is rapidly synchronized under conditions of con current control to purpose cluster.At this
In embodiment, task generator generates the synchronous task list of metadata according to the job configuration information that scheduler obtains, synchronous
It can include multiple synchronous tasks in task list.When task is performed by way of multithreading, the distribution of different synchronous tasks
It is performed into different threads, the effective efficiency and speed for improving tasks carrying.
In the present embodiment, it is using multithreading that the metadata of the source position is same according to the synchronous task of metadata
Step effectively improves the synchronous efficiency and synchronizing speed of metadata to the destination locations.
Further, with reference to Fig. 5, Fig. 5 is the flow diagram of information data synchronous method fourth embodiment of the present invention.
Based on above-described embodiment, in the present embodiment, step S40 is further included:
Step S60 will at least two data synchronous tasks polymerize according to preset polymerization rule, obtain synchronous of polymerization
Business;
The data of the source position are synchronized to the destination locations by step S70 according to the polymerization synchronous task.
Based on above-described embodiment, in the present embodiment, user can only select synchrodata, and the present embodiment provides a kind of letters
Cease method of data synchronization.User can choose whether the synchronous task of data polymerizeing in the setting job configuration information stage
It is synchronized again into polymerization synchronous task, when user's selection synchronizes task aggregation again, selectes corresponding polymeric rule.
Wherein, after the polymerization synchronous task in the present embodiment is the synchronous task list generation of data, according to selected polymeric rule pair
Two or more synchronous tasks in synchronous list are polymerize to obtain.Specifically, for the synchronization of data, by each of generation
The task list of kind rank adaptively carries out polymerization and submits distcp job, for example, the size according to data volume merges
It submits.It is distributed MapReduce that distcp job, which are synchronized, is suitable for the data copy of larger data amount, each distcp
The submission of job all can spend the time, and the submission of the distcp job of multiple small data quantities is than the distcp of a big data quantity
Job can take more time.Simultaneously also by the control of concurrent parameter and bandwidth parameter, ensure not because synchronous occupancy network is special
Tape is wide and influences operation system.With reference to the present embodiment and 3rd embodiment, based on the needs of actual information data synchronization scenarios,
User is also possible to simultaneous selection metadata in the setting job configuration information stage and synchronizes two options synchronous with data, for first number
According to situation about will be synchronized with data, metadata is first synchronized, resynchronizes data;Synchrodata can determine whether according to configuration
Distcp job are submitted in polymerization.
In the present embodiment, the synchronous task of at least two data is polymerize according to preset polymerization rule, is gathered
Close synchronous task;The data of the source position are synchronized to by the destination locations according to the polymerization synchronous task.By above-mentioned
Synchronous task is polymerize and carries out data synchronization according to the polymerization task after polymerization, when can save data synchronization by mode
Between, improve the synchronous efficiencies of data.
Further, with reference to Fig. 6, Fig. 6 is the flow diagram of the 5th embodiment of information data synchronous method of the present invention.Base
In above-described embodiment, in the present embodiment, further included after step S30:
Step S80 generates monitor task, and obtain the synchronization according to the monitor task and appoint according to the synchronous task
The implementation progress information of business;
Step S90 determines the execution state of the synchronous task according to the implementation progress information;
Step S100 when the execution state of the synchronous task is performs status of fail, performs step:User is obtained to set
The job configuration information put re-starts information data synchronization.
Existing data synchronization technology lacks complete visualization interface, and synchronizing progress to data cannot control in time, nothing
Method supports more rich data synchronization scenarios.Based on above-described embodiment, obtain what synchronous task performed the present embodiment provides a kind of
The method of progress and execution state information.In the present embodiment, after performing synchronous task, task monitor regularly monitors synchronization
The situation of tasks carrying.Specifically, after each distcpjob starts execution, job executive conditions can be regularly monitored, for number
According to big job is measured, performing the time can be longer, adaptive using according to synchrodata amount size in order to obtain execution details in time
Ground is answered to define monitoring interval time to obtain execution details.The time interval of monitoring is according to the performance and reality of system and cluster
Visual control demand determines, too continually query execution progress, can aggravate the burden of cluster and synchronization system, too long of monitoring
Interval is also unsatisfactory for visualization timeliness requirement.During being monitored according to monitor task to synchronous task, obtain and appoint
The implementation progress information of business, wherein implementation progress information include synchronous data amount information and execution state information, including synchronization
State, synchronous task perform status of fail and synchronous task and are finished state during task normally performs, in the present embodiment, if
It when the execution state of synchronous task is performs status of fail, is re-executed since step S10, to be carried out again to information data
It is synchronous.When execution task is to be finished state, can be verified with trigger data.System each stage performs state and progress
Predeterminated position can be written and be stored for user's inquiry.
In the present embodiment, monitor task is generated according to the synchronous task, and according to obtaining the monitor task
The implementation progress information of synchronous task;The execution state of the synchronous task is determined according to the implementation progress information;When described
When the execution state of synchronous task is performs status of fail, step is performed:Obtain user setting job configuration information, again into
Row information data synchronize.By the above-mentioned means, the executive condition to synchronous task is monitored, when synchronous task performs failure
When, re-start the synchronization of information data.
Further, with reference to Fig. 7, Fig. 7 is the flow diagram of the 5th embodiment of information data synchronous method of the present invention.
Based on above-described embodiment, in the present embodiment, include after step S90:
Step S110, when the synchronous task execution state for be finished state when, obtain the of source position data
Second information of one information and destination locations data, the first information or the second information include at least file size information, text
Part quantity information or check code information;
Step S120 carries out data check according to the first information and second information.
Based on above-described embodiment, in the synchronizing process of data, it is related to the permission or other factors of data, it may
Data is caused to synchronize and exception occur, it is possible that the problems such as such as data of source position and destination locations are inconsistent.The present embodiment
A kind of data verification method is provided, after data synchronously complete and task monitor monitors the execution state of execution task and is
During the state that is finished, after task monitor transmission data notice of surveys gives task verifier, task verifier to receive notice,
The verification mode being configured according to user in job information configuration phase carries out the data of source position and destination locations consistency, complete
The verifications such as whole property or availability.Specifically, after data synchronously complete, system carries out data check, from source position and purpose position
Data information, such as the information such as file size, catalogue number, number of files, check code are read on the HDFS clusters put, then carry out one
Cause property, integrality and availability verification, the information such as the file size or check code of source position and destination locations are compared, are sentenced
Whether both disconnected corresponding information is consistent, so that it is determined that whether destination locations are synchronized complete consistent data.It can be used
Property when examining, the metadata that can combine source position and destination locations be tested, and can judgement obtain accurate according to first number
Data.After being verified to data, there is no during abnormal conditions, retention data.When the appearance of consistency, integrality or availability is different
During reason condition, re-executed according to the automatic retry mechanism of failure according to the synchronous task that the information data of the source position is same
Step to the destination locations step carries out data check.
In the present embodiment, the first information of source position data and the second information of destination locations data are obtained, described the
One information or the second information include at least file size information, quantity of documents information or summation inspection code information;According to described
One information and second information carry out data check.By the above-mentioned means, improve the accuracy of synchronous data and synchronous effect
Fruit.
Further, with reference to Fig. 8, Fig. 8 is the flow diagram of information data synchronous method sixth embodiment of the present invention.
Based on above-described embodiment, in the present embodiment, further included after step S80:
Step S130 receives the synchronous task implementation progress inquiry request of user's triggering;
Step S140 shows the implementation progress information of synchronous task according to the inquiry request.
Based on above-described embodiment, corresponding monitor task can be generated, and obtain corresponding task and hold after synchronous task generation
Row progress msg.In the present embodiment, user can trigger inquiry request in front end, obtain data syn-chronization situation.Specifically, may be used
To select corresponding synchronous task and preset inquiry button with the page that user interacts in system, when user's progress is above-mentioned
After inquiry operation, customer headend equipment sends implementation progress inquiry request, Mission Monitor by interactive interface to task monitor
Last time monitoring progress msg is sent to the interaction page of user front end and is shown by device
In the present embodiment, the synchronous task implementation progress inquiry request of user's triggering is received;According to the inquiry request
Show synchronous task implementation progress information.By the above-mentioned means, user can visually control tasks carrying progress.
Further, the present invention also provides a kind of information data synchronizing device, described information data synchronization unit includes depositing
Reservoir, processor and the information data synchronization program that can be run on the memory and on the processor is stored in, it is described
Information data synchronization program is performed realized method and can refer to information data of the present invention when being performed by the processor
Each embodiment of synchronous method, details are not described herein again.
Further, in addition, the embodiment of the present invention also proposes a kind of computer readable storage medium.
Information data synchronization program, described information data synchronization program are stored on computer readable storage medium of the present invention
The step of identity identifying method as described above is realized during execution.
Wherein, the information data synchronization program run on the processor is performed realized method and can refer to this
The each embodiment of invention information method of data synchronization, details are not described herein again.
It should be noted that herein, term " comprising ", "comprising" or its any other variant are intended to non-row
His property includes, so that process, method, article or system including a series of elements not only include those elements, and
And it further includes other elements that are not explicitly listed or further includes intrinsic for this process, method, article or system institute
Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including this
Also there are other identical elements in the process of element, method, article or system.
The embodiments of the present invention are for illustration only, do not represent the quality of embodiment.
Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side
Method can add the mode of required general hardware platform to realize by software, naturally it is also possible to by hardware, but in many cases
The former is more preferably embodiment.Based on such understanding, technical scheme of the present invention substantially in other words does the prior art
Going out the part of contribution can be embodied in the form of software product, which is stored in one as described above
In storage medium (such as ROM/RAM, magnetic disc, CD), including some instructions use so that a station terminal equipment (can be mobile phone,
Computer, server, air conditioner or network equipment etc.) perform method described in each embodiment of the present invention.
It these are only the preferred embodiment of the present invention, be not intended to limit the scope of the invention, it is every to utilize this hair
The equivalent structure or equivalent flow shift that bright specification and accompanying drawing content are made directly or indirectly is used in other relevant skills
Art field, is included within the scope of the present invention.
Claims (10)
1. a kind of information data synchronous method, which is characterized in that described information data include data or source data, described information number
Include according to synchronous method:
Obtain the job configuration information of user setting;
The information data of source position is synchronized to by destination locations according to the job configuration information.
2. information data synchronous method as described in claim 1, which is characterized in that described to be incited somebody to action according to the job configuration information
The information data of source position is synchronized to destination locations and includes:
Synchronous task is generated according to the job configuration information;
The information data of the source position is synchronized to by the destination locations according to the synchronous task.
3. information data synchronous method as claimed in claim 2, which is characterized in that described information data are metadata, described
The information data of the source position is synchronized to the destination locations according to the synchronous task to include:
The metadata of the source position is synchronized to by the purpose using multithreading according to the synchronous task of the metadata
Position.
4. information data synchronous method as claimed in claim 2, which is characterized in that described information data be data, described
The information data of the source position is synchronized to the destination locations according to the synchronous task to further include:
The synchronous task of at least two data is polymerize according to preset polymerization rule, obtains polymerization synchronous task;
The data of the source position are synchronized to by the destination locations according to the polymerization synchronous task.
5. information data synchronous method as claimed in claim 2, which is characterized in that described to be given birth to according to the job configuration information
It is further included into after the step of synchronous task:
Monitor task is generated according to the synchronous task, and the implementation progress of the synchronous task is obtained according to the monitor task
Information;
The execution state of the synchronous task is determined according to the implementation progress information;
When the execution state of the synchronous task is performs status of fail, step is performed:Obtain the operation configuration of user setting
Information re-starts information data synchronization.
6. information data synchronous method as claimed in claim 5, which is characterized in that described true according to the implementation progress information
It is further included after the step of execution state of the fixed synchronous task:
When the execution state of the synchronous task is to be finished state, the first information of source position data and purpose position are obtained
The second information of data is put, the first information or the second information include at least file size information, quantity of documents information or inspection
Test a yard information;
Data check is carried out according to the first information and second information.
7. such as information data synchronous method described in claim 5 or 6, which is characterized in that described to be given birth to according to the synchronous task
Into monitor task, and the step of obtain the implementation progress information of the synchronous task according to the monitor task after further include:
Receive the synchronous task implementation progress inquiry request of user's triggering;
The implementation progress information of synchronous task is shown according to the inquiry request.
8. information data synchronous method as claimed in claim 2, which is characterized in that it is same that the synchronous task type includes increment
Step and batch synchronization, the job configuration information include the task create-rule of user's selection, and the task create-rule is at least
Including being based on metadata attributes create-rule, based on Hive Hook create-rules or based on the last modification time generation rule of file
Then.
9. a kind of information data synchronizing device, which is characterized in that described information data synchronization unit includes:Memory, processor
And the information data synchronization program that can be run on the memory and on the processor is stored in, described information data synchronize
Program realizes the step of information data synchronous method according to any one of claims 1 to 8 when being performed by the processor
Suddenly.
10. a kind of computer readable storage medium, which is characterized in that be stored with Information Number on the computer readable storage medium
According to synchronization program, realized when described information data synchronization program is executed by processor as described in any item of the claim 1 to 8
The step of information data synchronous method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711293634.0A CN108197155A (en) | 2017-12-08 | 2017-12-08 | Information data synchronous method, device and computer readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711293634.0A CN108197155A (en) | 2017-12-08 | 2017-12-08 | Information data synchronous method, device and computer readable storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108197155A true CN108197155A (en) | 2018-06-22 |
Family
ID=62573688
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711293634.0A Pending CN108197155A (en) | 2017-12-08 | 2017-12-08 | Information data synchronous method, device and computer readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108197155A (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109977169A (en) * | 2019-03-19 | 2019-07-05 | 广州品唯软件有限公司 | Method of data synchronization, device, computer readable storage medium and system |
CN110175159A (en) * | 2019-05-29 | 2019-08-27 | 京东数字科技控股有限公司 | Method of data synchronization and system for object storage cluster |
CN110704393A (en) * | 2019-08-30 | 2020-01-17 | 北京浪潮数据技术有限公司 | Data monitoring method and device for Hive data warehouse |
CN110888760A (en) * | 2019-11-26 | 2020-03-17 | 中国工商银行股份有限公司 | Data recovery method and device, and data processing method and device |
CN112395287A (en) * | 2019-08-19 | 2021-02-23 | 北京国双科技有限公司 | Table classification method, table creation method, device, equipment and medium |
CN112751938A (en) * | 2020-12-30 | 2021-05-04 | 上海赋算通云计算科技有限公司 | Real-time data synchronization system based on multi-cluster operation, implementation method and storage medium |
CN112948494A (en) * | 2021-03-04 | 2021-06-11 | 北京沃东天骏信息技术有限公司 | Data synchronization method and device, electronic equipment and computer readable medium |
CN113742420A (en) * | 2021-08-09 | 2021-12-03 | 广州市易工品科技有限公司 | Data synchronization method and device |
CN116383310A (en) * | 2023-06-02 | 2023-07-04 | 天津金城银行股份有限公司 | Method, device, equipment and storage medium for synchronizing daily terminal files |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2005229509A (en) * | 2004-02-16 | 2005-08-25 | Ricoh Co Ltd | Content metadata transmission/reception system, content metadata synchronizing method, program for making computer execute the method, and reception terminal in which content and metadata are associated with each other |
CN101551801A (en) * | 2008-03-31 | 2009-10-07 | 国际商业机器公司 | Data synchronization method and data synchronization system |
CN101854400A (en) * | 2010-06-09 | 2010-10-06 | 中兴通讯股份有限公司 | Database synchronization deployment and monitoring method and device |
CN103761162A (en) * | 2014-01-11 | 2014-04-30 | 深圳清华大学研究院 | Data backup method of distributed file system |
CN103875229A (en) * | 2013-12-02 | 2014-06-18 | 华为技术有限公司 | Asynchronous replication method, device and system |
CN106101265A (en) * | 2016-07-26 | 2016-11-09 | 浪潮软件股份有限公司 | A kind of method carrying out file synchronization between Dropbox and desktop end |
CN106919346A (en) * | 2017-02-21 | 2017-07-04 | 无锡华云数据技术服务有限公司 | A kind of shared Storage Virtualization implementation method based on CLVM |
-
2017
- 2017-12-08 CN CN201711293634.0A patent/CN108197155A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2005229509A (en) * | 2004-02-16 | 2005-08-25 | Ricoh Co Ltd | Content metadata transmission/reception system, content metadata synchronizing method, program for making computer execute the method, and reception terminal in which content and metadata are associated with each other |
CN101551801A (en) * | 2008-03-31 | 2009-10-07 | 国际商业机器公司 | Data synchronization method and data synchronization system |
CN101854400A (en) * | 2010-06-09 | 2010-10-06 | 中兴通讯股份有限公司 | Database synchronization deployment and monitoring method and device |
CN103875229A (en) * | 2013-12-02 | 2014-06-18 | 华为技术有限公司 | Asynchronous replication method, device and system |
CN103761162A (en) * | 2014-01-11 | 2014-04-30 | 深圳清华大学研究院 | Data backup method of distributed file system |
CN106101265A (en) * | 2016-07-26 | 2016-11-09 | 浪潮软件股份有限公司 | A kind of method carrying out file synchronization between Dropbox and desktop end |
CN106919346A (en) * | 2017-02-21 | 2017-07-04 | 无锡华云数据技术服务有限公司 | A kind of shared Storage Virtualization implementation method based on CLVM |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109977169A (en) * | 2019-03-19 | 2019-07-05 | 广州品唯软件有限公司 | Method of data synchronization, device, computer readable storage medium and system |
CN110175159A (en) * | 2019-05-29 | 2019-08-27 | 京东数字科技控股有限公司 | Method of data synchronization and system for object storage cluster |
CN112395287A (en) * | 2019-08-19 | 2021-02-23 | 北京国双科技有限公司 | Table classification method, table creation method, device, equipment and medium |
CN110704393A (en) * | 2019-08-30 | 2020-01-17 | 北京浪潮数据技术有限公司 | Data monitoring method and device for Hive data warehouse |
CN110888760A (en) * | 2019-11-26 | 2020-03-17 | 中国工商银行股份有限公司 | Data recovery method and device, and data processing method and device |
CN112751938A (en) * | 2020-12-30 | 2021-05-04 | 上海赋算通云计算科技有限公司 | Real-time data synchronization system based on multi-cluster operation, implementation method and storage medium |
CN112948494A (en) * | 2021-03-04 | 2021-06-11 | 北京沃东天骏信息技术有限公司 | Data synchronization method and device, electronic equipment and computer readable medium |
CN113742420A (en) * | 2021-08-09 | 2021-12-03 | 广州市易工品科技有限公司 | Data synchronization method and device |
CN113742420B (en) * | 2021-08-09 | 2024-02-02 | 广州市易工品科技有限公司 | Data synchronization method and device |
CN116383310A (en) * | 2023-06-02 | 2023-07-04 | 天津金城银行股份有限公司 | Method, device, equipment and storage medium for synchronizing daily terminal files |
CN116383310B (en) * | 2023-06-02 | 2023-08-04 | 天津金城银行股份有限公司 | Method, device, equipment and storage medium for synchronizing daily terminal files |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108197155A (en) | Information data synchronous method, device and computer readable storage medium | |
CN106844198B (en) | Distributed dispatching automation test platform and method | |
CN108304255A (en) | Distributed task dispatching method and device, electronic equipment and readable storage medium storing program for executing | |
CN111026635B (en) | Software project testing system, method, device and storage medium | |
CN109582466A (en) | A kind of timed task executes method, distributed server cluster and electronic equipment | |
CN106406993A (en) | Timed task management method and system | |
US8943127B2 (en) | Techniques for capturing data sets | |
CN111125444A (en) | Big data task scheduling management method, device, equipment and storage medium | |
CN113590386B (en) | Disaster recovery method, system, terminal device and computer storage medium for data | |
CN106446168B (en) | A kind of load client realization method of Based on Distributed data warehouse | |
CN108491254A (en) | A kind of dispatching method and device of data warehouse | |
CN108446326B (en) | A kind of isomeric data management method and system based on container | |
CN111784318A (en) | Data processing method and device, electronic equipment and storage medium | |
CN110099084A (en) | A kind of method, system and computer-readable medium guaranteeing storage service availability | |
JP2004038516A (en) | Work processing system, operation management method and program for performing operation management | |
CN113760513A (en) | Distributed task scheduling method, device, equipment and medium | |
CN109992373A (en) | Resource regulating method, approaches to IM and device and task deployment system | |
CN106850724A (en) | Data push method and device | |
CN113419872A (en) | Application system interface integration system, integration method, equipment and storage medium | |
US9977726B2 (en) | System and method for smart framework for network backup software debugging | |
CN111143177B (en) | Method, system, device and storage medium for collecting RMF III data of IBM host | |
CN112015534A (en) | Configurated platform scheduling method, system and storage medium | |
Weidmann et al. | Conception and Installation of System Monitoring Using the SAP Solution Manager | |
CN112449061B (en) | Outbound task allocation method and device, computer equipment and readable storage medium | |
CN117519838B (en) | AI workflow modeling method, related device, equipment, system and medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180622 |
|
RJ01 | Rejection of invention patent application after publication |