CN108804697A - Method of data synchronization, device, computer equipment based on Spark and storage medium - Google Patents
Method of data synchronization, device, computer equipment based on Spark and storage medium Download PDFInfo
- Publication number
- CN108804697A CN108804697A CN201810620678.8A CN201810620678A CN108804697A CN 108804697 A CN108804697 A CN 108804697A CN 201810620678 A CN201810620678 A CN 201810620678A CN 108804697 A CN108804697 A CN 108804697A
- Authority
- CN
- China
- Prior art keywords
- data
- spark
- record
- task
- resource
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Abstract
This application involves a kind of method of data synchronization, device, computer equipment and storage medium based on Spark.The method includes:Obtain the business datum that Spark tasks generate;The business datum includes a plurality of data record;It generates and records corresponding data summarization per data;The corresponding historical summaries of a plurality of historical record are obtained from source database;Each data summarization and a plurality of historical summaries of storage are compared, newly-increased data summarization is obtained;Message queue is written into the newly-increased corresponding data record of data summarization;When receiving the data pull request of first terminal transmission, asked the data record in the message queue being synchronized to target database according to the data pull.Large-scale data synchronous efficiency can be improved using this method.
Description
Technical field
This application involves field of computer technology, more particularly to a kind of method of data synchronization based on Spark, device,
Computer equipment and storage medium.
Background technology
Spark is a kind of computing engines for large-scale data processing, excellent by its is versatile, speed of service is fast etc.
Point is increasingly widely applied.Spark is by dividing mass data collection calculating task (hereinafter referred to as " Spark tasks ")
It is fitted in multiple stage computers equipment and executes, realize efficient task processing.Spark tasks can generate what multiple operation systems needed
Business datum, such as generate e-commerce system need " Related product pushed information ", generate social platform need " birthday wishes
Good fortune information " etc..Request of data of the Spark tasks based on operation system, by the synchronizing traffic data write-in corresponding service system of generation
System.The business datum that Spark tasks generate is typically large-scale data.However, traditional data method of synchronization is only applicable to
Small-scale data, for large-scale data, then synchronous efficiency is relatively low.
Invention content
Based on this, it is necessary in view of the above technical problems, provide a kind of base that can improve large-scale data synchronous efficiency
In the method for data synchronization of Spark, device, computer equipment and storage medium.
A kind of method of data synchronization based on Spark, the method includes:Obtain the business datum that Spark tasks generate;
The business datum includes a plurality of data record;It generates and records corresponding data summarization per data;It is obtained from source database
The corresponding historical summaries of a plurality of historical record;Each data summarization and a plurality of historical summaries of storage are compared, obtained
To newly-increased data summarization;Message queue is written into the newly-increased corresponding data record of data summarization;When receiving first terminal
When the data pull request of transmission, asked the data record in the message queue being synchronized to target according to the data pull
Database.
In one of the embodiments, before the business datum for obtaining the generation of Spark tasks, further include:Receive the
The Spark tasks and corresponding Parameter File that two terminals are submitted;The resource point of the Spark tasks is read in the Parameter File
With parameter, physical source distributing is carried out according to the resource allocation parameters;The Spark is executed based on the physical resource to appoint
Business, monitors the execution efficiency of the Spark tasks;The resource allocation parameters in the Parameter File are carried out according to monitoring result
Adjustment;The Spark task schedulings to the physical resource adaptable with the resource allocation parameters after adjustment are executed.
It is described in one of the embodiments, that the resource allocation parameters in the Parameter File are carried out according to monitoring result
Adjustment, including:Compare whether the execution efficiency is less than threshold value;Calculate the corresponding task total amount of the Spark tasks and task
Duration;If so, calculating remaining task amount according to the task total amount and task execution amount;According to the task duration and currently
Timing node calculates residual time length;Newly-increased physical resource is needed according to the remaining task amount and residual time length measuring and calculating;Otherwise,
According to the resource using information of the two neighboring timing node of the operational information recording, computing resource utilization rate;According to described
Resource utilization measuring and calculating needs the physical resource discharged;The resource allocation parameters are adjusted according to results of measuring.
Described generate records corresponding data summarization per data in one of the embodiments, including:In the data
One or more current key words are extracted in record, form current keyword set;A plurality of history note is obtained from source database
Record corresponding history keyword set of words;Recognize whether the history keyword word to match with the current keyword set
Set;If so, the extraction supplement keyword in the data record;It is built according to the supplement keyword and current key word that extract
Vertical keyword index, using the keyword index as the data summarization of the data record.
The data pull request carries system banner and user identifier in one of the embodiments,;The basis
Data record in the message queue is synchronized to target database by the data pull request, including:According to the user
Mark, detects whether that there are corresponding data records in the message queue;If so, data is called to synchronize script;The number
Include multiple labels according to synchronous script;The corresponding configuration file of the system banner is obtained, based on the configuration file to data
Label in synchronous script is replaced, and is updated with synchronizing script to data;Foot is synchronized by executing updated data
This, the target database is synchronized to by data record corresponding with the user identifier in the message queue.
It includes splitting script that the data, which synchronize script, in one of the embodiments, it is described will be in the message queue
Data record corresponding with the user identifier is synchronized to the target database, including:It is corresponding to calculate the user identifier
The data volume of data record;Detect whether the data volume is more than target data amount;If so, calling the fractionation script by user
It identifies corresponding data record and is split as multiple data groups;Call multithreading that multiple data groups are synchronized to the number of targets
According to library.
A kind of data synchronization unit based on Spark, described device include:Data screening module is appointed for obtaining Spark
The business datum that business generates;The business datum includes a plurality of data record;It generates and records corresponding data summarization per data;
The corresponding historical summaries of a plurality of historical record are obtained from source database;By a plurality of history of each data summarization and storage
Abstract is compared, and newly-increased data summarization is obtained;Data memory module, for remembering the corresponding data of the data summarization increased newly
Record write-in message queue;Data simultaneous module, for when receive first terminal transmission data pull request when, according to described
Data record in the message queue is synchronized to target database by data pull request.
Described device further includes resource distribution module in one of the embodiments, for receiving second terminal submission
Spark tasks and corresponding Parameter File;The resource allocation parameters of the Spark tasks are read in the Parameter File, according to
The resource allocation parameters carry out physical source distributing;The Spark tasks are executed based on the physical resource, described in monitoring
The execution efficiency of Spark tasks;The resource allocation parameters in the Parameter File are adjusted according to monitoring result;It will be described
Spark task schedulings to the physical resource adaptable with the resource allocation parameters after adjustment executes.
A kind of computer equipment, including memory and processor, the memory are stored with computer program, the processing
Device realizes following steps when executing the computer program:Obtain the business datum that Spark tasks generate;The business data packet
Include a plurality of data record;It generates and records corresponding data summarization per data;A plurality of historical record point is obtained from source database
Not corresponding historical summaries;Each data summarization and a plurality of historical summaries of storage are compared, newly-increased data is obtained and plucks
It wants;Message queue is written into the newly-increased corresponding data record of data summarization;When the data pull for receiving first terminal transmission
When request, asked the data record in the message queue being synchronized to target database according to the data pull.
A kind of computer readable storage medium, is stored thereon with computer program, and the computer program is held by processor
Following steps are realized when row:Obtain the business datum that Spark tasks generate;The business datum includes a plurality of data record;It is raw
Corresponding data summarization is recorded at every data;The corresponding historical summaries of a plurality of historical record are obtained from source database;
Each data summarization and a plurality of historical summaries of storage are compared, newly-increased data summarization is obtained;Newly-increased data are plucked
Want corresponding data record write-in message queue;When receiving the data pull request of first terminal transmission, according to the number
The data record in the message queue is synchronized to target database according to request is pulled.
Above-mentioned method of data synchronization, device, computer equipment and storage medium based on Spark are produced based on Spark tasks
Raw business datum can generate the corresponding data summarization of a plurality of data record in business datum;By by each data
Abstract historical summaries corresponding with a plurality of historical record that source database stores are compared, and newly-increased data can be obtained
Abstract;Message queue is written into the newly-increased corresponding data record of data summarization, it can be in the number for receiving first terminal transmission
Request is pulled based on message queue response data according to when pulling request, the data record in the message queue is only synchronized to institute
State the corresponding target database of operation system.Since only partial data relatively newly-increased in large-scale business datum being synchronized to
Target database, rather than by whole synchronizing traffic datas of generation to target database, reduce and need synchronous data volume, to
Improve the synchronous efficiency of large-scale data.The data summarization of structure record per data, newly-increased number is carried out based on data summarization
According to the screening of record, the data volume for needing to compare is reduced, to improve business datum to specific efficiency, and then data is improved and synchronizes effect
Rate.
Description of the drawings
Fig. 1 is the application scenario diagram of the method for data synchronization based on Spark in one embodiment;
Fig. 2 is the flow diagram of the method for data synchronization based on Spark in one embodiment;
Fig. 3 is the flow diagram of Spark task physical source distributing steps in one embodiment;
Fig. 4 is the structure diagram of the data synchronization unit based on Spark in one embodiment;
Fig. 5 is the internal structure chart of one embodiment Computer equipment.
Specific implementation mode
It is with reference to the accompanying drawings and embodiments, right in order to make the object, technical solution and advantage of the application be more clearly understood
The application is further elaborated.It should be appreciated that specific embodiment described herein is only used to explain the application, not
For limiting the application.
Method of data synchronization provided by the present application based on Spark, can be applied in application environment as shown in Figure 1.
Wherein, first terminal 102 is communicated with server 104 by network.Second terminal 106 and server 104 by network into
Row communication.Wherein, first terminal 102 can be, but not limited to be various personal computers, notebook electricity with second terminal 106 respectively
Brain, smart mobile phone, tablet computer and portable wearable device, server 104 can be the servers of multiple server compositions
Cluster is realized.One or more operation systems are deployed on first terminal 102.First terminal 102 and second terminal 106 can be with
Same terminal can also be different terminals.Server 104 receives the Spark tasks that second terminal 106 is submitted, and passes through execution
Spark tasks generate the business datum needed for operation system.Business datum includes a plurality of data record.Server 104 generates often
Data records corresponding data summarization.Server 104 deploys the corresponding database (hereinafter referred to as " source data of Spark tasks
Library ").Source database stores the corresponding historical summaries of a plurality of historical record.Server 104 record will be corresponded to per data
Data summarization and a plurality of historical summaries of storage compared, obtain newly-increased data summarization.According to newly-increased data summarization,
Server 104 obtains corresponding data record, creates message queue, and message queue is written in the data record got.When with
When family is needed using business datum, it can be pulled to 104 transmission data of server based on corresponding service system in first terminal 102
Request.Operation system has corresponding database (hereinafter referred to as " target database ").Data pull request carries system banner
And user identifier.Server 104 extracts corresponding data record according to user identifier in message queue, the data that will be extracted
Recording synchronism is to the corresponding target database of system banner.Above-mentioned data synchronization process, due to only by large-scale business datum
In relatively newly-increased partial data be synchronized to target database, rather than by whole synchronizing traffic datas of generation to target data
Library reduces and needs synchronous data volume, to improve the synchronous efficiency of large-scale data.
In one embodiment, it as shown in Fig. 2, providing a kind of method of data synchronization based on Spark, answers in this way
For being illustrated for the server in Fig. 1, include the following steps:
Step 202, the business datum that Spark tasks generate is obtained;Business datum includes a plurality of data record.
The server cluster of multiple server compositions, including host node Master and multiple working node Worker.Task
Spark tasks are committed to host node by dispatcher in second terminal by spark-submit orders.It is deployed on host node
Task scheduling platform, for being scheduled execution to multiple Spark tasks that multiple second terminals are submitted.Task scheduling platform is
Each Spark one corresponding Driver process of task start is based on Driver process initiation Spark tasks, and appoints for Spark
The physical resources such as business storage allocation, CPU.In other words, task scheduling platform starts a certain number of on each working node of cluster
Executor processes execute Spark tasks based on multiple Executor processes.
Spark tasks generate business datum according to pre-set business logic, such as Related product pushed information, birthday greeting language.
Business datum has corresponding system banner, for identifying the business datum is suitable for which operation system, in other words which industry
Business system, which has permission, uses the business datum.Business datum includes a plurality of data record.Different data record is respectively provided with correspondence
User identifier, for identify the data record suitable for corresponding service system which user, in other words which user have
Permission uses the data record.
Step 204, it generates and records corresponding data summarization per data.
The business datum that Spark tasks generate is typically large-scale.For the ease of carrying out retrieval analysis to business datum,
Server generates the data summarization of the record per data.Data summarization is the brief information for identifying respective data record, can
To be cryptographic Hash or keyword index etc..
In one embodiment, generation records corresponding data summarization per data and extracts multiple keys in data record
Word;Calculate the cryptographic Hash of each keyword extracted;Logic of propositions operation is carried out to the cryptographic Hash of multiple keywords, by operation
As a result as the data summarization of data record.Logic of propositions operation can also be arithmetic etc. with Hash operation.
Step 206, the corresponding historical summaries of a plurality of historical record are obtained from source database.
Spark tasks have corresponding source database, can be Hive databases.The business datum that Spark tasks generate
It stores to source database.It is historical record to store to the data record of source database, and corresponding data summarization is that history is plucked
It wants.Historical summaries can also be to generate in the manner described above.It is readily appreciated that, source database is for storing Spark tasks not
The whole business datums generated with the time.
Step 208, each data summarization and a plurality of historical summaries of storage are compared, obtains newly-increased data and plucks
It wants.
Server is compared each data summarization in business datum with each historical summaries in source database one by one,
It will be determined as newly-increased data summarization with the data summarization that historical summaries do not match that.Match refers to that historical summaries are plucked with data
The content wanted is same or similar.It is appreciated that in order to improve data comparison efficiency, it can be in advance according to historical comparison result to source
Multiple historical summaries sequences in database.For example, being arranged the passing historical summaries repeatedly to match with data summarization preferential
The sequence of comparison.
Step 210, message queue is written into the newly-increased corresponding data record of data summarization.
Spark tasks have corresponding message queue.Message queue is responsible for the reception, storage and forwarding of business datum.Clothes
Business device obtains the newly-increased corresponding data record of data summarization, data record is stored to message queue, and according to newly-increased number
It is stored to source database according to abstract and its corresponding data record, i.e. the historical record to source database storage and corresponding history is plucked
Carry out full dose update or incremental update.
Step 212, it when receiving the data pull request of first terminal transmission, is asked message team according to data pull
Data record in row is synchronized to target database.
When user needs using business datum, corresponding service system can be based in first terminal and send number to server
It is asked according to pulling.Data pull request carries system banner and user identifier.Operation system has corresponding target database,
Can be Sql Server databases, oracle database or MySql databases.Server is according to user identifier in message team
Corresponding data record is extracted in row, and the data record extracted is synchronized to the corresponding target database of system banner.
In the present embodiment, based on the business datum that Spark tasks generate, a plurality of data record in business datum can be generated
Corresponding data summarization;By going through each data summarization is corresponding with a plurality of historical record that source database stores
History abstract is compared, and newly-increased data summarization can be obtained;Message is written into the newly-increased corresponding data record of data summarization
Queue can pull request, only in the data pull request for receiving first terminal transmission based on message queue response data
Data record in message queue is synchronized to the corresponding target database of operation system.Due to only by large-scale business datum
In relatively newly-increased partial data be synchronized to target database, rather than by whole synchronizing traffic datas of generation to target data
Library reduces and needs synchronous data volume, to improve the synchronous efficiency of large-scale data.Structure per data, pluck by the data of record
It wants, the screening of newly-increased data record is carried out based on data summarization, the data volume for needing to compare is reduced, to improve business datum
To specific efficiency, and then improve data synchronization efficiency.
In one embodiment, further include Spark task physics money before obtaining the business datum that Spark tasks generate
The step of source is distributed.As shown in figure 3, the step of Spark task physical source distributings, includes:
Step 302, the Spark tasks and corresponding Parameter File that second terminal is submitted are received.
The corresponding service logic script of Spark tasks includes Shell scripts.Task scheduling personnel are by the money of spark tasks
Source allocation of parameters is recorded in Parameter File, and the preset call back function to Parameter File in Shell scripts.Resource allocation is joined
Number can be that task scheduling personnel estimate in advance according to the task amount of Spark tasks.Task scheduling personnel are logical in second terminal
It crosses spark-submit orders and Spark tasks and corresponding configuration file is committed to host node.Being disposed on host node for task
Dispatching platform is individually stored Parameter File independently of Spark tasks, and corresponding for each Spark task starts one
Driver processes.According to preset deployment mode (deploy-mode), Driver processes in the local boot Spark tasks or
Certain working node starts Spark tasks to person in the cluster.
Step 304, the resource allocation parameters that Spark tasks are read in Parameter File carry out object according to resource allocation parameters
Manage resource allocation.
Task scheduling platform is based on Driver process initiation Spark tasks, and distributes physical resource for Spark tasks.Tool
Body, Driver processes call the corresponding Shell scripts of Spark tasks, generate a callback instruction to configuration file, according to
Callback instruction reads the resource allocation parameters in Parameter File.Driver processes are according to the resource allocation parameters read, to collection
Group's manager application operation Spark tasks need physical resource to be used.Cluster manager dual system can be Spark Standalone
Cluster or YARN resource management clusters etc..Physical resource refers to memory and CPU etc..Cluster manager dual system exists according to resource allocation parameters
Start a certain number of Executor processes on each working node of cluster.It is readily appreciated that, Driver processes and each Executor
Process itself can also occupy certain physical resource.
Step 306, it is based on physical resource and executes Spark tasks, monitor the execution efficiency of Spark tasks.
Applying to after the physical resource needed for Spark task executions, task scheduling platform is opened based on Driver processes
Begin scheduling execution Spark tasks.Specifically, Spark tasks are split as the task groups of multiple asynchronous executions by Driver processes
Stage, each task groups stage include multiple asynchronous executions and/or the background task task concurrently executed.Driver processes will
Multiple background task task of one task groups stage, which are assigned in multiple Executor processes, to be executed.Background task task is
Minimum execution unit.The implementing result of each background task task is stored to the corresponding memory of Executor processes or place work
Make in the disk file of node.When all background task task of current task group stage are carried out and finish, Driver processes exist
Intermediate result, and management and running next task group stage are written in the disk file of each working node local.So cycle
Back and forth, until all having executed Spark tasks.
During Spark task executions, task scheduling platform executes effect based on Driver monitoring the process Spark tasks
Rate calculates the execution speed of background task task.It is readily appreciated that, execution speed and the corresponding Executor of background task task
The physical resources such as the CPU core number of process are directly related.In general, a CPU same times execute a thread.Physical resource is enough
In the case of, as the multiple background task task being assigned in Executor processes, multi-thread concurrent can be called to execute more
A background task task, to improve the execution efficiency of Spark tasks.
Step 308, the resource allocation parameters in Parameter File are adjusted according to monitoring result.
Task scheduling platform compares whether execution efficiency is less than threshold value based on Driver processes.Threshold value can be according to practical need
Ask free setting, can also dynamic change, it is without limitation.If execution efficiency is less than threshold value, indicate that current Spark tasks are deposited
In the insufficient risk of physical resource, task scheduling platform, which generates, to be stopped executing instruction, and will stop executing instruction being sent to corresponding work
Make node, to terminate corresponding Driver processes and Executor processes.The measuring and calculating of task scheduling platform needs newly-increased physics money
Source, the resource allocation parameters for corresponding to Parameter File record to Spark tasks according to results of measuring are adjusted.If execution efficiency is big
In or equal to threshold value, indicating current Spark tasks, there is no the insufficient risk of physical resource or risk are relatively low.Task scheduling
Platform judges whether the allocated physical resource of Spark tasks there is idling-resource, measuring and calculating to need the physical resource discharged, according to
The resource allocation parameters that results of measuring corresponds to Spark tasks Parameter File record are adjusted.
Step 310, Spark task schedulings to the physical resource adaptable with the resource allocation parameters after adjustment are executed.
Task scheduling platform based on the resource allocation parameters after adjustment, again for one Driver of Spark task starts into
Journey, it is that Spark tasks distribute physical resource again in the manner described above to call the Driver processes of the new startup, i.e., more in cluster
A working node restarts a certain number of Executor processes.Driver processes by Spark task schedulings to after adjustment
The adaptable physical resource of resource allocation parameters execute, i.e., multiple background task task that Spark tasks are split are sent
It is executed to the multiple Executor processes redistributed.Task scheduling platform continues to monitor Spark tasks based on Driver processes
Execution efficiency, and the adjustment of resource allocation parameters is carried out according to execution efficiency, until Spark task executions finish.
Traditional resource allocation parameters are fixedly arranged in the Shell scripts of Spark tasks so that only until
Spark tasks carry out that when version updating resource allocation parameters change could be carried out so that resource allocation parameters modification is inconvenient, in turn
Influence Spark task runs efficiency and operation result.
In the present embodiment, due to individually being stored resource allocation parameters in a manner of Parameter File, independently of Spark
Resource allocation parameters are flexibly freely changed in task itself, the limitation so as to break away from Spark task version updatings;Monitoring in real time
Spark task execution efficiency, and according to the physical resource of execution efficiency dynamic adjustment distribution, Spark tasks are adapted to object
The actual demand of resource is managed, and then Spark task execution efficiency can be improved.
In one embodiment, the resource allocation parameters in configuration file are adjusted according to monitoring result, including:Than
Whether it is less than threshold value compared with execution efficiency;Calculate the corresponding task total amount of Spark tasks and task duration;If so, total according to task
Amount and task execution amount calculate remaining task amount;Residual time length is calculated according to task duration and current timing node;According to surplus
Remaining task amount and residual time length measuring and calculating need newly-increased physical resource;Otherwise, according to the two neighboring time of operational information recording
The resource using information of node, computing resource utilization rate;The physical resource discharged is needed according to resource utilization measuring and calculating;According to survey
Calculate result adjustresources allocation of parameters.
Task scheduling platform can be automatically right according to the monitoring result to Spark task execution efficiency based on Driver processes
Resource allocation parameters are adjusted.Specifically, Driver processes compare whether execution efficiency is less than threshold value.If so, Driver into
Journey calculates remaining task amount, and according to the execution of measuring and calculating according to the task total amount and task execution amount of the Spark tasks of measuring and calculating
The task duration and current timing node that Spark tasks need, calculate residual time length.Driver processes are according to remaining task amount
And residual time length, calculate the target execution efficiency of Spark tasks.Driver processes read the resource allocation of configuration file record
Parameter, the Spark tasks obtained according to monitoring are determined in the physical resource of current time actual execution efficiency and corresponding distribution
Reach the target physical resource of target execution efficiency needs.It is readily appreciated that, target physical resource and allocated physical resource
Difference is to need newly-increased physical resource.The resource allocation that Driver processes record Parameter File according to target physical resource
Parameter is adjusted.
The operation information of Spark tasks based on the acquisition of preset task run monitor component further includes Spark tasks
Resource using information, such as CPU usage, memory remaining space capacity etc..If execution efficiency be greater than or equal to threshold value, Driver into
Journey calculates the resource utilization of physical resource, according to resource according to the resource using information of the two neighboring timing node of acquisition
Utilization rate judges that allocated physical resource whether there is free physical resource.Driver processes read the money of configuration file record
Source allocation of parameters determines the free physical resource for needing to discharge according to resource allocation parameters and resource utilization.Driver into
Journey is adjusted according to the resource allocation parameters that free physical resource records Parameter File.
In the present embodiment, resource allocation parameters are adjusted automatically according to the monitoring result to Spark task execution efficiency
It is whole, carry out that physical resource is newly-increased in time when execution efficiency is less than threshold value, with ensure the execution efficiency of Spark tasks and execute at
Power;Even if carrying out physical resource release when execution efficiency is greater than or equal to threshold value, physical resource utilization rate can be improved, is subtracted
Few waste to physical resource.
In one embodiment, it generates and records corresponding data summarization per data, including:One is extracted in data record
A or multiple current key words form current keyword set;It is corresponding that a plurality of historical record is obtained from source database
History keyword set of words;Recognize whether the history keyword set of words to match with current keyword set;If so, in data
Extraction supplement keyword in record;Keyword index is established according to the supplement keyword and current key word that extract, it will be crucial
Data summarization of the glossarial index as data record.
Traditional approach in correction data by data respectively one by one with compared than data, but by the number than data
To reduce specific efficiency according to this way of contrast when measuring bigger.In order to solve the technical problem, the present embodiment service
Device, which is directed to record per data in business datum, establishes corresponding keyword index, and data comparison is carried out based on keyword index.
Specifically, server extracts one or more current key words in data record, formation records corresponding current per data
Set of keywords.Source database stores the corresponding keyword index of a plurality of historical record.The corresponding key of historical record
Glossarial index is history keyword set of words.Server recognizes whether the history keyword word set to match with current keyword set
It closes.If in the presence of the extraction supplement keyword in data record, to be distinguished with history keyword set of words.Supplementing keyword can
To be the vocabulary being different from Message Record except current key word.Server is according to the supplement keyword extracted and current pass
Keyword establishes keyword index, using keyword index as the data summarization of data record.
In the present embodiment, the keyword index of the record per data is built, newly-increased data are carried out based on keyword index
The screening of record reduces the data volume for needing to compare, to improve business datum to specific efficiency;Based on supplement keyword and extraction
The current key word structure keyword index arrived, can carry the mark action for ensureing keyword index to respective data record.
In one embodiment, data pull request carries system banner and user identifier;It is asked according to data pull
Data record in message queue is synchronized to target database, including:According to user identifier, detected whether in message queue
There are corresponding data records;If so, data is called to synchronize script;It includes multiple labels that data, which synchronize script,;Acquisition system mark
Know corresponding configuration file, synchronizing the label in script to data based on configuration file is replaced, to synchronize script to data
It is updated;Script is synchronized by executing updated data, data record corresponding with user identifier in message queue is same
Walk target database.
In traditional approach, using data synchronization means between different business systems carry out data synchronization before, user
It needs to write different data synchronization scripts for different operation systems in advance.But in fact, different business systems are corresponding
Data synchronization script is similar, and the operation system for if desired carrying out data synchronization is more, then user needs to carry out largely repeating to move
Make, waste of manpower also reduces data synchronization efficiency.In order to reduce user's operation, the present embodiment is write a set of general in advance
Data synchronize script, and general data are synchronized script and are stored to server.It includes first that the general data, which synchronize script,
Synchronous script and second synchronizes script.
First synchronization script includes the label of at least one preset format.Preset format refer in label both sides at least
Side is equipped with default mark.Default mark can be " # ", and "@", " * " etc. can be " #ABC# " to the label of preset format,
"@DEF " or " GHI* " etc..When user needs to carry out data synchronization, the mode of page configuration may be used in first terminal base
Configuration information is set in operation system, and generating configuration information based on configuration information generates configuration file, and configuration is sent to clothes
Business device.Configuration information includes multiple labels and its corresponding replacement information.Replacement information includes Spark task identifications, Yong Hubiao
Know, the connection string etc. of system banner or target database.
The general data of server calls synchronize script, identify that first synchronizes the label in script, root according to default mark
The corresponding replacement information of each label is inquired according to configuration information.Server obtains the corresponding message queue of Spark task identifications
Connection string, verification mark etc..Verification mark can be username and password etc..Server will be marked each according to configuration information
Label replace with corresponding replacement information, to be updated to the first synchronization script.
Second synchronization script includes building table script and synchronous script.Server according to the connection string of target database with
Target database establishes connection, the field information of reads data log in the corresponding message queue of Spark task identifications.Field
Information is different, corresponding to build table script and synchronous script difference.Server according to field information generate it is corresponding build table statement and
Synchronization statements, a plurality of by generation build that table statement write-in is corresponding to build table script, and a plurality of synchronization statements of generation are written and are corresponded to
Synchronization script, with to second synchronization script be updated.Server synchronizes script by executing updated data, from message
Queue is by synchronizing traffic data to target database.
In the present embodiment, general data are write in advance and synchronize script, and is directed to and needs to carry out the not of the same trade or business of data synchronization
Business system is added to corresponding configuration file so that, only need to be according to right when needing synchronous service data to different business systems
The configuration file answered synchronizes script to general data and is updated, can be by industry by executing updated data synchronization script
Business data are synchronized to target database from message queue, it is possible to reduce user's operation also improves data synchronization efficiency.
In one embodiment, it includes splitting script that data, which synchronize script, will be corresponding with user identifier in message queue
Data record is synchronized to target database, including:Calculate the data volume of the corresponding data record of user identifier;Detection data amount is
No is more than target data amount;The corresponding data record of user identifier is split as multiple data groups if so, calling and splitting script;It adjusts
Multiple data groups are synchronized to target database with multithreading.
Second synchronization script further includes splitting script.Server calculates data note corresponding with user identifier in message queue
The data volume of record compares whether data volume is more than threshold value.A plurality of data record is split if so, server calls split script
In other words a plurality of data record is grouped for multiple data groups.Specifically, server obtains preset target data amount.
Target data amount can be preset, can also be to be generated temporarily according to the current load monitoring result of server.Clothes
Business device determines the fractionation position of each data group according to target data amount.For example, it is assumed that target data amount is 80M, then by 80M
The position mark of size is first fractionation position, and the position mark of 160M sizes is second fractionation position, and so on.
Whether each position that splits of server detection is between adjacent separator.It is located at a separator when splitting position
When place, server is split in fractionation position;When splitting position between adjacent separator, appoint in adjacent separator
It is split, i.e., is torn open at the previous separator in the adjacent separator or the latter separator at one separator of meaning
Point, obtain multiple data groups.Multiple data groups are synchronized to target database by server calls multithreading, to improve business datum
Synchronous efficiency.
In the present embodiment, the larger business datum of data volume is split, be split as multiple data groups and is called more
Thread synchronizes, and can improve data synchronization efficiency;It is determined based on target data amount and separator and splits position, it can be to avoid
Same data record is split to different data group, is guaranteed data integrity.
It should be understood that although each step in the flow chart of Fig. 2 and Fig. 3 is shown successively according to the instruction of arrow,
But these steps are not the inevitable sequence indicated according to arrow to be executed successively.Unless expressly state otherwise herein, these
There is no stringent sequences to limit for the execution of step, these steps can execute in other order.Moreover, in Fig. 2 and Fig. 3
At least part step may include that either these sub-steps of multiple stages or stage are not necessarily same to multiple sub-steps
One moment executed completion, but can execute at different times, and the execution in these sub-steps or stage sequence is also not necessarily
Be carry out successively, but can with other steps either the sub-step of other steps or at least part in stage in turn or
Alternately execute.
In one embodiment, as shown in figure 4, providing a kind of data synchronization unit based on Spark, including:Data
Screening module 402, data memory module 404 and data simultaneous module 406, wherein:
Data screening module 402, the business datum for obtaining the generation of Spark tasks;Business datum includes a plurality of data
Record;It generates and records corresponding data summarization per data;Obtain that a plurality of historical record is corresponding to be gone through from source database
History is made a summary;Each data summarization and a plurality of historical summaries of storage are compared, newly-increased data summarization is obtained.
Data memory module 404, for message queue to be written in the corresponding data record of the data summarization increased newly.
Data simultaneous module 406, for when receive first terminal transmission data pull request when, according to data pull
Data record in message queue is synchronized to target database by request.
In one embodiment, which further includes resource distribution module 408, for receiving second terminal submission
Spark tasks and corresponding Parameter File;The resource allocation parameters of Spark tasks are read in Parameter File, according to resource allocation
Parameter carries out physical source distributing;Spark tasks are executed based on physical resource, monitor the execution efficiency of Spark tasks;According to prison
Result is surveyed to be adjusted the resource allocation parameters in Parameter File;By Spark task schedulings to the resource allocation after adjustment
The adaptable physical resource of parameter executes.
In one embodiment, resource distribution module 408 is additionally operable to compare whether execution efficiency is less than threshold value;It calculates
The corresponding task total amount of Spark tasks and task duration;If so, calculating remaining task according to task total amount and task execution amount
Amount;Residual time length is calculated according to task duration and current timing node;Calculate needs according to remaining task amount and residual time length
Newly-increased physical resource;Otherwise, according to the resource using information of the two neighboring timing node of operational information recording, computing resource
Utilization rate;The physical resource discharged is needed according to resource utilization measuring and calculating;According to results of measuring adjustresources allocation of parameters.
In one embodiment, data screening module 402 is additionally operable to extract one or more current passes in data record
Keyword forms current keyword set;The corresponding history keyword set of words of a plurality of historical record is obtained from source database;
Recognize whether the history keyword set of words to match with current keyword set;If so, extracting supplement in data record
Keyword;Keyword index is established according to the supplement keyword and current key word that extract, using keyword index as data
The data summarization of record.
In one embodiment, data pull request carries system banner and user identifier;Data simultaneous module 406 is also
For according to user identifier, detecting whether that there are corresponding data records in message queue;If so, data is called to synchronize foot
This;It includes multiple labels that data, which synchronize script,;The corresponding configuration file of system banner is obtained, data are synchronized based on configuration file
Label in script is replaced, and is updated with synchronizing script to data;Script is synchronized by executing updated data, it will
Data record corresponding with user identifier is synchronized to target database in message queue.
In one embodiment, it includes splitting script that data, which synchronize script, and data simultaneous module 406 is additionally operable to calculate user
Identify the data volume of corresponding data record;Whether detection data amount is more than target data amount;It will be used if so, calling and splitting script
Family identifies corresponding data record and is split as multiple data groups;Call multithreading that multiple data groups are synchronized to target database.
Specific restriction about the data synchronization unit based on Spark may refer to above for the number based on Spark
According to the restriction of synchronous method, details are not described herein.Modules in the above-mentioned data synchronization unit based on Spark can all or
It is realized by software, hardware and combinations thereof part.Above-mentioned each module can be embedded in or be set independently of computer in the form of hardware
It in processor in standby, can also in a software form be stored in the memory in computer equipment, in order to which processor calls
Execute the corresponding operation of the above modules.
In one embodiment, a kind of computer equipment is provided, which can be server, internal junction
Composition can be as shown in Figure 5.The computer equipment include the processor connected by system bus, memory, network interface and
Database.Wherein, the processor of the computer equipment is for providing calculating and control ability.The memory packet of the computer equipment
Include non-volatile memory medium, built-in storage.The non-volatile memory medium is stored with operating system, computer program and data
Library.The built-in storage provides environment for the operation of operating system and computer program in non-volatile memory medium.The calculating
The database of machine equipment is used for storing history and corresponding historical summaries.The network interface of the computer equipment be used for it is outer
The terminal in portion is communicated by network connection.To realize a kind of data based on Spark when the computer program is executed by processor
Synchronous method.
It will be understood by those skilled in the art that structure shown in Fig. 5, is only tied with the relevant part of application scheme
The block diagram of structure does not constitute the restriction for the computer equipment being applied thereon to application scheme, specific computer equipment
May include either combining certain components than more or fewer components as shown in the figure or being arranged with different components.
In one embodiment, a kind of computer equipment, including memory and processor are provided, which is stored with
Computer program, the processor realize following steps when executing computer program:Obtain the business datum that Spark tasks generate;
Business datum includes a plurality of data record;It generates and records corresponding data summarization per data;It is obtained from source database a plurality of
The corresponding historical summaries of historical record;Each data summarization and a plurality of historical summaries of storage are compared, obtained new
The data summarization of increasing;Message queue is written into the newly-increased corresponding data record of data summarization;It is sent when receiving first terminal
Data pull request when, asked data record in message queue being synchronized to target database according to data pull.
In one embodiment, following steps are also realized when processor executes computer program:Second terminal is received to submit
Spark tasks and corresponding Parameter File;The resource allocation parameters of Spark tasks are read in Parameter File, according to resource point
Physical source distributing is carried out with parameter;Spark tasks are executed based on physical resource, monitor the execution efficiency of Spark tasks;According to
Monitoring result is adjusted the resource allocation parameters in Parameter File;Spark task schedulings are divided to the resource after adjustment
It is executed with the physical resource that parameter is adapted.
In one embodiment, following steps are also realized when processor executes computer program:Whether compare execution efficiency
Less than threshold value;Calculate the corresponding task total amount of Spark tasks and task duration;If so, according to task total amount and task execution amount
Calculate remaining task amount;Residual time length is calculated according to task duration and current timing node;According to remaining task amount and residue
Duration measuring and calculating needs newly-increased physical resource;Otherwise, it is used according to the resource of the two neighboring timing node of operational information recording
Information, computing resource utilization rate;The physical resource discharged is needed according to resource utilization measuring and calculating;According to results of measuring adjustresources
Allocation of parameters.
In one embodiment, following steps are also realized when processor executes computer program:It is extracted in data record
One or more current key words form current keyword set;A plurality of historical record is obtained from source database to correspond to respectively
History keyword set of words;Recognize whether the history keyword set of words to match with current keyword set;If so, in number
According to extraction supplement keyword in record;Keyword index is established according to the supplement keyword and current key word that extract, will be closed
Keyword indexes the data summarization as data record.
In one embodiment, data pull request carries system banner and user identifier;Processor executes computer
Following steps are also realized when program:According to user identifier, detect whether that there are corresponding data records in message queue;If
It is that data is called to synchronize script;It includes multiple labels that data, which synchronize script,;The corresponding configuration file of system banner is obtained, is based on
Configuration file synchronizes the label in script to data and is replaced, and is updated with synchronizing script to data;By executing update
Data afterwards synchronize script, and data record corresponding with user identifier in message queue is synchronized to target database.
In one embodiment, it includes splitting script that data, which synchronize script, and processor is also realized when executing computer program
Following steps:Calculate the data volume of the corresponding data record of user identifier;Whether detection data amount is more than target data amount;If
It is to call fractionation script that the corresponding data record of user identifier is split as multiple data groups;Call multithreading by multiple data
Group is synchronized to target database.
In one embodiment, a kind of computer readable storage medium is provided, computer program is stored thereon with, is calculated
Machine program realizes following steps when being executed by processor:Obtain the business datum that Spark tasks generate;Business datum includes a plurality of
Data record;It generates and records corresponding data summarization per data;A plurality of historical record is obtained from source database to correspond to respectively
Historical summaries;Each data summarization and a plurality of historical summaries of storage are compared, newly-increased data summarization is obtained;It will be new
Message queue is written in the corresponding data record of data summarization of increasing;When receiving the data pull request of first terminal transmission,
It is asked the data record in message queue being synchronized to target database according to data pull.
In one embodiment, following steps are also realized when computer program is executed by processor:Second terminal is received to carry
The Spark tasks of friendship and corresponding Parameter File;The resource allocation parameters of Spark tasks are read in Parameter File, according to resource
Allocation of parameters carries out physical source distributing;Spark tasks are executed based on physical resource, monitor Spark task execution efficiency;According to
Monitoring result is adjusted the resource allocation parameters in Parameter File;Spark task schedulings are divided to the resource after adjustment
It is executed with the physical resource that parameter is adapted.
In one embodiment, following steps are also realized when computer program is executed by processor:Comparing execution efficiency is
It is no to be less than threshold value;Calculate the corresponding task total amount of Spark tasks and task duration;If so, according to task total amount and task execution
Amount calculates remaining task amount;Residual time length is calculated according to task duration and current timing node;According to remaining task amount and remain
Remaining duration measuring and calculating needs newly-increased physical resource;Otherwise, made according to the resource of the two neighboring timing node of operational information recording
With information, computing resource utilization rate;The physical resource discharged is needed according to resource utilization measuring and calculating;It is adjusted and is provided according to results of measuring
Source allocation of parameters.
In one embodiment, following steps are also realized when computer program is executed by processor:It is carried in data record
One or more current key words are taken, current keyword set is formed;It is right respectively that a plurality of historical record is obtained from source database
The history keyword set of words answered;Recognize whether the history keyword set of words to match with current keyword set;If so,
Extraction supplement keyword in data record;Keyword index is established according to the supplement keyword and current key word that extract, it will
Data summarization of the keyword index as data record.
In one embodiment, data pull request carries system banner and user identifier;Computer program is handled
Device also realizes following steps when executing:According to user identifier, detect whether that there are corresponding data records in message queue;If
It is that data is called to synchronize script;It includes multiple labels that data, which synchronize script,;The corresponding configuration file of system banner is obtained, is based on
Configuration file synchronizes the label in script to data and is replaced, and is updated with synchronizing script to data;By executing update
Data afterwards synchronize script, and data record corresponding with user identifier in message queue is synchronized to target database.
In one embodiment, it includes splitting script that data, which synchronize script, and reality is gone back when computer program is executed by processor
Existing following steps:Calculate the data volume of the corresponding data record of user identifier;Whether detection data amount is more than target data amount;If
It is to call fractionation script that the corresponding data record of user identifier is split as multiple data groups;Call multithreading by multiple data
Group is synchronized to target database.
One of ordinary skill in the art will appreciate that realizing all or part of flow in above-described embodiment method, being can be with
Instruct relevant hardware to complete by computer program, computer program can be stored in a non-volatile computer readable
It takes in storage medium, the computer program is when being executed, it may include such as the flow of the embodiment of above-mentioned each method.Wherein, this Shen
Any reference to memory, storage, database or other media used in each embodiment please provided, may each comprise
Non-volatile and/or volatile memory.Nonvolatile memory may include read-only memory (ROM), programming ROM
(PROM), electrically programmable ROM (EPROM), electrically erasable ROM (EEPROM) or flash memory.Volatile memory may include
Random access memory (RAM) or external cache.By way of illustration and not limitation, RAM is available in many forms,
Such as static state RAM (SRAM), dynamic ram (DRAM), synchronous dram (SDRAM), double data rate sdram (DDRSDRAM), enhancing
Type SDRAM (ESDRAM), synchronization link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM
(RDRAM), direct memory bus dynamic ram (DRDRAM) and memory bus dynamic ram (RDRAM) etc..
Each technical characteristic of above example can be combined arbitrarily, to keep description succinct, not to above-described embodiment
In each technical characteristic it is all possible combination be all described, as long as however, the combination of these technical characteristics be not present lance
Shield is all considered to be the range of this specification record.
Above example only expresses the several embodiments of the application, the description thereof is more specific and detailed, but can not
Therefore it is construed as limiting the scope of the patent.It should be pointed out that for those of ordinary skill in the art,
Under the premise of not departing from the application design, various modifications and improvements can be made, these belong to the protection domain of the application.
Therefore, the protection domain of the application patent should be determined by the appended claims.
Claims (10)
1. a kind of method of data synchronization based on Spark, the method includes:
Obtain the business datum that Spark tasks generate;The business datum includes a plurality of data record;
Generate the corresponding data summarization of every data record;
The corresponding historical summaries of a plurality of historical record are obtained from source database;
Each data summarization and a plurality of historical summaries are compared, newly-increased data summarization is obtained;
Message queue is written into the corresponding data record of the newly-increased data summarization;
It, will be in the message queue according to data pull request when receiving the data pull request of first terminal transmission
Data record be synchronized to target database.
2. according to the method described in claim 1, it is characterized in that, it is described obtain Spark tasks generate business datum before,
Further include:
Receive the Spark tasks and corresponding Parameter File that second terminal is submitted;
The resource allocation parameters of the Spark tasks are read in the Parameter File, object is carried out according to the resource allocation parameters
Manage resource allocation;
The Spark tasks are executed based on the physical resource, monitor the execution efficiency of the Spark tasks;
The resource allocation parameters in the Parameter File are adjusted according to monitoring result;
The Spark task schedulings to the physical resource adaptable with the resource allocation parameters after adjustment are executed.
3. according to the method described in claim 2, it is characterized in that, it is described according to monitoring result to the money in the Parameter File
Source allocation of parameters is adjusted, including:
Compare whether the execution efficiency is less than threshold value;
Calculate the corresponding task total amount of the Spark tasks and task duration;
If so, calculating remaining task amount according to the task total amount and task execution amount;According to the task duration and currently
Timing node calculates residual time length;Newly-increased physical resource is needed according to the remaining task amount and residual time length measuring and calculating;
Otherwise, according to the resource using information of the two neighboring timing node of the operational information recording, computing resource utilization rate;
The physical resource discharged is needed according to resource utilization measuring and calculating;
The resource allocation parameters are adjusted according to results of measuring.
4. according to the method described in claim 1, it is characterized in that, described generate records corresponding data summarization per data,
Including:
One or more current key words are extracted in the data record, form current keyword set;
The corresponding history keyword set of words of a plurality of historical record is obtained from source database;
Recognize whether the history keyword set of words to match with the current keyword set;
If so, the extraction supplement keyword in the data record;
Establish keyword index according to the supplement keyword and the current key word that extract, using the keyword index as
The data summarization of the data record.
5. according to the method described in claim 1, it is characterized in that, data pull request carries system banner and user
Mark;It is described to be asked the data record in the message queue being synchronized to target database according to the data pull, including:
According to the user identifier, detect whether that there are corresponding data records in the message queue;
If so, data is called to synchronize script;It includes multiple labels that the data, which synchronize script,;
The corresponding configuration file of the system banner is obtained, synchronizing the label in script to data based on the configuration file carries out
It replaces, is updated with synchronizing script to data;
Script is synchronized by executing updated data, by data record corresponding with the user identifier in the message queue
It is synchronized to the corresponding target database of the system banner.
6. according to the method described in claim 5, it is characterized in that, it includes fractionation script, the general that the data, which synchronize script,
Data record corresponding with the user identifier is synchronized to the target database in the message queue, including:
Calculate the data volume of the corresponding data record of the user identifier;
Detect whether the data volume is more than target data amount;
If so, calling the fractionation script that the corresponding data record of the user identifier is split as multiple data groups;
Call multithreading that multiple data groups are synchronized to the target database.
7. a kind of data synchronization unit based on Spark, which is characterized in that described device includes:
Data screening module, the business datum for obtaining the generation of Spark tasks;The business datum includes a plurality of data note
Record;It generates and records corresponding data summarization per data;The corresponding history of a plurality of historical record is obtained from source database
Abstract;Each data summarization and a plurality of historical summaries of storage are compared, newly-increased data summarization is obtained;
Data memory module, for message queue to be written in the corresponding data record of the data summarization increased newly;
Data simultaneous module, for when receiving the data pull request of first terminal transmission, being asked according to the data pull
It asks and the data record in the message queue is synchronized to target database.
8. device according to claim 7, which is characterized in that described device further includes resource distribution module, for receiving
The Spark tasks and corresponding Parameter File that second terminal is submitted;The resource of the Spark tasks is read in the Parameter File
Allocation of parameters carries out physical source distributing according to the resource allocation parameters;The Spark is executed based on the physical resource to appoint
Business, monitors the execution efficiency of the Spark tasks;The resource allocation parameters in the Parameter File are carried out according to monitoring result
Adjustment;The Spark task schedulings to the physical resource adaptable with the resource allocation parameters after adjustment are executed.
9. a kind of computer equipment, including memory and processor, the memory are stored with computer program, feature exists
In when the processor executes the computer program the step of any one of realization claim 1 to 6 the method.
10. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program
The step of method according to any one of claims 1 to 6 is realized when being executed by processor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810620678.8A CN108804697A (en) | 2018-06-15 | 2018-06-15 | Method of data synchronization, device, computer equipment based on Spark and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810620678.8A CN108804697A (en) | 2018-06-15 | 2018-06-15 | Method of data synchronization, device, computer equipment based on Spark and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108804697A true CN108804697A (en) | 2018-11-13 |
Family
ID=64086584
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810620678.8A Pending CN108804697A (en) | 2018-06-15 | 2018-06-15 | Method of data synchronization, device, computer equipment based on Spark and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108804697A (en) |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109857803A (en) * | 2018-12-13 | 2019-06-07 | 杭州数梦工场科技有限公司 | Method of data synchronization, device, equipment, system and computer readable storage medium |
CN109947429A (en) * | 2019-03-13 | 2019-06-28 | 咪咕文化科技有限公司 | Data processing method and device |
CN110442588A (en) * | 2019-07-05 | 2019-11-12 | 深圳壹账通智能科技有限公司 | Information synchronous updating method, device, computer equipment and storage medium |
CN110705816A (en) * | 2019-08-14 | 2020-01-17 | 中国平安人寿保险股份有限公司 | Task allocation method and device based on big data |
CN111190949A (en) * | 2018-11-15 | 2020-05-22 | 杭州海康威视数字技术股份有限公司 | Data storage and processing method, device, equipment and medium |
CN111258746A (en) * | 2018-11-30 | 2020-06-09 | 阿里巴巴集团控股有限公司 | Resource allocation method and service equipment |
CN111382206A (en) * | 2020-03-20 | 2020-07-07 | 北京奇艺世纪科技有限公司 | Data storage method and device |
CN111435356A (en) * | 2019-01-15 | 2020-07-21 | 杭州海康威视数字技术股份有限公司 | Data feature extraction method and device, computer equipment and storage medium |
CN112245906A (en) * | 2020-11-18 | 2021-01-22 | 腾讯科技(深圳)有限公司 | Data synchronization method and device, electronic equipment and storage medium |
CN112347099A (en) * | 2020-10-27 | 2021-02-09 | 口碑(上海)信息技术有限公司 | Data processing method and device, computing equipment and computer readable storage medium |
CN112507020A (en) * | 2020-11-20 | 2021-03-16 | 平安普惠企业管理有限公司 | Data synchronization method and device, computer equipment and storage medium |
CN112905539A (en) * | 2021-03-25 | 2021-06-04 | 芝麻链(北京)科技有限公司 | Automatic data storage method and device based on message digest |
CN113449035A (en) * | 2021-06-29 | 2021-09-28 | 平安健康保险股份有限公司 | Data synchronization method and device, computer equipment and readable storage medium |
CN113806372A (en) * | 2021-09-29 | 2021-12-17 | 中国平安人寿保险股份有限公司 | New data information construction method and device, computer equipment and storage medium |
CN114817342A (en) * | 2022-07-04 | 2022-07-29 | 杭州安恒信息技术股份有限公司 | Data synchronization method, device, equipment and medium |
WO2023185309A1 (en) * | 2022-03-28 | 2023-10-05 | 京东方科技集团股份有限公司 | Data synchronization method and system, and computer-readable storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103220183A (en) * | 2013-05-02 | 2013-07-24 | 杭州电子科技大学 | Implement method of Hadoop high-availability system based on double-main-engine warm backup |
CN104142861A (en) * | 2013-05-10 | 2014-11-12 | 中国电信股份有限公司 | Processing method and processing device for configuration of server resources |
CN106681863A (en) * | 2016-12-30 | 2017-05-17 | 北京天健源达科技有限公司 | Method for storing edited contents of electronic medical records and terminal equipment |
CN106980699A (en) * | 2017-04-14 | 2017-07-25 | 中国科学院深圳先进技术研究院 | A kind of data processing platform (DPP) and system |
CN107515786A (en) * | 2017-08-04 | 2017-12-26 | 北京奇虎科技有限公司 | Resource allocation methods, master device, from device and distributed computing system |
CN107908631A (en) * | 2017-07-25 | 2018-04-13 | 平安科技(深圳)有限公司 | Data processing method, device, storage medium and computer equipment |
-
2018
- 2018-06-15 CN CN201810620678.8A patent/CN108804697A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103220183A (en) * | 2013-05-02 | 2013-07-24 | 杭州电子科技大学 | Implement method of Hadoop high-availability system based on double-main-engine warm backup |
CN104142861A (en) * | 2013-05-10 | 2014-11-12 | 中国电信股份有限公司 | Processing method and processing device for configuration of server resources |
CN106681863A (en) * | 2016-12-30 | 2017-05-17 | 北京天健源达科技有限公司 | Method for storing edited contents of electronic medical records and terminal equipment |
CN106980699A (en) * | 2017-04-14 | 2017-07-25 | 中国科学院深圳先进技术研究院 | A kind of data processing platform (DPP) and system |
CN107908631A (en) * | 2017-07-25 | 2018-04-13 | 平安科技(深圳)有限公司 | Data processing method, device, storage medium and computer equipment |
CN107515786A (en) * | 2017-08-04 | 2017-12-26 | 北京奇虎科技有限公司 | Resource allocation methods, master device, from device and distributed computing system |
Cited By (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111190949B (en) * | 2018-11-15 | 2023-09-26 | 杭州海康威视数字技术股份有限公司 | Data storage and processing method, device, equipment and medium |
CN111190949A (en) * | 2018-11-15 | 2020-05-22 | 杭州海康威视数字技术股份有限公司 | Data storage and processing method, device, equipment and medium |
CN111258746B (en) * | 2018-11-30 | 2023-04-25 | 阿里巴巴集团控股有限公司 | Resource allocation method and service equipment |
CN111258746A (en) * | 2018-11-30 | 2020-06-09 | 阿里巴巴集团控股有限公司 | Resource allocation method and service equipment |
CN109857803A (en) * | 2018-12-13 | 2019-06-07 | 杭州数梦工场科技有限公司 | Method of data synchronization, device, equipment, system and computer readable storage medium |
CN111435356A (en) * | 2019-01-15 | 2020-07-21 | 杭州海康威视数字技术股份有限公司 | Data feature extraction method and device, computer equipment and storage medium |
CN109947429A (en) * | 2019-03-13 | 2019-06-28 | 咪咕文化科技有限公司 | Data processing method and device |
CN109947429B (en) * | 2019-03-13 | 2022-07-26 | 咪咕文化科技有限公司 | Data processing method and device |
CN110442588A (en) * | 2019-07-05 | 2019-11-12 | 深圳壹账通智能科技有限公司 | Information synchronous updating method, device, computer equipment and storage medium |
CN110705816A (en) * | 2019-08-14 | 2020-01-17 | 中国平安人寿保险股份有限公司 | Task allocation method and device based on big data |
CN110705816B (en) * | 2019-08-14 | 2023-08-25 | 中国平安人寿保险股份有限公司 | Task allocation method and device based on big data |
CN111382206A (en) * | 2020-03-20 | 2020-07-07 | 北京奇艺世纪科技有限公司 | Data storage method and device |
CN111382206B (en) * | 2020-03-20 | 2024-03-15 | 北京奇艺世纪科技有限公司 | Data storage method and device |
CN112347099A (en) * | 2020-10-27 | 2021-02-09 | 口碑(上海)信息技术有限公司 | Data processing method and device, computing equipment and computer readable storage medium |
CN112245906B (en) * | 2020-11-18 | 2023-08-25 | 腾讯科技(深圳)有限公司 | Data synchronization method, device, electronic equipment and storage medium |
CN112245906A (en) * | 2020-11-18 | 2021-01-22 | 腾讯科技(深圳)有限公司 | Data synchronization method and device, electronic equipment and storage medium |
CN112507020A (en) * | 2020-11-20 | 2021-03-16 | 平安普惠企业管理有限公司 | Data synchronization method and device, computer equipment and storage medium |
CN112905539A (en) * | 2021-03-25 | 2021-06-04 | 芝麻链(北京)科技有限公司 | Automatic data storage method and device based on message digest |
CN113449035A (en) * | 2021-06-29 | 2021-09-28 | 平安健康保险股份有限公司 | Data synchronization method and device, computer equipment and readable storage medium |
CN113806372A (en) * | 2021-09-29 | 2021-12-17 | 中国平安人寿保险股份有限公司 | New data information construction method and device, computer equipment and storage medium |
CN113806372B (en) * | 2021-09-29 | 2024-02-06 | 中国平安人寿保险股份有限公司 | New data information construction method, device, computer equipment and storage medium |
WO2023185309A1 (en) * | 2022-03-28 | 2023-10-05 | 京东方科技集团股份有限公司 | Data synchronization method and system, and computer-readable storage medium |
CN114817342A (en) * | 2022-07-04 | 2022-07-29 | 杭州安恒信息技术股份有限公司 | Data synchronization method, device, equipment and medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108804697A (en) | Method of data synchronization, device, computer equipment based on Spark and storage medium | |
CN109271447A (en) | Method of data synchronization, device, computer equipment and storage medium | |
CN106802826B (en) | Service processing method and device based on thread pool | |
US10567397B2 (en) | Security-based container scheduling | |
KR101600129B1 (en) | Application efficiency engine | |
CN102279730B (en) | Parallel data processing method, device and system | |
CN103365700B (en) | A kind of facing cloud calculates monitoring resource and the adjustment System of virtualized environment | |
CN108845884A (en) | Physical source distributing method, apparatus, computer equipment and storage medium | |
US8656134B2 (en) | Optimized memory configuration deployed on executing code | |
CN108920153B (en) | Docker container dynamic scheduling method based on load prediction | |
CN108279892A (en) | It is a kind of to split the method, apparatus and equipment that large-scale application service is micro services | |
CN101944114A (en) | Data synchronization method between memory database and physical database | |
US8959518B2 (en) | Window-based scheduling using a key-value data store | |
CN109492017A (en) | Business information inquiry processing method, system, computer equipment and storage medium | |
CN109814995A (en) | Method for scheduling task, device, computer equipment and storage medium | |
CN112527310A (en) | Multi-tenant data isolation method and device, computer equipment and storage medium | |
CN110442752A (en) | Organizational structure drawing generating method, device, computer equipment and storage medium | |
CN109933338B (en) | Block chain deployment method, device, computer equipment and storage medium | |
CN107870727A (en) | Method and apparatus for data storage | |
CN114416352A (en) | Computing resource allocation method and device, electronic equipment and storage medium | |
CN112347076B (en) | Data storage method and device of distributed database and computer equipment | |
CN106293541A (en) | A kind of blog management method storing system and system | |
CN102929929B (en) | A kind of data summarization method and device | |
CN110765162A (en) | Data comparison method and device, computer equipment and storage medium | |
CN112667592A (en) | Data storage method and device, computer equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |