CN111026768A - Data synchronization method and device capable of realizing rapid loading of data - Google Patents

Data synchronization method and device capable of realizing rapid loading of data Download PDF

Info

Publication number
CN111026768A
CN111026768A CN201910981942.5A CN201910981942A CN111026768A CN 111026768 A CN111026768 A CN 111026768A CN 201910981942 A CN201910981942 A CN 201910981942A CN 111026768 A CN111026768 A CN 111026768A
Authority
CN
China
Prior art keywords
data
loading
file
task queue
thread
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910981942.5A
Other languages
Chinese (zh)
Inventor
梅纲
付铨
周淳
张驻西
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Dameng Database Co Ltd
Original Assignee
Wuhan Dameng Database Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Dameng Database Co Ltd filed Critical Wuhan Dameng Database Co Ltd
Priority to CN201910981942.5A priority Critical patent/CN111026768A/en
Publication of CN111026768A publication Critical patent/CN111026768A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2379Updates performed during online database operations; commit processing
    • G06F16/2386Bulk updating operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2308Concurrency control
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of data synchronization, in particular to a data synchronization method and a data synchronization device capable of realizing rapid data loading, wherein the method comprises the following steps: reading data from a data source and putting the read data into a cache; the producer thread generates a plurality of data files in batches from the data in the cache, and puts the generated data files into a task queue; a consumer thread calls a batch data loading command specific to a target database, and data files to be loaded in a task queue are loaded to the target database one by one; the data file generation process corresponding to the producer thread and the data batch loading process corresponding to the consumer thread are carried out in parallel. The invention adopts a producer-consumer model, continuously generates data files in batches during loading, and calls a loading command special for a target database to load data, the generation of the data files and the execution of the data loading command can be carried out in parallel, the performance loss during serial execution is reduced, and the synchronization efficiency is improved.

Description

Data synchronization method and device capable of realizing rapid loading of data
[ technical field ] A method for producing a semiconductor device
The invention relates to the technical field of data synchronization, in particular to a data synchronization method and device capable of realizing rapid loading of data.
[ background of the invention ]
Currently, a typical data synchronization process generally includes two steps: firstly, reading data from a data source and putting the data into a cache, namely reading the data; and secondly, reading data from the cache and loading the data into a target database, namely data loading. Assuming that the source end is a relational Database, a typical Data synchronization process is shown in fig. 1, and a Data loading process generally calls a jdbc (java Data Base connectivity) or odbc (open Data Base connectivity) interface to access a destination Database. JDBC or ODBC are generic database programming interfaces that provide a complete set of programming interfaces that are independent of any particular database, so that applications do not need to be bound to different programming interfaces for each database. When a database manufacturer releases the database, the database manufacturer simultaneously releases a driver which conforms to the JDBC and ODBC standards so that an application program can access the database by using the driver. Since JDBC or ODBC is a common standard, the operation flow when loading data by calling JDBC or ODBC interface is generally fixed, as shown in fig. 2.
The data loading flow shown in fig. 2 is not problematic for general database applications, but the performance is often difficult to meet the user requirement when loading large data volume, because JDBC and ODBC interfaces generally do not support batch data loading commands specific to database products to ensure universality, and it is difficult to implement fast data loading. To realize the rapid loading of data, a batch data loading command specific to a database is used for realizing the rapid loading of the data. The special batch data loading command of the database is generally provided for a database administrator to be directly used for rapidly importing and exporting data, and the interior of the special batch data loading command is optimized more, compared with a universal interface, the special batch data loading command has the advantages that the performance is several times to dozens of times faster than that of a mode using JDBC and ODBC interfaces, and rapid batch loading of data can be carried out; but the disadvantage is that the data is required to be present on the disk in the form of files with specified format when in use, and the batch data loading commands and required data formats of different destination databases are different and have no universality. The following is an example of a batch data load command for a mysql database:
load data infile"/data/mysql/test.txt"into table t1fieldsterminatedby',';
the command indicates that data in the/data/mysql/test.txt file is loaded into the t1 table, the file test.txt is a text file, and the fields are separated by a separator "comma". The support of the above-mentioned type of bulk loading by the current common data integration tool requires that the loaded data file must already exist on the disk, and the bulk loading of data from the data source to the destination table cannot be directly performed. If a user wants to use batch loading, the user must export the source table data into a file first, and then uses a batch loading command to load the data, and the two steps must be executed in series, so that the use is troublesome, the efficiency is low, and the performance loss is large.
In view of the above, it is an urgent problem in the art to overcome the above-mentioned drawbacks of the prior art.
[ summary of the invention ]
The technical problems to be solved by the invention are as follows:
when batch loading is used in data synchronization, the data is required to be stored on a disk in a file form of a specified format, the source table data must be exported to the file, and then the data loading is carried out by using a batch loading command, the two steps of serial execution are complex in process, low in efficiency and large in performance loss; the batch loading commands and the required data formats of the databases with different purposes are different and have no universality.
The invention achieves the above purpose by the following technical scheme:
in a first aspect, the present invention provides a data synchronization method capable of implementing fast loading of data, where a producer thread, a task queue, and a consumer thread are arranged in a data loading node, and the data synchronization method includes:
reading data from a data source and putting the read data into a cache;
the producer thread generates a plurality of data files in batches from the data in the cache, and puts the generated data files into a task queue; wherein each data file contains one or more pieces of data;
the consumer thread calls a batch data loading command specific to a target database, and loads data files to be loaded in the task queue to the target database one by one;
and the data file generation process corresponding to the producer thread and the data batch loading process corresponding to the consumer thread are carried out in parallel.
Preferably, during the data synchronization process, the method further comprises:
counting the reading speed of data from a data source and the loading speed of data loaded to a target database in the synchronization process, and comparing the reading speed with the loading speed;
if the reading speed is equivalent to the loading speed, generating a virtual pipeline file in the linux system after the data are placed into the cache when the data are synchronized next time, and reading the data in the cache into the pipeline file; then positioning a special batch loading command of a target database to the pipeline file, and loading data in the pipeline file to the target database by using the batch loading command;
and if the reading speed and the loading speed are not proper, after the data are placed into the cache in the next data synchronization, the producer thread generates a plurality of data files in batches from the data in the cache, and places the generated data files into the task queue.
Preferably, when generating the data file, the producer thread generates the data file in the corresponding format according to the format requirement of the destination database.
Preferably, the producer thread generates a plurality of data files in batches from the data in the cache, and puts the generated data files into the task queue, specifically including:
the producer thread reads data to be loaded from the cache and writes the read data into a temporarily generated data file;
and the producer thread judges whether the data volume in the current data file reaches a preset threshold value, and if the data volume in the current data file does not reach the preset threshold value, the producer thread continues to read the next data to be loaded from the cache and write the next data into the current data file.
Preferably, if the data volume in the current data file reaches a preset threshold value, continuously checking whether the task number in the task queue reaches the maximum task number limit;
if the number of the tasks in the task queue does not reach the maximum task number limit, the producer thread puts the currently written data file into the task queue;
and if the number of the tasks in the task queue reaches the maximum task number limit, the producer thread waits for the consumer thread to take out the tasks from the task queue, and then puts the currently written data file into the task queue.
Preferably, the consumer thread calls a batch data loading command specific to the destination database, and loads the data files to be loaded in the task queue to the destination database one by one, and the method specifically includes:
the consumer thread takes out the data files to be loaded from the task queue and generates a batch data loading command for loading the current data files according to the target database;
the consumer thread executes the batch data loading command, and then loads the currently taken data file to a target database;
and the consumer thread deletes the loaded data file, checks whether the data file to be loaded still exists in the task queue, and continues to load the next data file if the data file to be loaded exists in the task queue.
Preferably, at least two consumer threads are provided, and when data loading is performed, different data files in the task queue are loaded to the destination database by each consumer thread in a parallel mode.
Preferably, the data source is a relational database, a message queue, an XML file, a KV database, or a document database.
Preferably, the destination database is a relational database.
In a second aspect, the present invention provides a data synchronization apparatus capable of implementing data fast loading, including at least one processor and a memory, where the at least one processor and the memory are connected through a data bus, and the memory stores instructions executable by the at least one processor, where the instructions are used to complete the data synchronization method capable of implementing data fast loading according to the first aspect after being executed by the processor.
The invention has the beneficial effects that:
the data synchronization method provided by the invention improves the data loading process, a producer thread, a task queue and a consumer thread are arranged in a data loading node, a multi-thread model of a producer and a consumer is adopted, data files with required formats are continuously generated in batches during data loading, and a batch data loading command special for a target database is called to load data, the generation of the data files and the execution of the data loading command can be carried out in parallel, and the two steps are combined into one step, so that the performance loss during serial execution is reduced, the synchronization efficiency is improved, and the flow configuration is simplified.
[ description of the drawings ]
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required to be used in the embodiments of the present invention will be briefly described below. It is obvious that the drawings described below are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.
FIG. 1 is a diagram illustrating a conventional data synchronization process;
FIG. 2 is a flow chart of data loading when loading data using JDBC or ODBC interfaces in a conventional data synchronization process;
FIG. 3 is a diagram illustrating a data synchronization process and a multithreading mode according to an embodiment of the present invention;
fig. 4 is a flowchart of a data synchronization method for fast loading data according to an embodiment of the present invention;
FIG. 5 is a flowchart illustrating operations of a producer thread during data loading according to an embodiment of the present invention;
FIG. 6 is a flowchart illustrating the operation of a consumer thread during data loading according to an embodiment of the present invention;
fig. 7 is an architecture diagram of a data synchronization apparatus capable of implementing fast data loading according to an embodiment of the present invention.
[ detailed description ] embodiments
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
In the embodiments of the present invention, the symbol "/" indicates the meaning of having both functions, and the symbol "a and/or B" indicates that the combination between the preceding and following objects connected by the symbol includes three cases of "a", "B", "a and B".
In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other. The invention will be described in detail below with reference to the figures and examples.
Example 1:
the embodiment of the invention provides a data synchronization method capable of realizing rapid loading of data, wherein a target database (namely a target end) is a relational database, and a data source end is not limited and can be various types of data sources, such as a relational database, a message queue, an XML file, a KV database, a document database and the like.
Compared with the traditional data synchronization process, the method mainly improves the data loading step, so that the method can utilize the batch loading command specific to various databases, thereby improving the performance. As shown in FIG. 3, in the present invention, the data loading node does not simply fetch data from the data cache and then call the jdbc/odbc interface to write the data into the destination database, but adopts a producer-consumer concurrence model, and multiple threads work in parallel, that is, a producer thread, a task queue and a consumer thread are arranged in the data loading node. Based on the model provided in fig. 3, the data synchronization method provided in the embodiment of the present invention may refer to fig. 4, and mainly includes the following steps:
step 201, reading data from a data source, and placing the read data into a cache.
This step is data reading, and the data in the data source is read into the cache one by one, which is the same as the data reading step in the conventional data synchronization process, and is not described herein again.
Step 202, the producer thread generates a plurality of data files in batches from the data in the cache, and puts the generated data files into a task queue; wherein each data file contains one or more pieces of data.
Step 203, the consumer thread calls a batch data loading command specific to the target database, and loads the data files to be loaded in the task queue to the target database one by one.
Wherein, the step 202 and the step 203 together form a data loading process: step 202 is the generation of a data file, completed by the producer thread; step 203 is the execution of a bulk data load command, completed by the consumer thread. When generating the data file, the producer thread can generate the data file with a corresponding format according to the format requirement of the destination database.
When generating a data file, the producer thread may generate a plurality of data files in batch continuously, for example, one data file is generated for every 10000 pieces of data in the cache; after the producer thread generates a data file and writes the data file into the task queue, the producer thread continues to generate a next data file; and when the producer thread generates the data file, the consumer thread executes a batch data loading command on the existing data files to be loaded in the task queue, and loads the data files to the target database one by one. That is to say, the data file generation process corresponding to the producer thread and the data batch loading process corresponding to the consumer thread are performed in parallel, and the two threads do not interfere with each other, so that the performance loss in the conventional serial execution can be reduced, and the loading efficiency can be improved.
In the traditional data loading process, the source table data needs to be exported to a file, then a batch loading command is used for data loading, two steps of serial execution are performed, and only a single file is generated at the moment, so that the file memory is too large, too much disk space is occupied, the data loading time of the single file is too long, and the problem that the state of the interface statistical information is not updated timely is easily caused. According to the embodiment of the invention, a plurality of data files can be continuously generated in batches, and the loading command is respectively executed for each data file, so that the problems that the single file generated is overlarge, the space of a disk is excessively occupied, and the data loading time of the single file is excessively long, so that the state of the statistical information of the interface is not updated timely are avoided. In addition, the invention combines the traditional two-step operation into one-step operation in practice by dynamically generating the data file and executing the batch data loading command during data loading, thereby simplifying the flow configuration.
The data synchronization method provided by the embodiment of the invention mainly improves the data loading process, a producer thread, a task queue and a consumer thread are arranged in a data loading node, a producer-consumer concurrent model is adopted, a plurality of threads work in a parallel and cooperative mode, data files in a required format are continuously generated in batches during data loading, a batch data loading command specific to a target database can be called to load data, the generation of the data files and the execution of the data loading command can be carried out in parallel, and the two steps are combined into one step, so that the performance loss during serial execution is reduced, the synchronization efficiency is improved, and the flow configuration is simplified.
In addition to the "producer-consumer" multi-threaded task processing model shown in fig. 3, pipeline files can be generated to achieve fast batch loading of data. Then, in the data synchronization process, the method further includes:
and counting the reading speed of the data from the data source and the loading speed of the data loaded into the destination database in the synchronization process, and comparing the reading speed with the loading speed. The specific reading speed and the loading speed can be obtained through related test messages; or counting the data volume read from the data source and the data volume loaded into the destination database within a specified time, and calculating to obtain the corresponding reading speed and loading speed. The reading speed and the loading speed are mainly related to the hardware configuration of the source end server and the destination end server and the occupied resources of the service execution content of the source end server and the destination end server, and the more services run by the source end server and the destination end server, the more the rate is affected.
If the reading speed and the loading speed are equivalent after comparison, a virtual pipeline file can be generated in the linux system firstly after data is read from the data source and put into the cache when the data is synchronized next time, and the data in the cache is read into the pipeline file; and then positioning a batch loading command specific to the target database to the pipeline file, and loading the data in the pipeline file to the target database by using the batch loading command. Wherein, the speeds are similar, a preset difference value can be set specifically, if the difference between the two speeds is smaller than the preset difference value, the two speeds can be considered to be equivalent; if the difference is larger than the preset difference, the two speeds are considered not to be appropriate. In addition, the pipeline file is a virtual file which is simulated by the linux system server by using the memory of the linux system server and does not actually exist, so that the loading mode does not need to generate an intermediate file, batch loading of data can be realized by using a virtual transmission mode, the loading process is further simplified, and the loading efficiency is improved.
If the reading speed and the loading speed are found to be not appropriate after the comparison, after the data is read from the data source and put into the cache at the next data synchronization, the producer thread generates a plurality of data files in batches from the data in the cache, and puts the generated data files into the task queue, that is, continuously executing the step 202 and the step 203 downwards according to the illustration in fig. 3 and fig. 4, and completing the fast batch loading of the data by using a 'producer-consumer' multithread task processing model.
The pipeline file is mainly generated by using the pipeline technology in the linux system server, so that the corresponding data loading mode can be directly selected according to the type of the system besides the comparison speed. If the system is a linux system, a pipeline file loading mode can be directly adopted, the process of generating the intermediate file is skipped, and data loading is finished; if the system is not a linux system, such as a windows system, data loading is realized by dynamically generating data files and executing a batch loading command by adopting a 'producer-consumer' multi-thread task processing model. In any loading mode, the data can be loaded quickly by using a batch data loading command specific to the target database.
Further, as shown in step 202, the main function of the producer thread is to generate a temporary data file from the data in the buffer and place the generated data file into the task queue. The operation flow can refer to fig. 5, and mainly includes the following steps:
first, the producer thread reads a piece of data to be loaded from a cache.
Secondly, the producer thread writes the read data into a temporarily generated data file; for convenience of description herein, the currently generated data file may be referred to as data file a.
Thirdly, the producer thread determines whether the amount of data in the current data file a reaches a preset threshold, i.e. reaches a maximum limit, for example 500000 pieces of data. The size limit of the data file is set here because the data file loaded when the consumer thread executes the fast load command must be complete; if the size of the data file is not limited, but the entire data is written directly into a data file, the consumer thread must wait for the producer thread to write all the data to begin loading. In this way, although multithreading is enabled, sequential execution is essential, which is disadvantageous for improving performance, and therefore, a method of writing data to a data file in batches is adopted. In short, the preset threshold is set to generate data files in batches, so that parallel execution of data file generation and data batch loading is guaranteed.
And if the data volume in the current data file a does not reach the preset threshold value, the producer thread continues to read the next data to be loaded from the cache and write the next data into the current data file a, and then the judging process is continuously repeated until the data volume in the current data file a reaches the preset threshold value.
If the data amount in the current data file a reaches the preset threshold, the data file a is written completely, and the data cannot be written into the data file a any more, and then the fourth step is executed downwards.
Fourthly, the producer thread checks whether the number of the tasks in the task queue reaches the maximum task number limit, namely, whether the task queue is full is judged. The reason for setting the task number limit for the task queue is to prevent the tasks waiting in the task queue from occupying too much disk space. Because the speed of writing the local file is much faster than the speed of reading the file loading database, namely the speed of generating the task is much faster than the speed of executing the task, the task queue is not empty under most conditions, the condition that the waiting is not needed when a consumer thread obtains the task without setting the limit of the number of tasks of the task queue to be too large can be ensured, and the concurrency of the system is ensured. In an embodiment of the present invention, the maximum number of tasks limit of the task queue may be set to 3.
And if the number of the tasks in the task queue does not reach the maximum task number limit, the producer thread puts the currently written data file a into the task queue and continues to execute the fifth step downwards.
And if the number of the tasks in the task queue reaches the maximum task number limit, the producer thread needs to wait for the consumer thread to take out the tasks from the task queue, then put the currently written data file a into the task queue, and continue to execute the fifth step downwards.
Fifthly, judging whether the reading of the data in the cache is finished. If the cache has data, the producer thread continues to read the next piece of data to be loaded from the cache and writes the data into the newly generated data file b, namely, the next batch of data is written, and the process is repeatedly executed until the synchronization is completed; if the cache has no data, the whole synchronization process is finished, and the operation is finished.
Further, as shown in step 203, the main function of the consumer thread is to call a batch data loading command specific to the destination database, and load the data file to be loaded in the task queue into the destination database. Since the speed of generating the data file (i.e., "write file" in fig. 3) is generally faster than the speed of loading the file into the database (i.e., "read file" in fig. 3), the number of consumer threads can be increased to further improve the performance of data loading, that is, at least two consumer threads are provided, and when data loading is performed, each consumer thread loads different data files in the task queue into the destination database in a parallel manner. The extent to which this approach can continue to improve loading performance depends on the concurrency performance of the destination database, and the extent of improvement may vary from database to database.
No matter what the set number of the consumer threads is, the operation flow of each consumer thread can refer to fig. 6, which mainly includes the following steps:
firstly, the consumer thread takes out a data file to be loaded from a task queue; for convenience of description, the currently fetched data file may be referred to as data file a.
Second, the consumer thread generates a batch data load command for loading the current data file a according to the destination database. Wherein the batch data load command is specific to the current destination database.
Thirdly, the consumer thread executes the batch data loading command, and then loads the currently taken data file a to a target database.
Fourthly, the consumer thread deletes the loaded data file a.
Fifth, the consumer thread checks whether there are more tasks in the task queue, i.e., whether there are more data files to load. If so, continuing to load the next data file, for example, if the next data file to be loaded in the task queue is b, continuing to load the data file b to the destination database by the consumer thread according to the process; and if no new task exists in the task queue, the whole synchronization process is finished, and the operation is finished.
In summary, the data synchronization method provided by the embodiment of the present invention has the following advantages:
when the data is loaded, a multi-thread model of a producer-consumer is adopted, a batch data loading command special for a target database can be called to load the data, the generation of the data file and the execution of the data loading command can be carried out in parallel, the performance loss during serial execution is reduced, and the synchronization efficiency is improved;
during data loading, a mode of dynamically generating data files and executing a quick loading command is adopted, and two steps of operation are actually combined into one step of operation, so that the flow configuration is simplified;
when the data files are generated, a plurality of data files are continuously generated in batches, and loading commands are respectively executed on each file, so that the problems that a single file is generated to be overlarge, and the space of a disk is excessively occupied and the data loading time of the single file is excessively long are solved;
when the reading speed and the loading speed are the same or when the method is applied to a linux system, batch loading of data can be completed in a pipeline file mode directly without generating an intermediate file, the loading process is further simplified, and the loading efficiency is improved;
at least two consumer threads can be set, when data loading is carried out, different data files in the task queue are respectively loaded to the target database by each consumer thread in a parallel mode, and the data loading performance is further improved by increasing the number of the consumer threads.
Example 2:
on the basis of the data synchronization method capable of implementing fast data loading provided in embodiment 1, the present invention further provides a data synchronization apparatus capable of implementing fast data loading, which is used to implement the method described above, and as shown in fig. 7, is a schematic diagram of an apparatus architecture in an embodiment of the present invention. The data synchronization device capable of realizing data fast loading of the embodiment comprises one or more processors 21 and a memory 22. In fig. 7, one processor 21 is taken as an example.
The processor 21 and the memory 22 may be connected by a bus or other means, and fig. 7 illustrates the connection by a bus as an example.
The memory 22, which is a non-volatile computer-readable storage medium for implementing a data synchronization method for fast loading data, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules, such as the data synchronization method for fast loading data in embodiment 1. The processor 21 executes various functional applications and data processing of the data synchronization apparatus capable of realizing fast loading of data, that is, implements the data synchronization method capable of realizing fast loading of data according to embodiment 1, by executing the nonvolatile software program, instructions, and modules stored in the memory 22.
The memory 22 may include high speed random access memory and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, the memory 22 may optionally include memory located remotely from the processor 21, and these remote memories may be connected to the processor 21 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The program instructions/modules are stored in the memory 22, and when executed by the one or more processors 21, perform the data synchronization method capable of implementing data fast loading in the above embodiment 1, for example, perform the steps shown in fig. 4 to fig. 6 described above.
Those of ordinary skill in the art will appreciate that all or part of the steps of the various methods of the embodiments may be implemented by associated hardware as instructed by a program, which may be stored on a computer-readable storage medium, which may include: a Read Only Memory (ROM), a Random Access Memory (RAM), a magnetic or optical disk, or the like.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims (10)

1. A data synchronization method capable of realizing data fast loading is characterized in that a producer thread, a task queue and a consumer thread are arranged in a data loading node, and the data synchronization method comprises the following steps:
reading data from a data source and putting the read data into a cache;
the producer thread generates a plurality of data files in batches from the data in the cache, and puts the generated data files into a task queue; wherein each data file contains one or more pieces of data;
the consumer thread calls a batch data loading command specific to a target database, and loads data files to be loaded in the task queue to the target database one by one;
and the data file generation process corresponding to the producer thread and the data batch loading process corresponding to the consumer thread are carried out in parallel.
2. The data synchronization method capable of realizing data fast loading according to claim 1, wherein in the data synchronization process, the method further comprises:
counting the reading speed of data from a data source and the loading speed of data loaded to a target database in the synchronization process, and comparing the reading speed with the loading speed;
if the reading speed is equivalent to the loading speed, generating a virtual pipeline file in the linux system after the data are placed into the cache when the data are synchronized next time, and reading the data in the cache into the pipeline file; then positioning a special batch loading command of a target database to the pipeline file, and loading data in the pipeline file to the target database by using the batch loading command;
and if the reading speed and the loading speed are not proper, after the data are placed into the cache in the next data synchronization, the producer thread generates a plurality of data files in batches from the data in the cache, and places the generated data files into the task queue.
3. The data synchronization method capable of realizing data fast loading according to claim 1, wherein when generating the data file, the producer thread generates the data file with a corresponding format according to the format requirement of the destination database.
4. The data synchronization method capable of achieving fast data loading according to claim 1, wherein the producer thread generates a plurality of data files in batches from the data in the cache, and puts the generated data files into the task queue, specifically comprising:
the producer thread reads data to be loaded from the cache and writes the read data into a temporarily generated data file;
and the producer thread judges whether the data volume in the current data file reaches a preset threshold value, and if the data volume in the current data file does not reach the preset threshold value, the producer thread continues to read the next data to be loaded from the cache and write the next data into the current data file.
5. The data synchronization method capable of realizing data fast loading according to claim 4, characterized in that if the data amount in the current data file has reached a preset threshold, the method continues to check whether the number of tasks in the task queue reaches a maximum task number limit;
if the number of the tasks in the task queue does not reach the maximum task number limit, the producer thread puts the currently written data file into the task queue;
and if the number of the tasks in the task queue reaches the maximum task number limit, the producer thread waits for the consumer thread to take out the tasks from the task queue, and then puts the currently written data file into the task queue.
6. The data synchronization method capable of achieving fast data loading according to claim 1, wherein the consumer thread calls a batch data loading command specific to a destination database to load data files to be loaded in a task queue to the destination database one by one, and specifically comprises:
the consumer thread takes out the data files to be loaded from the task queue and generates a batch data loading command for loading the current data files according to the target database;
the consumer thread executes the batch data loading command, and then loads the currently taken data file to a target database;
and the consumer thread deletes the loaded data file, checks whether the data file to be loaded still exists in the task queue, and continues to load the next data file if the data file to be loaded exists in the task queue.
7. The data synchronization method capable of realizing data fast loading according to any one of claims 1 to 6, characterized in that at least two consumer threads are provided, and when data loading is performed, each consumer thread loads different data files in a task queue to a destination database in a parallel manner.
8. The data synchronization method for realizing data fast loading according to any one of claims 1 to 6, characterized in that the data source is a relational database, a message queue, an XML file, a KV database or a document database.
9. The data synchronization method for realizing data fast loading according to any one of claims 1 to 6, characterized in that the destination database is a relational database.
10. A data synchronization apparatus capable of fast loading data, comprising at least one processor and a memory, wherein the at least one processor and the memory are connected through a data bus, and the memory stores instructions executable by the at least one processor, and the instructions are used for completing the data synchronization method capable of fast loading data according to any one of claims 1 to 9 after being executed by the processor.
CN201910981942.5A 2019-10-16 2019-10-16 Data synchronization method and device capable of realizing rapid loading of data Pending CN111026768A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910981942.5A CN111026768A (en) 2019-10-16 2019-10-16 Data synchronization method and device capable of realizing rapid loading of data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910981942.5A CN111026768A (en) 2019-10-16 2019-10-16 Data synchronization method and device capable of realizing rapid loading of data

Publications (1)

Publication Number Publication Date
CN111026768A true CN111026768A (en) 2020-04-17

Family

ID=70205093

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910981942.5A Pending CN111026768A (en) 2019-10-16 2019-10-16 Data synchronization method and device capable of realizing rapid loading of data

Country Status (1)

Country Link
CN (1) CN111026768A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111767339A (en) * 2020-05-11 2020-10-13 北京奇艺世纪科技有限公司 Data synchronization method and device, electronic equipment and storage medium
CN112035057A (en) * 2020-07-24 2020-12-04 武汉达梦数据库有限公司 Hive file merging method and device
CN113076290A (en) * 2021-04-12 2021-07-06 百果园技术(新加坡)有限公司 File deletion method, device, equipment, system and storage medium
CN113609226A (en) * 2021-08-09 2021-11-05 平安国际智慧城市科技股份有限公司 Data export method and device, computer equipment and storage medium
CN115292420A (en) * 2022-10-10 2022-11-04 天津南大通用数据技术股份有限公司 Method and device for rapidly loading data in distributed database
CN113609226B (en) * 2021-08-09 2024-05-14 深圳平安智慧医健科技有限公司 Data export method and device, computer equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080141268A1 (en) * 2006-12-12 2008-06-12 Tirumalai Partha P Utility function execution using scout threads
CN102591725A (en) * 2011-12-20 2012-07-18 浙江鸿程计算机系统有限公司 Method for multithreading data interchange among heterogeneous databases
CN103197979A (en) * 2012-01-04 2013-07-10 阿里巴巴集团控股有限公司 Method and device for realizing data interaction access among processes
CN103488690A (en) * 2013-09-02 2014-01-01 用友软件股份有限公司 Data integrating system and data integrating method
CN105049524A (en) * 2015-08-13 2015-11-11 浙江鹏信信息科技股份有限公司 Hadhoop distributed file system (HDFS) based large-scale data set loading method
CN106156278A (en) * 2016-06-24 2016-11-23 努比亚技术有限公司 A kind of database data reading/writing method and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080141268A1 (en) * 2006-12-12 2008-06-12 Tirumalai Partha P Utility function execution using scout threads
CN102591725A (en) * 2011-12-20 2012-07-18 浙江鸿程计算机系统有限公司 Method for multithreading data interchange among heterogeneous databases
CN103197979A (en) * 2012-01-04 2013-07-10 阿里巴巴集团控股有限公司 Method and device for realizing data interaction access among processes
CN103488690A (en) * 2013-09-02 2014-01-01 用友软件股份有限公司 Data integrating system and data integrating method
CN105049524A (en) * 2015-08-13 2015-11-11 浙江鹏信信息科技股份有限公司 Hadhoop distributed file system (HDFS) based large-scale data set loading method
CN106156278A (en) * 2016-06-24 2016-11-23 努比亚技术有限公司 A kind of database data reading/writing method and device

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111767339A (en) * 2020-05-11 2020-10-13 北京奇艺世纪科技有限公司 Data synchronization method and device, electronic equipment and storage medium
CN111767339B (en) * 2020-05-11 2023-06-30 北京奇艺世纪科技有限公司 Data synchronization method and device, electronic equipment and storage medium
CN112035057A (en) * 2020-07-24 2020-12-04 武汉达梦数据库有限公司 Hive file merging method and device
CN112035057B (en) * 2020-07-24 2022-06-21 武汉达梦数据库股份有限公司 Hive file merging method and device
CN113076290A (en) * 2021-04-12 2021-07-06 百果园技术(新加坡)有限公司 File deletion method, device, equipment, system and storage medium
CN113076290B (en) * 2021-04-12 2024-01-30 百果园技术(新加坡)有限公司 File deletion method, device, equipment, system and storage medium
CN113609226A (en) * 2021-08-09 2021-11-05 平安国际智慧城市科技股份有限公司 Data export method and device, computer equipment and storage medium
CN113609226B (en) * 2021-08-09 2024-05-14 深圳平安智慧医健科技有限公司 Data export method and device, computer equipment and storage medium
CN115292420A (en) * 2022-10-10 2022-11-04 天津南大通用数据技术股份有限公司 Method and device for rapidly loading data in distributed database

Similar Documents

Publication Publication Date Title
CN111026768A (en) Data synchronization method and device capable of realizing rapid loading of data
CN106156278B (en) Database data reading and writing method and device
US9177027B2 (en) Database management system and method
US10521268B2 (en) Job scheduling method, device, and distributed system
WO2021000758A1 (en) Robotic resource task cycle management and control method and apparatus
US20080271042A1 (en) Testing multi-thread software using prioritized context switch limits
CN106960054B (en) Data file access method and device
US10031773B2 (en) Method to communicate task context information and device therefor
US20140359636A1 (en) Multi-core system performing packet processing with context switching
WO2023160092A1 (en) Method for processing blockchain transactions, and blockchain node and electronic device
WO2023160088A1 (en) Method for processing blockchain transactions, and blockchain node and electronic device
WO2018032698A1 (en) Page turning method and device, and writing terminal
CN112800026A (en) Data transfer node, method, system and computer readable storage medium
CN112395097A (en) Message processing method, device, equipment and storage medium
US20230221865A1 (en) Method, system, and device for writing compressed data to disk, and readable storage medium
CN110990169B (en) Structure and method for inter-process byte stream communication by using shared memory
CN112948136A (en) Method for implementing asynchronous log record of embedded operating system
CN108984405B (en) Performance test method, device and computer readable storage medium
US20230393782A1 (en) Io request pipeline processing device, method and system, and storage medium
CN106407020A (en) Database processing method of mobile terminal and mobile terminal thereof
CN113986546A (en) Centralized LabVIEW processing method for appointed software and hardware resources
CN114328626A (en) Multi-data source dynamic switching method, system and storage medium
CN108027727A (en) Dispatching method, device and the computer system of internal storage access instruction
CN111858002B (en) Concurrent processing method, system and device based on asynchronous IO
CN105653377B (en) A kind of internuclear means of communication of heterogeneous multi-core system and controller

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 430000 16-19 / F, building C3, future technology building, 999 Gaoxin Avenue, Donghu New Technology Development Zone, Wuhan, Hubei Province

Applicant after: Wuhan dream database Co., Ltd

Address before: 430000 16-19 / F, building C3, future technology building, 999 Gaoxin Avenue, Donghu New Technology Development Zone, Wuhan, Hubei Province

Applicant before: WUHAN DAMENG DATABASE Co.,Ltd.

CB03 Change of inventor or designer information
CB03 Change of inventor or designer information

Inventor after: Mei Gang

Inventor after: Zhang Zhuxi

Inventor before: Mei Gang

Inventor before: Fu Quan

Inventor before: Zhou Chun

Inventor before: Zhang Zhuxi