CN112015816A - Data synchronization method, device, medium and electronic equipment - Google Patents

Data synchronization method, device, medium and electronic equipment Download PDF

Info

Publication number
CN112015816A
CN112015816A CN202010880839.4A CN202010880839A CN112015816A CN 112015816 A CN112015816 A CN 112015816A CN 202010880839 A CN202010880839 A CN 202010880839A CN 112015816 A CN112015816 A CN 112015816A
Authority
CN
China
Prior art keywords
data
target
identifier
directory
synchronization
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010880839.4A
Other languages
Chinese (zh)
Inventor
李畅
罗齐
郝科
田博修
王宇飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing ByteDance Network Technology Co Ltd
Original Assignee
Beijing ByteDance Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing ByteDance Network Technology Co Ltd filed Critical Beijing ByteDance Network Technology Co Ltd
Priority to CN202010880839.4A priority Critical patent/CN112015816A/en
Publication of CN112015816A publication Critical patent/CN112015816A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present disclosure relates to a data synchronization method, apparatus, medium, and electronic device, the method comprising: acquiring data to be processed from a data source end; determining a temporary directory identifier according to a synchronous identifier of a data transmission task, and storing the data to be processed to a temporary directory according to the temporary directory identifier, wherein the synchronous identifier is used for indicating directory information which can be synchronized to a data destination; under the condition of reaching the synchronization time, determining target data meeting data synchronization conditions in the data stored in the temporary directory according to the synchronization identifier; and moving the target data to a target directory, and updating the synchronous identification, wherein the target directory is a directory corresponding to a data destination terminal. Therefore, the data to be processed can be temporarily stored in the temporary directory, and when the synchronization time is up, the target data which can be synchronized to the data destination end in the temporarily stored data of the temporary directory can be synchronized, so that the real-time synchronization of the data can be realized, and the efficiency and the real-time performance of the data synchronization are ensured.

Description

Data synchronization method, device, medium and electronic equipment
Technical Field
The present disclosure relates to the field of data processing, and in particular, to a data synchronization method, apparatus, medium, and electronic device.
Background
During the data middle station construction process, data synchronization between data sources is usually performed, for example, data at a data source end is imported into a data destination end for downstream warehouse construction and index statistics. In the related art, when data in a database is synchronized, for example, Sqoop can support data synchronization between a relational database and an HDFS, but it cannot achieve real-time data synchronization. The data synchronization is the first layer of data warehouse construction, the requirements on the accuracy and the real-time performance of data are high, and the requirements of data warehouse suggestions are difficult to meet.
Disclosure of Invention
This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
In a first aspect, the present disclosure provides a data synchronization method, including:
acquiring data to be processed from a data source end;
determining a temporary directory identifier according to a synchronous identifier of a data transmission task, and storing the data to be processed to a temporary directory according to the temporary directory identifier, wherein the synchronous identifier is used for indicating directory information which can be synchronized to a data destination;
under the condition of reaching the synchronization time, determining target data meeting data synchronization conditions in the data stored in the temporary directory according to the synchronization identifier;
and moving the target data to a target directory, and updating the synchronous identification, wherein the target directory is a directory corresponding to a data destination terminal.
In a second aspect, the present disclosure provides a data synchronization apparatus, the apparatus comprising:
the acquisition module is used for acquiring data to be processed from a data source end;
the storage module is used for determining a temporary directory identifier according to a synchronous identifier of a data transmission task and storing the data to be processed into a temporary directory according to the temporary directory identifier, wherein the synchronous identifier is used for indicating directory information which can be synchronized to a data destination;
the first determining module is used for determining target data meeting data synchronization conditions in the data stored in the temporary directory according to the synchronization identifier under the condition that synchronization time is reached;
and the synchronization module is used for moving the target data to a target directory and updating the synchronization identifier, wherein the target directory is a directory corresponding to the data destination.
In a third aspect, a computer-readable medium is provided, on which a computer program is stored which, when being executed by a processing device, carries out the steps of the method of the first aspect.
In a fourth aspect, an electronic device is provided, comprising:
a storage device having a computer program stored thereon;
processing means for executing the computer program in the storage means to carry out the steps of the method of the first aspect.
In the technical scheme, the data to be processed is acquired from the data source end and is temporarily stored in the temporary directory, and when the data to be processed is temporarily stored, the temporary directory identifier is determined according to the synchronous identifier of the data transmission task, so that the data to be processed is stored in the temporary directory according to the temporary directory identifier. Then, under the condition that the synchronization time is reached, determining target data meeting data synchronization conditions in the data stored in the temporary directory according to the synchronization identifier; and moving the target data to a target directory of a data destination end, and updating the synchronous identification so as to realize the synchronization of the data from the data source end to the data destination end. Therefore, by the technical scheme, the data to be processed can be temporarily stored in the temporary directory, and when the synchronization time is up, the target data which can be synchronized to the data destination end in the temporarily stored data of the temporary directory can be synchronized, so that on one hand, the real-time synchronization of the data can be realized, and the efficiency and the real-time performance of the data synchronization are ensured. On the other hand, the data to be processed can be ensured to be successfully synchronized and can only be successfully synchronized once when being synchronized, namely the data to be processed is only processed once, so that the data integrity and consistency of data synchronization can be ensured, and the accuracy of data synchronization is improved.
Additional features and advantages of the disclosure will be set forth in the detailed description which follows.
Drawings
The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and features are not necessarily drawn to scale. In the drawings:
FIG. 1 is a flow diagram of a data synchronization method provided according to one embodiment of the present disclosure;
FIG. 2 is a block diagram of a data synchronization apparatus provided in accordance with an embodiment of the present disclosure;
FIG. 3 is a block diagram illustrating an electronic device suitable for use in implementing embodiments of the present disclosure.
Detailed Description
Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.
It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.
The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description.
It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.
It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.
The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.
Before describing the data synchronization method provided by the present disclosure, an execution environment of the method is introduced. For example, the execution environment of the method may be implemented based on a Flink framework, and a Flink is a stream processing framework, and then a stream handler may be supported, so as to implement real-time synchronization of data.
Illustratively, the initialization operation of data synchronization may be performed by:
the input plug-in and the output plug-in can be initialized to construct the execution environment of the data transmission task. For example, plug-in names of a corresponding data source end and a corresponding data destination end may be determined according to task start configuration information generated based on a user configuration operation, the corresponding data source end and the corresponding data destination end may be determined by reflecting a class. Thereafter, the execution context of Flink may be obtained, for example, by calling streamexecution environment () method in Flink. And transmitting a data Source object DTS Source by calling a stream execution environment () method so as to access data of a data Source end subsequently, then constructing a stream processing object DataStream corresponding to the data Source object, calling an addSink () method of the stream processing object DataStream, transmitting a data destination end object DTS Sink so as to connect a data destination end subsequently, and writing data to be processed into the data destination end. In the present disclosure, data synchronization from the data source end to the data destination end can be achieved through a data transmission task, for example, a data transmission task in a Flink is constructed and submitted by calling a streamexecution environment () method, so that the data transmission task synchronizes the data of the data source end to the data destination end in real time.
The following describes the data synchronization method provided by the present disclosure in detail. Fig. 1 is a flowchart of a data synchronization method according to an embodiment of the present disclosure, where as shown in fig. 1, the method includes:
in step 11, data to be processed, i.e. data that needs to be synchronized to the data destination, is obtained from the data source.
In step 12, a temporary directory identifier is determined according to a synchronization identifier of the data transmission task, and the data to be processed is stored in the temporary directory according to the temporary directory identifier, wherein the synchronization identifier is used for indicating directory information that can be synchronized to a data destination.
In an embodiment of the present disclosure, in order to ensure complete and non-repetitive processing of data, data at the data source end may be first written into a temporary directory (dump _ temporal) to ensure consistency of data synchronization. In the embodiment of the present disclosure, data synchronization may be performed through multiple concurrent data transmission tasks, and data synchronization processes corresponding to each data transmission task are independent of each other. In this embodiment, when the data to be processed is acquired, the data is not directly synchronized to the data destination, but the data to be processed is temporarily stored in the temporary directory based on the synchronization identifier of the data transmission task, so that repeated processing of the same data and waste of transmission resources due to repeated retransmission of the data at the data source end can be avoided to a certain extent.
In step 13, in the case that the synchronization time is reached, target data satisfying the data synchronization condition among the data stored in the temporary directory is determined according to the synchronization identifier.
The synchronization time may be set according to an actual usage scenario, for example, a user may set to perform synchronization every preset time, for example, if the synchronization is performed every 5 minutes, it may be determined whether the synchronization time is reached through the clock timer, and when the clock timer determines that the time reaches 5 minutes, the data transmission task may be triggered to perform data synchronization. As described above, the acquired to-be-processed data is written into the temporary directory in real time, and therefore, when data synchronization is performed, in order to ensure accuracy of data synchronization, data that can be synchronized to the data destination end can be determined from the data stored in the temporary directory, so as to avoid the influence of the to-be-processed data written in real time.
In step 14, the target data is moved to a target directory, and the synchronization identifier is updated, where the target directory is a directory corresponding to the data destination.
For example, after the target data is determined, the target data is moved (move) to a state where the target directory is a target directory where the target data is written into a data destination, and at this time, the temporary directory no longer stores the target data, that is, the target data only stores unique data in the target directory or the temporary directory. Meanwhile, the synchronization identifier can be updated to ensure the ordered execution of the subsequent data synchronization process.
In the technical scheme, the data to be processed is acquired from the data source end and is temporarily stored in the temporary directory, and when the data to be processed is temporarily stored, the temporary directory identifier is determined according to the synchronous identifier of the data transmission task, so that the data to be processed is stored in the temporary directory according to the temporary directory identifier. Then, under the condition that the synchronization time is reached, determining target data meeting data synchronization conditions in the data stored in the temporary directory according to the synchronization identifier; and moving the target data to a target directory of a data destination end, and updating the synchronous identification so as to realize the synchronization of the data from the data source end to the data destination end. Therefore, by the technical scheme, the data to be processed can be temporarily stored in the temporary directory, and when the synchronization time is up, the target data which can be synchronized to the data destination end in the temporarily stored data of the temporary directory can be synchronized, so that on one hand, the real-time synchronization of the data can be realized, and the efficiency and the real-time performance of the data synchronization are ensured. On the other hand, the data to be processed can be ensured to be successfully synchronized and can only be successfully synchronized once when being synchronized, namely the data to be processed is only processed once, so that the data integrity and consistency of data synchronization can be ensured, and the accuracy of data synchronization is improved.
In order to make those skilled in the art understand the technical solutions provided by the embodiments of the present invention, the following detailed descriptions are provided for the above steps.
Optionally, in step 11, an exemplary implementation manner of obtaining the data to be processed from the data source end is as follows, and this step may include:
acquiring a data reading identifier corresponding to the data source end, wherein the data reading identifier is used for indicating position information of processed data of the data source end;
the data source end can record the corresponding data reading identification after the data is read, namely the data is consumed. For example, the data source end may update the data reading identifier after each time data is read, and store the data reading identifier into the status information of its corresponding data transmission task, so that when the data transmission task acquires data from the data source end, the data reading identifier may be directly obtained from the status information of the data transmission task.
And acquiring the data to be processed from the data source end by taking the data reading identifier as a starting position.
Therefore, by the technical scheme, the position information of the processed data of the data source end can be marked through the data reading identification, so that when the data is read from the data source end, the data can be read from the position of the processed data, the data can be prevented from being read in an omission mode, and the integrity of the data in the data synchronization process is ensured; the data can be prevented from being read repeatedly, accurate data support is provided for subsequent data synchronization, the complexity of a subsequent data synchronization process is reduced, the data are prevented from being repeated and lost, and the safety and the efficiency of data synchronization are improved.
Optionally, the synchronization identifier is an incremental identifier, that is, the updated synchronization identifier is larger than the synchronization identifier before updating, for example, the synchronization identifier cp _ id may be initialized to 0, and an increment operation is performed on the synchronization identifier each time a data synchronization operation is performed to update the synchronization identifier.
For example, when the temporary directory identifier is determined according to the synchronization identifier of the data transmission task in step 12, the result of adding one to the synchronization identifier may be used as the temporary directory identifier P _ id, and when the data to be processed is temporarily stored, the data may be stored in the directory after the synchronization identifier. For example, the to-be-processed data may be stored in combination with a set path of the temporary directory ({ base _ path }/_ dump _ temporal), such as storing the to-be-processed data in a storage path { base _ path }/_ dump _ temporal/{ p _ id } determined according to the temporary directory identifier.
Then in step 13, an exemplary implementation manner of determining, according to the synchronization identifier, target data that satisfies the data synchronization condition in the data stored in the temporary directory is as follows, and this step may include:
and determining data of which the temporary directory identifier is less than or equal to the synchronous identifier in the data stored in the temporary directory as the target data.
Exemplarily, when the temporary directory identifier is determined according to the synchronization identifier of the data transmission task, a result of an operation of adding one to the synchronization identifier is taken as the temporary directory identifier P _ id, and when the current synchronization identifier cp _ id is 10, the determined temporary directory identifier P _ id is 11, and the to-be-processed data acquired at this time is stored in { base _ path }/_ dump _ temporal/{ P _ id ═ 11 }. In the process of real-time data synchronization, the temporary directory stores data to be processed in real time, so when determining target data in the temporary directory, the data stored in the temporary directory, that is, the data with the temporary directory identifier less than or equal to the synchronization identifier, can be determined as the target data, that is, the data with the p _ id less than or equal to 10 in the temporary directory is determined as the target data, thereby ensuring the integrity of the part of data.
Therefore, by the technical scheme, the target data which can be synchronized to the data destination end in the temporary directory can be quickly and directly determined, accurate data support can be provided for the realization that the data at the data source end is processed only once, and the accuracy and convenience of data synchronization are ensured.
Optionally, the target directory comprises a plurality of files; another exemplary implementation manner of moving the target data satisfying the data synchronization condition among the data stored in the temporary directory to the target directory in step 13 is as follows, which may include:
and determining a target file for storing the target data under the target directory.
As an example, the size of the file in the target directory, that is, the size of the data amount that can be stored in each file, may be set in advance, and when determining the target file for storing the target data in the target directory, the file that is in the plurality of files in the target directory, and has the remaining stored data amount greater than the data amount of the target data and the data amount closest to the data amount of the target data may be determined as the target file. For example, if the data size of the target data is 3M, and the remaining storage data sizes of the plurality of files A, B, C under the target directory are 2M, 5M, and 10M, respectively, the file B may be determined as the target file for storing the target data, which not only can ensure the safe storage of the target data, but also can improve the utilization rate of the files under the target directory, and avoid too many file fragments. If the file in the current target directory does not have a file meeting the above condition, a new file can be created in the target directory, and the newly created file is used as the target file.
And then, moving the target data to the target file, thereby realizing the synchronization of the target data from the data source end to the data destination end.
Accordingly, the method may further comprise:
and closing the file handle of the target file under the condition that the data volume of the stored data of the target file exceeds a data volume threshold value.
The data volume threshold can be set according to an actual use scene, and can be used for controlling the size of a single file in a target directory, so that the problems that the data writing efficiency is reduced and the downstream use is inconvenient due to the fact that the file is too large are avoided. In this embodiment, after the target data is stored in the target file, the data size of the stored data of the target file may be obtained, and then it is determined whether the data size exceeds the data size threshold, and if the data size is greater than or equal to the data size threshold, the file handle of the target file may be closed, that is, the data is not being stored in the target file. If the data size is less than the data size threshold, it indicates that there is still space left in the target file for storing data, and the file handle may be reserved for subsequent writing of the target data.
Therefore, by the scheme, the automatic segmentation of the files in the target directory of the data destination can be realized, the generation of overlarge files in the data synchronization process is avoided, the management and data query of the files in the target directory are facilitated, the data writing efficiency can be ensured, and the data synchronization efficiency and safety are further improved.
Optionally, the method may further include:
and determining the last access time of each file handle in the use state at preset time intervals.
The preset time may be set according to an actual usage scenario, which is not limited by the present disclosure. In file I/O, to read data from a file, an application first calls an operating system function and passes the file name and selects a path to the file to open the file. The function retrieves a sequence number, the file handle, which is the unique identification for the open file. Thus, for each data transfer task, each file handle in use can be obtained by obtaining the file handle of each file it opens, and then obtaining the last access time of each file handle. Illustratively, the last access time of the file handle may be obtained by the TimeService method in Flink.
And then closing the file handle when the time difference between the last access time of the file handle and the current time exceeds a time threshold.
When the time difference between the last access time of the file handle and the current processing time exceeds the time threshold, the file handle is not accessed for a long time, namely the possibility of accessing the file corresponding to the file handle again is low, and the file handle can be closed at the moment, so that the number of the file handles which are processed by the data transmission task at the same time can be effectively reduced, the validity of the file handle of the data transmission task can be ensured to a certain extent, and the safety and the efficiency of data synchronization are ensured. In addition, the risk of memory overflow OOM when too many file handles are processed simultaneously by the data transmission task can be effectively reduced.
Illustratively, during the process of data staging, data synchronization between heterogeneous data sources is generally performed, for example, data of an MQ (Message Queue, such as Kafka and rockmq) is imported into Hive for downstream warehouse construction and index statistics. And the data types supported by different data sources may be different, and the data in the MQ may be data in the formats of JSON, PB (Protocol Buffer), Msgpack, and the like. To address the issue of data type conversion between different data sources, the present disclosure also provides the following embodiments.
Optionally, the method may further include:
and analyzing the data to be processed, and performing type conversion on the analyzed data to obtain the data to be processed converted into the target type.
The data types can be divided into basic types and composite types, wherein the basic types can include types such as Bool, Bytes, Double, String, and Long, and the composite types can include List, Map, and the like, wherein the data of the composite types support nesting, such as composite type List < T >, and T can also be data of the composite type. The definition of the above data types is well known in the art and will not be described herein.
In the embodiment of the present disclosure, in order to support the conversion of data types in different formats, an abstract class is first defined, and a plurality of abstract methods for type conversion, such as asBool (), asBytes (), asDate (), asDouble (), asString (), asLong () and the like, may be defined in the abstract class. The specific implementation method for data type conversion in each format may be configured according to an actual usage scenario, which is not limited in this disclosure.
Therefore, in this embodiment, after the data to be processed is obtained, the data to be processed may be analyzed, and then the type conversion may be performed on the data to be processed according to the type conversion method corresponding to the format of the data to be processed, so that the data received from different data source ends may be converted into the same type for storage, which is convenient for uniform processing of the data, and the application range of the data synchronization method is expanded.
Correspondingly, the storing the data to be processed to a temporary directory according to the temporary directory identifier includes:
and storing the data to be processed of the target type to the temporary directory according to the temporary directory identifier.
In this embodiment, the data at the heterogeneous data source end can be subjected to type conversion and uniformly converted into the data of the target type, so that the data of the target type is stored in the temporary directory, which is convenient for writing the data of the file in the temporary directory and uniformly managing the file on the one hand, and on the other hand, when the target data in the temporary directory is subsequently moved to the target directory, the type conversion can be directly performed based on the target data, so that the data type required by the data destination end is obtained, the reliability and stability of data synchronization are ensured, meanwhile, data compatibility between heterogeneous data sources can be realized, the application range of the data synchronization method is further expanded, and the user experience is improved.
In an actual usage scenario, during data synchronization, a data transmission task may be restarted due to a failure or the like, and in this case, stability of data synchronization needs to be ensured as well, based on which the present disclosure provides the following embodiments.
Optionally, when the data transmission task moves target data, which satisfies the data synchronization condition, in the data stored in the temporary directory to the target directory, the synchronization identifier at this time may be stored in the state information of the data transmission task, and the method may further include:
and when the data transmission task is restarted, moving target data meeting the data synchronization condition in the data stored in the temporary directory to a target directory according to the synchronization identifier, and updating the synchronization identifier.
For example, when the data transmission task is restarted, the synchronization identifier may be directly obtained from the state information of the data transmission task, so as to perform data synchronization on the target data in the temporary directory, where a manner of performing data synchronization in this embodiment is the same as that of performing data synchronization when the synchronization time is reached, and details are not described here again.
Therefore, by the technical scheme, after the data transmission task fails or is interrupted, the state of the data transmission task can be recovered based on the state information of the data transmission task when the data transmission task is restarted, so that the accuracy of data synchronization is ensured, and the stability of the data synchronization method is ensured.
The present disclosure also provides a data synchronization apparatus, as shown in fig. 2, the apparatus 10 includes:
an obtaining module 100, configured to obtain data to be processed from a data source end;
a storage module 200, configured to determine a temporary directory identifier according to a synchronization identifier of a data transmission task, and store the to-be-processed data in a temporary directory according to the temporary directory identifier, where the synchronization identifier is used to indicate directory information that can be synchronized to a data destination;
a first determining module 300, configured to determine, according to the synchronization identifier, target data that meets a data synchronization condition in the data stored in the temporary directory when the synchronization time is reached;
and a synchronization module 400, configured to move the target data to a target directory, and update the synchronization identifier, where the target directory is a directory corresponding to a data destination.
Optionally, the synchronization identifier is an incremental identifier;
the first determination module is to:
and determining data of which the temporary directory identifier is less than or equal to the synchronous identifier in the data stored in the temporary directory as the target data.
Optionally, the target directory comprises a plurality of files;
the synchronization module includes:
the first determining submodule is used for determining a target file for storing the target data under the target directory;
a synchronization submodule for moving the target data to the target file;
the device further comprises:
and the first processing module is used for closing the file handle of the target file under the condition that the data volume of the stored data of the target file exceeds a data volume threshold value.
Optionally, the target directory comprises a plurality of files;
the device further comprises:
the second determining module is used for determining the last access time of each file handle in the use state every preset time interval;
and the second processing module is used for closing the file handle when the time difference between the last access time of the file handle and the current time exceeds a time threshold.
Optionally, the apparatus further comprises:
the conversion module is used for carrying out data analysis on the data to be processed and carrying out type conversion on the data obtained by analysis so as to obtain the data to be processed converted into a target type;
the memory module includes:
and the storage submodule is used for storing the data to be processed of the target type into the temporary directory according to the temporary directory identification.
Optionally, the obtaining module includes:
a first obtaining submodule, configured to obtain a data reading identifier corresponding to the data source, where the data reading identifier is used to indicate location information of processed data of the data source;
and the second obtaining submodule is used for obtaining the data to be processed from the data source end by taking the data reading identifier as a starting position.
Referring now to FIG. 3, a block diagram of an electronic device 600 suitable for use in implementing embodiments of the present disclosure is shown. The terminal device in the embodiments of the present disclosure may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a vehicle terminal (e.g., a car navigation terminal), and the like, and a stationary terminal such as a digital TV, a desktop computer, and the like. The electronic device shown in fig. 3 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 3, electronic device 600 may include a processing means (e.g., central processing unit, graphics processor, etc.) 601 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage means 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data necessary for the operation of the electronic apparatus 600 are also stored. The processing device 601, the ROM 602, and the RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
Generally, the following devices may be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 607 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 608 including, for example, tape, hard disk, etc.; and a communication device 609. The communication means 609 may allow the electronic device 600 to communicate with other devices wirelessly or by wire to exchange data. While fig. 3 illustrates an electronic device 600 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 609, or may be installed from the storage means 608, or may be installed from the ROM 602. The computer program, when executed by the processing device 601, performs the above-described functions defined in the methods of the embodiments of the present disclosure.
It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
In some embodiments, the clients, servers may communicate using any currently known or future developed network Protocol, such as HTTP (HyperText Transfer Protocol), and may interconnect with any form or medium of digital data communication (e.g., a communications network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.
The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.
The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring data to be processed from a data source end; determining a temporary directory identifier according to a synchronous identifier of a data transmission task, and storing the data to be processed to a temporary directory according to the temporary directory identifier, wherein the synchronous identifier is used for indicating directory information which can be synchronized to a data destination; under the condition of reaching the synchronization time, determining target data meeting data synchronization conditions in the data stored in the temporary directory according to the synchronization identifier; and moving the target data to a target directory, and updating the synchronous identification, wherein the target directory is a directory corresponding to a data destination terminal.
Computer program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including but not limited to an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present disclosure may be implemented by software or hardware. The name of a module does not in some cases form a limitation of the module itself, and for example, the obtaining module may also be described as a "module for obtaining data to be processed from a data source terminal".
The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
Example 1 provides a data synchronization method according to one or more embodiments of the present disclosure, wherein the method includes:
acquiring data to be processed from a data source end;
determining a temporary directory identifier according to a synchronous identifier of a data transmission task, and storing the data to be processed to a temporary directory according to the temporary directory identifier, wherein the synchronous identifier is used for indicating directory information which can be synchronized to a data destination;
under the condition of reaching the synchronization time, determining target data meeting data synchronization conditions in the data stored in the temporary directory according to the synchronization identifier;
and moving the target data to a target directory, and updating the synchronous identification, wherein the target directory is a directory corresponding to a data destination terminal.
Example 2 provides the method of example 1, wherein the synchronization identifier is an incremental identifier;
the determining, according to the synchronization identifier, target data that meets a data synchronization condition in the data stored in the temporary directory includes:
and determining data of which the temporary directory identifier is less than or equal to the synchronous identifier in the data stored in the temporary directory as the target data.
Example 3 provides the method of example 1, wherein the target directory includes a plurality of files;
the moving the target data to a target directory comprises:
determining a target file for storing the target data under the target directory;
moving the target data to the target file;
the method further comprises the following steps:
and closing the file handle of the target file under the condition that the data volume of the stored data of the target file exceeds a data volume threshold value.
Example 4 provides the method of example 1, wherein the target directory includes a plurality of files;
the method further comprises the following steps:
determining the last access time of each file handle in the use state at intervals of preset time;
and closing the file handle when the time difference between the last access time of the file handle and the current time exceeds a time threshold.
Example 5 provides the method of example 1, wherein the method further comprises:
performing data analysis on the data to be processed, and performing type conversion on the data obtained by analysis to obtain the data to be processed converted into a target type;
the storing the data to be processed to a temporary directory according to the temporary directory identifier includes:
and storing the data to be processed of the target type to the temporary directory according to the temporary directory identifier.
Example 6 provides the method of example 1, wherein the obtaining the data to be processed from the data source includes:
acquiring a data reading identifier corresponding to the data source end, wherein the data reading identifier is used for indicating position information of processed data of the data source end;
and acquiring the data to be processed from the data source end by taking the data reading identifier as a starting position.
Example 7 provides a data synchronization apparatus according to one or more embodiments of the present disclosure, wherein the apparatus includes:
the acquisition module is used for acquiring data to be processed from a data source end;
the storage module is used for determining a temporary directory identifier according to a synchronous identifier of a data transmission task and storing the data to be processed into a temporary directory according to the temporary directory identifier, wherein the synchronous identifier is used for indicating directory information which can be synchronized to a data destination;
the first determining module is used for determining target data meeting data synchronization conditions in the data stored in the temporary directory according to the synchronization identifier under the condition that synchronization time is reached;
and the synchronization module is used for moving the target data to a target directory and updating the synchronization identifier, wherein the target directory is a directory corresponding to the data destination.
Example 8 provides the apparatus of example 7, wherein the synchronization identifier is an incremental identifier;
the first determination module is to:
and determining data of which the temporary directory identifier is less than or equal to the synchronous identifier in the data stored in the temporary directory as the target data.
Example 9 provides a computer readable medium having stored thereon a computer program that, when executed by a processing apparatus, performs the steps of the method of any of examples 1-6, in accordance with one or more embodiments of the present disclosure.
Example 10 provides, in accordance with one or more embodiments of the present disclosure, an electronic device comprising:
a storage device having a computer program stored thereon;
processing means for executing said computer program in said storage means to carry out the steps of the method of any of examples 1-6.
The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.
Further, while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims. With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

Claims (10)

1. A method for synchronizing data, the method comprising:
acquiring data to be processed from a data source end;
determining a temporary directory identifier according to a synchronous identifier of a data transmission task, and storing the data to be processed to a temporary directory according to the temporary directory identifier, wherein the synchronous identifier is used for indicating directory information which can be synchronized to a data destination;
under the condition of reaching the synchronization time, determining target data meeting data synchronization conditions in the data stored in the temporary directory according to the synchronization identifier;
and moving the target data to a target directory, and updating the synchronous identification, wherein the target directory is a directory corresponding to a data destination terminal.
2. The method of claim 1, wherein the synchronization identifier is an incremental identifier;
the determining, according to the synchronization identifier, target data that meets a data synchronization condition in the data stored in the temporary directory includes:
and determining data of which the temporary directory identifier is less than or equal to the synchronous identifier in the data stored in the temporary directory as the target data.
3. The method of claim 1, wherein the target directory comprises a plurality of files;
the moving the target data to a target directory comprises:
determining a target file for storing the target data under the target directory;
moving the target data to the target file;
the method further comprises the following steps:
and closing the file handle of the target file under the condition that the data volume of the stored data of the target file exceeds a data volume threshold value.
4. The method of claim 1, wherein the target directory comprises a plurality of files;
the method further comprises the following steps:
determining the last access time of each file handle in the use state at intervals of preset time;
and closing the file handle when the time difference between the last access time of the file handle and the current time exceeds a time threshold.
5. The method of claim 1, further comprising:
performing data analysis on the data to be processed, and performing type conversion on the data obtained by analysis to obtain the data to be processed converted into a target type;
the storing the data to be processed to a temporary directory according to the temporary directory identifier includes:
and storing the data to be processed of the target type to the temporary directory according to the temporary directory identifier.
6. The method of claim 1, wherein the obtaining the data to be processed from the data source comprises:
acquiring a data reading identifier corresponding to the data source end, wherein the data reading identifier is used for indicating position information of processed data of the data source end;
and acquiring the data to be processed from the data source end by taking the data reading identifier as a starting position.
7. A data synchronization apparatus, the apparatus comprising:
the acquisition module is used for acquiring data to be processed from a data source end;
the storage module is used for determining a temporary directory identifier according to a synchronous identifier of a data transmission task and storing the data to be processed into a temporary directory according to the temporary directory identifier, wherein the synchronous identifier is used for indicating directory information which can be synchronized to a data destination;
the first determining module is used for determining target data meeting data synchronization conditions in the data stored in the temporary directory according to the synchronization identifier under the condition that synchronization time is reached;
and the synchronization module is used for moving the target data to a target directory and updating the synchronization identifier, wherein the target directory is a directory corresponding to the data destination.
8. The apparatus of claim 7, wherein the synchronization flag is an incremental flag;
the first determination module is to:
and determining data of which the temporary directory identifier is less than or equal to the synchronous identifier in the data stored in the temporary directory as the target data.
9. A computer-readable medium, on which a computer program is stored, characterized in that the program, when being executed by processing means, carries out the steps of the method of any one of claims 1 to 6.
10. An electronic device, comprising:
a storage device having a computer program stored thereon;
processing means for executing the computer program in the storage means to carry out the steps of the method according to any one of claims 1 to 6.
CN202010880839.4A 2020-08-27 2020-08-27 Data synchronization method, device, medium and electronic equipment Pending CN112015816A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010880839.4A CN112015816A (en) 2020-08-27 2020-08-27 Data synchronization method, device, medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010880839.4A CN112015816A (en) 2020-08-27 2020-08-27 Data synchronization method, device, medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN112015816A true CN112015816A (en) 2020-12-01

Family

ID=73502531

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010880839.4A Pending CN112015816A (en) 2020-08-27 2020-08-27 Data synchronization method, device, medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN112015816A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112732679A (en) * 2021-01-20 2021-04-30 维沃移动通信有限公司 Data migration method and device, electronic equipment and readable storage medium
CN113297217A (en) * 2021-05-20 2021-08-24 广州光点信息科技有限公司 Data transmission method, device and system
CN113918238A (en) * 2021-09-27 2022-01-11 中盈优创资讯科技有限公司 Flink-based heterogeneous data source synchronization method and device

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090271451A1 (en) * 2008-04-28 2009-10-29 Microsoft Corporation On-Demand Access to Container File Directories
CN104572122A (en) * 2015-01-28 2015-04-29 中国工商银行股份有限公司 Software application data generating device and method
US9146868B1 (en) * 2013-01-17 2015-09-29 Symantec Corporation Systems and methods for eliminating inconsistencies between backing stores and caches
CN105630779A (en) * 2014-10-27 2016-06-01 杭州海康威视系统技术有限公司 Hadoop distributed file system based small file storage method and apparatus
CN107395763A (en) * 2017-08-30 2017-11-24 郑州云海信息技术有限公司 A kind of method, service end and the system of multi-client synchronization process file
CN107992504A (en) * 2016-10-26 2018-05-04 中兴通讯股份有限公司 A kind of document handling method and device
CN109101622A (en) * 2018-08-10 2018-12-28 北京奇虎科技有限公司 Method of data synchronization, calculates equipment and computer storage medium at device
CN109299072A (en) * 2018-10-16 2019-02-01 郑州云海信息技术有限公司 A kind of method and device that database migrates online
CN110032478A (en) * 2018-01-11 2019-07-19 中兴通讯股份有限公司 Method, device and system for real-time synchronization of data of main and standby centers and storage medium
CN110855735A (en) * 2019-09-20 2020-02-28 广州亚美信息科技有限公司 Project publishing method, device and system and computer equipment
CN111581030A (en) * 2020-05-13 2020-08-25 上海英方软件股份有限公司 Data synchronization system and method based on difference data

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090271451A1 (en) * 2008-04-28 2009-10-29 Microsoft Corporation On-Demand Access to Container File Directories
US9146868B1 (en) * 2013-01-17 2015-09-29 Symantec Corporation Systems and methods for eliminating inconsistencies between backing stores and caches
CN105630779A (en) * 2014-10-27 2016-06-01 杭州海康威视系统技术有限公司 Hadoop distributed file system based small file storage method and apparatus
CN104572122A (en) * 2015-01-28 2015-04-29 中国工商银行股份有限公司 Software application data generating device and method
CN107992504A (en) * 2016-10-26 2018-05-04 中兴通讯股份有限公司 A kind of document handling method and device
CN107395763A (en) * 2017-08-30 2017-11-24 郑州云海信息技术有限公司 A kind of method, service end and the system of multi-client synchronization process file
CN110032478A (en) * 2018-01-11 2019-07-19 中兴通讯股份有限公司 Method, device and system for real-time synchronization of data of main and standby centers and storage medium
CN109101622A (en) * 2018-08-10 2018-12-28 北京奇虎科技有限公司 Method of data synchronization, calculates equipment and computer storage medium at device
CN109299072A (en) * 2018-10-16 2019-02-01 郑州云海信息技术有限公司 A kind of method and device that database migrates online
CN110855735A (en) * 2019-09-20 2020-02-28 广州亚美信息科技有限公司 Project publishing method, device and system and computer equipment
CN111581030A (en) * 2020-05-13 2020-08-25 上海英方软件股份有限公司 Data synchronization system and method based on difference data

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112732679A (en) * 2021-01-20 2021-04-30 维沃移动通信有限公司 Data migration method and device, electronic equipment and readable storage medium
CN113297217A (en) * 2021-05-20 2021-08-24 广州光点信息科技有限公司 Data transmission method, device and system
CN113297217B (en) * 2021-05-20 2021-12-17 广州光点信息科技有限公司 Data transmission method, device and system
CN113918238A (en) * 2021-09-27 2022-01-11 中盈优创资讯科技有限公司 Flink-based heterogeneous data source synchronization method and device

Similar Documents

Publication Publication Date Title
CN110708237A (en) Message interaction method and device, readable medium and electronic equipment
CN112015816A (en) Data synchronization method, device, medium and electronic equipment
CN110909521B (en) Online document information synchronous processing method and device and electronic equipment
CN113395353B (en) File downloading method and device, storage medium and electronic equipment
CN111309747A (en) Data synchronization method, system and device
CN112256733A (en) Data caching method and device, electronic equipment and computer readable storage medium
CN110795446A (en) List updating method and device, readable medium and electronic equipment
CN112035529A (en) Caching method and device, electronic equipment and computer readable storage medium
CN111163336B (en) Video resource pushing method and device, electronic equipment and computer readable medium
CN113760536A (en) Data caching method and device, electronic equipment and computer readable medium
CN111857720A (en) Method and device for generating user interface state information, electronic equipment and medium
CN110928715A (en) Method, device, medium and electronic equipment for prompting error description information
CN111309366B (en) Method, device, medium and electronic equipment for managing registration core
CN112015746B (en) Data real-time processing method, device, medium and electronic equipment
CN112418389A (en) Data processing method and device, electronic equipment and computer readable storage medium
CN111444457B (en) Data release method and device, storage medium and electronic equipment
CN112507676B (en) Method and device for generating energy report, electronic equipment and computer readable medium
CN114785770A (en) Mirror layer file sending method and device, electronic equipment and computer readable medium
CN114035861A (en) Cluster configuration method and device, electronic equipment and computer readable medium
CN111628913B (en) Online time length determining method and device, readable medium and electronic equipment
CN111581930A (en) Online form data processing method and device, electronic equipment and readable medium
CN111787043A (en) Data request method and device
CN115225586B (en) Data packet transmitting method, device, equipment and computer readable storage medium
CN112311840A (en) Multi-terminal data synchronization method, device, equipment and medium
CN115993942B (en) Data caching method, device, electronic equipment and computer readable medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20201201