WO2022063284A1 - 数据同步方法、装置、设备及计算机可读介质 - Google Patents

数据同步方法、装置、设备及计算机可读介质 Download PDF

Info

Publication number
WO2022063284A1
WO2022063284A1 PCT/CN2021/120830 CN2021120830W WO2022063284A1 WO 2022063284 A1 WO2022063284 A1 WO 2022063284A1 CN 2021120830 W CN2021120830 W CN 2021120830W WO 2022063284 A1 WO2022063284 A1 WO 2022063284A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
target
data storage
storage
update
Prior art date
Application number
PCT/CN2021/120830
Other languages
English (en)
French (fr)
Inventor
孙亮
Original Assignee
京东科技控股股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 京东科技控股股份有限公司 filed Critical 京东科技控股股份有限公司
Publication of WO2022063284A1 publication Critical patent/WO2022063284A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2365Ensuring data consistency and integrity
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor

Definitions

  • the present disclosure generally relates to the technical field of data processing, and more particularly, to a data synchronization method, apparatus, device, and computer-readable medium.
  • Data synchronization refers to synchronizing data from one storage medium to another.
  • the two storage mediums may be the same or different, such as from MySQL to Elasticsearch, from MySQL to HBase, from MySQL to Elasticsearch and HBase, etc.
  • Data synchronization can be internal Business logic trigger or external trigger.
  • External trigger For example, a scheduled task at 3 am every day synchronizes the full data of a MySQL table to an index in Elasticsearch.
  • the present disclosure relates to a data synchronization method, which includes: monitoring a target transaction log of a first data storage terminal, the first data storage terminal is used for storing data generated by the operation of the business system, and the target transaction log is used to record the data generated by the first data storage terminal.
  • the target transaction log is updated, extract the target update data in the target transaction log; and write the target update data into the second data storage terminal according to the preset configuration strategy, so as to store the target update data Synchronized to the second data storage end.
  • the method before writing the target update data into the second data storage terminal according to the preset configuration policy, the method further includes setting the preset configuration policy as follows: determining the target data from the plurality of candidate data storage terminals The storage end is used as the second data storage end; and a target service cluster matching the second data storage end is configured, and a target index matching the second data storage end is established.
  • the method further includes: establishing a mapping relationship between the target source data and the target storage data, where the target source data is the target update data in the first data storage end, and the target storage data is synchronized to data at the second data storage end; and using the memory management system to load the mapping relationship into the memory.
  • establishing the mapping relationship between the target source data and the target storage data includes: determining the storage format, storage path, and version control fields of the target storage data; and using a target expression language to convert the target source data according to the storage format, The storage path and version control fields are encoded.
  • writing the target update data into the second data storage terminal according to a preset configuration strategy includes: converting the target update data into target storage data according to a mapping relationship; and storing the target storage data in the second data storage terminal .
  • writing the target update data into the second data storage terminal according to the preset configuration strategy further includes: in the case of synchronizing the stock data, determining the current version field of the stock data, and the target update data includes stock data; And if the second data storage end does not find a version control field larger than the current version field, the existing data is stored in the second data storage end according to the current version field.
  • the method when an exception occurs when the target update data is written to the second data storage terminal according to a preset configuration policy, the method further includes: using a first function to capture the exception of the second function, the first function It is an outer function of the second function, and the second function is used to write the target update data into the second data storage terminal according to the preset configuration strategy; and continue to use the first function to write the target update data into the second data storage terminal according to the preset configuration strategy. Data storage side until the exception is eliminated.
  • the present disclosure relates to a data synchronization device, which includes: a log monitoring module configured to monitor a target transaction log of a first data storage terminal, the first data storage terminal is used for storing data generated by the operation of the business system, and the target transaction log is used for Record the data update information generated by the first data storage end; the data extraction module is configured to extract the target update data in the target transaction log when it is detected that the target transaction log is updated; and the data synchronization module is configured to update the target The data is written into the second data storage end according to the preset configuration strategy, so as to synchronize the target update data to the second data storage end.
  • the present disclosure relates to an electronic device comprising a memory, a processor, a communication interface and a communication bus, wherein the memory stores a computer program executable on the processor, the memory and the processor communicate through the communication bus and the communication interface, The above method is implemented when the processor executes the computer program.
  • the present disclosure relates to a computer-readable medium having non-volatile program code executable by a processor, the program code causing the processor to perform the above-described method.
  • the technical solution of some embodiments of the present disclosure is to monitor the target transaction log of the first data storage terminal, the first data storage terminal is used to store the data generated by the operation of the business system, and the target transaction log is used to record the data update generated by the first data storage terminal. information; when it is detected that the target transaction log is updated, extract the target update data in the target transaction log; and write the target update data into the second data storage terminal according to the preset configuration strategy, so as to synchronize the target update data to the first Two data storage terminals.
  • Some embodiments of the present disclosure can determine which data has been updated and the update method through the binary log file of the source data storage side, so that the target data storage side can directly perform data synchronization without the instruction of the service function module, and completely synchronize the data synchronization with
  • the decoupling of business functions facilitates the maintenance and iteration of the two systems without affecting the stability of each other, and can also ensure the final consistency of data through the retry mechanism.
  • FIG. 1 is a schematic diagram of a hardware environment of an optional data synchronization method provided according to an embodiment of the present disclosure
  • FIG. 2 is a flowchart of an optional data synchronization method provided according to an embodiment of the present disclosure
  • FIG. 3 is a flowchart of an optional configuration method provided according to an embodiment of the present disclosure.
  • FIG. 6 is a flowchart of an optional data synchronization method provided according to an embodiment of the present disclosure.
  • FIG. 7 is a block diagram of an optional data synchronization apparatus provided according to an embodiment of the present disclosure.
  • FIG. 8 is a schematic structural diagram of an optional electronic device provided by an embodiment of the present disclosure.
  • binlog used for MySQL master-slave synchronization, which records all operations performed on the MySQL database (excluding operations such as SELECT and SHOW). If the operation itself does not cause the database to change, the operation will also be written to the binary log file. middle.
  • Elasticsearch is a Lucene-based search server. It provides a full-text search engine with distributed multi-user capabilities, based on a RESTful web interface, developed in the Java language, and released as open source under the terms of the Apache license, a popular enterprise-level search engine.
  • HBase A distributed, column-oriented open source database, the technology is derived from the Google paper "Bigtable: A Distributed Storage System for Structured Data" written by Fay Chang. Just as Bigtable leverages the distributed data storage provided by Google's File System, HBase provides Bigtable-like capabilities on top of Hadoop and is a sub-project of Apache's Hadoop project.
  • Spring Expression Language which is the expression language provided by spring, which can query and manipulate data at runtime, and provide a wealth of calculation and operation methods to facilitate configuration operations in the program.
  • the data to be synchronized is generally written in a double way.
  • the business module writes data to MySQL, because the data needs to be synchronized to Elasticsearch, and the business module calls the api to write to Elasticsearch after writing MySQL successfully.
  • This method often makes the business function and the data synchronization function have a high coupling, and the business function and the data synchronization function are coupled together, which complicates the logic of the business system, increases the difficulty of subsequent development and dimensions, and will lead to business System functions and data synchronization functions interact with each other, resulting in reduced availability.
  • the related technologies also have the following problems:
  • the related technical solutions are all implemented by hard coding, so the relationship between tables and field mapping are fixed.
  • the original requirement is that the table1 table of MySQL is synchronized to the index1 index of Elasticsearch, and the fields of MySQL and Elasticsearch are in one-to-one correspondence. , is not easy to expand.
  • the related technical solution does not consider the synchronization of existing data, that is, only the data generated after going online is synchronized, and the existing data generated before going online is not considered.
  • the above data synchronization method can be applied to the hardware environment composed of the terminal 101 and the server 102 as shown in FIG. 1 .
  • the server 102 is connected to the terminal 101 through the network, which can be used to provide services for the terminal or the client installed on the terminal, and a database 103 can be set on the server or independent of the server to provide data storage for the server 102 Services
  • the above-mentioned network includes but is not limited to: wide area network, metropolitan area network or local area network
  • the terminal 101 includes but is not limited to PC, mobile phone, tablet computer, etc.
  • the data synchronization method in an embodiment of the present disclosure may be executed by the server 102, or may be executed jointly by the server 102 and the terminal 101. As shown in FIG. 2, the method may include S201 to S203.
  • Step S201 monitor the target transaction log of the first data storage terminal, the first data storage terminal is used to store data generated by the operation of the business system, and the target transaction log is used to record the data update information generated by the first data storage terminal.
  • the first data storage end may be the source data storage end, that is, the end that needs to synchronize data
  • the second data storage end may be the target data storage end, that is, the end that is to synchronize data.
  • the data that needs to be synchronized in the source data storage end is the target source data
  • the data synchronized to the target data storage end is the target storage data.
  • the target source data and the target storage data can be completely consistent data, or data synchronization can be performed according to the actual situation or according to the configuration policy.
  • data access before data synchronization can be achieved through the target transaction log.
  • the target transaction log can be a database log type. Taking MySQL as an example, there are generally the following types of logs in MySQL:
  • An error log which records problems encountered when starting, running, or stopping MySQL
  • binary log used to record statements that change data
  • a relay log which replicates data changes received by the primary database
  • Slow query log which records all queries whose execution time exceeds the query time threshold or queries that do not use indexes.
  • a binary log may be used as the target transaction log, which records in the form of events changes to data in the database, as well as the elapsed time of statement execution.
  • the binary log format types can be STATEMENT, ROW and MIXED.
  • STATEMENT that is, based on SQL statement replication, records the modified SQL statement.
  • the advantage is that the log file is small, saves input/output (IO) resources, and has high performance.
  • the disadvantage is that only execution statements are recorded, so that these statements can be used from To run correctly on the database, it is also necessary to record some relevant information when each statement is executed to ensure that all statements can get the same results from the database and when executed in the main database.
  • ROW that is, row-based replication
  • this type does not record the context-related information of the SQL statement, but only saves which record is modified.
  • the advantage is that the ROW-based log content will clearly record the details of each row of data modification. And there is no problem that the stored procedure, or function, and trigger calls and triggers cannot be copied correctly in some specific cases.
  • the disadvantage is that all executed statements will be recorded in each line when they are recorded in the log. changes to log, which may generate a large amount of log content.
  • MIXED that is, mixed mode replication of STATEMENT and ROW.
  • the general statement modification uses the STATEMENT format to save the binary log. For example, for some functions, if the STATEMENT cannot complete the master-slave replication operation, the binary log is saved in the ROW format. MySQL will distinguish according to each specific SQL statement executed. The log format to be recorded, that is, choose one between STATEMENT and ROW.
  • a corresponding format can be selected for data synchronization processing according to the binary log file format adopted in the source data storage end.
  • Step S202 in the case of detecting that the target transaction log is updated, extract the target update data in the target transaction log.
  • the binary log that is, the target transaction log
  • the binary log file of the source data storage side ie, the first data storage side
  • the source data storage side can be monitored.
  • the binary log file of the storage side is updated, it means that the data of the source data storage side has changed.
  • the data update record can be extracted only from the binary log file of the source data storage side, and The updated data is extracted, so that the subsequent steps of synchronizing the updated data from the source data storage end to the target data storage end can be performed.
  • Step S203 Write the target update data into the second data storage end according to the preset configuration policy, so as to synchronize the target update data to the second data storage end.
  • the extracted update data may be synchronized to the target data storage end according to the configuration policy of the target data storage end (ie, the second data storage end).
  • the source and target of data synchronization are both databases as an example for illustration.
  • the changes of data in the source database can be obtained through binary logs, and then the changes of these data can be synchronized to the target database, so as to avoid It is then necessary for the business function module to issue a data synchronization instruction and perform corresponding operations, so as to realize the decoupling of the business function module and the data synchronization module, so that the business function module no longer needs to participate in the data synchronization work.
  • the methods of message queue access and JavaServer Faces framework access can also be used.
  • the advantage of the message queue access method is that it is naturally asynchronous. In large-traffic scenarios, the message queue can be used as a consumption buffer, and there is no risk that a data synchronization system will be abnormal due to high pressure.
  • the access method of the JavaServer Faces framework is relatively low, but if you want to do the asynchronous method, you need to use the thread pool to implement it yourself.
  • the method may further include setting the preset configuration strategy according to S301 and S302 .
  • Step S301 determining a target data storage end as a second data storage end from a plurality of candidate data storage ends.
  • the above-mentioned candidate data storage terminal may be a MySQL database, an HBase database, an Elasticsearch full-text search engine, or the like.
  • the corresponding target data storage terminal can be selected according to actual needs.
  • Step S302 configure a target service cluster that matches the second data storage end, and establishes a target index that matches the second data storage end.
  • an Elasticsearch cluster and index can be applied for and configured.
  • Configuring a cluster can improve system performance, avoid the problem of a server downtime during data synchronization causing the entire system to crash, reduce costs, improve scalability, and enhance reliability.
  • the purpose of configuring the index is to search the sorted index instead of accessing the data of the entire table, and then locate the corresponding data in the table through the index, so as to quickly search for the entry to be obtained.
  • S401 and S402 may be included.
  • Step S401 establishing a mapping relationship between target source data and target storage data, where the target source data is target update data in the first data storage end, and the target storage data is data synchronized to the second data storage end.
  • the source data storage end stores the target source data, that is, the data that needs to be synchronized
  • the target data storage end stores the target storage data, that is, the synchronized data.
  • the target source data and the target storage data can be It is completely consistent, and can also be changed according to the actual situation or needs.
  • the calculation strategy can be configured according to the needs, the target source data can be weighted, the weighted sum can be calculated, and the redundant data can be removed during the data synchronization process, and only the important data can be retained. data.
  • Step S402 using the memory management system to load the mapping relationship into the memory.
  • the mapping relationship and other configuration information can be loaded into the memory through the memory manager, so that the target source data can be stored in real time according to the mapping relationship and configuration information. Perform data synchronization.
  • the object pool mode can be adopted. According to the actual situation or requirements, when configuring, the objects in the pool are reused, and there is no overhead of allocating memory and creating objects in the heap, and there is no need to release memory and destroy objects in the heap. This reduces the overhead of the garbage collector, avoids memory jitter, and does not have to repeatedly initialize the object state, which can effectively improve performance.
  • the specific configuration information may be a drawing configuration, a writing configuration, an exception handling configuration, and the like.
  • related modules can also be configured adaptively according to the data tables to be synchronized and the information in the data tables to be synchronized, so that when adding data synchronization requirements, only simple configuration is required, and no development is required.
  • function expansion can be greatly facilitated, and subsequent development and maintenance workloads can be reduced.
  • establishing a mapping relationship between target source data and target storage data may include S501 and S502.
  • Step S501 Determine the storage format, storage path and version control field of the target storage data.
  • Step S502 using the target expression language, encode the target source data according to the storage format, storage path and version control field.
  • different data storage formats, storage paths, version information, etc. may be determined according to different target data storage terminals.
  • Spring Expression Language can be used for field mapping analysis and special value calculation.
  • “applydate”: “# ⁇ tf(map[applydate]) ⁇ ” means that the applydate field on the source data storage side is converted to the target data storage by a custom tf method Similarly, configure the calculation strategy according to the needs, and assign weights to the target source data.
  • the configuration of the calculation method for calculating the weighted sum can be: “totalcount”: "# ⁇ orderCount ⁇ +# ⁇ amountCount ⁇ ".
  • S601 and S602 may also be included when synchronizing offline data (ie, existing data).
  • Step S601 in the case of synchronizing the inventory data, determine the current version field of the inventory data, and the target update data includes the inventory data.
  • Step S602 if the second data storage end does not find a version control field larger than the current version field, store the existing data in the second data storage end according to the current version field.
  • data is divided into offline data and real-time data in terms of timeliness
  • offline data is stock data before going online
  • real-time data is streaming data received after going online.
  • Offline data is characterized by a large amount of data, so the first thing to ensure when synchronizing offline data is performance and stability. Performance is to quickly complete the synchronization of all basic data, and stability is to ensure that tasks can be terminated correctly, not because of memory. Problems such as overflow cause the task to terminate halfway.
  • offline data There is no strict boundary between offline data and real-time data.
  • the data generated before the 10th is called offline data, but because the data can be changed, it is possible to receive the changed real-time data on or after the 10th.
  • the order between offline data and real-time data should also be considered when synchronizing offline data.
  • the version field is used to represent the update order of data. The larger the version field, the later the data update time. For example, you can Configure offVersionKey to limit, when the value corresponding to the ofVersionKey of offline data is smaller than the stored value, the current offline data is ignored.
  • the accuracy can be accurate to seconds.
  • the method when an exception occurs when the target update data is written to the second data storage terminal according to a preset configuration policy, the method further includes: using a first function to capture the exception of the second function, the first function It is an outer function of the second function, and the second function is used to write the target update data into the second data storage terminal according to the preset configuration strategy; and continue to use the first function to write the target update data into the second data storage terminal according to the preset configuration strategy. Data storage side until the exception is eliminated.
  • the exception when an exception occurs during the data synchronization process, such as an unstable network connection, the exception may be thrown to the outer function, and the outer function will continue to retry until the data synchronization is completed.
  • the inner function throws an exception, and the outer function captures the exception and handles it, which can greatly simplify the code amount of the inner core function and reduce the probability of the system crashing, so that the retry mechanism can be used to achieve final data consistency.
  • Some technical solutions of the present disclosure are to monitor the target transaction log of the first data storage terminal, the first data storage terminal is used to store the data generated by the operation of the business system, and the target transaction log is used to record the data update information generated by the first data storage terminal; When it is detected that the target transaction log is updated, extract the target update data in the target transaction log; and write the target update data into the second data storage terminal according to the preset configuration strategy, so as to synchronize the target update data to the second data storage. end.
  • Some technical solutions of the present disclosure can determine which data has been updated and the update method through the binary log file of the source data storage end, so that the target data storage end can directly perform data synchronization without the instruction of the business function module, and completely synchronize the data synchronization with the business Functional decoupling facilitates the maintenance and iteration of the two systems without affecting the stability of each other, and can also ensure the final consistency of data through the retry mechanism.
  • a data synchronization apparatus which includes: a log monitoring module 701 configured to monitor a target transaction log of a first data storage end, and the first data storage end is used for storing The data generated by the operation of the business system, the target transaction log is used to record the data update information generated by the first data storage end; the data extraction module 702 is configured to extract the target transaction log in the case of detecting that the target transaction log is updated. update data; and a data synchronization module 703, configured to write the target update data into the second data storage end according to a preset configuration strategy, so as to synchronize the target update data to the second data storage end.
  • the log monitoring module 701 in this embodiment can be used to perform step S201 in some embodiments
  • the data extraction module 702 in this embodiment can be used to perform step S202 in some embodiments
  • the The data synchronization module 703 in the embodiment may be used to perform step S203 in some embodiments.
  • the data synchronization apparatus further includes a configuration module configured to: determine a target data storage end from a plurality of candidate data storage ends as the second data storage end; and configure and configure the second data storage end A matching target service cluster is established, and a target index matching the second data storage end is established.
  • the data synchronization apparatus further includes a mapping module configured to: establish a mapping relationship between target source data and target storage data, where the target source data is target update data in the first data storage end, and the target storage The data is the data synchronized to the second data storage end; and the mapping relationship is loaded into the memory by using the memory management system.
  • a mapping module configured to: establish a mapping relationship between target source data and target storage data, where the target source data is target update data in the first data storage end, and the target storage The data is the data synchronized to the second data storage end; and the mapping relationship is loaded into the memory by using the memory management system.
  • the mapping module is further configured to: determine the storage format, storage path and version control field of the target storage data; and use the target expression language to control the target source data according to the storage format, storage path and version control field to encode.
  • the data synchronization module is configured to: convert the target update data into target storage data according to the mapping relationship; and store the target storage data to the second data storage end.
  • the data synchronization module is further configured to: in the case of synchronizing the existing data, determine the current version field of the existing data, and the target update data includes the existing data; In the case of a version control field larger than the current version field, the existing data is stored in the second data storage end according to the current version field.
  • the data synchronization apparatus further includes an exception handling module configured to: use the first function to catch the exception of the second function, the first function is an outer function of the second function, and the second function is used for Write the target update data into the second data storage terminal according to the preset configuration strategy; and continue to use the first function to write the target update data into the second data storage terminal according to the preset configuration strategy until the abnormality is eliminated.
  • an exception handling module configured to: use the first function to catch the exception of the second function, the first function is an outer function of the second function, and the second function is used for Write the target update data into the second data storage terminal according to the preset configuration strategy; and continue to use the first function to write the target update data into the second data storage terminal according to the preset configuration strategy until the abnormality is eliminated.
  • the present disclosure provides an electronic device, as shown in FIG. 8 , which includes a memory 801 , a processor 802 , a communication interface 803 , and a communication bus 804 .
  • the computer program running on the memory 801 and the processor 802 communicate through the communication interface 803 and the communication bus 804, and the processor 802 implements the above method when executing the computer program.
  • the memory and the processor in the above electronic device communicate through a communication bus and a communication interface.
  • the communication bus may be a Peripheral Component Interconnect (PCI for short) bus or an Extended Industry Standard Architecture (EISA for short) bus or the like.
  • PCI Peripheral Component Interconnect
  • EISA Extended Industry Standard Architecture
  • the communication bus can be divided into an address bus, a data bus, a control bus, and the like.
  • the memory may include random access memory (Random Access Memory, RAM for short), or may include non-volatile memory (non-volatile memory), such as at least one disk memory.
  • RAM Random Access Memory
  • non-volatile memory such as at least one disk memory.
  • the memory may also be at least one storage device located remotely from the aforementioned processor.
  • the above-mentioned processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, referred to as CPU), a network processor (Network Processor, referred to as NP), etc.; it can also be a digital signal processor (Digital Signal Processor, referred to as DSP) , Application Specific Integrated Circuit (ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
  • CPU Central Processing Unit
  • NP Network Processor
  • DSP Digital Signal Processor
  • ASIC Application Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array
  • a computer-readable medium having non-volatile program code executable by a processor.
  • a computer-readable medium is configured to store program code for the processor to perform the steps of:
  • the first data storage terminal is used to store the data generated by the operation of the business system, and the target transaction log is used to record the data update information generated by the first data storage terminal;
  • the target update data is written into the second data storage terminal according to the preset configuration strategy, so as to synchronize the target update data to the second data storage terminal.
  • the embodiments described herein may be implemented in hardware, software, firmware, middleware, microcode, or a combination thereof.
  • the processing unit can be implemented in one or more Application Specific Integrated Circuits (ASIC), Digital Signal Processing (DSP), Digital Signal Processing Device (DSP Device, DSPD), programmable Programmable Logic Device (PLD), Field-Programmable Gate Array (FPGA), general purpose processor, controller, microcontroller, microprocessor, other for performing the functions described in this disclosure electronic unit or a combination thereof.
  • ASIC Application Specific Integrated Circuits
  • DSP Digital Signal Processing
  • DSP Device Digital Signal Processing Device
  • PLD programmable Programmable Logic Device
  • FPGA Field-Programmable Gate Array
  • the techniques described herein may be implemented by means of units that perform the functions described herein.
  • Software codes may be stored in memory and executed by a processor.
  • the memory can be implemented in the processor or external to the processor.
  • the disclosed apparatus and method may be implemented in other manners.
  • the apparatus embodiments described above are only illustrative.
  • the division of the modules is only a logical function division. In actual implementation, there may be other division methods.
  • multiple modules or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented.
  • the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • each functional unit in each embodiment of the present disclosure may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the functions, if implemented in the form of software functional units and sold or used as independent products, may be stored in a computer-readable storage medium.
  • the technical solutions of the embodiments of the present disclosure are essentially or contribute to the prior art or parts of the technical solutions may be embodied in the form of software products, and the computer software products are stored in a storage medium , including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present disclosure.
  • the aforementioned storage medium includes: a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk and other mediums that can store program codes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

公开了数据同步方法、装置、设备及计算机可读介质。该方法包括:监听第一数据存储端的目标事务日志,第一数据存储端用于存储业务系统运行产生的数据,目标事务日志用于记录第一数据存储端产生的数据更新信息;在检测到目标事务日志发生更新的情况下,提取目标事务日志中的目标更新数据;以及将目标更新数据按照预设配置策略写入第二数据存储端,以将目标更新数据同步到第二数据存储端。

Description

数据同步方法、装置、设备及计算机可读介质
相关申请的引用
本公开要求于2020年9月28日向中华人民共和国国家知识产权局提交的申请号为202011044851.8、名称为“数据同步方法、装置、设备及计算机可读介质”的发明专利申请的全部权益,并通过引用的方式将其全部内容并入本文。
领域
本公开大体上涉及数据处理技术领域,更具体地,涉及数据同步方法、装置、设备及计算机可读介质。
背景
数据同步是指将数据从一个存储媒介同步到另一个存储媒介,两个存储媒介可能相同或不同,比如从MySQL到Elasticsearch、从MySQL到HBase、从MySQL到Elasticsearch和HBase等,数据同步可以是内部业务逻辑触发或是外部触发。业务逻辑触发:例如当数据bizType=1的时候将数据从一个MySQL数据库同步到另一个MySQL数据库,当数据bizType=2的时候将数据从MySQL同步到HBase。外部触发:例如每天凌晨3点定时任务将MySQL某个表的全量数据同步到Elasticsearch的某个index中。
概述
一方面,本公开涉及数据同步方法,其包括:监听第一数据存储端的目标事务日志,第一数据存储端用于存储业务系统运行产生的数据,目标事务日志用于记录第一数据存储端产生的数据更新信息;在检测到目标事务日志发生更新的情况下,提取目标事务日志中的目标更新数据;以及将目标更新数据按照预设配置策略写入第二数据存储端,以将目标更新数据同步到第二数据存储端。
在某些实施方案中,将目标更新数据按照预设配置策略写入第二 数据存储端之前,该方法还包括按照如下方式设置预设配置策略:从多个待选数据存储端中确定目标数据存储端作为第二数据存储端;以及配置与第二数据存储端匹配的目标服务集群,并建立与第二数据存储端匹配的目标索引。
在某些实施方案中,建立目标索引之后,该方法还包括:建立目标源数据与目标存储数据的映射关系,目标源数据为第一数据存储端中的目标更新数据,目标存储数据为同步到第二数据存储端的数据;以及利用内存管理系统将映射关系加载到内存中。
在某些实施方案中,建立目标源数据与目标存储数据的映射关系包括:确定目标存储数据的存储格式、存储路径及版本控制字段;以及采用目标表达式语言,将目标源数据按照存储格式、存储路径及版本控制字段进行编码。
在某些实施方案中,将目标更新数据按照预设配置策略写入第二数据存储端包括:按照映射关系将目标更新数据转换为目标存储数据;以及将目标存储数据存储至第二数据存储端。
在某些实施方案中,将目标更新数据按照预设配置策略写入第二数据存储端还包括:在进行存量数据同步的情况下,确定存量数据的当前版本字段,目标更新数据包括存量数据;以及在第二数据存储端未查找到大于当前版本字段的版本控制字段的情况下,将存量数据按照当前版本字段存储至第二数据存储端。
在某些实施方案中,在将目标更新数据按照预设配置策略写入第二数据存储端时发生异常的情况下,该方法还包括:利用第一函数捕获第二函数的异常,第一函数为第二函数的外层函数,第二函数用于将目标更新数据按照预设配置策略写入第二数据存储端;以及继续利用第一函数将目标更新数据按照预设配置策略写入第二数据存储端,直至消除异常。
另一方面,本公开涉及数据同步装置,其包括:日志监听模块,配置为监听第一数据存储端的目标事务日志,第一数据存储端用于存储业务系统运行产生的数据,目标事务日志用于记录第一数据存储端产生的数据更新信息;数据提取模块,配置为在检测到目标事务日志 发生更新的情况下,提取目标事务日志中的目标更新数据;以及数据同步模块,配置为将目标更新数据按照预设配置策略写入第二数据存储端,以将目标更新数据同步到第二数据存储端。
又一方面,本公开涉及电子设备,其包括存储器、处理器、通信接口及通信总线,存储器中存储有可在处理器上运行的计算机程序,存储器、处理器通过通信总线和通信接口进行通信,处理器执行计算机程序时实现上述方法。
再一方面,本公开涉及具有处理器可执行的非易失的程序代码的计算机可读介质,程序代码使处理器执行上述的方法。
本公开某些实施方案的技术方案为监听第一数据存储端的目标事务日志,第一数据存储端用于存储业务系统运行产生的数据,目标事务日志用于记录第一数据存储端产生的数据更新信息;在检测到目标事务日志发生更新的情况下,提取目标事务日志中的目标更新数据;以及将目标更新数据按照预设配置策略写入第二数据存储端,以将目标更新数据同步到第二数据存储端。本公开的某些实施方案可以通过源数据存储端的二进制日志文件来确定哪些数据发生了更新及更新方式,从而目标数据存储端可以不需要业务功能模块的指示直接进行数据同步,完全将数据同步与业务功能解耦合,便于两个系统的维护及迭代且不会互相影响稳定性,并且,还能通过重试机制保证数据最终一致性。
附图的简要说明
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本公开的实施例,并与说明书一起用于解释本公开的原理。
为了更清楚地说明本公开实施例的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,对于本领域普通技术人员而言,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1为根据本公开一实施例提供的可选的数据同步方法硬件环境示意图;
图2为根据本公开一实施例提供的可选的数据同步方法流程图;
图3为根据本公开一实施例提供的可选的配置方法流程图;
图4为根据本公开一实施例提供的可选的配置方法流程图;
图5为根据本公开一实施例提供的可选的配置方法流程图;
图6为根据本公开一实施例提供的可选的数据同步方法流程图;
图7为根据本公开一实施例提供的可选的数据同步装置框图;以及
图8为本公开一实施例提供的可选的电子设备结构示意图。
详述
为使本公开实施例的目的、技术方案和优点更加清楚,下面将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本公开的一部分实施例,而不是全部的实施例。基于本公开中的实施例,本领域普通技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例,都属于本公开保护的范围。
在后续的描述中,使用用于表示元件的诸如“模块”、“部件”或“单元”的后缀仅为了有利于本公开的说明,其本身并没有特定的意义。因此,“模块”与“部件”可以混合地使用。
首先,在对本公开实施例进行描述的过程中出现的部分名词或者术语适用于如下解释:
binlog:用于MySQL主从同步,记录了对MySQL数据库执行更改的所有操作(不包括SELECT和SHOW这样的操作),若操作本身并没有导致数据库发生变化,那么该操作也会写入二进制日志文件中。
Elasticsearch:Elasticsearch是一个基于Lucene的搜索服务器。它提供了一个分布式多用户能力的全文搜索引擎,基于RESTful web接口,用Java语言开发的,并作为Apache许可条款下的开放源码发布,是一种流行的企业级搜索引擎。
HBase:一个分布式的、面向列的开源数据库,该技术来源于Fay Chang所撰写的Google论文“Bigtable:一个结构化数据的分布式存 储系统”。就像Bigtable利用了Google文件系统(File System)所提供的分布式数据存储一样,HBase在Hadoop之上提供了类似于Bigtable的能力,是Apache的Hadoop项目的子项目。
spel:全称是Spring Expression Language,即spring提供的表达式语言,可以在运行期进行查询和操作数据,提供了丰富的计算及操作方式,便于在程序中执行配置性操作。
相关技术中,对要进行同步的数据,一般是采用双写的方式,例如业务模块向MySQL写入数据,因为需要将数据同步到Elasticsearch,业务模块在写MySQL成功之后调用api写入Elasticsearch。这种方式往往使得业务功能与数据同步功能具有较高的耦合性,业务功能与数据同步功能耦合在一起,使业务系统的逻辑更复杂化,提升了后续开发及维度的难度,并且会导致业务系统功能与数据同步功能互相影响导致可用性下降。不仅如此,相关技术中还存在以下几个问题:
由于相关技术方案的逻辑是写MySQL成功后再写入Elasticsearch,若写入Elasticsearch时失败,由于MySQL已经成功写入,此时MySQL与Elasticsearch数据不一致,若要在此方案中解决事务问题会引入更复杂的方案及相应问题。
相关技术方案都是通过硬编码实现,所以表之间的关系及字段映射都是固定的,例如原始需求是MySQL的table1表同步到Elasticsearch的index1索引、则MySQL与Elasticsearch的字段是一一对应的,不易扩展。
相关技术方案中未考虑存量数据的同步问题,即只同步上线之后产生的数据,并未考虑上线前产生的存量数据,此时若有对存量数据的同步诉求则无法实现。
在某些实施方案中,上述数据同步方法可以应用于如图1所示的由终端101和服务器102所构成的硬件环境中。如图1所示,服务器102通过网络与终端101进行连接,可用于为终端或终端上安装的客户端提供服务,可在服务器上或独立于服务器设置数据库103,用于为服务器102提供数据存储服务,上述网络包括但不限于:广域网、城域网或局域网,终端101包括但不限于PC、手机、平板电脑等。
本公开一实施例中的数据同步方法可以由服务器102来执行,还可以是由服务器102和终端101共同执行,如图2所示,该方法可以包括S201至S203。
步骤S201,监听第一数据存储端的目标事务日志,第一数据存储端用于存储业务系统运行产生的数据,目标事务日志用于记录第一数据存储端产生的数据更新信息。
在某些实施方案中,第一数据存储端可以是源数据存储端,即需要同步数据的一端,第二数据存储端可以是目标数据存储端,即将要同步数据的一端。源数据存储端中需要同步的数据为目标源数据,同步到了目标数据存储端的数据为目标存储数据。目标源数据和目标存储数据可以是完全一致的数据,也可以是根据实际情况或需要按照配置策略进行数据同步。
在某些实施方案中,对于数据同步前的数据接入,可以通过目标事务日志实现。目标事务日志可以是一种数据库的日志类型,以MySQL为例,MySQL中一般有以下几种类型的日志:
错误日志,用于记录在启动、运行或停止MySQL时遇到的问题;
通用查询日志,用于记录建立的客户端连接和执行的语句;
二进制日志(binlog),用于记录更改数据的语句;
中继日志,用于复制主数据库接收的数据更改;以及
慢查询日志,用于记录所有执行时间超过查询时间阈值的所有查询或不使用索引的查询。
在某些实施方案中,可以以二进制日志(binlog)作为目标事务日志,二进制日志以事件形式记录数据库中数据发生的更改,还包含语句所执行的消耗的时间。二进制日志的格式类型可以是STATEMENT、ROW及MIXED。
STATEMENT,即基于SQL语句的复制,记录的是修改SQL语句,优点是日志文件小,节约输入/输出(IO)资源,性能较高,缺点是由于记录的只是执行语句,为了这些语句能在从数据库上正确运行,因此还必须记录每条语句在执行的时候的一些相关信息,以保证所有语句能在从数据库得到和在主数据库执行的时候相同的结果。
ROW,即基于行的复制,该类型不记录SQL语句上下文相关信息,仅保存哪条记录被修改。优点是基于ROW的日志内容会非常清楚的记录下每一行数据修改的细节。而且不会出现某些特定情况下的存储过程,或function,以及trigger的调用和触发无法被正确复制的问题,缺点是所有的执行的语句当记录到日志中的时候,都将以每行记录的修改来记录,这样可能会产生大量的日志内容。
MIXED,即STATEMENT和ROW的混合模式复制。在MIXED模式下,一般的语句修改使用STATEMENT格式保存二进制日志,如一些函数,STATEMENT无法完成主从复制的操作,则采用ROW格式保存二进制日志,MySQL会根据执行的每一条具体的SQL语句来区分对待记录的日志形式,也就是在STATEMENT和ROW之间选择一种。
在某些实施方案中,可以根据源数据存储端中采用的二进制日志文件格式,选择相应的格式进行数据同步处理。
步骤S202,在检测到目标事务日志发生更新的情况下,提取目标事务日志中的目标更新数据。
在某些实施方案中,二进制日志,即目标事务日志可以记录下源数据存储端的数据发生的更改,因此可以在监听源数据存储端(即第一数据存储端)的二进制日志文件,在源数据存储端的二进制日志文件更新的情况下,说明源数据存储端的数据发生了更改,为了将源数据存储端与目标数据存储端的数据同步,可以只通过源数据存储端的二进制日志文件提取数据更新记录,并提取更新的数据,从而可以进行后续的将更新数据从源数据存储端同步到目标数据存储端的步骤。
步骤S203,将目标更新数据按照预设配置策略写入第二数据存储端,以将目标更新数据同步到第二数据存储端。
在某些实施方案中,可以将提取到的更新数据按照目标数据存储端(即第二数据存储端)的配置策略同步到目标数据存储端。
在某些实施方案中,以数据同步的源和目标都是数据库为例进行说明,可以通过二进制日志获取源数据库中数据发生的改变,再将这些数据发生的改变同步到目标数据库中,从而不再需要业务功能模块 发出数据同步指令并进行相应操作,实现将业务功能模块和数据同步模块解耦合,使得业务功能模块不再需要参与数据同步工作。
在某些实施方案中,对于数据同步前的数据接入,还可以采用消息队列接入和JavaServer Faces框架接入的方式。消息队列接入的方式优点是天然异步,大流量场景可以用消息队列做消费缓冲,不会有因压力大导致某个数据同步系统异常的风险。JavaServer Faces框架接入的方式接入程度比较低,但如果要做异步方式需自己使用线程池实现,缺点是无法缓冲,可能因为请求量大导致JavaServer Faces线程池耗尽或是消费端异常。
采用本公开某些技术方案,可以通过源数据存储端的二进制日志文件来确定哪些数据发生了更新及更新方式,从而目标数据存储端可以不需要业务功能模块的指示直接进行数据同步,完全将数据同步与业务功能解耦合,便于两个系统的维护及迭代且不会互相影响稳定性。
在某些实施方案中,将目标更新数据按照预设配置策略写入第二数据存储端之前,如图3所示,该方法还可以包括按S301和S302设置预设配置策略。
步骤S301,从多个待选数据存储端中确定目标数据存储端作为第二数据存储端。
在某些实施方案中,上述待选数据存储端可以是MySQL数据库,可以是HBase数据库,还可以是Elasticsearch全文搜索引擎等。可以根据实际需要选择相应的目标数据存储端。
步骤S302,配置与第二数据存储端匹配的目标服务集群,并建立与第二数据存储端匹配的目标索引。
在某些实施方案中,以Elasticsearch为目标数据存储端为例,在进行数据同步之前,可以申请及配置Elasticsearch集群、索引。配置集群可以提高系统性能,避免数据同步过程中一个服务器宕机导致整个系统奔溃的问题,还可以降低成本、提高可扩展性,并增强可靠性。配置索引是为了不通过存取整张表的数据,而是搜索已经排序的索引,然后通过索引定位到表中相应的数据,从而快速搜索到需要获取的条目。
在某些实施方案中,建立匹配的服务集群和索引之后,还需要建立源数据存储端到目标数据存储端之间的数据映射关系,如图4所示,可以包括S401和S402。
步骤S401,建立目标源数据与目标存储数据的映射关系,目标源数据为第一数据存储端中的目标更新数据,目标存储数据为同步到第二数据存储端的数据。
在某些实施方案中,源数据存储端存储的是目标源数据,即需要同步的数据,目标数据存储端存储的是目标存储数据,即同步过来的数据,目标源数据和目标存储数据可以是完全一致的,也可以根据实际情况或需要进行改动,例如,可以根据需要配置计算策略,对目标源数据赋予权重,求加权和,还可以再数据同步的过程中去除冗余数据,只保留重要数据。
步骤S402,利用内存管理系统将映射关系加载到内存中。
在某些实施方案中,为了将目标源数据按照映射关系转换为目标存储数据,可以通过内存管理器将映射关系及其他配置信息加载到内存中,从而实时对目标源数据按照映射关系和配置信息进行数据同步。
在某些实施方案中,可以采用对象池模式,根据实际情况或需求,在进行配置时,复用池中对象,没有分配内存和创建堆中对象的开销,没有释放内存和销毁堆中对象的开销,进而减少垃圾收集器的负担,避免内存抖动,不必重复初始化对象状态,能够有效提高性能。其中,具体的配置信息可以是抽数配置、写数配置、异常处理配置等。
在某些实施方案中,还可以根据需要同步的数据表及数据表中需要同步的信息适应性的配置相关模块,从而新增数据同步需求时只需要简单配置即可,不需要开发。采用本公开某些技术方案,能够极大的便于功能扩展,降低后续的开发及维护工作量。
在某些实施方案中,如图5所示,建立目标源数据与目标存储数据的映射关系可以包括S501和S502。
步骤S501,确定目标存储数据的存储格式、存储路径及版本控制字段。
步骤S502,采用目标表达式语言,将目标源数据按照存储格式、 存储路径及版本控制字段进行编码。
在某些实施方案中,可以根据不同的目标数据存储端,确定不同的数据存储格式、存储路径和版本信息等。可以采用Spring Expression Language进行字段映射解析及特殊值计算,例如,“applydate”:“#{tf(map[applydate])}”表示源数据存储端的applydate字段通过自定义的tf方式转换为目标数据存储端的applydate字段,相似的,根据需要配置计算策略,对目标源数据赋予权重,求加权和的计算方式的配置可以是:“totalcount”:“#{orderCount}+#{amountCount}”。
在某些实施方案中,如图6所示,在对离线数据(即存量数据)进行数据同步时,还可以包括S601和S602。
步骤S601,在进行存量数据同步的情况下,确定存量数据的当前版本字段,目标更新数据包括存量数据。
步骤S602,在第二数据存储端未查找到大于当前版本字段的版本控制字段的情况下,将存量数据按照当前版本字段存储至第二数据存储端。
在某些实施方案中,数据从时效性方面即分为离线数据和实时数据,离线数据即上线前的存量数据,实时数据即上线后收到的流式数据。离线数据的特点是基量较大,所以在做离线数据同步时首要保证的就是性能和稳定性,性能即快速完成全部基量数据的同步工作,稳定性即保证任务可以正确终止,不能因为内存溢出等问题导致任务半路终止。
离线数据和实时数据并没有严格的界限,例如10号上线,10号之前产生的数据称为离线数据,但因为数据是可以变更的,所以也可能在10号或之后收到变更后的实时数据,即在做离线数据同步时也要考虑离线数据与实时数据之间的顺序问题,本方案通过版本字段来表示数据的更新顺序,版本字段越大,则表示数据更新的时间越晚,例如可以配置offVersionKey进行限制,当离线数据的ofVersionKey对应的值比已经存储的值小的时候就忽略当前离线数据。采用本公开某些技术方案,精度可以精确到秒。
在某些实施方案中,在将目标更新数据按照预设配置策略写入第 二数据存储端时发生异常的情况下,该方法还包括:利用第一函数捕获第二函数的异常,第一函数为第二函数的外层函数,第二函数用于将目标更新数据按照预设配置策略写入第二数据存储端;以及继续利用第一函数将目标更新数据按照预设配置策略写入第二数据存储端,直至消除异常。
在某些实施方案中,在数据同步过程中发生异常如网络连接不稳定等情况时,可以将异常抛出到外层函数,由外层函数继续重试直至数据同步完成。内层函数抛出异常,由外层函数抓取异常并进行处理,可以极大简化内层核心功能的代码量,减少系统奔溃的概率,从而可以通过重试机制,实现数据最终一致性。
本公开某些技术方案为监听第一数据存储端的目标事务日志,第一数据存储端用于存储业务系统运行产生的数据,目标事务日志用于记录第一数据存储端产生的数据更新信息;在检测到目标事务日志发生更新的情况下,提取目标事务日志中的目标更新数据;以及将目标更新数据按照预设配置策略写入第二数据存储端,以将目标更新数据同步到第二数据存储端。本公开某些技术方案可以通过源数据存储端的二进制日志文件来确定哪些数据发生了更新及更新方式,从而目标数据存储端可以不需要业务功能模块的指示直接进行数据同步,完全将数据同步与业务功能解耦合,便于两个系统的维护及迭代且不会互相影响稳定性,并且,还能通过重试机制保证数据最终一致性。
根据本公开实施例的又一方面,如图7所示,提供了数据同步装置,其包括:日志监听模块701,配置为监听第一数据存储端的目标事务日志,第一数据存储端用于存储业务系统运行产生的数据,目标事务日志用于记录第一数据存储端产生的数据更新信息;数据提取模块702,配置为在检测到目标事务日志发生更新的情况下,提取目标事务日志中的目标更新数据;以及数据同步模块703,配置为将目标更新数据按照预设配置策略写入第二数据存储端,以将目标更新数据同步到第二数据存储端。
需要说明的是,该实施例中的日志监听模块701可以用于执行某些实施方案中的步骤S201,该实施例中的数据提取模块702可以用 于执行某些实施方案中的步骤S202,该实施例中的数据同步模块703可以用于执行某些实施方案中的步骤S203。
此处需要说明的是,上述模块与对应的步骤所实现的示例和应用场景相同,但不限于上述实施例所公开的内容。需要说明的是,上述模块作为装置的一部分可以运行在如图1所示的硬件环境中,可以通过软件实现,也可以通过硬件实现。
在某些实施方案中,该数据同步装置,还包括配置模块,其配置为:从多个待选数据存储端中确定目标数据存储端作为第二数据存储端;以及配置与第二数据存储端匹配的目标服务集群,并建立与第二数据存储端匹配的目标索引。
在某些实施方案中,该数据同步装置,还包括映射模块,其配置为:建立目标源数据与目标存储数据的映射关系,目标源数据为第一数据存储端中的目标更新数据,目标存储数据为同步到第二数据存储端的数据;以及利用内存管理系统将映射关系加载到内存中。
在某些实施方案中,该映射模块,还配置为:确定目标存储数据的存储格式、存储路径及版本控制字段;以及采用目标表达式语言,将目标源数据按照存储格式、存储路径及版本控制字段进行编码。
在某些实施方案中,该数据同步模块配置为:按照映射关系将目标更新数据转换为目标存储数据;以及将目标存储数据存储至第二数据存储端。
在某些实施方案中,该数据同步模块,还配置为:在进行存量数据同步的情况下,确定存量数据的当前版本字段,目标更新数据包括存量数据;以及在第二数据存储端未查找到大于当前版本字段的版本控制字段的情况下,将存量数据按照当前版本字段存储至第二数据存储端。
在某些实施方案中,该数据同步装置,还包括异常处理模块,其配置为:利用第一函数捕获第二函数的异常,第一函数为第二函数的外层函数,第二函数用于将目标更新数据按照预设配置策略写入第二数据存储端;以及继续利用第一函数将目标更新数据按照预设配置策略写入第二数据存储端,直至消除异常。
根据本公开实施例的另一方面,本公开提供了电子设备,如图8所示,其包括存储器801、处理器802、通信接口803及通信总线804,存储器801中存储有可在处理器802上运行的计算机程序,存储器801、处理器802通过通信接口803和通信总线804进行通信,处理器802执行计算机程序时实现上述方法。
上述电子设备中的存储器、处理器通过通信总线和通信接口进行通信。所述通信总线可以是外设部件互连标准(Peripheral Component Interconnect,简称PCI)总线或扩展工业标准结构(Extended Industry Standard Architecture,简称EISA)总线等。该通信总线可以分为地址总线、数据总线、控制总线等。
存储器可以包括随机存取存储器(Random Access Memory,简称RAM),也可以包括非易失性存储器(non-volatile memory),例如至少一个磁盘存储器。在某些实施方案中,存储器还可以是至少一个位于远离前述处理器的存储装置。
上述的处理器可以是通用处理器,包括中央处理器(Central Processing Unit,简称CPU)、网络处理器(Network Processor,简称NP)等;还可以是数字信号处理器(Digital Signal Processor,简称DSP)、专用集成电路(Application Specific Integrated Circuit,简称ASIC)、现场可编程门阵列(Field-Programmable Gate Array,简称FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。
根据本公开实施例的又一方面还提供了具有处理器可执行的非易失的程序代码的计算机可读介质。
在某些实施方案中,计算机可读介质被设置为存储用于所述处理器执行以下步骤的程序代码:
监听第一数据存储端的目标事务日志,第一数据存储端用于存储业务系统运行产生的数据,目标事务日志用于记录第一数据存储端产生的数据更新信息;
在检测到目标事务日志发生更新的情况下,提取目标事务日志中的目标更新数据;以及
将目标更新数据按照预设配置策略写入第二数据存储端,以将目 标更新数据同步到第二数据存储端。
本实施方案具体示例可以参考上述实施方案中所描述的示例,在此不再赘述。
本公开实施例在具体实现时,可以参阅上述各个实施例,具有相应的技术效果。
可以理解的是,本文描述的这些实施例可以用硬件、软件、固件、中间件、微码或其组合来实现。对于硬件实现,处理单元可以实现在一个或多个专用集成电路(Application Specific Integrated Circuits,ASIC)、数字信号处理器(Digital Signal Processing,DSP)、数字信号处理设备(DSP Device,DSPD)、可编程逻辑设备(Programmable Logic Device,PLD)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)、通用处理器、控制器、微控制器、微处理器、用于执行本公开所述功能的其它电子单元或其组合中。
对于软件实现,可通过执行本文所述功能的单元来实现本文所述的技术。软件代码可存储在存储器中并通过处理器执行。存储器可以在处理器中或在处理器外部实现。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本公开的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本公开所提供的实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述模块的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个模块或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显 示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本公开各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本公开实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本公开各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。需要说明的是,在本文中,诸如“第一”和“第二”等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。
以上所述仅是本公开的具体实施方式,使本领域技术人员能够理解或实现本公开。对这些实施例的多种修改对本领域的技术人员来说将是显而易见的,本文中所定义的一般原理可以在不脱 离本公开的精神或范围的情况下,在其它实施例中实现。因此,本公开将不会被限制于本文所示的这些实施例,而是要符合与本文所公开的原理和新颖特点相一致的最宽的范围。

Claims (10)

  1. 数据同步方法,其包括:
    监听第一数据存储端的目标事务日志,其中,所述第一数据存储端用于存储业务系统运行产生的数据,所述目标事务日志用于记录所述第一数据存储端产生的数据更新信息;
    在检测到所述目标事务日志发生更新的情况下,提取所述目标事务日志中的目标更新数据;以及
    将所述目标更新数据按照预设配置策略写入第二数据存储端,以将所述目标更新数据同步到所述第二数据存储端。
  2. 如权利要求1所述的方法,其中,将所述目标更新数据按照预设配置策略写入第二数据存储端之前,所述方法包括按照如下方式设置所述预设配置策略:
    从多个待选数据存储端中确定目标数据存储端作为所述第二数据存储端;以及
    配置与所述第二数据存储端匹配的目标服务集群,并建立与所述第二数据存储端匹配的目标索引。
  3. 如权利要求2所述的方法,其中,建立所述目标索引之后,所述方法还包括:
    建立目标源数据与目标存储数据的映射关系,其中,所述目标源数据为所述第一数据存储端中的所述目标更新数据,所述目标存储数据为同步到所述第二数据存储端的数据;以及
    利用内存管理系统将所述映射关系加载到内存中。
  4. 如权利要求3所述的方法,其中,建立目标源数据与目标存储数据的映射关系包括:
    确定所述目标存储数据的存储格式、存储路径及版本控制字段;以及
    采用目标表达式语言,将所述目标源数据按照所述存储格式、所 述存储路径及所述版本控制字段进行编码。
  5. 如权利要求3或4所述的方法,其中,将所述目标更新数据按照预设配置策略写入第二数据存储端包括:
    按照所述映射关系将所述目标更新数据转换为所述目标存储数据;以及
    将所述目标存储数据存储至所述第二数据存储端。
  6. 如权利要求4所述的方法,其中,将所述目标更新数据按照预设配置策略写入第二数据存储端还包括:
    在进行存量数据同步的情况下,确定所述存量数据的当前版本字段,其中,所述目标更新数据包括所述存量数据;以及在所述第二数据存储端未查找到大于所述当前版本字段的所述版本控制字段的情况下,将所述存量数据按照所述当前版本字段存储至所述第二数据存储端。
  7. 如权利要求1至6中任一权利要求所述的方法,其中,在将所述目标更新数据按照预设配置策略写入第二数据存储端时发生异常的情况下,所述方法还包括:
    利用第一函数捕获第二函数的异常,其中,所述第一函数为所述第二函数的外层函数,所述第二函数用于将所述目标更新数据按照所述预设配置策略写入所述第二数据存储端;以及
    继续利用所述第一函数将所述目标更新数据按照所述预设配置策略写入所述第二数据存储端,直至消除异常。
  8. 数据同步装置,其包括:
    日志监听模块,配置为监听第一数据存储端的目标事务日志,其中,所述第一数据存储端用于存储业务系统运行产生的数据,所述目标事务日志用于记录所述第一数据存储端产生的数据更新信息;
    数据提取模块,配置为在检测到所述目标事务日志发生更新的情 况下,提取所述目标事务日志中的目标更新数据;以及
    数据同步模块,配置为将所述目标更新数据按照预设配置策略写入第二数据存储端,以将所述目标更新数据同步到所述第二数据存储端。
  9. 电子设备,其包括存储器、处理器、通信接口及通信总线,所述存储器中存储有可在所述处理器上运行的计算机程序,所述存储器、所述处理器通过所述通信总线和所述通信接口进行通信,其中,所述处理器执行所述计算机程序时实现权利要求1至7中任一权利要求所述的方法。
  10. 具有处理器可执行的非易失的程序代码的计算机可读介质,其中,所述程序代码使所述处理器执行权利要求1至7中任一权利要求所述的方法。
PCT/CN2021/120830 2020-09-28 2021-09-27 数据同步方法、装置、设备及计算机可读介质 WO2022063284A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011044851.8 2020-09-28
CN202011044851.8A CN112131237B (zh) 2020-09-28 2020-09-28 数据同步方法、装置、设备及计算机可读介质

Publications (1)

Publication Number Publication Date
WO2022063284A1 true WO2022063284A1 (zh) 2022-03-31

Family

ID=73844506

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/120830 WO2022063284A1 (zh) 2020-09-28 2021-09-27 数据同步方法、装置、设备及计算机可读介质

Country Status (2)

Country Link
CN (1) CN112131237B (zh)
WO (1) WO2022063284A1 (zh)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114691125A (zh) * 2022-04-01 2022-07-01 上海道客网络科技有限公司 一种应用资源映射转换的方法、系统、介质和电子设备
CN115525631A (zh) * 2022-10-31 2022-12-27 华润数字科技有限公司 数据库数据迁移方法、装置、设备、存储介质
CN115576503A (zh) * 2022-12-09 2023-01-06 深圳市泛联信息科技有限公司 数据存储方法、装置、存储介质及存储设备
CN116155920A (zh) * 2023-02-16 2023-05-23 北京万里开源软件有限公司 一种MySQL协议数据库数据远距离传输方法及系统
CN116668465A (zh) * 2023-07-31 2023-08-29 成都卓拙科技有限公司 一种数据同步方法、装置、计算机设备和存储介质

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112131237B (zh) * 2020-09-28 2024-06-14 京东科技控股股份有限公司 数据同步方法、装置、设备及计算机可读介质
CN114221798A (zh) * 2021-12-07 2022-03-22 北京安天网络安全技术有限公司 一种计算机攻击信息存储方法、装置、电子设备
CN113987078B (zh) * 2021-12-24 2022-04-19 中兴通讯股份有限公司 数据同步方法、设备及计算机可读存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111209344A (zh) * 2020-02-07 2020-05-29 浪潮软件股份有限公司 数据同步方法及装置
US20200250151A1 (en) * 2019-01-31 2020-08-06 Rubrik, Inc. Systems and methods for node consistency in a clustered database
CN111563102A (zh) * 2020-04-10 2020-08-21 中国联合网络通信集团有限公司 缓存更新方法、服务器、系统及存储介质
CN112131237A (zh) * 2020-09-28 2020-12-25 京东数字科技控股股份有限公司 数据同步方法、装置、设备及计算机可读介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200250151A1 (en) * 2019-01-31 2020-08-06 Rubrik, Inc. Systems and methods for node consistency in a clustered database
CN111209344A (zh) * 2020-02-07 2020-05-29 浪潮软件股份有限公司 数据同步方法及装置
CN111563102A (zh) * 2020-04-10 2020-08-21 中国联合网络通信集团有限公司 缓存更新方法、服务器、系统及存储介质
CN112131237A (zh) * 2020-09-28 2020-12-25 京东数字科技控股股份有限公司 数据同步方法、装置、设备及计算机可读介质

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114691125A (zh) * 2022-04-01 2022-07-01 上海道客网络科技有限公司 一种应用资源映射转换的方法、系统、介质和电子设备
CN115525631A (zh) * 2022-10-31 2022-12-27 华润数字科技有限公司 数据库数据迁移方法、装置、设备、存储介质
CN115525631B (zh) * 2022-10-31 2023-09-05 华润数字科技有限公司 数据库数据迁移方法、装置、设备、存储介质
CN115576503A (zh) * 2022-12-09 2023-01-06 深圳市泛联信息科技有限公司 数据存储方法、装置、存储介质及存储设备
CN116155920A (zh) * 2023-02-16 2023-05-23 北京万里开源软件有限公司 一种MySQL协议数据库数据远距离传输方法及系统
CN116155920B (zh) * 2023-02-16 2023-10-03 北京万里开源软件有限公司 一种MySQL协议数据库数据远距离传输方法及系统
CN116668465A (zh) * 2023-07-31 2023-08-29 成都卓拙科技有限公司 一种数据同步方法、装置、计算机设备和存储介质
CN116668465B (zh) * 2023-07-31 2023-10-03 成都卓拙科技有限公司 一种数据同步方法、装置、计算机设备和存储介质

Also Published As

Publication number Publication date
CN112131237A (zh) 2020-12-25
CN112131237B (zh) 2024-06-14

Similar Documents

Publication Publication Date Title
WO2022063284A1 (zh) 数据同步方法、装置、设备及计算机可读介质
US11321303B2 (en) Conflict resolution for multi-master distributed databases
US10296606B2 (en) Stateless datastore—independent transactions
US10373247B2 (en) Lifecycle transitions in log-coordinated data stores
US10025802B2 (en) Automated configuration of log-coordinated storage groups
US10303795B2 (en) Read descriptors at heterogeneous storage systems
US10572361B2 (en) Concurrent production use of a production enterprise system and testing of a modified enterprise system
JP2017531256A (ja) 拡張縮小可能なログベーストランザクション管理
US10270852B2 (en) Data migration apparatus and system
CN106371953B (zh) 紧凑二进制事件日志生成方法及系统
KR102119258B1 (ko) 데이터베이스 관리 시스템에서의 변경 데이터 캡쳐 구현 기법
US20230030856A1 (en) Distributed table storage processing method, device and system
WO2024041022A1 (zh) 数据库表变更方法、装置、设备和存储介质
CN113792094A (zh) 一种数据同步系统、方法、设备及介质
EP4348933A1 (en) Managing keys across a series of nodes, based on snapshots of logged client key modifications
CN111078418B (zh) 操作同步方法、装置、电子设备及计算机可读存储介质
WO2023185335A1 (zh) 一种崩溃聚类方法、装置、电子设备以及存储介质
CN115640310A (zh) 用于业务数据聚合的方法和装置、电子设备和存储介质
CN117422556B (zh) 基于复制状态机的衍生品交易系统、设备和计算机介质
US10942649B2 (en) System and method for backup storage garbage collection
US20240184801A1 (en) Data discrepancy detection in a sensitive data replication pipeline
US11874821B2 (en) Block aggregation for shared streams
US20230325378A1 (en) Online Migration From An Eventually Consistent System To A Strongly Consistent System
JP2011186853A (ja) データ処理装置及びシステム及び方法及びプログラム
CN113326268A (zh) 一种数据写入、读取方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21871653

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21871653

Country of ref document: EP

Kind code of ref document: A1