WO2017166815A1 - 一种用于分布式数据库系统的更新数据的方法及装置 - Google Patents

一种用于分布式数据库系统的更新数据的方法及装置 Download PDF

Info

Publication number
WO2017166815A1
WO2017166815A1 PCT/CN2016/104690 CN2016104690W WO2017166815A1 WO 2017166815 A1 WO2017166815 A1 WO 2017166815A1 CN 2016104690 W CN2016104690 W CN 2016104690W WO 2017166815 A1 WO2017166815 A1 WO 2017166815A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
row
hash table
server
database system
Prior art date
Application number
PCT/CN2016/104690
Other languages
English (en)
French (fr)
Inventor
黄华东
王伟
林起芊
Original Assignee
杭州海康威视数字技术股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 杭州海康威视数字技术股份有限公司 filed Critical 杭州海康威视数字技术股份有限公司
Priority to EP16896584.6A priority Critical patent/EP3438845A1/en
Priority to US16/089,949 priority patent/US11176110B2/en
Publication of WO2017166815A1 publication Critical patent/WO2017166815A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/1873Versioning file systems, temporal file systems, e.g. file system supporting different historic versions of files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/219Managing data history or versioning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2255Hash tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • G06F16/184Distributed file systems implemented as replicated file system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • G06F16/275Synchronous replication
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • G06F16/278Data partitioning, e.g. horizontal or vertical partitioning

Definitions

  • the present application relates to the field of data processing technologies, and in particular, to a method and apparatus for updating data for a distributed database system.
  • Distributed database system through the establishment of multiple database servers, to improve the overall read and write performance of the database system, provides technical support for high concurrent read and write database applications, has been widely used in large interactive websites, banks and other backgrounds. .
  • each database server of the distributed database system saves data, it is necessary to ensure the consistency of the data stored in each database server. However, in the event of a system abnormality, abnormal power failure, etc., data inconsistency in each distributed server may occur. Therefore, it is necessary to recover the latest complete data from the distributed database system.
  • an embodiment of the present application provides a method for updating data for a distributed database system, which is applied to a server that maintains a hash table in a distributed database system, where the hash table is for a data table.
  • Each row of data stores a key corresponding to the row data and version information of the row data, and the method includes:
  • the row data is written into the hash table, and the key code corresponding to the row data and the version information are written;
  • the hash table is sent to the primary server such that each server in the distributed database system performs data recovery according to a hash table received by the primary server.
  • the data of each row in the data table is read, including:
  • each row of data in the data table is read by means of data fragmentation.
  • the writing the key corresponding to the row data includes:
  • the primary key of the row data in the data table is written into the hash table as a key corresponding to the row data in the hash table.
  • the method further includes:
  • the frequency of occurrence of the row data is recorded in the hash table.
  • the method further includes:
  • the frequency of occurrence of the row data in the hash table is incremented by one.
  • the method further includes:
  • the frequency of occurrence of each row of data in the hash table it is determined whether the frequency of occurrence of the row data is less than a predetermined threshold, and if so, the row data is deleted.
  • an embodiment of the present application provides an apparatus for updating data for a distributed database system, which is applied to a server in a distributed database system that stores a hash table, where the hash table is for a data table.
  • Each row of data holds a key corresponding to the row of data and version information of the row of data, and the device includes:
  • An obtaining module configured to acquire a data table saved in each server in the distributed database system, and read each row of data in the data table for each data table obtained;
  • a determining module configured to determine, according to each row of data read, whether a key corresponding to the row data exists in a hash table saved by itself;
  • a first processing module configured to: when the determining result of the determining module is YES, read a first version number of the row data in the data table, and determine whether the first version number is greater than the hash table The second version number corresponding to the row data; if yes, the row data is updated into the hash table, and the version information corresponding to the row data is updated;
  • a second processing module configured to: when the determining result of the determining module is negative, write the row data into the hash table, and write a key code corresponding to the row data and version information;
  • a sending module configured to send the hash table to the primary server, so that each server in the distributed database system performs data recovery according to a hash table received by the primary server.
  • the obtaining module is specifically configured to read, for each data table, each row of data in the data table by using data fragmentation.
  • the second processing module is specifically configured to write the primary key of the row data in the data table as a key corresponding to the row data in the hash table, into the hash table.
  • the device further includes:
  • a recording module configured to record, in the hash table, an appearance frequency of the row data after the first processing module or the second processing module writes the row data to the hash table.
  • the device further includes:
  • an execution module configured to: when the first processing module determines that the first version number is equal to the second version number, increase an appearance frequency of the row data in the hash table by one.
  • the device further includes:
  • a deleting module configured to determine, according to the frequency of occurrence of each row of data in the hash table, whether the frequency of occurrence of the row data is less than a predetermined threshold, if the sending module sends the hash table to the primary server, if , delete the row data.
  • the present application provides a storage medium for storing executable program code for executing a distributed database as described herein at runtime The method of updating data in the system.
  • the present application provides an application for executing a method for updating data for a distributed database system as described herein at runtime.
  • the application provides an electronic device, including:
  • processor a memory, a communication interface, and a bus
  • the processor, the memory, and the communication interface are connected by the bus and complete communication with each other;
  • the memory stores executable program code
  • the processor runs a program corresponding to the executable program code by reading executable program code stored in the memory for executing an update data for a distributed database system as described herein Methods.
  • the embodiment of the present application provides a method and apparatus for updating data for a distributed database system, which is applied to a server storing a hash table in a distributed database system, the method comprising: acquiring the distributed database system The data table saved in each server in the database, and for each data table obtained, read each row of data in the data table; for each row of data read, determine whether there is a hash table stored in itself a key corresponding to the row data; if yes, reading a first version number of the row data in the data table, determining whether the first version number is greater than a row corresponding to the row data saved in the hash table a second version number; if yes, updating the row data to the hash table, and updating version information corresponding to the row data; if not, writing the row data to the hash table, and writing the a key corresponding to the row data and version information; sending the hash table to the primary server such that each server in the distributed database system receives according to the primary server Hash table for data recovery.
  • a hash table including complete and latest data in the distributed database system can be constructed according to a data table in each server in the distributed database system, and the hash table can be sent to the primary server to Each server in the distributed database system is restored according to the hash table received by the primary server, so that the latest complete data in the distributed database system can be recovered in each server.
  • FIG. 1 is a flowchart of a method for updating data of a distributed database system according to an embodiment of the present application
  • FIG. 2 is another flowchart of a method for updating data for a distributed database system according to an embodiment of the present application
  • FIG. 3 is a schematic structural diagram of an apparatus for updating data in a distributed database system according to an embodiment of the present application
  • FIG. 4 is another schematic structural diagram of an apparatus for updating data in a distributed database system according to an embodiment of the present application.
  • the method of updating data in a distributed database system includes the following steps:
  • the method can be applied to a server in a distributed database system that maintains a hash table.
  • the server can be any server in the distributed database system.
  • a server that stores a hash table may be referred to as a target server.
  • a hash table (also called a hash table) is a data structure that is accessed directly based on a key value. That is, it accesses the record by mapping the key value to a location in the table to speed up the lookup. The key does not repeat in the hash table.
  • the hash table can find the specified key value in a limited number of steps.
  • the hash table has a constant search level complexity and is highly efficient.
  • a hash table may be built in the memory of the target server of the distributed database system to save the distributed database system in the hash table.
  • the latest complete data For example, each server in the distributed database system can be accessed by the target server, the data in the data table in each server can be obtained, and the latest complete data in the distributed database system saved in the hash table can be compared.
  • a data table is a very important object in the server. It is the basis of other objects. It is a carrier for maintaining fields, keywords, primary keys, and so on.
  • a server may contain several data tables. Each row in the data table can be called a "record", and each record contains all the information in that row.
  • Each column in the data table is called a field, and each field has corresponding description information, such as data type, data width, and so on.
  • the primary key (primary key) is one or more fields in the data table whose value is used to uniquely identify a record in the table. The primary key does not repeat in the data table.
  • a struct is a collection of data consisting of a series of data of the same type or different types, called a structure. Objects of a structure type contain these same types or different types of data.
  • the key of the hash table in the target server may be defined as the primary key of the data row in the data table, and the code value of the hash table is defined as the structure composed of all the fields of the data row.
  • the target server may first obtain the data table saved in each server in the distributed database system, and Each data table reads each row of data in the data table to save some or all of the data to a hash table to obtain the latest complete data in the distributed database system.
  • the target server may sequentially acquire the data table saved in each server in the distributed database system, and simultaneously read each row of data in the data table for each data table obtained.
  • the size of the data table may be relatively large, such as a data table may have 1 million rows of data. Therefore, reading each row of data in the data table at the same time may take a long time (for example, it takes about 40 seconds for mysql to read 1 million rows of data).
  • reading one row of data at a time when the amount of data in the data table is large, the process of reading the data can be very time consuming. For example, reading 1 million pieces of data may take about 4 minutes.
  • each row of data in the data table can be read by using data fragmentation for each data table.
  • the fragment size may be determined according to the complexity of the data table structure, and the fragment data of a fixed number of rows (such as 10,000 rows, 20,000 rows, 30,000 rows, etc.) may be read each time. Until all the data in the data table is read.
  • the target server may determine, for each row of data read, whether there is a key corresponding to the row data in the hash table saved by itself.
  • the data stored in the data table of each server in the distributed database system should be the same, therefore, for a certain row of data in a certain data table, the target server may be from the data table of other servers.
  • the row data is written to the hash table.
  • the latest complete data in the distributed database system is saved in the hash table, only the same data needs to be saved once.
  • the target server may determine, for each row of data acquired, whether the key code corresponding to the row data exists in the hash table saved by itself, to determine whether the row data is saved in the hash table. .
  • the target server may search for a key in the hash code of each row of data in the hash table saved by itself, if there is a key that is the same as the primary key of the row data in the data table, and if yes, determine that the hash table stored in the data table exists. The key corresponding to the row data, otherwise, it is determined that the key corresponding to the row data does not exist in the hash table saved by itself.
  • step S103 when a key corresponding to the row data exists in the hash table saved by itself, reading a first version number of the row data in the data table, and determining whether the first version number is greater than the hash table The second version number corresponding to the row data saved in the file; if yes, executing step S104, updating the row data into the hash table, and updating the version information corresponding to the row data; if not, not the row The data is written to the hash table.
  • the target server determines that the key corresponding to the row data exists in the hash table saved by itself, it may indicate that the row data has been saved in the hash table. However, since in the data table of each server, for any row of data, when it is updated, its primary key does not change. Therefore, it is not determined according to the key corresponding to the row data in the hash table that the row data saved in the hash table is the same as the row data in the data table.
  • the corresponding version number may be saved for each row of data in the data table of each server to identify the number of updates of the row data.
  • the version information corresponding to each row of data is also stored in the hash table.
  • the version information of each row of data saved in the hash table may be a version number of the row data in the data table when the row data is written into the hash table.
  • the target server determines that the row data has been saved in the hash table, it can further read Taking the first version number of the row data in the data table, and determining whether the first version number is greater than the second version number corresponding to the row data saved in the hash table.
  • the first version number When the first version number is greater than the second version number, it may indicate that the row data saved in the data table is the latest data. In this case, the row data may be updated into the hash table.
  • the first version number When the first version number is less than or equal to the second version number, it may indicate that the row data saved in the hash table is already the latest data. In this case, the row data may not be written into the hash table.
  • the row data may be updated into the hash table, and the version information corresponding to the row data is updated.
  • the row data in the hash table may be replaced with the row data in the data table, and the version number of the row data of the data table is used as the version number corresponding to the row data in the hash table, and updated to the hash table. in.
  • the target server determines that the key corresponding to the row data does not exist in the hash table saved by itself, it may indicate that the row data is not saved in the hash table. In this case, the row data can be written into the hash table, and the key code corresponding to the row data and the version information are written.
  • the primary key of the row data in the data table may be written into the hash table as a key corresponding to the row data in the hash table.
  • S106 Send the hash table to the primary server, so that each server in the distributed database system performs data recovery according to the hash table received by the primary server.
  • the target database can send the hash table to the primary server, so that each of the distributed database systems Server according to the main service
  • the hash table received by the device performs data recovery.
  • all servers can include one primary server and other secondary servers, and the secondary server can access the primary server to update its own data table based on the data tables in the primary server.
  • the target server can send the hash table to the primary server.
  • the main server saves the hash table, it can restore the latest data to its own data table according to the hash table.
  • the slave server can then synchronize the latest data according to the master server, so that the data stored in each server can be the latest complete data in the distributed database system.
  • the method provided by the embodiment of the present application can construct a hash table including complete latest data in the distributed database system according to the data table in each server in the distributed database system, and can send the hash table to the primary server.
  • a hash table including complete latest data in the distributed database system according to the data table in each server in the distributed database system, and can send the hash table to the primary server.
  • the data stored in the data table of each server in the distributed database system is data actively updated by the user.
  • some abnormal data may be added to the data table.
  • the data may only be stored in the data table of this or several servers, and if the data is written to the hash table of the target server, the data will be further updated to each server. In the data sheet. Therefore, when performing data query and other processing, the data query result may be inaccurate.
  • the target server in order to ensure the accuracy of the data in the data table of each server, and avoid writing abnormal data to each server, the target server writes one row of data in the data table of any server into the hash table. Thereafter, the frequency of occurrence of the row data can be recorded in the hash table. For example, the appearance frequency of the row data can be recorded as 1 to indicate that the row data appears once in the data table.
  • the appearance frequency of the row data may be modified according to whether the row data exists in the data tables of the other servers, to mark the number of occurrences of the row data in each server, and further, whether the row data is abnormal data may be determined.
  • a method for updating data of a distributed database system provided by an embodiment of the present application, after step S103, when it is determined that the first version number is equal to the second version number, the method further Can include:
  • the target server when it is determined that the first version number of the row data in the data table is equal to the second version number corresponding to the row data in the hash table for each row of data, the saved in the hash table may be indicated.
  • the row data is the same as the row data in the data table. Therefore, the target server can increase the frequency of occurrence of the row data in the hash table by one.
  • the frequency of occurrence of each row of data in the hash table is the row of data in each server in the distributed database system. The total number of occurrences in .
  • the method further Can include:
  • S108 Determine, according to the frequency of occurrence of each row of data in the hash table, whether an appearance frequency of the row data is less than a predetermined threshold, and if yes, delete the row data.
  • the target server may determine the frequency of occurrence of each row of data in the hash table. Whether the appearance frequency of the line data is less than a predetermined threshold, and if so, delete the line data.
  • the predetermined thresholds may be set to the same value, such as 2, 3, 4, and the like.
  • the predetermined threshold may be determined based on the total number of servers in the system. For example, when the total number of servers is large, the predetermined threshold may be set to a larger value (such as 3, 4, 5, etc.); when the total number of servers is small, the predetermined threshold may be set to a smaller value (such as 1, 2, 3, etc.).
  • the frequency of occurrence of the row data may be recorded in the hash table, and may be based on whether the data table of the other server exists.
  • the row data to modify the frequency of occurrence of the row data, send the hash table to Before the main server, data having a frequency less than a predetermined threshold can be deleted. Therefore, it is possible to avoid writing abnormal data to each server, and further, it is possible to improve the accuracy of the data processing result.
  • the embodiment of the present application also provides a corresponding device embodiment.
  • 3 is a device for updating data of a distributed database system, which is applied to a server in a distributed database system, where a hash table is stored, where the hash table is in a data table.
  • Each row of data holds a key corresponding to the row of data and version information of the row of data, and the device includes:
  • the obtaining module 310 is configured to acquire a data table saved in each server in the distributed database system, and read each row of data in the data table for each data table obtained;
  • the determining module 320 is configured to determine, for each row of data read, whether a key corresponding to the row data exists in the hash table saved by itself;
  • the first processing module 330 is configured to: when the determining result of the determining module is YES, read the first version number of the row data in the data table, and determine whether the first version number is greater than the hash table. The second version number corresponding to the saved row data; if yes, the row data is updated into the hash table, and the version information corresponding to the row data is updated;
  • the second processing module 340 is configured to: when the determining result of the determining module is negative, write the row data into the hash table, and write a key code corresponding to the row data and version information;
  • the sending module 350 is configured to send the hash table to the primary server, so that each server in the distributed database system performs data recovery according to the hash table received by the primary server.
  • the device provided by the embodiment of the present application can construct a hash table including the latest latest data in the distributed database system according to the data table in each server in the distributed database system, and can send the hash table to the primary server.
  • a hash table including the latest latest data in the distributed database system according to the data table in each server in the distributed database system, and can send the hash table to the primary server.
  • the obtaining module 310 is specifically configured to read, for each data table, each row of data in the data table by using data fragmentation.
  • the second processing module 340 is specifically configured to write the primary key of the row data in the data table as a key corresponding to the row data in the hash table, into the hash table.
  • the device further includes:
  • a recording module (not shown) for recording the row data in the hash table after the first processing module 330 or the second processing module 340 writes the row data to the hash table The frequency of occurrence.
  • an apparatus for updating data for a distributed database system provided by an embodiment of the present application further includes:
  • the executing module 360 is configured to: when the first processing module 330 determines that the first version number is equal to the second version number, increase the frequency of occurrence of the row data in the hash table by one.
  • the device further includes:
  • the deleting module 370 is configured to determine, according to the frequency of occurrence of each row of data in the hash table, whether the frequency of occurrence of the row data is less than a predetermined threshold before the sending module 350 sends the hash table to the primary server. If yes, delete the row data.
  • the frequency of occurrence of the row data may be recorded in the hash table, and may be based on whether the data table of the other server exists.
  • the row data is used to modify the frequency of occurrence of the row data, and the data having the frequency less than the predetermined threshold may be deleted before the hash table is sent to the main server, thereby avoiding writing abnormal data to each server, further Can improve the accuracy of data processing results.
  • the present application further provides a storage medium, wherein the storage medium is used to store executable program code for executing a distributed database according to the present application at runtime.
  • the method of updating data in the system which is applied to a server with a hash table in a distributed database system, wherein the hash table is for each row in the data table.
  • the data is saved with a key corresponding to the row data and version information of the row data, and the method includes:
  • the row data is written into the hash table, and the key code corresponding to the row data and the version information are written;
  • the hash table is sent to the primary server such that each server in the distributed database system performs data recovery according to a hash table received by the primary server.
  • a hash table including complete and latest data in the distributed database system can be constructed according to a data table in each server in the distributed database system, and the hash table can be sent to the primary server to Each server in the distributed database system is restored according to the hash table received by the primary server, so that the latest complete data in the distributed database system can be recovered in each server.
  • the present application also provides an application for executing a method for updating data of a distributed database system described herein at runtime.
  • the method for updating data for a distributed database system which is applied to a server with a hash table in a distributed database system, wherein the hash table is for each row in the data table.
  • the data is saved with a key corresponding to the row data and version information of the row data, and the method includes:
  • the row data is written into the hash table, and the key code corresponding to the row data and the version information are written;
  • the hash table is sent to the primary server such that each server in the distributed database system performs data recovery according to a hash table received by the primary server.
  • a hash table including complete and latest data in the distributed database system can be constructed according to a data table in each server in the distributed database system, and the hash table can be sent to the primary server to Each server in the distributed database system is restored according to the hash table received by the primary server, so that the latest complete data in the distributed database system can be recovered in each server.
  • an electronic device including:
  • processor a memory, a communication interface, and a bus
  • the processor, the memory, and the communication interface are connected by the bus and complete communication with each other;
  • the memory stores executable program code
  • the processor runs a program corresponding to the executable program code by reading executable program code stored in the memory for executing an update data for a distributed database system as described herein Methods.
  • the method for updating data for a distributed database system which is applied to a server with a hash table in a distributed database system, wherein the hash table is for each row in the data table.
  • the data is saved with a key corresponding to the row data and version information of the row data, and the method includes:
  • the row data is written into the hash table, and the key code corresponding to the row data and the version information are written;
  • the hash table is sent to the primary server such that each server in the distributed database system performs data recovery according to a hash table received by the primary server.
  • a hash table including complete and latest data in the distributed database system can be constructed according to a data table in each server in the distributed database system, and the hash table can be sent to the primary server to Each server in the distributed database system is restored according to the hash table received by the primary server, so that the latest complete data in the distributed database system can be recovered in each server.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种用于分布式数据库系统的更新数据的方法及装置,所述方法包括:获取分布式数据库系统中的每台服务器中保存的数据表,并针对获取的每个数据表,读取该数据表中的每一行数据(S101);针对所读取的每一行数据,判断自身保存的哈希表中是否存在该行数据对应的关键码(S102);如果否,将该行数据写入所述哈希表中,并写入该行数据对应的关键码以及版本信息(S105);如果是,读取所述数据表中该行数据的第一版本号,判断所述第一版本号是否大于所述哈希表中保存的该行数据对应的第二版本号(S103);如果是,将该行数据更新到哈希表中,并更新该行数据对应的版本信息(S104);将所述哈希表发送到主服务器,以使分布式数据库系统中的每台服务器根据所述主服务器接收的哈希表进行数据恢复(S106)。所述方法能够恢复分布式数据库系统中最新的数据。

Description

一种用于分布式数据库系统的更新数据的方法及装置
本申请要求于2016年3月30日提交中国专利局、申请号为201610191763.8发明名称为“一种用于分布式数据库系统的更新数据的方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及数据处理技术领域,特别是涉及一种用于分布式数据库系统的更新数据的方法及装置。
背景技术
随着计算机技术的普及,很多重要数据都是保存在电子设备中的。人们对电子设备的使用越多,对性能的要求也越高。分布式数据库系统的出现,可以有效分散数据库服务对于单台计算机设备的压力,提高了数据库服务的整体性能,也带来了更好的数据安全性保证。
分布式数据库系统,通过组建多台数据库服务器,实现数据库系统的整体读写性能的提升,为高并发读写数据库应用提供了技术支撑,在大型交互式网站、银行等后台都得到了广泛的应用。
既然分布式数据库系统的每台数据库服务器都保存了数据,就需要保证各数据库服务器中保存的数据的一致性。然而,在出现系统异常、异常掉电等故障的时候,就可能出现各分布式服务器中数据不一致的情况。因此,需要从分布式数据库系统中恢复出最新的完整的数据。
发明内容
本申请实施例的目的在于提供一种用于分布式数据库系统的更新数据的方法及装置,以恢复分布式数据库系统中最新的数据。具体技术方案如下:
第一方面,本申请实施例提供了一种用于分布式数据库系统的更新数据的方法,应用于分布式数据库系统中的维护有哈希表的服务器,其中所述哈希表中针对数据表中每一行数据保存有该行数据对应的关键码以及该行数据的版本信息,所述方法包括:
获取所述分布式数据库系统中的每台服务器中保存的数据表,并针对获取的每个数据表,读取该数据表中的每一行数据;
针对所读取的每一行数据,判断自身保存的哈希表中是否存在该行数据对应的关键码;
如果是,读取所述数据表中该行数据的第一版本号,判断所述第一版本号是否大于所述哈希表中保存的该行数据对应的第二版本号;如果是,将该行数据更新到所述哈希表中,并更新该行数据对应的版本信息;
如果否,将该行数据写入所述哈希表中,并写入该行数据对应的关键码以及版本信息;
将所述哈希表发送到主服务器,以使所述分布式数据库系统中的每台服务器根据所述主服务器接收的哈希表进行数据恢复。
进一步地,所述针对获取的每个数据表,读取该数据表中的每一行数据,包括:
针对每个数据表,采用数据分片的方式,读取该数据表中的每一行数据。
进一步地,所述写入该行数据对应的关键码,包括:
将数据表中该行数据的主键作为哈希表中该行数据对应的关键码,写入到所述哈希表中。
进一步地,所述将该行数据写入所述哈希表之后,所述方法还包括:
在所述哈希表中记录该行数据的出现频次。
进一步地,当判断所述第一版本号等于所述第二版本号时,所述方法还包括:
将所述哈希表中该行数据的出现频次加1。
进一步地,所述将所述哈希表发送到主服务器之前,所述方法还包括:
针对所述哈希表中每行数据的出现频次,判断该行数据的出现频次是否小于预定阈值,如果是,删除该行数据。
第二方面,本申请实施例提供了一种用于分布式数据库系统的更新数据的装置,应用于分布式数据库系统中的存储有哈希表的服务器,其中所述哈希表中针对数据表中每一行数据保存有该行数据对应的关键码以及该行数据的版本信息,所述装置包括:
获取模块,用于获取所述分布式数据库系统中的每台服务器中保存的数据表,并针对获取的每个数据表,读取该数据表中的每一行数据;
判断模块,用于针对所读取的每一行数据,判断自身保存的哈希表中是否存在该行数据对应的关键码;
第一处理模块,用于当所述判断模块判断结果为是时,读取所述数据表中该行数据的第一版本号,判断所述第一版本号是否大于所述哈希表中保存的该行数据对应的第二版本号;如果是,将该行数据更新到所述哈希表中,并更新该行数据对应的版本信息;
第二处理模块,用于当所述判断模块判断结果为否时,将该行数据写入所述哈希表中,并写入该行数据对应的关键码以及版本信息;
发送模块,用于将所述哈希表发送到主服务器,以使所述分布式数据库系统中的每台服务器根据所述主服务器接收的哈希表进行数据恢复。
进一步地,所述获取模块,具体用于针对每个数据表,采用数据分片的方式,读取该数据表中的每一行数据。
进一步地,所述第二处理模块,具体用于将数据表中该行数据的主键作为哈希表中该行数据对应的关键码,写入到所述哈希表中。
进一步地,所述装置还包括:
记录模块,用于在所述第一处理模块或第二处理模块将该行数据写入所述哈希表之后,在所述哈希表中记录该行数据的出现频次。
进一步地,所述装置还包括:
执行模块,用于当所述第一处理模块判断所述第一版本号等于所述第二版本号时,将所述哈希表中该行数据的出现频次加1。
进一步地,所述装置还包括:
删除模块,用于在所述发送模块将所述哈希表发送到主服务器之前,针对所述哈希表中每行数据的出现频次,判断该行数据的出现频次是否小于预定阈值,如果是,删除该行数据。
第三方面,本申请提供了一种存储介质,其中,该存储介质用于存储可执行程序代码,所述可执行程序代码用于在运行时执行本申请所述的一种用于分布式数据库系统的更新数据的方法。
第四方面,本申请提供了一种应用程序,其中,该应用程序用于在运行时执行本申请所述的一种用于分布式数据库系统的更新数据的方法。
第五方面,本申请提供了一种电子设备,包括:
处理器、存储器、通信接口和总线;
所述处理器、所述存储器和所述通信接口通过所述总线连接并完成相互间的通信;
所述存储器存储可执行程序代码;
所述处理器通过读取所述存储器中存储的可执行程序代码来运行与所述可执行程序代码对应的程序,以用于执行本申请所述的一种用于分布式数据库系统的更新数据的方法。
本申请实施例提供了一种用于分布式数据库系统的更新数据的方法及装置,应用于分布式数据库系统中的存储有哈希表的服务器,所述方法包括:获取所述分布式数据库系统中的每台服务器中保存的数据表,并针对获取的每个数据表,读取该数据表中的每一行数据;针对所读取的每一行数据,判断自身保存的哈希表中是否存在该行数据对应的关键码;如果是,读取所述数据表中该行数据的第一版本号,判断所述第一版本号是否大于所述哈希表中保存的该行数据对应的第二版本号;如果是,将该行数据更新到所述哈希表中,并更新该行数据对应的版本信息;如果否,将该行数据写入所述哈希表中,并写入该行数据对应的关键码以及版本信息;将所述哈希表发送到主服务器,以使所述分布式数据库系统中的每台服务器根据所述主服务器接收 的哈希表进行数据恢复。本申请实施例中,能够根据分布式数据库系统中各服务器中的数据表构建包含该分布式数据库系统中完整的最新的数据的哈希表,并且可以将该哈希表发送到主服务器,以使分布式数据库系统中的每台服务器根据主服务器接收的哈希表进行数据恢复,因此,能够在各服务器中恢复该分布式数据库系统中最新的完整的数据。
附图说明
为了更清楚地说明本申请实施例和现有技术的技术方案,下面对实施例和现有技术中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本申请实施例提供的一种用于分布式数据库系统的更新数据的方法的流程图;
图2为本申请实施例提供的一种用于分布式数据库系统的更新数据的方法的另一流程图;
图3为本申请实施例提供的一种用于分布式数据库系统的更新数据的装置的结构示意图;
图4为本申请实施例提供的一种用于分布式数据库系统的更新数据的装置的另一结构示意图。
具体实施方式
为了使本领域技术人员更好地理解本申请实施例中的技术方案,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本申请。
为了恢复分布式数据库系统中最新的数据,本申请实施例提供了一种用 于分布式数据库系统的更新数据的方法过程,如图1所示,该过程包括以下步骤:
S101,获取所述分布式数据库系统中的每台服务器中保存的数据表,并针对获取的每个数据表,读取该数据表中的每一行数据。
该方法可以应用于分布式数据库系统中的维护有哈希表的服务器。其中,该服务器可以是分布式数据库系统中的任一服务器。为了便于描述,在本申请实施例中,可以将保存有哈希表的服务器称为目标服务器。
哈希表(Hash table,也叫散列表),是根据关键码值(Key value)而直接进行访问的数据结构。也就是说,它通过把关键码值映射到表中一个位置来访问记录,以加快查找的速度。关键码在哈希表中不会重复存在。哈希表可以在有限步骤内查找到指定的关键码值,哈希表的查找时间复杂度为常数级别,查找效率很高。
在本申请实施例中,为了恢复分布式数据库系统中最新的数据,可以在分布式数据库系统的目标服务器的内存中构建一个哈希表,以在该哈希表中保存该分布式数据库系统中最新的完整的数据。例如,可以通过目标服务器访问分布式数据库系统中的每一台服务器,获取各服务器中的数据表中的数据,并对比得到哈希表中保存的该分布式数据库系统中最新的完整的数据。
数据表是服务器中一个非常重要的对象,是其他对象的基础,是维护字段、关键字、主键等的载体。根据信息的分类情况,一个服务器中可能包含若干个数据表。数据表中的每一行可以叫做一条“记录”,每一个记录包含这行中的所有信息。数据表中的每一列称为一个字段,每个字段都有相应的描述信息,如数据类型、数据宽度等。主键(primary key,主关键字)是数据表中一个或多个字段,它的值用于唯一地标识表中的某一条记录,主键在数据表中不会重复存在。结构体(struct)是由一系列具有相同类型或不同类型的数据构成的数据集合,叫做结构,结构体类型的对象包含了这些相同类型或者不同类型的数据。
在本申请实施例中,可以将目标服务器中的哈希表的关键码定义为数据表中数据行的主键,哈希表的码值定义为数据行所有字段组成的结构体。
在本申请实施例中,为了在哈希表中保存分布式数据库系统中最新的完整的数据,目标服务器可以首先获取该分布式数据库系统中的每台服务器中保存的数据表,并针对获取的每个数据表,读取该数据表中的每一行数据,以将这些数据中的部分或全部保存到哈希表中,得到该分布式数据库系统中最新的完整的数据。
具体地,目标服务器可以依次获取分布式数据库系统中的每台服务器中保存的数据表,并针对获取的每个数据表,同时读取该数据表中的每一行数据。
可选地,在实际应用中,数据表的规模可能比较大,如一个数据表可能有100万行数据。因此,同时读取数据表中每一行数据,可能会耗费较长的时间(如,mysql读取完成100万行数据大概需要40秒)。而一次读取一行数据,则数据表数据量大的时候,读取数据的流程会非常耗时,如读取100万条数据可能需要4分钟左右。
因此,在本申请实施例中,可以针对每个数据表,采用数据分片的方式,读取该数据表中的每一行数据。
在数据表数据量较大时候,一次全部读取数据表的数据会变得不可能完成。采用一些读取的算法,使得每次读取数据表的部分数据,经过多次读取,最终读取到数据表的完整数据,叫做数据分片。
具体地,可以针对每个数据表,根据该数据表结构的复杂程度确定分片大小,每次读取固定数量行数(如1万行、2万行、3万行等)的分片数据,直到全部读取完该数据表中的数据。
通过采用数据分片的方式来读取各数据表中的数据,能够提高数据表中的数据读取效率。
S102,针对所读取的每一行数据,判断自身保存的哈希表中是否存在该行数据对应的关键码;如果是,执行步骤S103;如果否,执行步骤S105。
当读取到每个数据表中的每一行数据后,目标服务器可以针对所读取的每一行数据,判断自身保存的哈希表中是否存在该行数据对应的关键码。
可以理解,在正常情况下,分布式数据库系统中每台服务器的数据表中保存的数据应该是相同的,因此,对于某一个数据表中的某一行数据,目标服务器可能从其他服务器的数据表中将该行数据写入哈希表中。而在哈希表中保存分布式数据库系统中最新的完整的数据时,对于相同的数据,只需要保存一次。
因此,在本申请实施例中,目标服务器针对获取的每一行数据,可以判断自身保存的哈希表中是否存在该行数据对应的关键码,以确定哈希表中是否已保存有该行数据。
具体地,目标服务器可以在自身保存的哈希表中各行数据的关键码中,查找是否存在与数据表中该行数据的主键相同的关键码,若是,则确定自身保存的哈希表中存在该行数据对应的关键码,否则,确定自身保存的哈希表中不存在该行数据对应的关键码。
S103,当自身保存的哈希表中存在该行数据对应的关键码时,读取所述数据表中该行数据的第一版本号,判断所述第一版本号是否大于所述哈希表中保存的该行数据对应的第二版本号;如果是,执行步骤S104,将该行数据更新到所述哈希表中,并更新该行数据对应的版本信息;如果否,不将该行数据写入哈希表中。
当目标服务器确定自身保存的哈希表中存在该行数据对应的关键码时,可以表明哈希表中已经保存有该行数据。但是,由于在各服务器的数据表中,对于任一行数据,当对其进行更新时,其主键不会改变。因此,仅根据哈希表中存在该行数据对应的关键码不能确定哈希表中保存的该行数据与数据表中的该行数据相同。
因此,在本申请实施例中,可以在各服务器的数据表中,针对每一行数据保存其对应的版本号,以用来标识该行数据的更新次数。并且,在哈希表中也保存有每行数据对应的版本信息。具体地,在哈希表中保存的每行数据的版本信息可以为将该行数据写入哈希表时,该行数据在数据表中的版本号。
当目标服务器确定哈希表中已经保存有该行数据时,可以进一步读 取数据表中该行数据的第一版本号,并判断该第一版本号是否大于哈希表中保存的该行数据对应的第二版本号。
当第一版本号大于第二版本号时,可以表明数据表中保存的该行数据为最新的数据,这种情况下,可以将该行数据更新到哈希表中。
当第一版本号小于或等于第二版本号时,可以表明哈希表中保存的该行数据已经是最新的数据,这种情况下,可以不将该行数据写入哈希表中。
S104,将该行数据更新到所述哈希表中,并更新该行数据对应的版本信息。
当目标服务器确定第一版本号大于第二版本号时,可以将该行数据更新到哈希表中,并更新该行数据对应的版本信息。
具体地,可以将哈希表中的该行数据替换为数据表中该行数据,并将数据表该行数据的版本号作为哈希表中该行数据对应的版本号,更新到哈希表中。
S105,当自身保存的哈希表中不存在该行数据对应的关键码时,将该行数据写入所述哈希表中,并写入该行数据对应的关键码以及版本信息。
当目标服务器确定自身保存的哈希表中不存在该行数据对应的关键码时,可以表明哈希表中未保存该行数据。这种情况下,可以将该行数据写入哈希表中,并写入该行数据对应的关键码以及版本信息。
具体地,在写入该行数据对应的关键码时,可以将数据表中该行数据的主键作为哈希表中该行数据对应的关键码,写入到哈希表中。
S106,将所述哈希表发送到主服务器,以使所述分布式数据库系统中的每台服务器根据所述主服务器接收的哈希表进行数据恢复。
通过执行上述步骤S101-S105,目标服务器的哈希表中即可保存分布式数据库系统中最新的完整的数据。最后,为了使各个服务器中均保存该最新的完整的数据,以保证各服务器中数据的一致性,目标数据库可以将哈希表发送到主服务器,以使所述分布式数据库系统中的每台服务器根据所述主服务 器接收的哈希表进行数据恢复。
在分布式数据库系统中,所有的服务器可以包括一个主服务器以及其他从服务器,并且,从服务器可以访问主服务器,以根据主服务器中的数据表更新其自身的数据表。
因此,在本实施例中,目标服务器可以将哈希表发送给主服务器。主服务器保存哈希表后,可以根据哈希表将最新数据恢复到自身的数据表中。然后从服务器可以根据主服务器来同步最新数据,从而可以实现每一台服务器中保存的数据都为该分布式数据库系统中最新的完整的数据。
本申请实施例提供的方法,能够根据分布式数据库系统中各服务器中的数据表构建包含该分布式数据库系统中完整的最新数据的哈希表,并且可以将该哈希表发送到主服务器,以使分布式数据库系统中的每台服务器根据主服务器接收的哈希表进行数据恢复,因此,能够在各服务器中恢复该分布式数据库系统中最新的数据。
更进一步地,在实际应用中,正常情况下,分布式数据库系统中各服务器的数据表保存的数据均为用户主动更新的数据。但是,当某一台或几台服务器系统异常或出现安全故障时,其数据表中可能会新增一些异常数据。这种情况下,该数据可能只保存在这一台或几台服务器的数据表中,而如果将该数据写入目标服务器的哈希表中,该数据就会进一步被更新到每一台服务器的数据表中。因此,在进行数据查询等处理时,可能导致数据查询结果不准确。
在本申请实施例中,为了保证各服务器的数据表中数据的准确性,避免将异常数据写入每一台服务器,目标服务器在将任一服务器的数据表中的一行数据写入哈希表之后,可以在哈希表中记录该行数据的出现频次,例如,可以将该行数据的出现频次记为1,以表明该行数据在数据表中出现了一次。
并且,可以根据其他各服务器的数据表中是否存在该行数据,来修改该行数据的出现频次,以标记该行数据在各服务器中出现的次数,进一步可以判断该行数据是否为异常数据。
因此,如图2所示,本申请实施例提供的一种用于分布式数据库系统的更新数据的方法,在步骤S103之后,当判断得到第一版本号等于第二版本号时,该方法还可以包括:
S107,将所述哈希表中该行数据的出现频次加1。
在本申请实施例中,当针对每一行数据,判断数据表中该行数据的第一版本号等于哈希表中该行数据对应的第二版本号时,可以表明哈希表中保存的该行数据与数据表中的该行数据相同。因此,目标服务器可以将哈希表中该行数据的出现频次加1。
当针对每台服务器的数据表中的每一行数据,都与哈希表中保存的数据进行对比之后,哈希表中每行数据的出现频次即为该行数据在分布式数据库系统中各服务器中出现的总次数。
本申请实施例提供的方法,在步骤S104、步骤S105或步骤S107之后,步骤S106之前,即将各数据表中的数据写入哈希表后,将哈希表发送到主服务器之前,该方法还可以包括:
S108,针对所述哈希表中每行数据的出现频次,判断该行数据的出现频次是否小于预定阈值,如果是,删除该行数据。
在本实施例中,在将哈希表发送给主服务器之前,为了避免将异常数据更新到每一台服务器的数据表中,目标服务器可以针对哈希表中每行数据的出现频次,判断该行数据的出现频次是否小于预定阈值,如果是,删除该行数据。
可选地,针对不同的分布式数据库系统,上述预定阈值可以设置为相同的值,如2、3、4等。或者,针对不同的分布式数据库系统,可以根据该系统中的总服务器数量,来确定上述预定阈值。如,当总服务器数量较大时,可以将该预定阈值设置为较大的数值(如3、4、5等);当总服务器数量较小时,可以将该预定阈值设置为较小的数值(如1、2、3等)。
本方案中,在将任一服务器的数据表中的一行数据写入哈希表之后,可以在哈希表中记录该行数据的出现频次,并且,可以根据其他服务器的数据表中的是否存在该行数据,来修改该行数据的出现频次,在将哈希表发送给 主服务器之前,可以将出现频次小于预定阈值的数据删除,因此,能够避免将异常数据写入各服务器中,进一步地,能够提高数据处理结果的准确性。
相应于上面的方法实施例,本申请实施例还提供了相应的装置实施例。
图3为本申请实施例提供的一种用于分布式数据库系统的更新数据的装置,应用于分布式数据库系统中的存储有哈希表的服务器,其中所述哈希表中针对数据表中每一行数据保存有该行数据对应的关键码以及该行数据的版本信息,所述装置包括:
获取模块310,用于获取所述分布式数据库系统中的每台服务器中保存的数据表,并针对获取的每个数据表,读取该数据表中的每一行数据;
判断模块320,用于针对所读取的每一行数据,判断自身保存的哈希表中是否存在该行数据对应的关键码;
第一处理模块330,用于当所述判断模块判断结果为是时,读取所述数据表中该行数据的第一版本号,判断所述第一版本号是否大于所述哈希表中保存的该行数据对应的第二版本号;如果是,将该行数据更新到所述哈希表中,并更新该行数据对应的版本信息;
第二处理模块340,用于当所述判断模块判断结果为否时,将该行数据写入所述哈希表中,并写入该行数据对应的关键码以及版本信息;
发送模块350,用于将所述哈希表发送到主服务器,以使所述分布式数据库系统中的每台服务器根据所述主服务器接收的哈希表进行数据恢复。
本申请实施例提供的装置,能够根据分布式数据库系统中各服务器中的数据表构建包含该分布式数据库系统中完整的最新数据的哈希表,并且可以将该哈希表发送到主服务器,以使分布式数据库系统中的每台服务器根据主服务器接收的哈希表进行数据恢复,因此,能够在各服务器中恢复该分布式数据库系统中最新的数据。
进一步地,所述获取模块310,具体用于针对每个数据表,采用数据分片的方式,读取该数据表中的每一行数据。
进一步地,所述第二处理模块340,具体用于将数据表中该行数据的主键作为哈希表中该行数据对应的关键码,写入到所述哈希表中。
进一步地,所述装置还包括:
记录模块(图中未示出),用于在所述第一处理模块330或第二处理模块340将该行数据写入所述哈希表之后,在所述哈希表中记录该行数据的出现频次。
进一步地,如图4所示,本申请实施例提供的一种用于分布式数据库系统的更新数据的装置还包括:
执行模块360,用于当所述第一处理模块330判断所述第一版本号等于所述第二版本号时,将所述哈希表中该行数据的出现频次加1。
进一步地,所述装置还包括:
删除模块370,用于在所述发送模块350将所述哈希表发送到主服务器之前,针对所述哈希表中每行数据的出现频次,判断该行数据的出现频次是否小于预定阈值,如果是,删除该行数据。
本方案中,在将任一服务器的数据表中的一行数据写入哈希表之后,可以在哈希表中记录该行数据的出现频次,并且,可以根据其他服务器的数据表中的是否存在该行数据,来修改该行数据的出现频次,在将哈希表发送给主服务器之前,可以将出现频次小于预定阈值的数据删除,因此,能够避免将异常数据写入各服务器中,进一步地,能够提高数据处理结果的准确性。
相应的,本申请还提供了一种存储介质,其中,该存储介质用于存储可执行程序代码,所述可执行程序代码用于在运行时执行本申请所述的一种用于分布式数据库系统的更新数据的方法。其中,本申请所述的一种用于分布式数据库系统的更新数据的方法,应用于分布式数据库系统中的维护有哈希表的服务器,其中所述哈希表中针对数据表中每一行数据保存有该行数据对应的关键码以及该行数据的版本信息,所述方法包括:
获取所述分布式数据库系统中的每台服务器中保存的数据表,并针对获 取的每个数据表,读取该数据表中的每一行数据;
针对所读取的每一行数据,判断自身保存的哈希表中是否存在该行数据对应的关键码;
如果是,读取所述数据表中该行数据的第一版本号,判断所述第一版本号是否大于所述哈希表中保存的该行数据对应的第二版本号;如果是,将该行数据更新到所述哈希表中,并更新该行数据对应的版本信息;
如果否,将该行数据写入所述哈希表中,并写入该行数据对应的关键码以及版本信息;
将所述哈希表发送到主服务器,以使所述分布式数据库系统中的每台服务器根据所述主服务器接收的哈希表进行数据恢复。
本申请实施例中,能够根据分布式数据库系统中各服务器中的数据表构建包含该分布式数据库系统中完整的最新的数据的哈希表,并且可以将该哈希表发送到主服务器,以使分布式数据库系统中的每台服务器根据主服务器接收的哈希表进行数据恢复,因此,能够在各服务器中恢复该分布式数据库系统中最新的完整的数据。
相应的,本申请还提供了一种应用程序,其中,该应用程序用于在运行时执行本申请所述的一种用于分布式数据库系统的更新数据的方法。其中,本申请所述的一种用于分布式数据库系统的更新数据的方法,应用于分布式数据库系统中的维护有哈希表的服务器,其中所述哈希表中针对数据表中每一行数据保存有该行数据对应的关键码以及该行数据的版本信息,所述方法包括:
获取所述分布式数据库系统中的每台服务器中保存的数据表,并针对获取的每个数据表,读取该数据表中的每一行数据;
针对所读取的每一行数据,判断自身保存的哈希表中是否存在该行数据对应的关键码;
如果是,读取所述数据表中该行数据的第一版本号,判断所述第一版本 号是否大于所述哈希表中保存的该行数据对应的第二版本号;如果是,将该行数据更新到所述哈希表中,并更新该行数据对应的版本信息;
如果否,将该行数据写入所述哈希表中,并写入该行数据对应的关键码以及版本信息;
将所述哈希表发送到主服务器,以使所述分布式数据库系统中的每台服务器根据所述主服务器接收的哈希表进行数据恢复。
本申请实施例中,能够根据分布式数据库系统中各服务器中的数据表构建包含该分布式数据库系统中完整的最新的数据的哈希表,并且可以将该哈希表发送到主服务器,以使分布式数据库系统中的每台服务器根据主服务器接收的哈希表进行数据恢复,因此,能够在各服务器中恢复该分布式数据库系统中最新的完整的数据。
相应的,本申请提供了一种电子设备,包括:
处理器、存储器、通信接口和总线;
所述处理器、所述存储器和所述通信接口通过所述总线连接并完成相互间的通信;
所述存储器存储可执行程序代码;
所述处理器通过读取所述存储器中存储的可执行程序代码来运行与所述可执行程序代码对应的程序,以用于执行本申请所述的一种用于分布式数据库系统的更新数据的方法。其中,本申请所述的一种用于分布式数据库系统的更新数据的方法,应用于分布式数据库系统中的维护有哈希表的服务器,其中所述哈希表中针对数据表中每一行数据保存有该行数据对应的关键码以及该行数据的版本信息,所述方法包括:
获取所述分布式数据库系统中的每台服务器中保存的数据表,并针对获取的每个数据表,读取该数据表中的每一行数据;
针对所读取的每一行数据,判断自身保存的哈希表中是否存在该行数据对应的关键码;
如果是,读取所述数据表中该行数据的第一版本号,判断所述第一版本号是否大于所述哈希表中保存的该行数据对应的第二版本号;如果是,将该行数据更新到所述哈希表中,并更新该行数据对应的版本信息;
如果否,将该行数据写入所述哈希表中,并写入该行数据对应的关键码以及版本信息;
将所述哈希表发送到主服务器,以使所述分布式数据库系统中的每台服务器根据所述主服务器接收的哈希表进行数据恢复。
本申请实施例中,能够根据分布式数据库系统中各服务器中的数据表构建包含该分布式数据库系统中完整的最新的数据的哈希表,并且可以将该哈希表发送到主服务器,以使分布式数据库系统中的每台服务器根据主服务器接收的哈希表进行数据恢复,因此,能够在各服务器中恢复该分布式数据库系统中最新的完整的数据。
对于装置/存储介质/应用程序/电子设备实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。
本说明书中的各个实施例均采用相关的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于装置实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
本领域普通技术人员可以理解实现上述方法实施方式中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,所述的程序可以存储于计算机可读取存储介质中,这里所称得的存储介质,如:ROM/RAM、磁碟、光盘等。
以上所述仅为本申请的较佳实施例而已,并非用于限定本申请的保护范围。凡在本申请的精神和原则之内所作的任何修改、等同替换、改进等,均包含在本申请的保护范围内。

Claims (15)

  1. 一种用于分布式数据库系统的更新数据的方法,其特征在于,应用于分布式数据库系统中的维护有哈希表的服务器,其中所述哈希表中针对数据表中每一行数据保存有该行数据对应的关键码以及该行数据的版本信息,所述方法包括:
    获取所述分布式数据库系统中的每台服务器中保存的数据表,并针对获取的每个数据表,读取该数据表中的每一行数据;
    针对所读取的每一行数据,判断自身保存的哈希表中是否存在该行数据对应的关键码;
    如果是,读取所述数据表中该行数据的第一版本号,判断所述第一版本号是否大于所述哈希表中保存的该行数据对应的第二版本号;如果是,将该行数据更新到所述哈希表中,并更新该行数据对应的版本信息;
    如果否,将该行数据写入所述哈希表中,并写入该行数据对应的关键码以及版本信息;
    将所述哈希表发送到主服务器,以使所述分布式数据库系统中的每台服务器根据所述主服务器接收的哈希表进行数据恢复。
  2. 根据权利要求1所述的方法,其特征在于,所述针对获取的每个数据表,读取该数据表中的每一行数据,包括:
    针对每个数据表,采用数据分片的方式,读取该数据表中的每一行数据。
  3. 根据权利要求1所述的方法,其特征在于,所述写入该行数据对应的关键码,包括:
    将数据表中该行数据的主键作为哈希表中该行数据对应的关键码,写入到所述哈希表中。
  4. 根据权利要求1-3任一所述的方法,其特征在于,所述将该行数据写入所述哈希表之后,所述方法还包括:
    在所述哈希表中记录该行数据的出现频次。
  5. 根据权利要求4所述的方法,其特征在于,当判断所述第一版本号等于所述第二版本号时,所述方法还包括:
    将所述哈希表中该行数据的出现频次加1。
  6. 根据权利要求5所述的方法,其特征在于,所述将所述哈希表发送到主服务器之前,所述方法还包括:
    针对所述哈希表中每行数据的出现频次,判断该行数据的出现频次是否小于预定阈值,如果是,删除该行数据。
  7. 一种用于分布式数据库系统的更新数据的装置,其特征在于,应用于分布式数据库系统中的存储有哈希表的服务器,其中所述哈希表中针对数据表中每一行数据保存有该行数据对应的关键码以及该行数据的版本信息,所述装置包括:
    获取模块,用于获取所述分布式数据库系统中的每台服务器中保存的数据表,并针对获取的每个数据表,读取该数据表中的每一行数据;
    判断模块,用于针对所读取的每一行数据,判断自身保存的哈希表中是否存在该行数据对应的关键码;
    第一处理模块,用于当所述判断模块判断结果为是时,读取所述数据表中该行数据的第一版本号,判断所述第一版本号是否大于所述哈希表中保存的该行数据对应的第二版本号;如果是,将该行数据更新到所述哈希表中,并更新该行数据对应的版本信息;
    第二处理模块,用于当所述判断模块判断结果为否时,将该行数据写入所述哈希表中,并写入该行数据对应的关键码以及版本信息;
    发送模块,用于将所述哈希表发送到主服务器,以使所述分布式数据库系统中的每台服务器根据所述主服务器接收的哈希表进行数据恢复。
  8. 根据权利要求7所述的装置,其特征在于,所述获取模块,具体用于针对每个数据表,采用数据分片的方式,读取该数据表中的每一行数据。
  9. 根据权利要求7所述的装置,其特征在于,所述第二处理模块,具体用于将数据表中该行数据的主键作为哈希表中该行数据对应的关键码,写入 到所述哈希表中。
  10. 根据权利要求7-9任一所述的装置,其特征在于,所述装置还包括:
    记录模块,用于在所述第一处理模块或第二处理模块将该行数据写入所述哈希表之后,在所述哈希表中记录该行数据的出现频次。
  11. 根据权利要求10所述的装置,其特征在于,所述装置还包括:
    执行模块,用于当所述第一处理模块判断所述第一版本号等于所述第二版本号时,将所述哈希表中该行数据的出现频次加1。
  12. 根据权利要求11所述的装置,其特征在于,所述装置还包括:
    删除模块,用于在所述发送模块将所述哈希表发送到主服务器之前,针对所述哈希表中每行数据的出现频次,判断该行数据的出现频次是否小于预定阈值,如果是,删除该行数据。
  13. 一种存储介质,其特征在于,所述存储介质用于存储可执行程序代码,所述可执行程序代码用于在运行时执行如权利要求1-6任一项所述的一种用于分布式数据库系统的更新数据的方法。
  14. 一种应用程序,其特征在于,所述应用程序用于在运行时执行如权利要求1-6任一项所述的一种用于分布式数据库系统的更新数据的方法。
  15. 一种电子设备,包括:
    处理器、存储器、通信接口和总线;
    所述处理器、所述存储器和所述通信接口通过所述总线连接并完成相互间的通信;
    所述存储器存储可执行程序代码;
    所述处理器通过读取所述存储器中存储的可执行程序代码来运行与所述可执行程序代码对应的程序,以用于执行如权利要求1-6任一项所述的一种用于分布式数据库系统的更新数据的方法。
PCT/CN2016/104690 2016-03-30 2016-11-04 一种用于分布式数据库系统的更新数据的方法及装置 WO2017166815A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP16896584.6A EP3438845A1 (en) 2016-03-30 2016-11-04 Data updating method and device for a distributed database system
US16/089,949 US11176110B2 (en) 2016-03-30 2016-11-04 Data updating method and device for a distributed database system

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610191763.8A CN107291710B (zh) 2016-03-30 2016-03-30 一种用于分布式数据库系统的更新数据的方法及装置
CN201610191763.8 2016-03-30

Publications (1)

Publication Number Publication Date
WO2017166815A1 true WO2017166815A1 (zh) 2017-10-05

Family

ID=59963382

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/104690 WO2017166815A1 (zh) 2016-03-30 2016-11-04 一种用于分布式数据库系统的更新数据的方法及装置

Country Status (4)

Country Link
US (1) US11176110B2 (zh)
EP (1) EP3438845A1 (zh)
CN (1) CN107291710B (zh)
WO (1) WO2017166815A1 (zh)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108829785A (zh) * 2018-05-31 2018-11-16 沈文策 数据库中故障表的修复方法、装置、电子设备及存储介质
CN109582666A (zh) * 2018-09-29 2019-04-05 阿里巴巴集团控股有限公司 数据主键生成方法、装置、电子设备及存储介质
CN109739684B (zh) * 2018-11-20 2020-03-13 清华大学 基于向量时钟的分布式键值数据库的副本修复方法与装置
CN110377611B (zh) * 2019-07-12 2022-07-15 北京三快在线科技有限公司 积分排名的方法及装置
CN113704274B (zh) * 2020-05-20 2024-03-19 中国移动通信集团福建有限公司 一种数据的读取方法及电子设备
CN111400334B (zh) * 2020-06-04 2020-10-09 腾讯科技(深圳)有限公司 数据处理方法、装置、存储介质及电子装置
CN111506668B (zh) * 2020-07-01 2024-02-02 西安安森智能仪器股份有限公司 机器人集群智能化数据同步方法及系统
CN115987759B (zh) * 2023-02-17 2023-06-23 天翼云科技有限公司 数据处理方法、装置、电子设备及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6581075B1 (en) * 2000-12-28 2003-06-17 Nortel Networks Limited System and method for database synchronization
CN101464895A (zh) * 2009-01-21 2009-06-24 阿里巴巴集团控股有限公司 一种更新内存数据的方法、系统和装置
CN102426611A (zh) * 2012-01-13 2012-04-25 广州从兴电子开发有限公司 一种数据库同步方法及装置
CN104899257A (zh) * 2015-05-18 2015-09-09 北京京东尚科信息技术有限公司 分布式数据仓库中的数据更新方法和装置

Family Cites Families (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5924096A (en) * 1997-10-15 1999-07-13 Novell, Inc. Distributed database using indexed into tags to tracks events according to type, update cache, create virtual update log on demand
US7478096B2 (en) * 2003-02-26 2009-01-13 Burnside Acquisition, Llc History preservation in a computer storage system
US7457832B2 (en) * 2004-08-31 2008-11-25 Microsoft Corporation Verifying dynamically generated operations on a data store
US7788225B2 (en) * 2005-03-18 2010-08-31 Oracle International Corporation Apparatus and method for identifying asynchronous data in redundant data stores and for re-synchronizing same
US20090144220A1 (en) * 2007-11-30 2009-06-04 Yahoo! Inc. System for storing distributed hashtables
CN101369923B (zh) * 2008-09-24 2010-12-29 中兴通讯股份有限公司 一种使用分布式哈希表提高集群web服务性能的方法
US8712964B2 (en) * 2008-12-02 2014-04-29 United States Postal Services Systems and methods for updating a data store using a transaction store
US8799231B2 (en) 2010-08-30 2014-08-05 Nasuni Corporation Versioned file system with fast restore
US9037618B2 (en) * 2011-03-31 2015-05-19 Novell, Inc. Distributed, unified file system operations
US8346810B2 (en) * 2011-05-13 2013-01-01 Simplivity Corporation Reference count propagation
US8745095B2 (en) * 2011-08-12 2014-06-03 Nexenta Systems, Inc. Systems and methods for scalable object storage
US20130268567A1 (en) * 2012-04-05 2013-10-10 Cover-All Technologies, Inc. System And Method For Updating Slowly Changing Dimensions
US9031911B2 (en) * 2012-06-05 2015-05-12 International Business Machines Corporation Preserving past states of file system nodes
US9146976B2 (en) 2013-05-21 2015-09-29 Baker Hughes Incorporated Synchronization and reconciliation through identification
US10303682B2 (en) * 2013-09-21 2019-05-28 Oracle International Corporation Automatic verification and triage of query results
US10657113B2 (en) * 2014-01-14 2020-05-19 Baker Hughes, A Ge Company, Llc Loose coupling of metadata and actual data

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6581075B1 (en) * 2000-12-28 2003-06-17 Nortel Networks Limited System and method for database synchronization
CN101464895A (zh) * 2009-01-21 2009-06-24 阿里巴巴集团控股有限公司 一种更新内存数据的方法、系统和装置
CN102426611A (zh) * 2012-01-13 2012-04-25 广州从兴电子开发有限公司 一种数据库同步方法及装置
CN104899257A (zh) * 2015-05-18 2015-09-09 北京京东尚科信息技术有限公司 分布式数据仓库中的数据更新方法和装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3438845A4 *

Also Published As

Publication number Publication date
EP3438845A4 (en) 2019-02-06
CN107291710B (zh) 2020-07-03
US11176110B2 (en) 2021-11-16
EP3438845A1 (en) 2019-02-06
US20190121793A1 (en) 2019-04-25
CN107291710A (zh) 2017-10-24

Similar Documents

Publication Publication Date Title
WO2017166815A1 (zh) 一种用于分布式数据库系统的更新数据的方法及装置
US11256715B2 (en) Data backup method and apparatus
US8924365B2 (en) System and method for range search over distributive storage systems
US9020916B2 (en) Database server apparatus, method for updating database, and recording medium for database update program
US9418094B2 (en) Method and apparatus for performing multi-stage table updates
US8938430B2 (en) Intelligent data archiving
US8732127B1 (en) Method and system for managing versioned structured documents in a database
US10409692B1 (en) Garbage collection: timestamp entries and remove reference counts
WO2012083754A1 (zh) 处理脏数据的方法及装置
US8527480B1 (en) Method and system for managing versioned structured documents in a database
US9031909B2 (en) Provisioning and/or synchronizing using common metadata
US20140320498A1 (en) Terminal device, information processing method, and computer program product
US11003543B2 (en) Generic metadata tags with namespace-specific semantics in a storage appliance
EP3343395B1 (en) Data storage method and apparatus for mobile terminal
US11003540B2 (en) Method, server, and computer readable medium for index recovery using index redo log
CN113918535A (zh) 一种数据读取方法、装置、设备及存储介质
CN111639087A (zh) 数据库中数据更新方法、装置和电子设备
US11010332B2 (en) Set-based mutual exclusion using object metadata tags in a storage appliance
CN113360571A (zh) 基于特征标记的电网监控系统内存库关系库同步方法
US9002810B1 (en) Method and system for managing versioned structured documents in a database
US20140325271A1 (en) Terminal device, information processing method, and computer program product
US10042558B1 (en) Method to improve the I/O performance in a deduplicated storage system
JP4825504B2 (ja) データ登録・検索システムおよびデータ登録・検索方法
CN116257531B (zh) 一种数据库空间回收方法
US11847334B2 (en) Method or apparatus to integrate physical file verification and garbage collection (GC) by tracking special segments

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 2016896584

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2016896584

Country of ref document: EP

Effective date: 20181030

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16896584

Country of ref document: EP

Kind code of ref document: A1