WO2022002044A1 - 分布式数据库的处理方法及装置、网络设备和计算机可读存储介质 - Google Patents

分布式数据库的处理方法及装置、网络设备和计算机可读存储介质 Download PDF

Info

Publication number
WO2022002044A1
WO2022002044A1 PCT/CN2021/103095 CN2021103095W WO2022002044A1 WO 2022002044 A1 WO2022002044 A1 WO 2022002044A1 CN 2021103095 W CN2021103095 W CN 2021103095W WO 2022002044 A1 WO2022002044 A1 WO 2022002044A1
Authority
WO
WIPO (PCT)
Prior art keywords
sequence
data
information
processing
processed
Prior art date
Application number
PCT/CN2021/103095
Other languages
English (en)
French (fr)
Inventor
吕修阳
郭龙波
徐文锋
刘志文
付裕
吕达
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2022002044A1 publication Critical patent/WO2022002044A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1469Backup restoration techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2474Sequence data queries, e.g. querying versioned data

Definitions

  • the present application relates to the technical field of database management, and in particular, to a method and apparatus for processing a distributed database, a network device, and a computer-readable storage medium.
  • the stand-alone database in the related art is increasingly unable to meet the business needs of users, and users in various industries hope to use a distributed database for data management.
  • the habit of using the stand-alone database in the related art has been formed, and it is difficult for users to adapt to the use method of the distributed database, especially the application of the self-increasing sequence commonly used in the stand-alone database in the related art in the distributed database.
  • a sequence is a sequence of numbers in a database system that is self-increasing according to certain rules. Because the sequence is self-increasing, the numbers in the sequence will not repeat. Sequence mainly provides a unique self-increasing value for the database system continuously. The sequence can be used as the primary key of the data table, and can also be used as the unique identifier during data operation. However, the efficiency of data search in distributed databases is low, and the user experience is poor.
  • An embodiment of the present application provides a method for processing a distributed database, including: sorting information to be processed in a database cluster to generate a unique incremental sequence processing result; the information to be processed is information from different threads, and the information in the database cluster is All distributed data nodes have the same set of sequences, and the set of sequences includes sequences.
  • An embodiment of the present application provides a processing device for a distributed database, including: an acquisition module configured to acquire information to be processed in a database cluster; a processing module configured to sort the information to be processed and generate a unique incremental sequence processing result; Processing information is information from different threads, and all distributed data nodes in the database cluster have the same sequence set, and the sequence set includes sequences.
  • An embodiment of the present application provides a network device, including: one or more processors; a memory on which one or more computer programs are stored, when the one or more computer programs are executed by the one or more processors When executed, the one or more processors are made to implement the method for processing a distributed database in the embodiments of the present application.
  • Embodiments of the present application provide a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, the method for processing a distributed database in the embodiments of the present application is implemented.
  • FIG. 1 shows a schematic flowchart of a method for processing a distributed database in an embodiment of the present application.
  • FIG. 2 shows another schematic flowchart of a method for processing a distributed database in an embodiment of the present application.
  • FIG. 3 shows a schematic structural diagram of a processing apparatus for a distributed database in an embodiment of the present application.
  • FIG. 4 shows a block diagram of a composition of a distributed database processing system in an embodiment of the present application.
  • FIG. 5 shows another block diagram of the composition of the distributed database processing system in the embodiment of the present application.
  • FIG. 6a shows a schematic diagram of a processing flow of abnormality when the backup machine of the global sequence manager performs full data backup in an embodiment of the present application.
  • Fig. 6b shows a schematic diagram of the processing flow when the global sequence manager host in the embodiment of the present application restores data.
  • FIG. 7 shows a structural diagram of an exemplary hardware architecture of a computing device capable of implementing methods and apparatuses according to embodiments of the present application.
  • FIG. 1 is a schematic flowchart of a method for processing a distributed database in an embodiment of the present application, and the method can be applied to a processing apparatus for a distributed database.
  • the processing method of the distributed database includes the following steps 101 and 102 .
  • Step 101 Obtain pending information in the database cluster.
  • the information to be processed is information from different threads, and all distributed data nodes in the database cluster have the same sequence set, and the sequence set includes sequences.
  • the distributed database created in the database cluster can have a sequence or no sequence.
  • One sequence corresponds to one distributed database, and each distributed database can have multiple sequences (for example, sequence 1, sequence 2, etc.). If there is no sequence in a distributed database, the distributed database cannot use the related functions of the sequence.
  • the application sends a plurality of structured query language (Structured Query Language, SQL) statements to be processed to the processing device of the distributed database, and the processing device of the distributed database parses the SQL statements to be processed through a plurality of different threads, and obtains Pending information.
  • the information to be processed includes multiple pieces of sequence information to be processed, and a SQL statement to be executed is sent to a corresponding database node for execution, and a processing result of the sequence information to be processed can be obtained.
  • SQL Structured Query Language
  • Step 102 Sort the information to be processed to generate a unique incremental sequence processing result.
  • the information to be processed may include information such as sequence creation, sequence application, sequence deletion, etc., so that the processing device of the distributed database can classify the information to be processed according to different sequence operations, and then classify the information to be processed according to different categories.
  • the information is processed to produce a uniquely increasing sequence of processing results.
  • step 102 may be implemented in the following manner. Perform any one of the following operations on the information to be processed according to the arrival order of the information to be processed, to generate a unique incremental sequence processing result: add the sequence corresponding to the information to be processed into the sequence set; delete the sequence corresponding to the information to be processed from the sequence set Sequence; modifies the sequence set according to the information to be processed.
  • Sequence processing results include sequences. Sequences are numbers that are automatically incremented in the database cluster according to certain rules. Because they are incremented, they will not be repeated, which ensures the uniqueness of the sequence.
  • the sequence can be used as a surrogate primary key to identify data; it can also be used to record the latest changed statement in the database. As long as the statement in the database changes (for example, insert statement or delete statement, etc.), the sequence will be updated with it, so it can be based on sequence to filter out the updated statements.
  • the above usage modes of sequences are only illustrative, and can be specifically set according to specific conditions. The usage modes of other unexplained sequences are also within the protection scope of the present application, and will not be repeated here.
  • the first message to be processed arrives first, and the second message to be processed arrives later, the first message to be processed will be processed first (for example, the sequence 100 is allocated to the first message to be processed), and then the second message to be processed ( For example, assign sequence 101) to the second information to be processed, so that the obtained sequence processing results are: sequence 100 and sequence 101, so as to ensure that the sequences are uniquely increasing and avoid two sequences with sequence 100.
  • a uniquely increasing sequence processing result is generated, which ensures the uniqueness and self-increment of the sequence. Since the information to be processed is information from different threads, all distributed data nodes in the database cluster have the same sequence set, and the sequence set includes the sequence.
  • the sequence processing result When the user uses the sequence processing result to search for data on the entire database cluster, The speed of data search is accelerated and the user experience is improved.
  • the method for processing a distributed database further includes the following steps 103 to 105.
  • Step 103 Receive the database processing statement sent by the application.
  • step 104 the database processing statement is parsed to obtain a data definition language and a data manipulation language.
  • Data Definition Language is a language in the Structured Query Language (SQL) set responsible for data structure definition and database object definition.
  • DDL is responsible for creating, modifying, deleting, and indexing objects in the database. and storage operations. For example, create a database, create a data table, create a data table index, create a view table and other operations.
  • DDL must be compiled by computer software and converted into a format that is convenient for computer storage, query and operation. The program that completes this conversion is called a schema compiler.
  • Data Manipulation Language through which users can implement basic operations on the database.
  • DML is responsible for inserting, modifying and deleting data in the database. For example, insert some data into the generated database or data table, delete some data in the data table, or modify some data in the data table.
  • Step 105 analyze the database definition language and the data manipulation language, and extract the information to be processed.
  • the database node corresponding to the data table in DDL is database node A
  • the database node corresponding to the data in the DML statement is also database node A
  • the above two pieces of information are combined. , to obtain the pending information corresponding to the database node A.
  • step 105 can be implemented in the following manner: analyze the database definition language and data manipulation language to obtain N sequence processing requirements; perform batch processing on the N sequence processing requirements to generate pending information, where N is greater than or equal to or an integer equal to 1.
  • the database definition language and the data manipulation language are analyzed through multiple different threads, the information to be processed is extracted, and then the information to be processed is processed in batches, which reduces the number of data processing times, and avoids message redundancy and database performance degradation. , which improves the data processing speed.
  • the method for processing a distributed database further includes: dividing the distributed data nodes into clusters according to the processing capabilities of the distributed data nodes, and obtaining a database cluster.
  • the distributed data nodes are divided into clusters according to the processing capabilities of the distributed data nodes, so that the distributed data nodes with high processing capabilities can process more complex data, and the distributed data nodes with low processing capabilities can process more complex data.
  • Data nodes process simpler data, improving the efficiency of the database.
  • the database cluster is used to manage each distributed data node, and the data is isolated at the level of the database cluster, so that the sequence information on each database cluster is guaranteed to be unique and incremental. When the user queries in the database cluster When there is some data, it can quickly find the required data and improve the user experience.
  • the method for processing a distributed database further includes the following steps 106 and 107.
  • Step 106 Send the database processing statement to the distributed data node for processing according to the sequence processing result.
  • the sequence processing result includes multiple sequences
  • the SQL statement to be processed corresponding to sequence 1 is sent to the first distributed data node for processing to generate the first data processing result
  • the SQL statement to be processed corresponding to sequence 2 is sent to the first distributed data node for processing. , send it to the second distributed data node for processing, and generate a second data processing result
  • ... send the SQL statement to be processed corresponding to sequence n to the nth distributed data node for processing, and generate the nth data process result.
  • Step 107 In response to the data processing result fed back by the distributed data node, forward the data processing result to the application.
  • the first data processing result fed back by the first distributed data node is fed back to the application through the first processing thread
  • the second data processing result fed back by the second distributed data node is fed back to the application through the second processing thread
  • the nth data processing result fed back by the nth distributed data node is fed back to the application through the nth processing thread.
  • Concurrent processing threads improve data processing speed and user experience.
  • sequence processing results are sent to the corresponding distributed data nodes for processing through different processing threads, and the data processing results fed back by each distributed data node are fed back to the application through the corresponding processing threads.
  • the data processing efficiency is improved, the application can quickly obtain the data processing results, and the user experience is improved.
  • FIG. 2 is another schematic flowchart of a method for processing a distributed database in an embodiment of the present application. As shown in FIG. 2 , the processing method of the distributed database includes the following steps 201 to 203 .
  • Step 201 Obtain pending information in the database cluster.
  • Step 202 the first global sequence manager sorts the information to be processed, and generates a unique incremental sequence processing result.
  • steps 201 and 202 in this embodiment are similar to steps 101 and 102 in the previous embodiment, and are not repeated here.
  • Step 203 Synchronize the sequence processing result to the second global sequence manager.
  • the second global sequence manager is a backup manager of the first global sequence manager, which includes processing threads.
  • the second global sequence manager and the first global sequence manager are backup managers for each other.
  • the first global sequence manager fails, the second global sequence manager is used to process the to-be-processed information to generate a unique incremental sequence Processing result;
  • the second global sequence manager fails, the first global sequence manager can be used to process the information to be processed to generate a unique incremental sequence processing result.
  • the global sequence manager can also be implemented in the form of a cluster.
  • the global sequence manager includes a global sequence manager master and multiple global sequence manager standby machines, ensuring that there are multiple servers holding complete sequences.
  • the global sequence manager standby machine that backs up data can take over the global sequence manager host to continue working when the global sequence manager host fails, and maintain the normal use of the database sequence function.
  • the sequence processing results can be backed up to avoid data loss, and to ensure that when the first global sequence manager fails, the second global sequence manager fails.
  • the sequence manager can replace the first global sequence manager, continue to sort the information to be processed, maintain the normal use of the database sequence function, and improve the user experience.
  • the method for processing the distributed database further includes: performing a full backup of the data in the first global sequence manager every preset time interval.
  • the first global sequence manager performs a full backup of the data every 10 minutes, that is, writes the data to a disk file. By regularly backing up the data in the first global sequence manager in full, data security is ensured.
  • the processing method of the distributed database further includes: counting the number of information to be processed; Full backup of data.
  • the number of received messages to be processed is counted, and if the number of messages to be processed is greater than a preset sequence threshold (for example, the preset sequence threshold is 20,000), it is necessary to quantify the number of messages in the first global sequence manager Full data backup is performed to ensure the security of real-time data.
  • a preset sequence threshold for example, the preset sequence threshold is 20,000
  • the method for processing the distributed database further includes: when the first global sequence manager fails, using the second global sequence manager The sequence manager replaces the first global sequence manager; when the failure of the first global sequence manager is recovered, the sequence data in the second global sequence manager is used as restoration data to restore the data in the first global sequence manager , to obtain the restored sequence data; perform persistent processing on the restored sequence data.
  • persistence processing includes various operations related to the database.
  • the restored sequence data is written into a disk file. If the first global sequence manager goes down, the second global sequence manager will take over the first global sequence manager and continue to provide services for the database. After the manager is restored, the first global sequence manager will obtain the latest sequence information from the second global sequence manager, and write it into its own disk file to ensure that the restored sequence data is stored by the first global sequence manager. Save it in the form of disk files to improve disaster tolerance. At this time, if both the first global sequence manager and the second global sequence manager are down, and then the first global sequence manager recovers, the latest sequence information can be read from the disk file to ensure the unique incrementality of the sequence information. .
  • Persistence processing can also permanently save the domain object to the database; update the state of the domain object in the database; delete a domain object from the database; load a domain object from the database into memory according to a specific identifier; Query conditions, load one or more domain objects that meet the query conditions from the database into memory, and so on.
  • the above persistence processing is only an example, and can be set according to specific circumstances. Other non-explained persistence processing methods are also within the scope of protection of this application, and will not be repeated here.
  • Persistence processing of recovered sequence data not only encapsulates data access details and provides object-oriented interface functions for most business logic; it can also reduce the number of database accesses and increase the execution speed of applications; and persistent processing
  • the code is highly reusable and can complete most of the database operations; at the same time, the persistence processing is not dependent on the underlying database and the upper-level business logic. When changing the database, you only need to modify the configuration file corresponding to the database.
  • the sequence data in the second global sequence manager is used as the restoration data to restore the data in the first global sequence manager, so that the first global sequence manager can Continue to process the information to be processed to ensure that the data is not lost in the process of the failure of the first global sequence manager, and to ensure the security of the data.
  • the restored sequence data By persisting the restored sequence data, the number of times of accessing database data is reduced and the execution speed of the application is increased.
  • the method for processing the distributed database further includes: when the first global sequence manager fails, and the second global sequence manager is not used When the global sequence manager replaces the first global sequence manager, obtain the global backup data at the first moment and the global backup data at the second moment; calculate the first difference between the preset recovery moment and the first moment, and calculate the preset recovery
  • the second difference between the time and the second time, the first time and the second time are both earlier than the restoration time; if it is determined that the second difference is smaller than the first difference, the global backup data at the second time is used as the restoration data.
  • the data in the first global sequence manager is restored to obtain restored sequence data; the starting sequence is set according to the restored sequence maximum value and the preset sequence threshold.
  • the first global sequence manager can quickly restore the data to a usable state to ensure the integrity of the data.
  • Sequence maximum value and preset sequence threshold value set the starting sequence to avoid sequence repetition.
  • FIG. 3 shows a schematic structural diagram of a processing apparatus for a distributed database according to an embodiment of the present application.
  • the processing means of the distributed database may be implemented using computing nodes including a global sequence manager.
  • the processing device of the distributed database includes the following modules:
  • the obtaining module 301 is configured to obtain the information to be processed in the database cluster; the processing module 302 is configured to sort the information to be processed and generate a unique incremental sequence processing result; the information to be processed is the information from different threads, and the information in the database cluster is All distributed data nodes have the same set of sequences, and the set of sequences includes sequences.
  • sequence set is created in a database cluster, the sequence set includes sequences, and all distributed data nodes in a database cluster have the same sequence set.
  • Each database cluster corresponds to a processing thread, and within a thread, the information to be processed is processed according to the arrival order of the information to be processed, and a unique incremental sequence processing result is obtained, which ensures the unique incrementality of the sequence.
  • the processing module processes the information to be processed according to the arrival order of the information to be processed to generate a unique incremental sequence processing result, which ensures the uniqueness and self-increment of the sequence. Since the information to be processed is information from different threads, all distributed data nodes in the database cluster have the same sequence set. When users use the sequence processing results to search for data on the entire database cluster, the speed of data search is accelerated , which improves the user experience.
  • FIG. 4 is a block diagram of a composition of a processing system of a distributed database in an embodiment of the present application.
  • the processing system of the distributed database includes: an application 410 , a computing node 420 , a global sequence manager 430 and a database cluster 440 .
  • the computing node 420 includes a merge thread 422 and n processing threads, for example, a first processing thread 4211, a second processing thread 4212, ..., an nth processing thread 421n, where n is an integer greater than or equal to 1.
  • the global sequence manager 430 includes: a global sequence manager host 431 and a global sequence manager backup machine 432, and the global sequence manager host 431 includes m processing threads, for example, a first processing thread 4311, a second processing thread 4312, ..., the mth processing thread 431m, where m is an integer greater than or equal to 1.
  • a sequence set 441 is created in the database cluster 440, and all distributed data nodes included in a database cluster 440 have the same sequence set 441.
  • the global sequence manager 430 can also be implemented in the form of a cluster.
  • the global sequence manager includes one global sequence manager host and multiple global sequence manager backup machines, ensuring that there are multiple servers holding complete sequences.
  • the global sequence manager standby machine that backs up data can take over the global sequence manager host to continue working when the global sequence manager host fails, and maintain the normal use of the database sequence function.
  • Global sequence manager 430 is configured to process sequences in database cluster 440 .
  • the global sequence manager 430 adopts a cluster architecture of one host and multiple standby machines to form a multi-copy high-availability cluster architecture, which can ensure that when the global sequence manager host goes down, there can be multiple global sequences that hold complete sequence copies.
  • the manager standby machine can take over the global sequence manager master and maintain the normal use of the database sequence function.
  • a global sequence manager can process sequence information on one or more database clusters, ensuring high processing performance for multiple database clusters, and at the same time, ensuring cluster-level isolation and better disaster tolerance.
  • the computing node 420 is configured to parse the SQL statement sent by the application 410 and send a sequence processing request to the global sequence manager 430 . After receiving the sequence processing result returned by the global sequence manager 430, the computing node 420 sends the to-be-processed SQL statement corresponding to the sequence in the sequence processing result to each database node for processing. The computing node 420 establishes a connection with the database cluster 440 and establishes a connection with the global sequence manager 430 .
  • the database cluster 440 includes a plurality of distributed data nodes, and is a complete storage unit externally, providing highly reliable data services and ensuring data consistency.
  • Each distributed data node is configured to execute the SQL statement sent by the computing node 420 and return the data processing result to the computing node 420 .
  • Each distributed data node has the same set 441 of sequences. Sequence set 441 corresponds to a processing thread in global sequence manager 430 . In a processing thread in the global sequence manager 430, the sequence information to be processed is processed and a response is returned according to the arrival sequence of the sequence processing request messages, so as to ensure the unique incrementality of the sequence.
  • the global sequence manager 430 provides uniquely increasing sequence information for one or more database clusters, ensuring that the sequences in the database clusters are uniquely increasing sequences.
  • Global sequence manager 430 only processes sequence-related information.
  • Global sequence manager 430 creates m processing threads (eg, 1st processing thread 4311, 2nd processing thread 4312, ..., mth processing thread 431m), and one processing thread serves sequence set 441 in database cluster 440 , that is, one processing thread in the global sequence manager 430 only processes the sequence information of the sequence set 441 .
  • the method for processing the SQL statement input by the application 410 by the processing system of the distributed database includes steps S401 to S408.
  • Step S401 the application 410 sends a plurality of SQL statements to be processed to the computing node 420 .
  • Step S402 the computing node 420 uses the first processing thread 4211, the second processing thread 4212, . sequence information 2, ..., sequence information n to be processed, etc., and then send the above multiple sequence information to be processed to the merge thread 422, so that the merge thread 422 can merge the sequence information to be processed, generate and send a sequence request message to the global sequence manager 430.
  • the sequence request message includes n pieces of sequence information to be processed, and the arrival order of the sequence information to be processed, where the arrival order is the order in which the sequence information to be processed reaches a certain processing thread in the global sequence manager host 431 .
  • Step S403 after receiving the sequence request message, the global sequence manager host 431 in the global sequence manager 430 obtains the set of sequence information to be processed that needs to be processed by the distributed data nodes in the database cluster 440 through screening, and then the to-be-processed sequence information set is obtained by screening.
  • sequences in the sequence information set use the first processing thread 4311 to process them in sequence to generate a unique increasing sequence (for example, the generated sequence is 1, 2, ..., k) , k is an integer greater than or equal to 1 to ensure that the sequence on the database cluster 440 is the only increasing sequence, and then generate a synchronization message according to the generated sequence 1, sequence 2, ..., sequence k, and send the synchronization message to the global Sequence Manager Standby 432.
  • a unique increasing sequence for example, the generated sequence is 1, 2, ..., k
  • k is an integer greater than or equal to 1 to ensure that the sequence on the database cluster 440 is the only increasing sequence
  • step S404 after receiving the synchronization message, the global sequence manager standby machine 432 saves the sequence information generated in step S403 to the local disk, and sends a synchronization completion message to the global sequence manager host 431 .
  • Step S405 the global sequence manager host 431, in response to the synchronization completion message fed back by the global sequence manager standby 432, determines that the data has been successfully synchronized to the global sequence manager standby 432, and then, according to the sequence 1 generated in step S403 , sequence 2, ..., sequence k, generate and send a sequence response message to the computing node 420.
  • step S406 the computing node 420 combines each sequence with the corresponding SQL statement according to each sequence in the sequence response message, and directly sends it to the designated distributed data node for execution to obtain the execution result, and the designated distributed data node executes the execution. Afterwards, the execution result is fed back to the computing node 420 .
  • the SQL statements to be processed are processed in parallel, so that the data processing efficiency is improved.
  • Step S407 the computing node 420 feeds back the data processing result fed back by the database cluster 440 to the application 410 .
  • multiple processing threads in the computing node simultaneously process multiple SQL statements to be processed sent by the application to generate a sequence request message, where the sequence request message includes multiple sequence information to be processed, so that the data
  • the processing efficiency has been improved, and the user experience has been improved.
  • each processing thread sequentially processes the sequence information to be processed according to the arrival order of the sequence request message to ensure the unique increment of the sequence.
  • FIG. 5 is another composition block diagram of the processing system of the distributed database in the embodiment of the present application.
  • the distributed database processing system includes: an application 510 , a computing node 520 , a first global sequence manager 530 , a second global sequence manager 550 , a first database cluster 540 and a second database cluster 560 .
  • the first global sequence manager 530 includes: a first global sequence manager host 531 and a first global sequence manager standby machine 532, and the global sequence manager host 531 includes n processing threads, for example, the first processing thread 5311, the second processing thread 5311, the second processing thread Processing threads 5312, ..., nth processing thread 531n, where n is an integer greater than or equal to 1.
  • the second global sequence manager 550 includes: a second global sequence manager host 551 and a second global sequence manager standby 552.
  • the global sequence manager host 551 includes m processing threads, for example, the first processing thread 5511, the second processing thread 5511, the second global sequence manager Processing threads 5512, ..., mth processing thread 551m, where m is an integer greater than or equal to 1.
  • the computing node 520 includes a merge thread 522 and k processing threads, eg, a first processing thread 5211, a second processing thread 5212, ..., a kth processing thread 521k, where k is greater than or equal to the sum of m and n.
  • Database cluster 540 includes sequence collection 541 .
  • Database cluster 560 includes sequence collection 561 .
  • the first global sequence manager 530 is only used to process the sequence information corresponding to the first database cluster 540
  • the second global sequence manager 550 is only used to process the sequence information corresponding to the second database cluster 560 .
  • Sequence information the merge thread in the computing node 520 needs to classify the sequence information to be processed sent by the application 510, according to the arrival order of the sequence to be processed, and which database cluster needs to be processed for the SQL statement corresponding to the sequence to be processed, is the sequence to be processed. Allocate the corresponding global sequence manager, and then distribute different to-be-processed sequences to different global sequence managers for processing.
  • the method for performing data processing on the SQL statement input by the application 510 by the processing system of the distributed database in this embodiment includes steps S501 to S509.
  • Step S501 the application 510 sends a plurality of SQL statements to be processed to the computing node 520 .
  • Step S502 the computing node 520 uses the first processing thread 5211, the second processing thread 5212, . Sequence information 2, ..., sequence information k to be processed, etc., and then send the above multiple sequence information to be processed to the merge thread 522, so that the merge thread 522 processes the sequence corresponding to the sequence to be processed according to the arrival order of the sequence to be processed.
  • the database cluster of the SQL statement sorts and merges the sequence information to be processed, generates the first sequence request message and the second sequence request message; sends the first sequence request message to the first global sequence manager 530, and sends the second sequence request message.
  • a sequence request message to the second global sequence manager 550 .
  • the first sequence request message includes n pieces of sequence information to be processed; the second sequence request message includes m pieces of sequence information to be processed. It should be noted that the n pieces of sequence information to be processed in the first sequence request message belong to the first database cluster 540 , and the m pieces of sequence information to be processed in the second sequence request message belong to the second database cluster 560 .
  • Step S503 after the first global sequence manager host 531 in the first global sequence manager 530 receives the first sequence request message, it sequentially distributes the n pieces of sequence information to be processed to the first processing thread 5311 for processing, Generating sequences 1, 2, . And send a first synchronization message to the first global sequence manager standby machine 532, where the first synchronization message includes sequence 1, sequence 2, ..., sequence n.
  • Step S504 after receiving the synchronization message, the first global sequence manager standby machine 532 saves the sequence information generated in step S503 to the local disk, and sends a synchronization completion message to the first global sequence manager host 531 .
  • Step S505 the first global sequence manager host 531 responds to the synchronization completion message fed back by the first global sequence manager standby 532, and determines that the data has been successfully synchronized to the first global sequence manager standby 532, and then, according to the steps The sequence 1, sequence 2, ..., sequence n generated in S503, and the identifier of the distributed data node corresponding to each sequence, generate and send a first sequence response message to the computing node 520.
  • steps S503 to S505 are performed, the second global sequence manager 550 simultaneously performs similar operations, as shown in steps S506 to S508.
  • Step S506 after receiving the second sequence request message, the second global sequence manager host 551 in the second global sequence manager 550 sequentially distributes the m pieces of sequence information to be processed to, for example, the second processing thread 5512 for processing. , and sequentially generate sequences 1, 2, .
  • the global sequence manager standby machine 552, the second synchronization message includes sequence 1, sequence 2, ..., sequence m, and so on.
  • Step S507 after receiving the synchronization message, the second global sequence manager standby machine 552 saves the sequence information generated in step S506 to the local disk, and sends a synchronization completion message to the second global sequence manager host 551 .
  • Step S508 the second global sequence manager host 551 responds to the synchronization completion message fed back by the second global sequence manager standby 552, and determines that the data has been successfully synchronized to the second global sequence manager standby 552, and then, according to the steps For the sequence 1, sequence 2, ..., sequence m generated in S506, a second sequence response message is generated and sent to the computing node 520.
  • sequences (sequence 1, sequence 2, ..., sequence n) generated by the first global sequence manager 530 ensure that the sequences in the first database cluster 540 are uniquely increasing; the second global sequence manager The sequence (sequence 1, sequence 2, . Unique incrementality of the sequence.
  • Step S509 after receiving the first sequence response message and the second sequence response message, the computing node 520 combines each sequence in the first sequence response message with the SQL statement to be processed, and generates and sends the first data processing message to the database.
  • the cluster 540 performs processing, and at the same time, combines each sequence in the second sequence response message with the SQL statement to be processed, generates and sends a second data processing message to the database cluster 560 for processing, and the computing node 520 obtains the first database cluster. 540 and the data processing results fed back by each distributed data node in the second database cluster 560 , forward each data processing result to the application 510 .
  • first database cluster 540 and the second database cluster 560 are physically isolated, the sequences corresponding to the generated data processing results will not be interfered with. It ensures the isolation of sequences at the database cluster level and the unique increment of sequences.
  • different global sequence managers process sequence information in different database clusters, so that sequence isolation can be performed at the level of database clusters, and at the same time, the uniqueness of sequences within each database cluster is guaranteed.
  • Multiple threads are used to process the information to be processed concurrently, which improves the efficiency of sequence processing and ensures the high performance of the distributed database system.
  • FIG. 6a shows a schematic diagram of a processing flow of abnormality when the backup machine of the global sequence manager performs full data backup in an embodiment of the present application.
  • the global sequence manager backup machine can perform a full backup of data every preset time interval (for example, 5 minutes), that is, write the data to a disk file.
  • preset time interval for example, 5 minutes
  • the number of information to be processed is counted.
  • the preset sequence threshold for example, the preset sequence threshold is set to 10,000
  • the backup machine of the global sequence manager will perform a full backup of the data again to ensure that the sequence information is not lost.
  • the backup machine of the global sequence manager will perform a full backup of the data again.
  • FIG. 6b shows a schematic flowchart of processing when the global sequence manager host in the embodiment of the present application restores data.
  • the global sequence manager host has returned to a normal working state due to maintenance by staff, and at this time, the global sequence manager host needs to start a data recovery operation.
  • the global sequence manager backup machine has performed full backup at 12:06 and 12:10 respectively, so the global sequence manager host needs to apply to the global sequence manager backup machine for the data before the current recovery time.
  • restore the data of the global sequence manager host including steps S601 to S604.
  • step S601 the global sequence manager host applies to the global sequence manager backup device to obtain, for example, the backup data at 12:06 as the current restoration data.
  • the global sequence manager host will select a full backup data earlier than, for example, 12:09 time (recovery time) and the closest to 12:09 time as the restoration data for this restoration to ensure the comprehensiveness of the data. and security.
  • Step S602 the global sequence manager host uses, for example, the backup data at 12:06 as the current restoration data, and restores its internal data.
  • Step S603 when the data recovery is completed, set the initial value of the sequence.
  • the maximum value of the restored sequence and the preset sequence threshold are added to obtain the initial value of the sequence;
  • the preset sequence threshold is multiplied to obtain the product value, and then the maximum value of the restored sequence and the product value are added to obtain the initial value of the sequence.
  • the above method for setting the initial value of the sequence is only an example, and it can be set according to the actual situation. Repeat.
  • Step S604 for example, delete backup data after 12:09 time (recovery time) to prevent data duplication and improve storage space utilization.
  • the global sequence manager backup machine performs full backup of the sequence information every preset time interval to ensure that the sequence information is not lost.
  • the backup data is obtained by applying to the global sequence manager standby server, that is, the full backup data closest to the current time point is used as backup data, so that the global sequence manager host can Quickly restore data to the current available state to ensure data integrity, and after data recovery is complete, delete backup data after the recovery time to prevent data duplication and improve storage space utilization.
  • FIG. 7 shows a structural diagram of an exemplary hardware architecture of a computing device capable of implementing methods and apparatuses according to embodiments of the present application.
  • the computing device 700 includes an input device 701 , an input interface 702 , a central processing unit 703 , a memory 704 , an output interface 705 , and an output device 706 .
  • the input interface 702, the central processing unit 703, the memory 704, and the output interface 705 are connected to each other through the bus 707, and the input device 701 and the output device 706 are connected to the bus 707 through the input interface 702 and the output interface 705, respectively, and then to other components of the computing device 700. Component connection.
  • the input device 701 receives input information from the outside, and transmits the input information to the central processing unit 703 through the input interface 702; the central processing unit 703 processes the input information based on the computer-executable instructions stored in the memory 704 to generate output information, temporarily or permanently store the output information in the memory 704, and then transmit the output information to the output device 706 through the output interface 705; the output device 706 outputs the output information to the outside of the computing device 700 for the user to use.
  • the computing device shown in FIG. 7 may be implemented as a network device, and the network device may include: a memory configured to store a computer program; a processor configured to execute the computer program stored in the memory , so as to execute the distributed database processing method described in the above embodiments.
  • Embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, the method for processing a distributed database described in the foregoing embodiments is implemented.
  • the distributed database processing method and device, the network device, and the computer-readable storage medium provided by the embodiments of the present application generate a unique incremental sequence processing result by sorting the information to be processed in the database cluster, thereby ensuring the uniqueness of the sequence.
  • the database cluster since the information to be processed is information from different threads, the database cluster includes distributed data nodes, and the uniqueness of the sequence is guaranteed by making the sequence processing results only available to the corresponding distributed data nodes. It can overcome the problem of low data search efficiency caused by the inability of distributed databases to provide unique sequences in related technologies. When users use sequence processing results to search for data on the entire database cluster, the speed of data search is accelerated and the user experience is improved. Spend.
  • the various embodiments of the present application may be implemented in hardware or special purpose circuits, software, logic, or any combination thereof.
  • some aspects may be implemented in hardware, while other aspects may be implemented in firmware or software that may be executed by a controller, microprocessor or other computing device, although the application is not limited thereto.
  • Embodiments of the present application may be implemented by the execution of computer program instructions by a data processor of a mobile device, eg in a processor entity, or by hardware, or by a combination of software and hardware.
  • Computer program instructions may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-dependent instructions, microcode, firmware instructions, state setting data, or source code written in any combination of one or more programming languages or object code.
  • ISA instruction set architecture
  • the block diagrams of any logic flow in the figures of the present application may represent program steps, or may represent interconnected logic circuits, modules and functions, or may represent a combination of program steps and logic circuits, modules and functions.
  • Computer programs can be stored on memory.
  • the memory may be of any type suitable for the local technical environment and may be implemented using any suitable data storage technology, such as but not limited to read only memory (ROM), random access memory (RAM), optical memory devices and systems (Digital Versatile Discs). DVD or CD disc) etc.
  • Computer-readable media may include non-transitory storage media.
  • the data processor may be of any type suitable for the local technical environment, such as, but not limited to, a general purpose computer, special purpose computer, microprocessor, digital signal processor (DSP), application specific integrated circuit (ASIC), programmable logic device (FGPA) and processors based on multi-core processor architectures.
  • DSP digital signal processor
  • ASIC application specific integrated circuit
  • FGPA programmable logic device

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Fuzzy Systems (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种分布式数据库的处理方法及装置、一种网络设备和一种计算机可读存储介质,该分布式数据库的处理方法包括:对数据库集群中的待处理信息进行排序,生成唯一递增的序列处理结果;待处理信息是来自于不同线程的信息,数据库集群中的所有分布式数据节点上均有相同的序列集合,所述序列集合包括序列。

Description

分布式数据库的处理方法及装置、网络设备和计算机可读存储介质
相关申请的交叉引用
本申请要求于2020年6月29日提交的中国专利申请NO.202010604834.9的优先权,该中国专利申请的内容通过引用的方式整体合并于此。
技术领域
本申请涉及数据库管理技术领域,具体涉及分布式数据库的处理方法及装置、网络设备和计算机可读存储介质。
背景技术
随着互联网行业迅猛发展,数据量呈爆发性增长。相关技术中的单机数据库越来越不能满足用户的业务需求,各行业用户都希望能够使用分布式数据库来进行数据的管理。但是,相关技术中的单机数据库的使用习惯已形成,用户很难适应分布式数据库的使用方法,尤其涉及到相关技术中的单机数据库中常用的自增长序列在分布式数据库中的应用问题。
序列(sequence)是数据库系统中按照一定规则自增的数列,因为序列具有自增性,所以序列中的数字不会重复。序列主要为数据库系统持续提供唯一的自增长的数值,序列可以作为数据表的主键,也可以作为数据操作时的唯一标识等。但是,分布式数据库中的数据搜索的效率较低,用户体验差。
公开内容
本申请实施例提供一种分布式数据库的处理方法,包括:对数据库集群中的待处理信息进行排序,生成唯一递增的序列处理结果;待处理信息是来自于不同线程的信息,数据库集群中的所有分布式数 据节点上均有相同的序列集合,所述序列集合包括序列。
本申请实施例提供一种分布式数据库的处理装置,包括:获取模块,配置为获取数据库集群中的待处理信息;处理模块,配置为对待处理信息进行排序,生成唯一递增的序列处理结果;待处理信息是来自于不同线程的信息,数据库集群中的所有分布式数据节点上均有相同的序列集合,所述序列集合包括序列。
本申请实施例提供一种网络设备,包括:一个或多个处理器;存储器,其上存储有一个或多个计算机程序,当所述一个或多个计算机程序被所述一个或多个处理器执行时,使得所述一个或多个处理器实现本申请实施例中的分布式数据库的处理方法。
本申请实施例提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时实现本申请实施例中的分布式数据库的处理方法。
关于本申请的以上实施例和其他方面以及其实现方式,在附图说明、具体实施方式和权利要求中提供更多说明。
附图说明
图1示出本申请实施例中的分布式数据库的处理方法的一种流程示意图。
图2示出本申请实施例中的分布式数据库的处理方法的另一种流程示意图。
图3示出本申请实施例中的分布式数据库的处理装置的结构示意图。
图4示出本申请实施例中的分布式数据库的处理系统的一种组成方框图。
图5示出本申请实施例中的分布式数据库的处理系统的另一种组成方框图。
图6a示出本申请实施例中的全局序列管理器备机对数据进行全量备份时存在异常的处理流程示意图。
图6b示出本申请实施例中的全局序列管理器主机对数据进行恢 复时的处理流程示意图。
图7示出能够实现根据本申请实施例的方法和装置的计算设备的示例性硬件架构的结构图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚明白,下文中将结合附图对本申请的实施例进行详细说明。需要说明的是,在不冲突的情况下,本申请中的各实施例及实施例中的各特征可以相互任意组合。
在分布式数据库中,由于数据是处于不同分片上的,对各个数据的操作是互相隔离的,导致各分片上的数据对应的序列不唯一,用户在整个数据库集群上搜索数据时,无法为用户提供唯一的序列,使得数据搜索的效率较低,用户体验差。
图1是本申请实施例中的分布式数据库的处理方法的一种流程示意图,该方法可应用于分布式数据库的处理装置。如图1所示,所述分布式数据库的处理方法包括以下步骤101和102。
步骤101,获取数据库集群中的待处理信息。
待处理信息是来自于不同线程的信息,数据库集群中的所有分布式数据节点上均有相同的序列集合,所述序列集合包括序列。
需要说明的是,数据库集群中创建的分布式数据库可以有序列,也可以没有序列。一个序列对应一个分布式数据库,每个分布式数据库中可以有多个序列(例如,序列1、序列2等)。若某个分布式数据库中没有序列,则该分布式数据库无法使用序列的相关功能。
例如,应用发送多个待处理的结构化查询语言(Structured Query Language,SQL)语句给分布式数据库的处理装置,分布式数据库的处理装置通过多个不同的线程对待处理的SQL语句进行解析,获得待处理信息。该待处理信息包括多条需处理的序列信息,将一条待执行的SQL语句发送至对应的数据库节点上执行,可获得一条需处理的序列信息的处理结果。
步骤102,对待处理信息进行排序,生成唯一递增的序列处理结果。
需要说明的是,待处理信息中可包括序列创建、序列申请、序列删除等信息,使得分布式数据库的处理装置能够针对不同的序列操作,对待处理信息进行分类,然后依照不同的类别,对待处理信息进行处理,生成唯一递增的序列处理结果。
在一个具体实现中,步骤102可采用如下方式实现。依据待处理信息的到达顺序对待处理信息进行如下操作中的任意一项,生成唯一递增的序列处理结果:将待处理信息对应的序列添加至序列集合中;从序列集合中删除待处理信息对应的序列;依据待处理信息对序列集合进行修改。
序列处理结果包括序列,序列是数据库集群中按照一定规则自增的数字,因为自增所以不会重复,保证了序列的唯一性。序列可以作为代理主键,用于识别数据;还可以用于记录数据库中最新变化的语句,只要数据库中的语句有变化(例如,插入语句或删除语句等),序列都会随着更新,故可根据序列来筛选出更新的语句。以上对于序列的使用方式仅是举例说明,可根据具体情况具体设定,其他未说明的序列的使用方式也在本申请的保护范围之内,在此不再赘述。
例如:第一待处理信息先到达,第二待处理信息后到达,则会先处理第一待处理信息(例如,为第一待处理信息分配序列100),然后再处理第二待处理信息(例如,为第二待处理信息分配序列101),使得获得的序列处理结果为:序列100和序列101,以保证序列是唯一递增的,避免出现两个序列为100的序列。
在本实施例中,通过依据待处理信息的到达顺序对待处理信息进行处理,生成唯一递增的序列处理结果,保证了序列的唯一性和自增性。由于待处理信息是来自于不同线程的信息,数据库集群中的所有分布式数据节点上均有相同的序列集合,该序列集合包括序列,当用户使用序列处理结果在整个数据库集群上搜索数据时,加快了数据搜索的速度,提高了用户体验度。
本申请实施例提供了另一种可能的实现方式,在步骤101之前, 所述分布式数据库的处理方法还包括以下步骤103至105。
步骤103,接收应用发送的数据库处理语句。
步骤104,对数据库处理语句进行解析,获得数据定义语言和数据操纵语言。
数据定义语言(Data Definition Language,DDL)是结构化查询语言(Structured Query Language,SQL)集合中负责数据结构定义与数据库对象定义的语言,DDL负责对数据库中的对象进行创建、修改、删除、索引和存储等操作。例如,建立数据库、建立数据表、建立数据表索引、建立查看表等操作。DDL须由计算机软件进行编译,转换为便于计算机存储、查询和操作的格式,完成这个转换工作的程序称为模式编译器。数据操纵语言(Data Manipulation Language,DML),用户通过它可以实现对数据库的基本操作,例如,DML负责对数据库中的数据进行插入、修改和删除等操作。例如,将某个数据插入到已生成的数据库或数据表中,删除数据表中的某个数据,或,对数据表中的某些数据进行修改等操作。
步骤105,对数据库定义语言和数据操纵语言进行分析,提取待处理信息。
例如,通过对DDL和DML进行分析,获得DDL中的数据表所对应的数据库节点为数据库节点A,同时,DML语句中的数据对应的数据库节点也是数据库节点A,则将以上两条信息进行合并,获得与数据库节点A对应的待处理信息。
在一个具体实现中,步骤105可采用如下方式实现:对数据库定义语言和数据操纵语言进行分析,获得N个序列处理需求;对N个序列处理需求进行批量处理,生成待处理信息,N为大于或等于1的整数。
通过对N个序列处理需求进行批量处理,减少处理次数,避免消息冗余和数据库性能下降,提升了数据处理速度。
在本实施例中,通过多个不同的线程对数据库定义语言和数据操纵语言进行分析,提取待处理信息,然后将待处理信息进行批量处理,减少数据处理次数,避免消息冗余和数据库性能下降,提升了数 据处理速度。
本申请实施例提供了另一种可能的实现方式,在步骤101之前,所述分布式数据库的处理方法还包括:依据分布式数据节点的处理能力对分布式数据节点进行集群的划分,获得数据库集群。
在本实施例中,通过依据分布式数据节点的处理能力对分布式数据节点进行集群的划分,使得能够让处理能力高的分布式数据节点可以处理较复杂的数据,处理能力较低的分布式数据节点处理较简单的数据,提升了数据库的工作效率。并且,采用数据库集群的方式来对各个分布式数据节点进行管理,在数据库集群的层面上对数据进行隔离,使得每个数据库集群上的序列信息保证唯一递增性,当用户在该数据库集群中查询某些数据时,能够快速找到需要的数据,提升用户体验度。
本申请实施例提供了另一种可能的实现方式,在步骤102之后,所述分布式数据库的处理方法还包括如下步骤106和107。
步骤106,依据序列处理结果,将数据库处理语句发送至分布式数据节点进行处理。
例如,序列处理结果包括多个序列,将序列1对应的待处理的SQL语句,发送至第一分布式数据节点中进行处理,生成第一数据处理结果;将序列2对应的待处理的SQL语句,发送至第二分布式数据节点中进行处理,生成第二数据处理结果;……;将序列n对应的待处理的SQL语句,发送至第n分布式数据节点中进行处理,生成第n数据处理结果。使得能够并行的对多个待处理的SQL语句进行处理,提升数据处理效率。
步骤107,响应于分布式数据节点反馈的数据处理结果,将数据处理结果转发给应用。
例如,将第一分布式数据节点反馈的第一数据处理结果,通过第1处理线程反馈至应用,将第二分布式数据节点反馈的第二数据处理结果,通过第2处理线程反馈至应用,……,将第n分布式数据节点反馈的第n数据处理结果,通过第n处理线程反馈至应用。并发的处理线程提升了数据处理速度,提升了用户体验度。
在本实施例中,通过将序列处理结果,通过不同的处理线程发送至对应的分布式数据节点进行处理,并将各个分布式数据节点反馈的数据处理结果再通过对应的处理线程反馈至应用,提升了数据处理效率,并使得应用能够快速获得数据处理结果,提升了用户体验度。
图2是本申请实施例中的分布式数据库的处理方法的另一种流程示意图。如图2所示,该分布式数据库的处理方法包括如下步骤201至203。
步骤201,获取数据库集群中的待处理信息。
步骤202,第一全局序列管理器对待处理信息进行排序,生成唯一递增的序列处理结果。
需要说明的是,本实施例中的步骤201和202与上一实施例中的步骤101和102类似,在此不再赘述。
步骤203,将序列处理结果同步至第二全局序列管理器中。
第二全局序列管理器是第一全局序列管理器的备份管理器,第一全局序列管理器包括处理线程。
例如,第二全局序列管理器和第一全局序列管理器互为备份管理器,当第一全局序列管理器发生故障时,使用第二全局序列管理器对待处理信息进行处理,生成唯一递增的序列处理结果;对应的,当第二全局序列管理器发生故障时,可以使用第一全局序列管理器对待处理信息进行处理,生成唯一递增的序列处理结果。
在一些具体实现中,全局序列管理器还可以采用集群的形式实现,例如,全局序列管理器包括一台全局序列管理器主机和多台全局序列管理器备机,保证有多个持有完整序列备份数据的全局序列管理器备机能够在全局序列管理器主机发生故障时,接替全局序列管理器主机继续进行工作,维持数据库序列功能的正常使用。
在本实施例中,通过将序列处理结果同步至第二全局序列管理器中,使得能够对序列处理结果进行备份,避免数据的丢失,保证在第一全局序列管理器出现故障时,第二全局序列管理器可以接替第一全局序列管理器,继续对待处理信息进行排序,维持数据库序列功能 的正常使用,提升用户体验度。
在一些具体实现中,在步骤203之后,所述分布式数据库的处理方法还包括:每间隔预设时长,对第一全局序列管理器中的数据进行全量备份。
例如,第一全局序列管理器每间隔10分钟,对数据进行一次全量备份,即将数据写入磁盘文件中。通过定时对第一全局序列管理器中的数据进行全量备份,保证数据的安全性。
在一些具体实现中,在步骤203之后,所述分布式数据库的处理方法还包括:统计待处理信息的数量;依据待处理信息的数量和预设序列阈值,对第一全局序列管理器中的数据进行全量备份。
例如,对接收到的待处理信息的数量进行统计,若该待处理信息的数量大于预设序列阈值(例如,预设序列阈值为20000条)时,则需要对第一全局序列管理器中的数据进行全量备份,以保证实时数据的安全性。
在一些具体实现中,在对第一全局序列管理器中的数据进行全量备份的步骤之后,所述分布式数据库的处理方法还包括:在第一全局序列管理器发生故障时,使用第二全局序列管理器替换第一全局序列管理器;在第一全局序列管理器的故障恢复时,使用第二全局序列管理器中的序列数据作为恢复数据,对第一全局序列管理器中的数据进行恢复,获得恢复后的序列数据;对恢复后的序列数据进行持久化处理。
需要说明的是,持久化处理,包括与数据库相关的各种操作。例如,把恢复后的序列数据写入磁盘文件中,若第一全局序列管理器宕机,则第二全局序列管理器会接替第一全局序列管理器继续为数据库提供服务,待第一全局序列管理器恢复后,第一全局序列管理器会从第二全局序列管理器中获取最新的序列信息,并写入到自己的磁盘文件中,以保证恢复后的序列数据被第一全局序列管理器以磁盘文件的形式保存下来,提升容灾能力。此时若第一全局序列管理器和第二全局序列管理器均宕机,然后第一全局序列管理器恢复,则可以从磁盘文件中读出最新的序列信息,以保证序列信息的唯一递增性。
持久化处理还可以是把域对象永久保存到数据库中;更新数据库中域对象的状态;从数据库中删除一个域对象;根据特定的标识,把一个域对象从数据库加载到内存中;根据特定的查询条件,把符合查询条件的一个或多个域对象从数据库加载到内存中等等。以上对于持久化处理仅是举例说明,可根据具体情况具体设定,其他未说明的持久化处理方式也在本申请的保护范围之内,在此不再赘述。
对恢复后的序列数据进行持久化处理,不仅封装了数据访问细节,为大部分业务逻辑提供面向对象的接口函数;而且可以减少访问数据库数据的次数,增加应用程序的执行速度;并且持久化处理的代码重用性高,可以完成大部分的数据库操作;同时,持久化处理是不依赖于底层数据库和上层业务逻辑实现的,在更换数据库时只需修改该数据库对应的配置文件即可。
通过使用第二全局序列管理器替换第一全局序列管理器,使得在第一全局序列管理器发生故障时,仍能够保证对待处理信息的处理,避免数据库无法使用的情况发生。并且,在第一全局序列管理器的故障恢复时,使用第二全局序列管理器中的序列数据作为恢复数据,对第一全局序列管理器中的数据进行恢复,使得第一全局序列管理器可以继续对待处理信息进行处理,保证在第一全局序列管理器出现故障的过程中的数据不丢失,保证数据的安全性。通过对对恢复后的序列数据进行持久化处理,减少访问数据库数据的次数,增加应用程序的执行速度。
在一些具体实现中,在对第一全局序列管理器中的数据进行全量备份的步骤之后,所述分布式数据库的处理方法还包括:在第一全局序列管理器发生故障,且未使用第二全局序列管理器替换第一全局序列管理器时,获取第一时刻的全局备份数据和第二时刻的全局备份数据;计算预设恢复时刻与第一时刻的第一差值,以及计算预设恢复时刻与第二时刻的第二差值,第一时刻和第二时刻均早于恢复时刻;若确定第二差值小于第一差值,则使用第二时刻的全局备份数据作为恢复数据,对第一全局序列管理器中的数据进行恢复,获得恢复后的序列数据;依据数据恢复后的序列最大值和预设序列阈值,设置起始 序列。
通过选取距离预设恢复时刻最近的一次全量备份数据作为备份数据,使得第一全局序列管理器能够快速将数据恢复至可用状态,保证数据的完整性,并且在完成数据恢复后,依据数据恢复后的序列最大值和预设序列阈值,设置起始序列,避免序列重复。
下面结合附图,详细介绍根据本申请实施例的节点设备。图3示出根据本申请实施例中的分布式数据库的处理装置的结构示意图。分布式数据库的处理装置可以使用包括有全局序列管理器的计算节点来实现。如图3所示,该分布式数据库的处理装置包括如下模块:
获取模块301,配置为获取数据库集群中的待处理信息;处理模块302,配置为对待处理信息进行排序,生成唯一递增的序列处理结果;待处理信息是来自于不同线程的信息,数据库集群中的所有分布式数据节点上均有相同的序列集合,所述序列集合包括序列。
需要说明的是,在数据库集群中创建序列集合,该序列集合包括序列,一个数据库集群中的所有分布式数据节点上均有相同的序列集合。每个数据库集群对应一个处理线程,并且,在一个线程内,根据待处理信息的到达顺序对待处理信息进行处理,并获得唯一递增的序列处理结果,保证了序列的唯一递增性。
在本实施例中,通过处理模块依据待处理信息的到达顺序对待处理信息进行处理,生成唯一递增的序列处理结果,保证了序列的唯一性和自增性。由于待处理信息是来自于不同线程的信息,数据库集群中的所有分布式数据节点上均有相同的序列集合,当用户使用序列处理结果在整个数据库集群上搜索数据时,加快了数据搜索的速度,提高了用户体验度。
需要明确的是,本申请并不局限于上文实施例中所描述并在图中示出的特定配置和处理。为了描述的方便和简洁,这里省略了对已知方法的详细描述,并且上述描述的模块的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
图4是本申请实施例中的分布式数据库的处理系统的一种组成方框图。该分布式数据库的处理系统包括:应用410、计算节点420、全局序列管理器430和数据库集群440。计算节点420包括合并线程422和n个处理线程,例如,第1处理线程4211、第2处理线程4212、……、第n处理线程421n,n为大于或等于1的整数。全局序列管理器430包括:全局序列管理器主机431和全局序列管理器备机432,在全局序列管理器主机431中包括m个处理线程,例如,第1处理线程4311、第2处理线程4312、……、第m处理线程431m,m为大于或等于1的整数。在数据库集群440中创建序列集合441,一个数据库集群440中所包含的所有分布式数据节点上均有相同的序列集合441。
需要说明的是,全局序列管理器430还可以采用集群的形式实现,例如,全局序列管理器包括一台全局序列管理器主机和多台全局序列管理器备机,保证有多个持有完整序列备份数据的全局序列管理器备机能够在全局序列管理器主机发生故障时,接替全局序列管理器主机继续进行工作,维持数据库序列功能的正常使用。
全局序列管理器430配置为对数据库集群440中的序列进行处理。全局序列管理器430采用一台主机和多台备机的集群架构,构成多副本高可用集群架构,能够保证在全局序列管理器主机宕机时,可以有多个持有完整序列副本的全局序列管理器备机能够接替全局序列管理器主机,维持数据库序列功能的正常使用。一个全局序列管理器可以处理一个或多个数据库集群上的序列信息,保证多个数据库集群的处理的高处理性能,同时,保证了集群层面的隔离性,并具有较好的容灾能力。
计算节点420配置为对应用410发送的SQL语句进行解析,并发送序列处理请求给全局序列管理器430。计算节点420在接收到全局序列管理器430返回的序列处理结果后,将序列处理结果中的序列对应的待处理SQL语句发送至各个数据库节点进行处理。计算节点420与数据库集群440建立连接,并与全局序列管理器430建立连接。
数据库集群440包括多个分布式数据节点,对外是一个完整的 存储单位,提供高可靠的数据服务,并保证数据的一致性。各个分布式数据节点配置为执行计算节点420发送的SQL语句,并返回数据处理结果至计算节点420。每个分布式数据节点都具有相同的序列集合441。序列集合441对应全局序列管理器430中的一个处理线程。在全局序列管理器430中的一个处理线程内,根据序列处理请求消息的到达顺序,对待处理的序列信息进行处理并返回响应,以此保证序列的唯一递增性。
全局序列管理器430为一个或多个数据库集群提供唯一递增的序列信息,保证在数据库集群中的序列是唯一递增序列。
需要说明的是,全局序列管理器430只处理与序列相关的信息。全局序列管理器430创建m个处理线程(例如,第1处理线程4311、第2处理线程4312、……、第m处理线程431m),且一个处理线程为数据库集群440中的序列集合441提供服务,即全局序列管理器430中的一个处理线程只处理序列集合441的序列信息。
分布式数据库的处理系统对应用410输入的SQL语句进行处理的方法包括步骤S401至步骤S408。
步骤S401,应用410向计算节点420发送多个待处理的SQL语句。
步骤S402,计算节点420使用第1处理线程4211、第2处理线程4212、……、第n处理线程421n等依次对各个待处理的SQL语句进行解析,获得需处理的序列信息1、需处理的序列信息2、……、需处理的序列信息n等,然后将以上多个需处理的序列信息发送至合并线程422,使得合并线程422能够将各个需处理的序列信息合并,生成并发送序列请求消息至全局序列管理器430。
序列请求消息包括n个需处理的序列信息,以及需处理的序列信息的到达顺序,该到达顺序是需处理的序列信息到达全局序列管理器主机431中的某个处理线程的顺序。
步骤S403,全局序列管理器430中的全局序列管理器主机431在接收到序列请求消息后,通过筛选获得需要数据库集群440中的分布式数据节点进行处理的待处理序列信息集合,然后将该待处理序列 信息集合中的序列,依据各个待处理序列信息的到达顺序,使用第1处理线程4311依次进行处理,生成唯一递增的序列(例如,生成的序列依次为1、2、……、k),k为大于或等于1的整数,以保证数据库集群440上的序列是唯一递增的序列,然后依据生成的序列1、序列2、……、序列k,生成同步消息,并发送同步消息至全局序列管理器备机432。
步骤S404,全局序列管理器备机432接收到同步消息后,将步骤S403中生成的序列信息保存至本地磁盘上,并发送同步完成消息给全局序列管理器主机431。
步骤S405,全局序列管理器主机431响应于全局序列管理器备机432反馈的同步完成消息,确定已将数据成功同步至全局序列管理器备机432中,然后,依据步骤S403中生成的序列1、序列2、……、序列k,生成并发送序列响应消息至计算节点420。
步骤S406,计算节点420依据序列响应消息中的各个序列,将各个序列与对应的SQL语句结合后,直接发送至指定的分布式数据节点上执行,获得执行结果,指定的分布式数据节点执行完成后,将执行结果反馈给计算节点420。
通过数据库集群440中的多个分布式数据节点,并行的对待处理的SQL语句进行处理,使得数据处理效率得到提升。
步骤S407,计算节点420将数据库集群440反馈的数据处理结果反馈给应用410。
在本实施例中,通过计算节点中的多个处理线程对应用发送的多个待处理的SQL语句同时进行处理,生成序列请求消息,该序列请求消息包括多个需处理的序列信息,使得数据的处理效率得到提升,提升了用户体验度。并且,当全局序列管理器接收到计算节点发送的序列请求消息后,各个处理线程依据序列请求消息的到达顺序,对需处理的序列信息进行依次处理,以保证序列的唯一递增性。
图5是本申请实施例中的分布式数据库的处理系统的另一种组成方框图。如图5所示,该分布式数据库的处理系统包括:应用510、 计算节点520、第一全局序列管理器530、第二全局序列管理器550、第一数据库集群540和第二数据库集群560。
第一全局序列管理器530包括:第一全局序列管理器主机531和第一全局序列管理器备机532,全局序列管理器主机531包括n个处理线程,例如,第1处理线程5311、第2处理线程5312、……、第n处理线程531n,n为大于或等于1的整数。第二全局序列管理器550包括:第二全局序列管理器主机551和第二全局序列管理器备机552,全局序列管理器主机551包括m个处理线程,例如,第1处理线程5511、第2处理线程5512、……、第m处理线程551m,m为大于或等于1的整数。计算节点520包括合并线程522和k个处理线程,例如,第1处理线程5211、第2处理线程5212、……、第k处理线程521k,k大于或等于m与n的和。数据库集群540包括序列集合541。数据库集群560包括序列集合561。
需要说明的是,在本实施例中,第一全局序列管理器530仅用于处理第一数据库集群540对应的序列信息,第二全局序列管理器550仅用于处理第二数据库集群560对应的序列信息,计算节点520中的合并线程需要将应用510发送的待处理序列信息进行分类,依据待处理序列的到达顺序,以及待处理序列对应的SQL语句需要哪个数据库集群进行处理,为待处理序列分配对应的全局序列管理器,进而将不同的待处理序列分发至不同的全局序列管理器中进行处理。
例如,本实施例中的分布式数据库的处理系统对应用510输入的SQL语句进行数据处理的方法包括步骤S501至步骤S509。
步骤S501,应用510向计算节点520发送多个待处理的SQL语句。
步骤S502,计算节点520使用第1处理线程5211、第2处理线程5212、……、第k处理线程521k等依次对各个待处理的SQL语句进行解析,获得需处理的序列信息1、需处理的序列信息2、……、需处理的序列信息k等,然后将以上多个需处理的序列信息发送至合并线程522,使得合并线程522依据待处理序列的到达顺序,以及处理该待处理序列对应的SQL语句的数据库集群,对各个待处理的序列 信息进行分类和合并,生成第一序列请求消息和第二序列请求消息;发送第一序列请求消息至第一全局序列管理器530,发送第二序列请求消息至第二全局序列管理器550。
第一序列请求消息包括n个需处理的序列信息;第二序列请求消息包括m个需处理的序列信息。需要说明的是,第一序列请求消息中的n个需处理的序列信息都属于第一数据库集群540,第二序列请求消息中的m个需处理的序列信息都属于第二数据库集群560。
步骤S503,第一全局序列管理器530中的第一全局序列管理器主机531在接收到第一序列请求消息后,将n个需处理的序列信息依次分发至第1处理线程5311上进行处理,以依次生成序列1、2、……、n,n为大于或等于1的整数,保证第一数据库集群540上的序列是唯一递增的序列,然后依据以上生成的序列信息生成第一同步消息,并发送第一同步消息至第一全局序列管理器备机532,第一同步消息包括序列1、序列2、……、序列n。
步骤S504,第一全局序列管理器备机532接收到同步消息后,将步骤S503中生成的序列信息保存至本地磁盘上,并发送同步完成消息给第一全局序列管理器主机531。
步骤S505,第一全局序列管理器主机531响应于第一全局序列管理器备机532反馈的同步完成消息,确定已将数据成功同步至第一全局序列管理器备机532中,然后,依据步骤S503中生成的序列1、序列2、……、序列n,和各个序列对应的分布式数据节点的标识,生成并发送第一序列响应消息至计算节点520。
需要说明的是,在步骤S503至步骤S505执行的同时,第二全局序列管理器550同时进行相类似的操作,具体如步骤S506至步骤S508所示。
步骤S506,第二全局序列管理器550中的第二全局序列管理器主机551在接收到第二序列请求消息后,将m个需处理的序列信息依次分发至例如第2处理线程5512上进行处理,依次生成序列1、2、……、m,保证第二数据库集群560上的序列是唯一递增的序列,然后依据以上生成的序列信息生成第二同步消息,并发送第二同步消 息至第二全局序列管理器备机552,第二同步消息包括序列1、序列2、……、序列m等。
步骤S507,第二全局序列管理器备机552接收到同步消息后,将步骤S506中生成的序列信息保存至本地磁盘上,并发送同步完成消息给第二全局序列管理器主机551。
步骤S508,第二全局序列管理器主机551响应于第二全局序列管理器备机552反馈的同步完成消息,确定已将数据成功同步至第二全局序列管理器备机552中,然后,依据步骤S506中生成的序列1、序列2、……、序列m,生成并发送第二序列响应消息给计算节点520。
需要说明的是,第一全局序列管理器530所生成的序列(序列1、序列2、……、序列n),保证第一数据库集群540中的序列是唯一递增的;第二全局序列管理器550所生成的序列(序列1、序列2、……、序列m),保证第二数据库集群560中的序列是唯一递增的,使得分布式数据库能够在数据库集群的层面上得到序列的隔离,保证序列的唯一递增性。
步骤S509,计算节点520接收到第一序列响应消息和第二序列响应消息后,将第一序列响应消息中的各个序列与待处理的SQL语句进行组合,生成并发送第一数据处理消息至数据库集群540进行处理,同时,将第二序列响应消息中的各个序列与待处理的SQL语句进行组合,生成并发送第二数据处理消息至数据库集群560进行处理,计算节点520在获得第一数据库集群540和第二数据库集群560中的各个分布式数据节点反馈的数据处理结果后,将各个数据处理结果转发给应用510。
需要说明的是,因第一数据库集群540和第二数据库集群560是物理上隔离的,因此,生成的各个数据处理结果对应的序列之间不会受到干扰。保证了数据库集群层面的上的序列的隔离,以及序列的唯一递增性。
在本实施例中,通过不同的全局序列管理器处理不同的数据库集群中的序列信息,使得能够从数据库集群的层面上进行序列的隔离,同时,保证每个数据库集群内部的序列的唯一性。采用多个线程对待 处理信息进行并发的处理,提升了序列处理效率,保证了分布式数据库系统的高性能。
图6a示出本申请实施例中的全局序列管理器备机对数据进行全量备份时存在异常的处理流程示意图。如图6a所示,全局序列管理器备机每间隔预设时长(例如5分钟)可对数据进行一次全量备份,即将数据写入磁盘文件中。但是,例如在12:05之后、12:10到来之前,接收到序列处理请求并统计获得待处理信息的数量,当待处理信息的数量大于预设序列阈值(例如,预设序列阈值设定为10000条)时(例如12:06时),全局序列管理器备机还会再次对数据进行全量备份,以确保序列信息不丢失。当12:10到来时,全局序列管理器备机再次对数据进行全量备份。
图6b示出本申请实施例中的全局序列管理器主机对数据进行恢复时的处理流程示意图。如图6b所示,例如12:09时,全局序列管理器主机由于工作人员的维修,已恢复正常工作状态,此时,全局序列管理器主机需要启动数据恢复操作。因图6a中所示,全局序列管理器备机分别在12:06和12:10都进行了全量备份,故全局序列管理器主机需要向全局序列管理器备机申请当前恢复时刻之前的数据,依此数据作为备份数据,对故全局序列管理器主机的数据进行恢复,包括步骤S601至步骤S604。
步骤S601,全局序列管理器主机向全局序列管理器备机申请获取例如12:06时刻的备份数据,作为本次的恢复数据。
需要说明的是,全局序列管理器主机会选择早于例如12:09时刻(恢复时刻)、且距离12:09时刻最近的一次全量备份数据作为本次恢复的恢复数据,以保证数据的全面性和安全性。
步骤S602,全局序列管理器主机将例如12:06时刻的备份数据作为本次的恢复数据,对自己内部的数据进行恢复。
步骤S603,在数据恢复完成时,设定序列的起始值。
例如,将恢复后的序列的最大值和预设序列阈值做加和运算,获得序列的起始值;或者,先通过步长(即序列之间的间隔长度,例 如,1、5等)与预设序列阈值做乘法运算,获得乘积值,然后再将恢复后的序列的最大值和该乘积值做加和运算,以获得序列的起始值。以上对于序列的起始值的设定方法仅是举例说明,可根据实际情况具体设定,其他未说明的序列的起始值的设定方法也在本申请的保护范围之内,在此不再赘述。
步骤S604,删除例如12:09时刻(恢复时刻)以后的备份数据,以防止数据的重复,提升存储空间的利用率。
在本实施例中,通过全局序列管理器备机每间隔预设时长对序列信息进行全量备份,确保序列信息不丢失。在全局序列管理器主机从宕机的故障中恢复后,通过向全局序列管理器备机申请获取备份数据,即将距离当前时间点最近的一次全量备份数据作为备份数据,使得全局序列管理器主机能够快速将数据恢复至当前可用状态,保证数据的完整性,并且在完成数据恢复后,删除恢复时刻之后的备份数据,以防止数据的重复,同时,提升存储空间的利用率。
图7示出能够实现根据本申请实施例的方法和装置的计算设备的示例性硬件架构的结构图。
如图7所示,计算设备700包括输入设备701、输入接口702、中央处理器703、存储器704、输出接口705、以及输出设备706。输入接口702、中央处理器703、存储器704、以及输出接口705通过总线707相互连接,输入设备701和输出设备706分别通过输入接口702和输出接口705与总线707连接,进而与计算设备700的其他组件连接。
具体地,输入设备701接收来自外部的输入信息,并通过输入接口702将输入信息传送到中央处理器703;中央处理器703基于存储器704中存储的计算机可执行指令对输入信息进行处理以生成输出信息,将输出信息临时或者永久地存储在存储器704中,然后通过输出接口705将输出信息传送到输出设备706;输出设备706将输出信息输出到计算设备700的外部,供用户使用。
在一些实施方式中,图7所示的计算设备可以被实现为一种网 络设备,该网络设备可以包括:存储器,被配置为存储计算机程序;处理器,被配置为运行存储器中存储的计算机程序,以执行上述实施例描述的分布式数据库的处理方法。
本申请实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时实现上述实施例描述的分布式数据库的处理方法。
本申请实施例提供的分布式数据库的处理方法及装置、网络设备和计算机可读存储介质,通过对数据库集群中的待处理信息进行排序,生成唯一递增的序列处理结果,保证了序列的唯一性和自增性,另外,由于待处理信息是来自于不同线程的信息,数据库集群包括分布式数据节点,通过使得序列处理结果仅能够被对应的分布式数据节点使用,保证了序列的唯一性,能够克服相关技术中由于分布式数据库无法提供唯一的序列而导致的数据搜索效率低的问题,当用户使用序列处理结果在整个数据库集群上搜索数据时,加快了数据搜索的速度,提高了用户体验度。
一般来说,本申请的多种实施例可以在硬件或专用电路、软件、逻辑或其任何组合中实现。例如,一些方面可以被实现在硬件中,而其它方面可以被实现在可以被控制器、微处理器或其它计算装置执行的固件或软件中,尽管本申请不限于此。
本申请的实施例可以通过移动装置的数据处理器执行计算机程序指令来实现,例如在处理器实体中,或者通过硬件,或者通过软件和硬件的组合。计算机程序指令可以是汇编指令、指令集架构(ISA)指令、机器指令、机器相关指令、微代码、固件指令、状态设置数据、或者是以一种或多种编程语言的任意组合编写的源代码或目标代码。
本申请附图中的任何逻辑流程的框图可以表示程序步骤,或者可以表示相互连接的逻辑电路、模块和功能,或者可以表示程序步骤与逻辑电路、模块和功能的组合。计算机程序可以存储在存储器上。存储器可以具有任何适合于本地技术环境的类型并且可以使用任何适合的数据存储技术实现,例如但不限于只读存储器(ROM)、随机访问存储器(RAM)、光存储器装置和系统(数码多功能光碟DVD或 CD光盘)等。计算机可读介质可以包括非瞬时性存储介质。数据处理器可以是任何适合于本地技术环境的类型,例如但不限于通用计算机、专用计算机、微处理器、数字信号处理器(DSP)、专用集成电路(ASIC)、可编程逻辑器件(FGPA)以及基于多核处理器架构的处理器。
通过示范性和非限制性的示例,上文已提供了对本申请的示范实施例的详细描述。但结合附图和权利要求来考虑,对以上实施例的多种修改和调整对本领域技术人员来说是显而易见的,但不偏离本申请的范围。因此,本申请的恰当范围将根据权利要求确定。

Claims (14)

  1. 一种分布式数据库的处理方法,包括:
    对数据库集群中的待处理信息进行排序,生成唯一递增的序列处理结果,其中,所述待处理信息是来自于不同线程的信息,所述数据库集群中的所有分布式数据节点上均有相同的序列集合,所述序列集合包括序列。
  2. 根据权利要求1所述的方法,其中,所述对数据库集群中的待处理信息进行排序,生成唯一递增的序列处理结果,包括:
    依据所述待处理信息的到达顺序对所述待处理信息进行如下操作中的任意一项,以生成所述序列处理结果:
    将所述待处理信息对应的序列添加至所述序列集合中;
    从所述序列集合中删除所述待处理信息对应的序列;以及
    依据所述待处理信息对所述序列集合进行修改。
  3. 根据权利要求1所述的方法,还包括:
    在所述对数据库集群中的待处理信息进行排序,生成唯一递增的序列处理结果的步骤之前,接收应用发送的数据库处理语句;
    对所述数据库处理语句进行解析,获得数据定义语言和数据操纵语言;以及
    对所述数据库定义语言和所述数据操纵语言进行分析,提取所述待处理信息。
  4. 根据权利要求3所述的方法,其中,对所述数据库定义语言和所述数据操纵语言进行分析,提取所述待处理信息,包括:
    对所述数据库定义语言和所述数据操纵语言进行分析,获得N个序列处理需求;
    对所述N个序列处理需求进行批量处理,生成所述待处理信息,其中,N为大于或等于1的整数。
  5. 根据权利要求3所述的方法,还包括:
    在所述对数据库集群中的待处理信息进行排序,生成唯一递增的序列处理结果之后,依据所述序列处理结果,将所述数据库处理语句发送至所述分布式数据节点进行处理;以及
    响应于所述分布式数据节点反馈的数据处理结果,将所述数据处理结果转发给所述应用。
  6. 根据权利要求1所述的方法,还包括:
    在所述对数据库集群中的待处理信息进行排序,生成唯一递增的序列处理结果的步骤之前,依据所述分布式数据节点的处理能力对所述分布式数据节点进行集群的划分,获得所述数据库集群。
  7. 根据权利要求1至6中任一项所述的方法,其中,
    第一全局序列管理器对数据库集群中的待处理信息进行排序,生成唯一递增的序列处理结果,
    所述方法还包括:
    在所述对数据库集群中的待处理信息进行排序,生成唯一递增的序列处理结果之后,将所述序列处理结果同步至第二全局序列管理器中,其中,所述第二全局序列管理器是所述第一全局序列管理器的备份管理器,所述第一全局序列管理器包括处理线程。
  8. 根据权利要求7所述的方法,还包括:
    在所述将所述序列处理结果同步至第二全局序列管理器中之后,每间隔预设时长,对所述第一全局序列管理器中的数据进行全量备份。
  9. 根据权利要求7所述的方法,还包括:
    在所述将所述序列处理结果同步至第二全局序列管理器中之后,统计所述待处理信息的数量;以及
    依据所述待处理信息的数量和预设序列阈值,对所述第一全局 序列管理器中的数据进行全量备份。
  10. 根据权利要求8或9所述的方法,还包括:
    在所述对所述第一全局序列管理器中的数据进行全量备份之后,
    在所述第一全局序列管理器发生故障时,使用第二全局序列管理器替换所述第一全局序列管理器;
    在所述第一全局序列管理器的故障恢复时,使用所述第二全局序列管理器中的序列数据作为恢复数据,对所述第一全局序列管理器中的数据进行恢复,获得恢复后的序列数据;以及
    对所述恢复后的序列数据进行持久化处理。
  11. 根据权利要求8或9所述的方法,还包括:
    在所述对所述第一全局序列管理器中的数据进行全量备份之后,
    在所述第一全局序列管理器发生故障,且未使用第二全局序列管理器替换所述第一全局序列管理器时,获取第一时刻的全局备份数据和第二时刻的全局备份数据;
    计算预设恢复时刻与所述第一时刻的第一差值,以及计算所述预设恢复时刻与所述第二时刻的第二差值,其中,所述第一时刻和所述第二时刻均早于所述恢复时刻;
    若确定所述第二差值小于第一差值,则使用所述第二时刻的全局备份数据作为恢复数据,对所述第一全局序列管理器中的数据进行恢复,获得恢复后的序列数据;以及
    依据数据恢复后的序列最大值和预设序列阈值,设置起始序列。
  12. 一种分布式数据库的处理装置,包括:
    获取模块,配置为获取数据库集群中的待处理信息;
    处理模块,配置为对所述待处理信息进行排序,生成唯一递增的序列处理结果,其中,所述待处理信息是来自于不同线程的信息,所述数据库集群中的所有分布式数据节点上均有相同的序列集合,所述序列集合包括序列。
  13. 一种网络设备,包括:
    一个或多个处理器;
    存储器,其上存储有一个或多个计算机程序,当所述一个或多个计算机程序被所述一个或多个处理器执行时,使得所述一个或多个处理器实现权利要求1至11中任一项所述的方法。
  14. 一种计算机可读存储介质,所述存储介质存储有计算机程序,所述计算机程序被处理器执行时实现权利要求1至11中任一项所述的方法。
PCT/CN2021/103095 2020-06-29 2021-06-29 分布式数据库的处理方法及装置、网络设备和计算机可读存储介质 WO2022002044A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010604834.9 2020-06-29
CN202010604834.9A CN113934792B (zh) 2020-06-29 2020-06-29 分布式数据库的处理方法、装置、网络设备和存储介质

Publications (1)

Publication Number Publication Date
WO2022002044A1 true WO2022002044A1 (zh) 2022-01-06

Family

ID=79272984

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/103095 WO2022002044A1 (zh) 2020-06-29 2021-06-29 分布式数据库的处理方法及装置、网络设备和计算机可读存储介质

Country Status (2)

Country Link
CN (1) CN113934792B (zh)
WO (1) WO2022002044A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114691706A (zh) * 2022-04-28 2022-07-01 中国银行股份有限公司 分布式系统序号生成方法及系统
CN114900532A (zh) * 2022-05-09 2022-08-12 南方电网大数据服务有限公司 电力数据容灾方法、系统、装置、计算机设备和存储介质
CN116303661A (zh) * 2023-01-12 2023-06-23 北京万里开源软件有限公司 一种分布式数据库中针对序列的处理方法、装置及系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2619778A1 (en) * 2005-09-09 2007-03-15 Avokia Inc. Method and apparatus for sequencing transactions globally in a distributed database cluster with collision monitoring
CN104361065A (zh) * 2014-11-04 2015-02-18 福建亿榕信息技术有限公司 基于Zookeeper的分布式系统的有序序列号生成方法
CN107247770A (zh) * 2017-06-05 2017-10-13 广东亿迅科技有限公司 基于zookeeper的全局序列生成方法及装置
CN109977171A (zh) * 2019-02-02 2019-07-05 中国人民大学 一种保证事务一致性和线性一致性的分布式系统和方法
CN110162573A (zh) * 2019-05-05 2019-08-23 中国银行股份有限公司 一种分布式序列生成方法、装置及系统

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109656911B (zh) * 2018-12-11 2023-08-01 江苏瑞中数据股份有限公司 分布式并行处理数据库系统及其数据处理方法
CN111143389B (zh) * 2019-12-27 2022-08-05 腾讯科技(深圳)有限公司 事务执行方法、装置、计算机设备及存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2619778A1 (en) * 2005-09-09 2007-03-15 Avokia Inc. Method and apparatus for sequencing transactions globally in a distributed database cluster with collision monitoring
CN104361065A (zh) * 2014-11-04 2015-02-18 福建亿榕信息技术有限公司 基于Zookeeper的分布式系统的有序序列号生成方法
CN107247770A (zh) * 2017-06-05 2017-10-13 广东亿迅科技有限公司 基于zookeeper的全局序列生成方法及装置
CN109977171A (zh) * 2019-02-02 2019-07-05 中国人民大学 一种保证事务一致性和线性一致性的分布式系统和方法
CN110162573A (zh) * 2019-05-05 2019-08-23 中国银行股份有限公司 一种分布式序列生成方法、装置及系统

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114691706A (zh) * 2022-04-28 2022-07-01 中国银行股份有限公司 分布式系统序号生成方法及系统
CN114900532A (zh) * 2022-05-09 2022-08-12 南方电网大数据服务有限公司 电力数据容灾方法、系统、装置、计算机设备和存储介质
CN116303661A (zh) * 2023-01-12 2023-06-23 北京万里开源软件有限公司 一种分布式数据库中针对序列的处理方法、装置及系统
CN116303661B (zh) * 2023-01-12 2023-09-12 北京万里开源软件有限公司 一种分布式数据库中针对序列的处理方法、装置及系统

Also Published As

Publication number Publication date
CN113934792B (zh) 2023-03-24
CN113934792A (zh) 2022-01-14

Similar Documents

Publication Publication Date Title
WO2022002044A1 (zh) 分布式数据库的处理方法及装置、网络设备和计算机可读存储介质
US10990610B2 (en) Synchronization on reactivation of asynchronous table replication
WO2019154394A1 (zh) 分布式数据库集群系统、数据同步方法及存储介质
EP3602341B1 (en) Data replication system
US9760595B1 (en) Parallel processing of data
US9589041B2 (en) Client and server integration for replicating data
JP7263297B2 (ja) ハイブリッドクラウド弾性スケーリングおよび高性能データ仮想化のためのリアルタイムクロスシステムデータベースレプリケーション
CN109643310B (zh) 用于数据库中数据重分布的系统和方法
CN110245134B (zh) 一种应用于搜索服务的增量同步方法
US11748215B2 (en) Log management method, server, and database system
CN109241182B (zh) 大数据实时同步方法、装置、计算机设备及存储介质
CN111680017A (zh) 一种数据同步的方法及装置
CN114153809A (zh) 基于数据库日志并行实时增量统计的方法
US11079960B2 (en) Object storage system with priority meta object replication
CN115080666A (zh) 数据同步方法、系统、电子设备及存储介质
US11461201B2 (en) Cloud architecture for replicated data services
WO2020259149A1 (zh) 一种实现增量数据比对的系统及方法
CN111459913B (zh) 分布式数据库的容量扩展方法、装置及电子设备
CN111522688B (zh) 分布式系统的数据备份方法及装置
CN106855869B (zh) 一种实现数据库高可用的方法、装置和系统
CN116303789A (zh) 多分片多副本数据库并行同步方法、装置及可读介质
CN116186082A (zh) 基于分布式的数据汇总方法、第一服务器和电子设备
CN115587141A (zh) 一种数据库同步方法和装置
US11093465B2 (en) Object storage system with versioned meta objects
CN111045869B (zh) 一种数据备份方法、装置及可读存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21832778

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 16/05/2023)

122 Ep: pct application non-entry in european phase

Ref document number: 21832778

Country of ref document: EP

Kind code of ref document: A1