CN116028505A - Method for realizing high available sequence in distributed database - Google Patents

Method for realizing high available sequence in distributed database Download PDF

Info

Publication number
CN116028505A
CN116028505A CN202310139533.7A CN202310139533A CN116028505A CN 116028505 A CN116028505 A CN 116028505A CN 202310139533 A CN202310139533 A CN 202310139533A CN 116028505 A CN116028505 A CN 116028505A
Authority
CN
China
Prior art keywords
sequence
value
val
character string
length
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310139533.7A
Other languages
Chinese (zh)
Inventor
乐磊
仇东标
张勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Focus Technology Co Ltd
Original Assignee
Focus Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Focus Technology Co Ltd filed Critical Focus Technology Co Ltd
Priority to CN202310139533.7A priority Critical patent/CN116028505A/en
Publication of CN116028505A publication Critical patent/CN116028505A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method for realizing high available sequence in a distributed database, which is characterized in that a plurality of computing nodes in the distributed database directly provide sequence values and automatically update sequence caches; the two sequence libraries are used for providing sequence services at the same time, so that high availability of the sequence services is ensured; the uniqueness of sequences among different sequence libraries is realized through self algorithm rules so as to ensure the uniqueness of sequence values in the whole distributed system. The method and the device have the advantages that the application program developer does not need to pay attention to non-business problems such as acquisition of sequences, repetition of sequence values and the like, so that the working gravity center can be placed on own business; the distributed database can provide sequence values more efficiently, reliably and safely.

Description

Method for realizing high available sequence in distributed database
Technical Field
The invention belongs to the field of distributed databases, and particularly relates to a method for realizing a high-availability sequence in a distributed database.
Background
With the rapid development of computer technology and the increasing business fields of applications, more and more data is generated and recorded in databases. With the accumulation of time, the data volume of a single table in a database is more and more huge, and the read-write performance starts to be obviously reduced. Meanwhile, the traditional single-point single-machine deployment mode of the relational database does not solve the problem of high availability, and the requirements of business application on high performance, high availability and high concurrency of the database can not be met. Therefore, more and more enterprises in recent years are beginning to transform into using distributed databases. The distributed database is used for solving the problems of huge data of single table and reduced read-write performance, and one table in the original single-table relational database is split into a plurality of single-table relational databases by taking the record as a dimension. A distributed database supports higher concurrency in order to provide more powerful read-write performance, and multiple computing nodes are typically deployed in one distributed database for parsing SQL, executing SQL, and assembling SQL execution results. For the user, one of the computing nodes can be arbitrarily linked through the distributed database client to perform the related operation. In a stand-alone relational database, a list table may use a primary key sequence to generate a value unique within the list table to identify a record. In a distributed database, when a single table is split into different stand-alone relational databases, there is a need for a function that provides for maintaining unique sequence values across multiple single tables.
The traditional unique sequence value scheme is to independently deploy a set of systems outside the distributed database. The user needs to link the system to obtain independent sequence values, assemble SQL and send the independent sequence values to the distributed database, for example, a key is stored in redis to automatically increase continuously. However, the independent deployment of the systems means that additional software and hardware are required, additional development and maintenance personnel are required, and meanwhile, the linkage between the two systems is required to be maintained for a user, which clearly increases the complexity of the development of the application system and the work of the maintenance personnel.
In order to ensure higher concurrency, the traditional unique sequence value implementation mode is often realized in a memory mode, such as a snowflake algorithm, a specific algorithm is used for generating a never repeated value to serve as a sequence value by using a clock of a computer as a seed, and the unique sequence value implementation mode has the advantages of simplicity in implementation, high performance, low availability, and incapability of effectively solving the problem of machine clock callback and easiness in sequence value repetition. Memory-based approaches are often used in applications where the uniqueness requirements are not so stringent.
Disclosure of Invention
The invention aims to solve the defects and problems in the prior art, and in a distributed database, sequence values need to be considered in high availability, high performance and severe unrepeatability, so that the distributed database can provide the sequence values more efficiently, reliably and safely; the invention directly provides the sequence value by a plurality of computing nodes in the distributed database, and automatically updates the sequence cache; the two sequence libraries are used for providing sequence services at the same time, so that high availability of the sequence services is ensured; the uniqueness of the sequences among different sequence libraries is realized through a self-defined algorithm rule, so that the uniqueness of the sequence values in the whole distributed system is ensured. The application program developer does not need to pay attention to non-business problems such as acquisition of sequences, repetition of sequence values and the like, so that the center of gravity of work can be placed on own business.
In order to solve the technical problems, the invention provides a method for realizing a high-availability sequence in a distributed database, which specifically comprises the following steps:
step 1: respectively creating a sequence library in two independent databases; in order to ensure the stability of the sequence library, a maximum connection number allowed by the sequence library at the same time should be defined, and if the number of requests of the current computing node at the same time is greater than the maximum connection number set by the sequence library, the sequence library should refuse to receive new requests;
step 2: creating a sequence table in each of the two sequence libraries created in the step 1, wherein the field information of the sequence table comprises: a table main key, a sequence name, a sequence starting value, a sequence step length and the number of sequence values which a single computing node should buffer; the sequence names are used for distinguishing different sequences; the sequence starting value is an initial value when calculating the sequence, and each newly generated sequence is updated in the sequence table; the sequence step length is the increasing speed of the sequence value and is recorded as the difference value of every two continuous sequences;
step 3: creating a storage process in each of the two sequence libraries created in the step 1; the storage process is used for processing a program segment of specific complex logic, and receives a sequence name as an input parameter and outputs a character string as an output parameter; the storage process converts the received sequence name into uppercase, finds the corresponding sequence record in the sequence table according to the sequence name, updates the sequence starting value according to a preset algorithm, returns the character string with the appointed format if the updating is successful, and returns the preset character string if the updating is failed;
step 4: after the step 3 is completed, respectively creating sequence records in the two sequence tables; the method comprises the following steps: creating a sequence record with identical sequence names in the two sequence tables, wherein the sequence step sizes of the two sequence records with identical names are identical and are multiples of 10, the single digits of the initial values of the sequences are identical, but one of the tens digits is an odd number, and the other is an even number;
step 5: the computing node receives a request of a user for extracting a sequence, searches whether an available sequence exists in a local cache according to a sequence name, consumes a sequence value to return to the user if the available sequence exists, and sends the sequence name to a sequence library to acquire the sequence if the available sequence does not exist; the method comprises the following steps: the computing node confirms whether the sequence library is normal or not through heartbeat detection; if one sequence library is abnormal, generating sequences in the other sequence library, and if both sequence libraries are normal, randomly selecting one sequence for generating the sequences; if the sequence fails, reselecting another library generation sequence; if the heartbeat detection results of the two sequence libraries are abnormal, the computing node continuously waits until the sequence libraries are recovered to be normal; the computing node calls a storage process on the selected sequence library to obtain a return value;
step 6: after receiving the request of the computing node, the sequence library returns the character string with the appointed format to the computing node;
step 7: after the computing node acquires the character string in the appointed format returned by the sequence library, the computing node computes the sequence value and returns the sequence value to the user.
In the step 1, the single-point database is created on MySQL which is independently deployed; sequences are generated from either of two sequence libraries.
In the step 2, the name of the primary key of the table is id, the type is BIGINT type, the length is 20, and the self-growing attribute of the primary key of the single-machine database is used; the field seq_name is used for storing a sequence name, and is of a type VARCHAR and of a length 255; the field curr_val stores the initial value of the sequence, the type is BIGINT, and the length is 20; the field step_val is used for saving the step length of the sequence, and is of the type INT and of the length 11; the field cache_nums is used for storing the cache quantity of the computing nodes, the type is INT, and the length is 11; a unique index is created on the field seq_name.
In the step 3, the storage processes in the two sequence libraries are the same, and are written by using SQL, deployed on specific MySQL, and are named as focus_sequence_func, the type is FUNCTION, the storage process defines a receiving parameter, the type is VACHAR, the length is 255, the name is v_seq_name, the return type is VARCHAR, the length is 64, and the character string in the specified format is a character string composed of a sequence start value, a character string link symbol, a value to be added to the sequence start value, a character string link symbol and a sequence step length; the preset character string comprises-1 and-2; the preset algorithm is as follows: new sequence start value = current sequence start value + number of sequences that should be buffered by the compute node x sequence step size.
The specific steps in the step 3 comprise:
step 3-1: four variables are predefined, namely v_curr_val of long type, v_step_val of int type, v_increment of int type and v_update RowCount of int type; the four variables are respectively used for storing a sequence starting value, a sequence step length, a numerical value which needs to be added to the sequence starting value and the number of changed sequence records;
step 3-2: converting the input v_seq_name into capital letters, converting the seq_name in the table into capital letters, and inquiring two sequence records with the same value; and calculating a v_increment value, wherein the calculation formula is as follows: v_increment=cache_nums step_val; judging whether the v_cur_val value is empty or not, if so, returning to the character string-2, and if not, continuing to step 3-3;
step 3-3: calculating a new sequence starting value, wherein the calculation formula is as follows: the new v_curr_val=v_curr_val+v_increment, and the sequence record inquired in the step 3-2 is updated by using the new v_curr_val value; judging the number of updated sequence records, and if the number is 0 rows, returning to the character string-1; otherwise, returning to the character string with the specified format, wherein the character string format is as follows: v_curr_val value, english comma, v_increment value, english comma, v_step_val value.
In the step 5, after the computing node is started, respectively establishing heartbeat detection according to the address, the account number, the password and the two sequence libraries; initializing a Map data structure by a computing node, splicing # symbols by using the schema names of the distributed databases, splicing sequence names as keys, and using a sequence value queue as a value; the user requests the sequence value of the designated name from the computing node, and the computing node performs the following steps:
step 5-1: the computing node uses the key to query the Map data structure, if the obtained value is not empty, the step 5-2 is executed, if the obtained value is empty, a new empty sequence value queue is constructed and stored in the Map data structure, and the step 5-2 is executed continuously;
step 5-2: checking the sequence value queue obtained in the step 5-1, if the sequence value queue is not empty, extracting a specific sequence value from the sequence value queue and returning the specific sequence value to a requester, and if the sequence value queue is empty, executing the step 5-4; checking the length of the sequence value queue, and executing the step 5-4 if the length of the last sequence value queue is more than 100 and the length of the current sequence value queue is less than half of the last sequence value queue; if the length of the last sequence value queue is less than 100 and the length of the current sequence value queue is less than 50, executing the step 5-4; step 5-3 is executed if the sequence value queue length does not meet the two conditions;
step 5-3: the computing node finishes the operation of returning the sequence value, and enters the waiting for the next request;
step 5-4: calculating a node request sequence library to generate a new batch of sequence values; checking whether the two sequence libraries are normal, and if so, randomly calling one of the storage processes to obtain a returned result character string; if the character string is-1 or-2, repeating the step 5-4 to another sequence library; otherwise, analyzing the returned character string, and executing the step 7.
In the step 7, after receiving the character string in the specified format, the computing node analyzes and extracts a sequence start value, a sequence increment value and a sequence step length, the computing node computes the sequence value, returns one of the sequence values to the user, and the rest of the sequence values are cached in the local memory for the next use.
In the step 7, the calculating the sequence value by the calculating node includes: the first step, calculating the maximum value of the sequence, wherein the calculation formula is as follows: sequence maximum = sequence start value + sequence increase value; secondly, calculating specific sequence values one by one, wherein a calculation formula is as follows: sequence value = sequence start value + sequence step size; and thirdly, judging whether the sequence value of the second step is larger than the maximum value of the sequence of the first step, if so, ending the step of generating the sequence, and if so, repeating the second step.
Step 7 further comprises updating a local sequence buffer with the calculated sequence value; sequentially extracting v_curr_val, v_increment and v_step_val from the return values of the sequence library according to the rules in the step 3-3; successively increasing the value v_step_val based on the returned v_curr_val; after each value is added, if the new v_curr_val is larger than or equal to the sum of the returned v_curr_val and v_increment, ending the operation, and if the new v_curr_val is smaller than the sum of the returned v_curr_val and v_increment, putting the new v_curr_val and v_increment into the sequence value queue obtained in the step 5-2.
The invention has the beneficial effects that:
(1) The invention uses the single-machine relational database to persist the sequence information, thereby solving the problem that the reliability and durability can not be realized by using the memory to save the sequence in the traditional sequence;
(2) The invention ensures high reliability of sequence generation by using a plurality of sequence libraries, at least two sequence libraries;
(3) The invention designs a new sequence generation algorithm, ensures that the sequence value is never repeated, and avoids the problems of time rollback, overlong sequence value length and the like faced by the traditional time-based snowflake algorithm;
(4) The invention improves the concurrency of the sequence generating function and enhances the performance by caching the sequence values by a plurality of computing nodes and providing the sequence service for the user by the computing nodes.
Drawings
FIG. 1 is a flow chart of implementing a high availability sequence in a distributed database in an embodiment of the present invention;
FIG. 2 is a schematic diagram illustrating connection between a service system and a computing node in an embodiment of the present invention;
FIG. 3 is a timing diagram of a user request sequence and a compute node response in an embodiment of the invention.
Detailed Description
To make the objects, technical solutions and advantages of the present invention more apparent, the following describes the initialization and use of sequential services in detail with reference to the accompanying drawings:
fig. 1 is a flowchart of a high-availability client load balancing method based on a database middleware cluster in an embodiment of the present invention, which specifically includes:
step 1: creating two sequence libraries; two single point databases are initialized as a sequence library. The method comprises the following steps: a database was created on each of the two mysqls. MySQL is a stand-alone relational database that is part of the underlying data store of the distributed database. The sequence library is used to hold basic information of sequences and is also used to generate specific sequence values. The initialization parameters include: sequence library name, sequence library character set, sequence library account number, sequence library password, maximum connection number. The maximum number of connections is used to define how many compute nodes a sequence pool remains connected to at most. If the number of the current computing nodes is greater than the set maximum number of connections, the new computing node request will be denied.
By using at least two (or more) sequence libraries, a high reliability of the sequence generation is ensured.
Step 2: sequence tables are created in the two sequence libraries, respectively. The method comprises the following steps: and respectively creating a sequence table in the two sequence libraries. The field information of the sequence table includes: the table primary key, the sequence name, the starting value of the sequence, the step size of the sequence, and the number of buffered sequence values. The sequence names are used to distinguish between different sequences. The sequence start value refers to a start value calculated by the generation algorithm every time a sequence is generated. Sequence step size refers to the rate of increase of the sequence value, i.e. the difference of every two consecutive sequences. The number of buffered sequence values refers to how many sequence values are generated in total each time a sequence value is generated.
The name of the main key of the table is id, the type is BIGINT type, the length is 20, and the self-growing attribute of the main key of the single-machine database is used; the field seq_name is used for storing a sequence name, and is of a type VARCHAR and of a length 255; the field curr_val stores the initial value of the sequence, the type is BIGINT, and the length is 20; the field step_val is used for saving the step length of the sequence, and is of the type INT and of the length 11; the field cache_nums is used for storing the cache quantity of the computing nodes, the type is INT, and the length is 11; a unique index is created on the field seq_name.
By using a stand-alone relational database to persist sequence information, the problem that the reliability and durability cannot be realized by using a memory to save the sequence in the traditional sequence is solved.
Step 3: stored procedures are created in the two sequence libraries, respectively. The method comprises the following steps: the stored procedure is a segment written using SQL, which can be deployed on specific MySQL, for processing program fragments of specific complex logic. It receives a parameter sequence name and outputs the result as a specific value. The above-mentioned storage procedure will first convert the received sequence name into uppercase, and then find the corresponding record in the sequence table according to the sequence name. And updating the sequence starting value according to a specific preset algorithm, returning a result according to a specified format if updating is successful, and returning a number-1 if updating is failed. The rules of the preset algorithm are as follows: new sequence start value = sequence current start value + number of sequence buffers sequence increase step size. The return format is as follows: the character string consists of a sequence start value, an English comma, a sequence increment value, an English comma and a sequence increment step length. The specific flow of the storage process comprises the following steps:
step 3-1: four variables are predefined, namely v_curr_val of long type, v_step_val of int type, v_increment of int type and v_update RowCount of int type; the four variables are respectively used for storing the initial value of the sequence, the step length of the sequence, the accumulated increment of the sequence and the changed sequence record quantity;
step 3-2: the input v_seq_name is converted into capital letters, the seq_name in the table is converted into capital letters, and two sequence records with the same value are queried. And calculating a v_increment value, wherein the calculation formula is as follows: v_increment=cache_nums step_val. Judging whether the v_cur_val value is empty or not, if so, returning to the character string-2, and if not, continuing to step 3-3;
step 3-3: calculating a new sequence starting value, wherein the calculation formula is as follows: new v_curr_val=v_curr_val+v_increment, and the sequence record queried in step 3-2 is updated with the new v_curr_val value. Judging the number of the update operation influence records, and if the number is 0 rows, returning to the character string-1; otherwise, returning to the character string with the specified format, wherein the character string format is as follows: v_curr_val value, english comma, v_increment value, english comma, v_step_val value.
The stored procedure code fragment used in step 3 is as follows:
Figure BDA0004087119270000071
by designing a new sequence generation algorithm, never repetition of sequence values is guaranteed, and the problems of time rollback, overlong sequence value length and the like faced by a traditional time-based snowflake algorithm are avoided.
Step 4: after the step 3 is completed, respectively creating sequence records in the two sequence tables according to actual needs. The method comprises the following steps: the sequence of the same name is to create a sequence record in each sequence table, the unique difference between the two records is that the ten digits of the initial value are even and odd, if one is 0, the other is 1, and the digits of other digits are consistent.
Step 5: and the computing node responds to a request of a user for acquiring the sequence, checks whether the memory has a sequence value cache, returns if the memory has the sequence value cache, and requests the sequence library if the memory does not have the sequence value cache.
The computing node receives a request of a user for extracting a sequence, firstly searches whether an available sequence exists in a local cache according to a sequence name, if so, consumes a value and returns the value to the user, and if not, sends the sequence name to a sequence library to acquire the sequence. More specifically, the method comprises the following steps: the computing node is linked to the two sequence libraries through account passwords; the computing node randomly selects one of the available sequence libraries; the computing node calls the storage process on the sequence library to acquire a return value.
The specific flow of the computing node comprises the following steps:
step 5-1: the calculation node uses the schema name, splices the # symbol, and splices the requested sequence name to form a key; the computing node uses the key to query the Map data structure, if the obtained value is not empty, the step 5-2 is executed, if the obtained value is empty, a new empty sequence value queue is constructed first, and the new empty sequence value queue is stored in the Map data structure, and the step 5-2 is executed continuously;
step 5-2: checking the sequence value queue obtained in the step 5-1, if the sequence value queue is not empty, extracting a specific sequence value from the sequence value queue and returning the specific sequence value to a requester, and if the sequence value queue is empty, executing the step 5-4. Checking the length of the sequence value queue, and executing the step 5-4 if the length of the last sequence value queue is more than 100 and the length of the current sequence value queue is less than half of the previous sequence value queue; if the length of the last sequence value queue is less than 100 and the length of the current sequence value queue is less than 50, executing the step 5-4. Step 5-3 is executed if the sequence value queue length does not meet the two conditions;
step 5-3: the computing node finishes the operation of returning the sequence value, and enters the waiting for the next request;
step 5-4: the compute node requests the sequence library to generate a new batch of sequence values. Checking whether the two sequence libraries are normal, and if so, randomly calling one of the storage processes to obtain a returned result character string; if the character string is-1, repeating the step 5-4 to another sequence library; otherwise, the returned character string is analyzed.
Step 6: the sequence library receives the compute node request, invokes the stored procedure to update the sequence value, and returns a specific result to the compute node. The method comprises the following steps: when the sequence library receives a sequence value of a specific sequence name requested by a computing node, a storage process in the sequence library is called, and the sequence name is used as a parameter to be transmitted in. After the sequence name is received in the storage process, the sequence name is converted into upper case, a corresponding record is found out from the sequence table, and a new sequence starting value is calculated according to a specified algorithm. The storage process transmits the calculated new sequence starting value, the sequence increment value and the sequence step length composition character string back to the calculation node.
Step 7: the calculation node analyzes the returned result, returns the returned result to the specific sequence value of the user, and caches the residual sequence value. The method comprises the following steps: after acquiring the return value of the sequence library, the computing node analyzes and extracts a sequence starting value, a sequence accumulated increasing value and a sequence step length, the computing node computes the sequence value according to a certain algorithm, returns the sequence value to a user, and the rest of the cache is reserved in a local memory for next use. The algorithm is divided into three steps: the first step is to calculate the maximum value of the sequences of the batch, specifically, the maximum value of the sequences=the initial value of the sequences+the cumulative increment value of the sequences; the second step is sequence value = sequence start value + sequence step size; and thirdly, judging whether the sequence value of the second step is larger than the maximum value of the sequence of the first step, if so, ending the step of generating the sequence, and if so, repeating the second step. The specific process of calculating the sequence value by the calculation node comprises the following steps: according to the rule in step 3-3, v_curr_val, v_increment and v_step_val are sequentially extracted. Successively increasing the value v_step_val based on the returned v_curr_val; after adding the value once, if the obtained new v_curr_val is larger than or equal to the sum of the returned v_curr_val and v_increment, ending the operation, and if the obtained new v_curr_val is smaller than the sum of the returned v_curr_val and v_increment, putting the value into the sequence value queue obtained in the step 5-2.
The code segments used for calculating the sequence values in step 7 are:
Figure BDA0004087119270000091
FIG. 2 is a schematic diagram of a distributed database for implementing high availability sequences in the distributed database, including computing clusters and storage clusters within the distributed database connected to users of the distributed database, in accordance with an embodiment of the present invention.
The distributed database users comprise personal users, business systems, third party management tools and other software or personnel; the computing cluster consists of at least 1 computing node; the storage cluster is composed of at least two sequence libraries (sequence library A and sequence library B), and each sequence library is provided with an independent sequence table and a storage process for updating the sequence.
For example, in a distributed database environment, the primary key of the table needs to be guaranteed to be globally unique, and meanwhile, due to the limitation of the field length of the primary key, a UUID algorithm commonly seen in the market cannot be stored in the table at all because of overlong fields, and at the moment, the primary key value can only be provided by means of the method.
For example, a common snowflake algorithm uses a time stamp as a seed when a sequence is generated, repeated sequences can appear once the time service time callback occurs, the sequence is directly destroyed, the unique characteristic is required to be kept forever, and huge disasters can be brought to users. The invention changes the initial value of the sequence in time by depending on the database, and can ensure that the sequence never grows forward and never repeats.
For example, a distributed high-availability sequence realized in a common number segment distribution manner in the market needs to additionally deploy a time service cluster and a sequence service cluster outside an original system, so that a user needs not only additional software and hardware resources, but also maintenance and availability between sequence services, and meanwhile, a new request has to be divided into two calls of different systems. The invention is naturally integrated in a distributed database, and the sequence can be directly used as the conventional oracle and mysql databases without any additional modification and learning for users.
At present, through more than two years of research and development experiments, an embodiment designed based on the idea of the scheme is comprehensively on line in core business of a company, and under the condition of business test, accidents related to sequences do not happen once, so that the access amount of tens of millions of TPS is strongly supported.
FIG. 3 is a timing flow chart of a user request sequence of a distributed database in an embodiment of the present invention, including a user, a computing node, and a sequence library, and the specific steps include:
step 301: a distributed database user requests a specific sequence;
step 302: the computing node checks the local sequence cache, and if the local sequence cache is available, the computing node returns a sequence value;
step 303: calculating a node request sequence library to generate a new sequence value;
step 304-1: the sequence library calls a storage process, and a sequence starting value is updated according to an algorithm;
step 304-2: the sequence library returns the execution result of the storage process to the computing node;
step 305-1: the computing node acquires a sequence library return result, analyzes and extracts a specific sequence value, and caches the specific sequence value in a memory of the computing node;
step 305-2: the computing node extracts a part of sequence values from the sequence buffer and returns the sequence values to the user.
The invention provides a method for generating a unique high-availability sequence value based on a single-machine relational database in a distributed database, wherein the single-machine relational database ensures that the sequence is never repeated, but the problem of high availability is not solved; the invention ensures that the service for generating the sequence is high in availability through the two sequence libraries, and the two sequence libraries can simultaneously provide service to the outside; the invention ensures that the sequence values in different sequence libraries never repeat in the mode that only ten digits are different through the same step length; the invention improves the sequence distribution performance and concurrency by a plurality of calculation nodes for caching the sequence values, and different calculation nodes can automatically supplement or discard the acquired sequence values according to the current caching quantity.
The invention has the beneficial effects that:
(1) The invention uses the single-machine relational database to persist the sequence information, thereby solving the problem that the persistence can not be realized by using the memory to save the sequence in the traditional sequence;
(2) The invention ensures high reliability of the sequence generating capability by using a plurality of sequence libraries, at least two sequence libraries;
(3) The invention designs a new sequence generation algorithm, ensures that the sequence value is never repeated, and avoids the problems of time rollback, overlong sequence value length and the like faced by the traditional time-based snowflake algorithm;
(4) The invention improves the concurrency of the sequence generating function and enhances the performance by caching the sequence values by a plurality of computing nodes and providing the sequence service for the user by the computing nodes.
The above embodiments are not intended to limit the present invention in any way, and all other modifications and applications of the above embodiments which are equivalent to the above embodiments fall within the scope of the present invention.

Claims (9)

1. A method for implementing a high availability sequence in a distributed database, comprising:
step 1: respectively creating a sequence library in two independent databases; in order to ensure the stability of the sequence library, a maximum connection number allowed by the sequence library at the same time should be defined, and if the number of requests of the current computing node at the same time is greater than the maximum connection number set by the sequence library, the sequence library should refuse to receive new requests;
step 2: creating a sequence table in each of the two sequence libraries created in the step 1, wherein the field information of the sequence table comprises: a table main key, a sequence name, a sequence starting value, a sequence step length and the number of sequence values which a single computing node should buffer; the sequence names are used for distinguishing different sequences; the sequence starting value is an initial value when calculating the sequence, and each newly generated sequence is updated in the sequence table; the sequence step length is the increasing speed of the sequence value and is recorded as the difference value of every two continuous sequences;
step 3: creating a storage process in each of the two sequence libraries created in the step 1; the storage process is used for processing a program segment of specific complex logic, and receives a sequence name as an input parameter and outputs a character string as an output parameter; the storage process converts the received sequence name into uppercase, finds the corresponding sequence record in the sequence table according to the sequence name, updates the sequence starting value according to a preset algorithm, returns the character string with the appointed format if the updating is successful, and returns the preset character string if the updating is failed;
step 4: after the step 3 is completed, respectively creating sequence records in the two sequence tables; the method comprises the following steps: creating a sequence record with identical sequence names in the two sequence tables, wherein the sequence step sizes of the two sequence records with identical names are identical and are multiples of 10, the single digits of the initial values of the sequences are identical, but one of the tens digits is an odd number, and the other is an even number;
step 5: the computing node receives a request of a user for extracting a sequence, searches whether an available sequence exists in a local cache according to a sequence name, consumes a sequence value to return to the user if the available sequence exists, and sends the sequence name to a sequence library to acquire the sequence if the available sequence does not exist; the method comprises the following steps: the computing node confirms whether the sequence library is normal or not through heartbeat detection; if one sequence library is abnormal, generating sequences in the other sequence library, and if both sequence libraries are normal, randomly selecting one sequence for generating the sequences; if the sequence fails, reselecting another library generation sequence; if the heartbeat detection results of the two sequence libraries are abnormal, the computing node continuously waits until the sequence libraries are recovered to be normal; the computing node calls a storage process on the selected sequence library to obtain a return value;
step 6: after receiving the request of the computing node, the sequence library returns the character string with the appointed format to the computing node;
step 7: after the computing node acquires the character string in the appointed format returned by the sequence library, the computing node computes the sequence value and returns the sequence value to the user.
2. A method of implementing a high availability sequence in a distributed database as claimed in claim 1, wherein: in the step 1, the single-point database is created on MySQL which is independently deployed; sequences are generated from either of two sequence libraries.
3. A method of implementing a high availability sequence in a distributed database as claimed in claim 2, wherein: in the step 2, the name of the primary key of the table is id, the type is BIGINT type, the length is 20, and the self-growing attribute of the primary key of the single-machine database is used; the field seq_name is used for storing a sequence name, and is of a type VARCHAR and of a length 255; the field curr_val stores the initial value of the sequence, the type is BIGINT, and the length is 20; the field step_val is used for saving the step length of the sequence, and is of the type INT and of the length 11; the field cache_nums is used for storing the cache quantity of the computing nodes, the type is INT, and the length is 11; a unique index is created on the field seq_name.
4. A method of implementing a high availability sequence in a distributed database as claimed in claim 3, wherein: in the step 3, the storage processes in the two sequence libraries are the same, and are written by using SQL, deployed on specific MySQL, and are named as focus_sequence_func, the type is FUNCTION, the storage process defines a receiving parameter, the type is VACHAR, the length is 255, the name is v_seq_name, the return type is VARCHAR, the length is 64, and the character string in the specified format is a character string composed of a sequence start value, a character string link symbol, a value to be added to the sequence start value, a character string link symbol and a sequence step length; the preset character string comprises-1 and-2; the preset algorithm is as follows: new sequence start value = current sequence start value + number of sequences that should be buffered by the compute node x sequence step size.
5. A method of implementing a high availability sequence in a distributed database as recited in claim 4, wherein:
the specific steps in the step 3 comprise:
step 3-1: four variables are predefined, namely v_curr_val of long type, v_step_val of int type, v_increment of int type and v_update RowCount of int type; the four variables are respectively used for storing a sequence starting value, a sequence step length, a numerical value which needs to be added to the sequence starting value and the number of changed sequence records;
step 3-2: converting the input v_seq_name into capital letters, converting the seq_name in the table into capital letters, and inquiring two sequence records with the same value; and calculating a v_increment value, wherein the calculation formula is as follows: v_increment=cache_nums step_val; judging whether the v_cur_val value is empty or not, if so, returning to the character string-2, and if not, continuing to step 3-3;
step 3-3: calculating a new sequence starting value, wherein the calculation formula is as follows: the new v_curr_val=v_curr_val+v_increment, and the sequence record inquired in the step 3-2 is updated by using the new v_curr_val value; judging the number of updated sequence records, and if the number is 0 rows, returning to the character string-1; otherwise, returning to the character string with the specified format, wherein the character string format is as follows: v_curr_val value, english comma, v_increment value, english comma, v_step_val value.
6. A method of implementing a high availability sequence in a distributed database as recited in claim 5, wherein: in the step 5, after the computing node is started, respectively establishing heartbeat detection according to the address, the account number, the password and the two sequence libraries; initializing a Map data structure by a computing node, splicing # symbols by using the schema names of the distributed databases, splicing sequence names as keys, and using a sequence value queue as a value; the user requests the sequence value of the designated name from the computing node, and the computing node performs the following steps:
step 5-1: the computing node uses the key to query the Map data structure, if the obtained value is not empty, the step 5-2 is executed, if the obtained value is empty, a new empty sequence value queue is constructed and stored in the Map data structure, and the step 5-2 is executed continuously;
step 5-2: checking the sequence value queue obtained in the step 5-1, if the sequence value queue is not empty, extracting a specific sequence value from the sequence value queue and returning the specific sequence value to a requester, and if the sequence value queue is empty, executing the step 5-4; checking the length of the sequence value queue, and executing the step 5-4 if the length of the last sequence value queue is more than 100 and the length of the current sequence value queue is less than half of the last sequence value queue; if the length of the last sequence value queue is less than 100 and the length of the current sequence value queue is less than 50, executing the step 5-4; step 5-3 is executed if the sequence value queue length does not meet the two conditions;
step 5-3: the computing node finishes the operation of returning the sequence value, and enters the waiting for the next request;
step 5-4: calculating a node request sequence library to generate a new batch of sequence values; checking whether the two sequence libraries are normal, and if so, randomly calling one of the storage processes to obtain a returned result character string; if the character string is-1 or-2, repeating the step 5-4 to another sequence library; otherwise, analyzing the returned character string, and executing the step 7.
7. A method of implementing a high availability sequence in a distributed database as recited in claim 6, wherein: in the step 7, after receiving the character string in the specified format, the computing node analyzes and extracts a sequence start value, a sequence increment value and a sequence step length, the computing node computes the sequence value, returns one of the sequence values to the user, and the rest of the sequence values are cached in the local memory for the next use.
8. A method of implementing a high availability sequence in a distributed database as recited in claim 7, wherein: in the step 7, the calculating the sequence value by the calculating node includes: the first step, calculating the maximum value of the sequence, wherein the calculation formula is as follows: sequence maximum = sequence start value + sequence increase value; secondly, calculating specific sequence values one by one, wherein a calculation formula is as follows: sequence value = sequence start value + sequence step size; and thirdly, judging whether the sequence value of the second step is larger than the maximum value of the sequence of the first step, if so, ending the step of generating the sequence, and if so, repeating the second step.
9. A method of implementing a high availability sequence in a distributed database as recited in claim 8, wherein: step 7 further comprises updating a local sequence buffer with the calculated sequence value; sequentially extracting v_curr_val, v_increment and v_step_val from the return values of the sequence library according to the rules in the step 3-3; successively increasing the value v_step_val based on the returned v_curr_val; after each value is added, if the new v_curr_val is larger than or equal to the sum of the returned v_curr_val and v_increment, ending the operation, and if the new v_curr_val is smaller than the sum of the returned v_curr_val and v_increment, putting the new v_curr_val and v_increment into the sequence value queue obtained in the step 5-2.
CN202310139533.7A 2023-02-20 2023-02-20 Method for realizing high available sequence in distributed database Pending CN116028505A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310139533.7A CN116028505A (en) 2023-02-20 2023-02-20 Method for realizing high available sequence in distributed database

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310139533.7A CN116028505A (en) 2023-02-20 2023-02-20 Method for realizing high available sequence in distributed database

Publications (1)

Publication Number Publication Date
CN116028505A true CN116028505A (en) 2023-04-28

Family

ID=86091420

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310139533.7A Pending CN116028505A (en) 2023-02-20 2023-02-20 Method for realizing high available sequence in distributed database

Country Status (1)

Country Link
CN (1) CN116028505A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117251456A (en) * 2023-11-16 2023-12-19 广州市千钧网络科技有限公司 Main key value generation method and device, electronic equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117251456A (en) * 2023-11-16 2023-12-19 广州市千钧网络科技有限公司 Main key value generation method and device, electronic equipment and storage medium
CN117251456B (en) * 2023-11-16 2024-03-08 广州市千钧网络科技有限公司 Main key value generation method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN109800222B (en) HBase secondary index self-adaptive optimization method and system
CN107247808B (en) Distributed NewSQL database system and picture data query method
US8140495B2 (en) Asynchronous database index maintenance
US9639542B2 (en) Dynamic mapping of extensible datasets to relational database schemas
CN107038222B (en) Database cache implementation method and system
US9149054B2 (en) Prefix-based leaf node storage for database system
US7890541B2 (en) Partition by growth table space
US7512597B2 (en) Relational database architecture with dynamic load capability
US20100257181A1 (en) Dynamic Hash Table for Efficient Data Access In A Relational Database System
EP2199935A2 (en) Method and system for dynamically partitioning very large database indices on write-once tables
CN111506621B (en) Data statistical method and device
US20120005158A1 (en) Reducing Contention of Transaction Logging in a Database Management System
CN108959538B (en) Full text retrieval system and method
US11526465B2 (en) Generating hash trees for database schemas
AU1472901A (en) System for managing rdbm fragmentations
CN116028505A (en) Method for realizing high available sequence in distributed database
US8489644B2 (en) System and method for managing virtual tree pages
US7822767B2 (en) Modeling and implementing complex data access operations based on lower level traditional operations
CN114556320A (en) Switching to a final consistent database copy
CN112181302A (en) Data multilevel storage and access method and system
CN110781205A (en) JDBC-based database direct-checking method, device and system
CN113127717A (en) Key retrieval method and system
EA027808B1 (en) Database management system
CN117632946A (en) Hierarchical B+ tree algorithm, device and computer storage medium based on dynamic prefix
CN114706832A (en) Data redistribution method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination