CN117828160A

CN117828160A - Data processing method and data processing device

Info

Publication number: CN117828160A
Application number: CN202311710711.3A
Authority: CN
Inventors: 罗旭; 胡彬; 范润泽; 李瑾林
Original assignee: Tianyi Cloud Technology Co Ltd
Current assignee: Tianyi Cloud Technology Co Ltd
Priority date: 2023-12-13
Filing date: 2023-12-13
Publication date: 2024-04-05

Abstract

The invention relates to the technical field of databases, in particular to a data processing method and a data processing device, wherein the data processing method comprises the following steps: when the data processing request is determined to be a data reading request, firstly, a plurality of slave servers are polled, one of the slave servers is determined to be a target slave server, a first log sequence stored in the target slave server is obtained, then a second log sequence stored in a master server is obtained, and the synchronization delay difference between the master server and the target slave server is determined according to the first log sequence and the second log sequence; stopping polling and distributing the data reading request to the target slave server when the synchronous delay difference is smaller than or equal to a preset threshold value and the polling times are smaller than or equal to a preset maximum polling times, and returning a query result obtained by the target slave server according to the data reading request; the obtained query result ensures that the consistency and the instantaneity meet the preset requirements, and the accuracy of the query result is ensured.

Description

Data processing method and data processing device

Technical Field

The present invention relates to the field of database technologies, and in particular, to a data processing method and a data processing device.

Background

To improve the high availability, scalability, and high concurrency of databases, database clustering techniques have evolved. In the database cluster, a plurality of server nodes are generally included, each including a master server and a plurality of slave servers. The master server and the slave server store the same data, and the master server and the slave server communicate and synchronize information through a network so as to maintain the consistency and availability of the data.

In order to adapt to the business requirements of more and less reading and writing in most application scenes, a main server usually processes a writing operation, and a slave server is responsible for reading operations. However, due to synchronization delay or other factors between the master server and the slave servers, the data stored on the slave servers may not be up-to-date, which may cause that the query result obtained from a certain slave server is inconsistent with the query result stored on the master server, thereby affecting the accuracy of the query result.

Disclosure of Invention

The invention provides a data processing method and a data processing device, which are used for solving the technical defect that the query result obtained from a server is inconsistent with the query result stored on a main server in the prior art.

In one aspect, the present invention provides a data processing method, applied to a distributed database cluster, where the distributed database cluster includes a plurality of server nodes, and each server node includes a master server and a plurality of slave servers; for any one of the server nodes, the data processing method includes:

in response to a received data processing request, when the data processing request is determined to be a data reading request, performing the following steps in a circulating manner;

a polling step of polling the plurality of slave servers, determining one slave server as a target slave server, and acquiring a first log sequence number stored in the target slave server determined at this time;

a synchronization delay difference determining step of obtaining a second log sequence number stored in the master server, and determining a synchronization delay difference between the master server and a target slave server according to the first log sequence number and the second log sequence number;

a polling stop determining step of determining whether the synchronization delay difference is smaller than or equal to a preset threshold value and whether the polling times are smaller than or equal to a preset maximum polling times, if yes, stopping polling and distributing the data reading request to the target slave server, and returning a query result obtained by the target server according to the data reading request;

If not, increasing the polling times and repeating the polling step, the synchronous delay difference determining step and the polling stop determining step until the polling is stopped.

Optionally, the polling stop determining step further includes:

determining whether the synchronous delay difference is larger than a preset threshold value and the polling times are equal to a preset maximum polling times, if so, stopping polling and distributing the data reading request to the main server; and returning the query result acquired by the main server according to the data reading request.

Optionally, the data processing method further includes:

and responding to the received data processing request, distributing the data writing request to the main server when the data processing request is determined to be a data writing request, and returning a second log serial number stored in the main server.

Optionally, the determining the synchronization delay difference between the master server and the target slave server according to the first log sequence number and the second log sequence number includes:

the synchronization delay difference between the master and slave servers is determined by the following formula:

wherein,m is the synchronization delay difference between the master server and the slave server _LNN A second log sequence number stored in the primary server; />A first log sequence number stored for the target from the server.

In another aspect, the present application further provides a data processing apparatus, which is applied to a distributed database cluster, where the distributed database cluster includes a plurality of server nodes, and each server node includes a master server and a plurality of slave servers; the data processing apparatus includes: a request receiving unit, a polling unit, a synchronization delay difference determining unit, a polling stop determining unit, and a polling control unit;

the request receiving unit is used for responding to the received data processing request, and triggering the polling unit, the synchronous delay difference determining unit, the polling stop determining unit and the polling control unit to work when the data processing request is determined to be a data reading request;

the polling unit is used for polling the plurality of slave servers and determining one slave server as a target slave server to acquire a first log sequence number stored in the target slave server determined at this time;

a synchronization delay difference determining unit, configured to obtain a second log sequence number stored in the master server, and determine a synchronization delay difference between the master server and a target slave server according to the first log sequence number and the second log sequence number;

A polling stop determining unit, configured to determine whether the synchronization delay difference is less than or equal to a preset threshold value and whether the polling frequency is less than or equal to a preset maximum polling frequency, if yes, stop polling and distribute the data reading request to the target slave server, and return a query result obtained by the target server according to the data reading request;

and the polling control unit is used for controlling the polling unit, the synchronous delay difference determining unit and the polling stop determining unit to circularly work until the polling is stopped when the polling times are smaller than the preset maximum polling times and the synchronous delay difference is larger than a preset threshold value.

Optionally, the polling stop determination unit is further configured to:

Optionally, the request receiving unit is further configured to, in response to the received data processing request, allocate the data writing request to the primary server when determining that the data processing request is a data writing request, and return a second log serial number stored in the primary server.

Optionally, the synchronization delay difference determining unit is specifically configured to:

wherein,m is the synchronization delay difference between the master server and the slave server _LSN A second log sequence number stored in the primary server; />A first log sequence number stored for the target from the server.

The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing a data processing method as described in any of the above when executing the program.

The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a data processing method as described in any of the above.

The invention also provides a computer program product comprising a computer program which, when executed by a processor, implements a data processing method as described in any of the above.

When determining that a data processing request is a data reading request, firstly polling a plurality of slave servers, determining one of the slave servers as a target slave server, acquiring a first log sequence stored in the target slave server, then acquiring a second log sequence stored in a master server, and determining a synchronization delay difference between the master server and the target slave server according to the first log sequence and the second log sequence; when the synchronous delay difference is smaller than or equal to a preset threshold value and the polling times are smaller than or equal to a preset maximum polling times, the fact that the delay difference between the data information stored in the target slave server and the delay difference stored in the master server determined by the polling is smaller or no delay difference is indicated, polling is stopped, a data reading request is distributed to the target slave server, a query result obtained by the target slave server according to the data reading request is returned, the query result is consistent with the query result stored in the master server, and accuracy of the query result is guaranteed.

Drawings

In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of a data processing method according to an embodiment of the present invention;

FIG. 2 is a second flowchart of a data processing method according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a data processing apparatus according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of an electronic device provided by the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

To assist those skilled in the art in better understanding the technical solutions of the present application, some terms related to the present application are explained below.

Database cluster: database clustering refers to grouping together multiple database servers to collectively handle the workload of a database. It improves the performance, scalability, and reliability of databases by distributing data and workload across multiple servers. At the same time, the database cluster may extend the processing power and storage capacity of the system by adding more nodes. New nodes can balance the load and increase the throughput of the system to meet the ever-increasing data demands. In addition, database clusters provide high availability database services through redundant backup and failover mechanisms. When one node fails, other nodes can take over the work of the node, so that the continuity of the database is ensured. Common implementations of database clusters are master-slave replication, multi-master replication, and shared storage. The invention is mainly aimed at database clusters based on master-slave replication, which is the most common cluster configuration, wherein one server acts as a master server and is responsible for processing write operations, while the other servers act as slave servers and are responsible for processing read operations.

Database cluster agent: refers to a middle tier component located at the front end of a database cluster for managing and routing database requests. The system acts as a proxy between the application program and the database cluster and is responsible for the functions of forwarding processing requests, load balancing, fault transfer, query optimization and the like. Typical agents are Pgpool, HAProxy, etc.

Reading operation: refers to an operation of retrieving data from a database. It includes query statements (e.g., SELECT) and operations to read data. Read operations are typically used to obtain data for use by an application without modifying the data in the database.

Write operation: refers to an operation of modifying database data. It includes INSERT (INSERT), UPDATE (UPDATE), DELETE (DELETE) and the like. Write operations are used to add new data to a database, update existing data, or delete data.

Synchronization delay: in database master-slave replication, due to network delay, server load, replication mechanism and other factors, data on a slave server may not immediately keep synchronous with data on a master server, so that a time difference exists between the data on the slave server and the data on the master server.

LSN (Log Sequence Number ): is a unique identifier in the database system that identifies each log record in the transaction log. Typically the LSN is an incremental number that is used to record the order and location of each log.

The application is mainly directed to a database cluster scene based on master-slave replication, wherein the database cluster comprises not only one master server but also a plurality of slave servers. The master server and the slave server store the same data, and communicate and synchronize with each other through a network to maintain the consistency and availability of the data. To accommodate the business requirements of more and less current reads, the master server typically handles the write operations and the slave server is responsible for the read operations. However, due to synchronization delay or other factors between the master server and the slave server, the data on the slave server may not be up-to-date, resulting in inconsistent query results with the query results on the master server, thereby affecting the accuracy of the query results. For example, when a write operation is performed on a master server, the data on the slave servers may not be completely synchronized, and the query results may reflect the old data state, rather than the most current data state, if a query operation is performed on the slave servers. Such query inconsistencies may lead to data inconsistencies and erroneous business decisions.

In order to overcome the above technical drawbacks, the related art proposes to set master-slave replication as synchronous stream replication. The method can ensure that the data synchronization between the master server and the slave server is real-time, namely after the writing operation on the master server is completed, the slave server must wait for confirmation of receiving and applying the operation before proceeding to the next operation. This approach, while guaranteeing data synchronicity between the master and slave servers, synchronous stream replication may result in the master server having to wait for acknowledgement from the slave server after the write operation is completed, which may increase the response time of the master server. If the processing speed of the slave server is slow or the network delay is high, the master server may have to wait a long time, thereby affecting overall performance. Second, in synchronous stream replication, there is a strong dependency between the master and slave servers. If the primary server fails, the entire system may not work properly until the primary server recovers or switches to a standby primary server.

In order to overcome the technical defects, the application provides a data processing method, when determining that a data processing request is a data reading request, firstly polling a plurality of slave servers, determining one of the slave servers as a target slave server, acquiring a first log sequence stored in the target slave server, then acquiring a second log sequence stored in a master server, and determining a synchronization delay difference between the master server and the target slave server according to the first log sequence and the second log sequence; when the synchronous delay difference is smaller than or equal to a preset threshold value and the polling times are smaller than or equal to a preset maximum polling times, the fact that the delay difference between the data information stored in the target slave server and the delay difference stored in the master server determined by the polling is smaller or no delay difference is indicated, polling is stopped, a data reading request is distributed to the target slave server, a query result obtained by the target server according to the data reading request is returned, the query result is consistent with the query result stored in the master server, and accuracy of the query result is guaranteed. In addition, the method and the device can dynamically adjust the threshold corresponding to the synchronous delay difference according to different scene requirements so as to meet the requirements of different service scenes.

It can be appreciated that the data processing method and the data processing device provided by the application can be embedded into the proxy layer of the database cluster architecture, so as to provide more flexible query consistency control for the service. First, the method introduces the concept of a consistency tolerance factor based on the synchronization delay difference of the latest log sequence numbers of the master server and the slave server. The parameter can be dynamically adjusted according to the service requirement so as to meet the requirements of different service scenes on the consistency strength of the query. Meanwhile, the invention introduces the concept of the maximum polling times of consistency based on the load balancing polling algorithm of the existing agent, and can dynamically adjust the size of the service under the condition of meeting the consistency strength of service inquiry, thereby meeting the requirements of different service scenes on inquiry instantaneity. Based on the consistency tolerance factor and the maximum polling times of consistency, the application provides a data processing method and device, which can ensure that the service query instantaneity and the query consistency performance are well balanced, and query requests are reasonably distributed to a database cluster, so as to ensure that a request server meeting the query consistency strength is found.

It will be appreciated that a typical distributed database cluster includes a plurality of server nodes, each server node including a master server and a plurality of slave servers, the data read-write logic of each server node being identical, the data processing method being described with respect to one server node.

The data processing method of the present invention is described below in connection with fig. 1-2 and the first embodiment.

Fig. 1 is a schematic diagram of a data processing method according to an embodiment of the present invention, and fig. 2 is a schematic diagram of a second data processing method according to an embodiment of the present invention, referring to fig. 1 and 2, the data processing method includes:

s101, responding to a received data processing request, and when the data processing request is determined to be a data reading request, circularly executing the following steps.

In one embodiment, a proxy layer of a database cluster receives a data processing request of a client, the data processing request is generally in the form of an SQL statement, then the SQL statement is parsed to determine the type of the current data processing request, and the general types of the current data processing request include a data reading request and a data writing request. Upon determining that the type of the current data processing request is a data read request, the following steps S102, S103, and S104 are cyclically performed to complete the data read request.

In one embodiment, in response to a received data processing request, upon determining that the data processing request is a data write request, the data write request is distributed to the primary server and a second log sequence number stored in the primary server is returned.

S102, polling the plurality of slave servers, determining one slave server as a target slave server, and acquiring a first log sequence number stored in the target slave server determined at this time.

In load balancing, the polling algorithm plays an important role. It can ensure that each server can participate in the processing of the request to achieve load balancing. The working principle is that requests are distributed to each slave server in turn according to a predefined sequence, and then the process is repeated in a loop. Therefore, the invention selects one slave server S by using the polling algorithm _cur The slave server S _cur The target slave server selected as the current polling and acquiring the target slave server S _cur Returns the first log sequence number of (c) to the proxy layer.

Wherein the first log sequence number is a target slave server latest replay log sequence number determined according to a polling algorithm.

Further, in the actual application scenario, in order to obtain a certain slave server S _cur One possible implementation is: when agent layer wheelPolling S _cur In this case, the database cluster agent distributes a query statement to S _cur : select pg_last_ wal _replay_ lsn (); querying playback of the latest log sequence number from the server (for postgresql) And returns to the proxy layer.

S103, a synchronization delay difference determining step, namely acquiring a second log sequence number stored in the master server, and determining the synchronization delay difference between the master server and the target slave server according to the first log sequence number and the second log sequence number.

In one embodiment, the identity between the master and slave servers is determined by the following formula

Step delay difference:

Wherein the second log sequence number is the latest replay log sequence number of the main server.

In practical application, the proxy layer passes throughThereby obtaining the target slave server S of the current polling determination _cur Synchronous delay difference with the main server, when +.>The larger the representation of the main suitServer data and target slave server S _cur The larger the data phase difference, the +.>When 0, the target slave server S is represented _cur Has the same data as the main server M.

S104, a polling stop determining step, namely determining whether the synchronous delay difference is smaller than or equal to a preset threshold value and the polling times are smaller than or equal to a preset maximum polling times, if yes, stopping polling and distributing the data reading request to the target slave server, and returning a query result obtained by the target server according to the data reading request.

In one embodiment, the threshold V of the synchronization delay difference may be set according to the required levels of information consistency in the master server and the slave server in the current application scenario, where the threshold set to be higher the required level of information consistency is smaller and the threshold set to be lower the required level of information consistency is larger. The threshold V is also referred to as a consistency tolerance factor (hereinafter collectively referred to as a consistency tolerance factor for convenience of description), which can also be understood as a strength factor of consistency.

In one embodiment, the maximum polling times T may be preset, the polling times T represent real-time performance of the current query, and the proxy layer is based on the synchronization delay differenceAnd the polling times t can realize the balance between the query consistency and the real-time property.

For example, in a scenario with a high requirement on real-time performance of the query, the maximum polling frequency T may be set to be smaller, whereas in a scenario with a high requirement on consistency of the data query, the synchronization delay difference may be set to be smallerA little smaller is set.

In one embodiment, when the synchronization delay difference is less than or equal to a preset threshold and the number of polls is less than or equal toAnd when the data read request is equal to the preset maximum polling times, determining that the data consistency requirement of the current query and the real-time requirement of the query both meet the preset requirements, stopping polling, distributing the data read request to a target slave server determined by the current polling, and returning a query result obtained by the target server according to the data read request. In other words, if And T is less than or equal to T, representing the current slave server S _cur While meeting the query real-time property (T is less than or equal to T), the strength of the query consistency also meets the set consistency tolerance standardThus will distribute the read operation request to the slave server S _cur And returning the query result and ending the algorithm flow.

S105, if not, increasing the polling times and repeating the polling step, the synchronous delay difference determining step and the polling stop determining step until the polling is stopped.

In one embodiment, due to the delay of data transmission between the master server and the slave server, the synchronization delay difference queried during the previous polling times may not meet the requirement of the preset consistency tolerance factor, but polling is continued when the polling times do not reach the maximum polling times T (i.e. on the premise of meeting the requirement in real time), until the synchronization delay difference and the polling times meet the requirement, and the query result is obtained from the server from the determined target.

In one embodiment, determining whether the synchronization delay difference is greater than a preset threshold and the number of polls is equal to a preset maximum number of polls, if so, stopping polling and distributing a data read request to the master server; and returning a query result obtained by the main server according to the data reading request. In other words, if And t=t, representing that under the condition that query instantaneity is guaranteed to be satisfied (T is less than or equal to T), the current target slave server S _cur The strength of the query consistency of (a) does not meet the set consistency tolerance criterion +.>And this poll is the last poll opportunity (t=t). Therefore, in order to meet the requirement (T) of the business on real-time performance, the embodiment provides a preferable scheme for directly distributing the read operation statement to the main server M, returning the query result and ending the algorithm flow, so that the requirements of consistency and real-time performance of data query are also ensured.

It can be seen that the data processing method provided by the application introduces the consistency tolerance factor on the basis of the synchronous delay difference of the latest log serial numbers of the master server and the slave server, can dynamically adjust the query consistency intensity, and meets the requirements of different service scenes on query consistency. In addition, the design of the maximum polling times is introduced on the basis of the existing load balancing algorithm, and the size of the load balancing algorithm can be dynamically adjusted according to service requirements under the condition of meeting the service query consistency intensity, so that the requirements of different service scenes on query instantaneity are met.

In addition, the data processing method provided by the application does not need to rely on extra middleware on hardware, can be easily embedded into the database cluster agent layer, and is simple and efficient.

In order to more clearly describe the technical solution of the present application, the present application also provides the following second embodiment.

In general, in order to ensure high availability, scalability and high concurrency of databases, many businesses adopt a master-slave based database cluster mode, and by setting a master database and a plurality of slave databases, data of the master database is synchronized into the slave databases, the master database is responsible for writing, and the slave databases are responsible for reading, so that the request pressure of the databases is shared. And after the master server is down, the slave server can be upgraded to the master server to ensure that the service is available. However, when the business is in peak time, SQL writing operation is frequent, data in the main database may change frequently, and the data amount synchronized from the main server to the slave server increases, thereby leading to the main server and the slave serverThere is a synchronization delay between the servers, which may cause a difference between the data queried in the master server and the data queried in the slave server, thereby affecting the accuracy of the query result. The invention provides a method for optimizing query consistency in a database cluster. Referring to fig. 2, a master server is initialized to be M, and a slave server set is c= { S ₁ ,...,S _i ,S _i+1 ,...,S _N Where N represents the number of slave server nodes.

In this embodiment, the technician first sets two custom parameters, i.e., the consistency tolerance factor V and the maximum polling frequency T. Consistency tolerance factor V (V is more than or equal to 0 and less than or equal to M) _LSN ): this parameter represents the extent to which consistency can be tolerated for master-slave queries, with a greater V representing a lower strength of required query consistency. When V is 0, strong consistency is required; the maximum number of polling times of consistency is T (T.gtoreq.1), and the parameter represents the maximum number of polling times of finding the slave server meeting the condition of consistency tolerance factor in the slave server set C. And initializes the number of consistency polls t=1 (1. Ltoreq.t).

In practical applications, the requirements of different business scenarios on query consistency and instantaneity may be different. Depending on the nature of the service, the importance of the data, and the user's desire for the data. Some businesses have very high requirements for consistency of queries, such as in the fields of finance, e-commerce, etc., and need to ensure accuracy and consistency of data, but may not have high requirements for real-time. Some businesses have very high requirements for query real-time, such as logistics tracking systems or real-time monitoring systems, which need to display information such as location, status, etc. in real time, but have low requirements for query accuracy. Thus, when executing SQL query statements, a compromise between both needs to be considered. In this embodiment, the consistency tolerance factor V is mainly set for the requirement of the service on the consistency of the query, and a larger V indicates a lower strength of the consistency of the query is required. When V is 0, a strong agreement is indicated. And the maximum polling times T of consistency are mainly set according to the real-time requirements of the service, the larger T represents the lower real-time requirements of the service, and when T is set to be 1, the service has extremely high real-time requirements and the query result needs to be returned immediately. Through the two parameters introduced by the invention, a technician can balance the relationship between real-time performance and consistency according to specific requirements and service scenes. For example, let a certain database cluster have 1 master 4 slaves, and experience test that the maximum synchronization delay between the master and slave is typically around 10 (MB). Setting several sets of parameters as (V, T) = (10, 1), (8, 2), (6, 3), (4, 4), (2, 5), (0, 6), it can be inferred that when the service is set to these sets of parameters in turn, the service increases in turn for query consistency and decreases in turn for real-time. For example, (10, 1) this set of parameters can tolerate query inconsistencies to a large extent but requires very high real-time performance of the query; the (0, 6) set of parameter services need to guarantee strong consistency of queries, but the real-time requirements are relaxed.

In addition, the maximum polling times T of consistency can be introduced to dynamically adjust the requirement of the service on query instantaneity, and the design initiative of 'master write slave read' of the database cluster can be matched as much as possible under the condition that the requirement of query consistency intensity is met, so that the concurrency capability of the database cluster is improved.

The specific implementation steps of the data processing method provided by the embodiment are as follows:

step 201: the database cluster agent receives the data processing request sent by the client and analyzes the type of the data processing request.

In practice, a master-slave based database cluster may be generated in a number of ways. For example, with some cluster modes of open source database native, as in Mysql cluster, the master server asynchronously transmits a binlog to the slave server, and the slave server receives the log and plays back the log by starting an I/O thread and an SQL thread, thereby achieving the purpose of master-slave data synchronization. As another example, in Postgresql cluster, based on the stream replication technique, the master server asynchronously starts a walsen process to send a Wal log to the slave server, which receives the log and plays back by starting the walrecover process and the Startup process. Some middleware may also be utilized to form a data cluster having different databases. For example, the master server may be an Oracle database, the slave server may be a plurality of Mysql databases, the Oracle message synchronization mechanism service and the kafka message queuing technique are utilized to package the changes of the data in the Oracle master database into a log with a fixed format, the log is transmitted to the kafka as a message, the slave server receives the message as a kafka consumer, and the data obtained by the deblocking is synchronized into the slave server by the change soldier. In this example, the post gresql cluster is taken as an example to illustrate the subsequent implementation steps.

The database cluster agent is usually used as an upper layer of the database cluster, and can be used for acting a lower layer database cluster to receive SQL sentences from clients or application programs and distribute the sentences, and is usually accompanied with functions of load balancing, fault transfer, query optimization and the like. The agent can also easily parse out the type of the SQL statement by using the keywords of the SQL statement. For example, SELECT keywords are typically read operation statements, and keywords containing UPDATE, INSERT or DELETE are typically write operation statements. Typical agents include Pgpool, HAProxy, etc., and it is worth noting that there is no requirement for the selection of an agent in this embodiment.

Step 202: the following corresponding processing manner is performed according to the type of the data processing request.

Step 2021: if the type of the data processing request is a write operation statement, the request is distributed to the main server M, and the latest log serial number in M is returned (marked as M _LSN ) To the proxy layer and ending the algorithm flow.

In practical application, the rule of master writing and slave reading in the database cluster should be complied with, and when the statement is interpreted as a writing operation statement, the request is distributed to a master server; it is worth noting here that in order to be able to return the latest log sequence number in M (noted M _LSN ) To the proxy layer, one possible implementation in practical application is: every time the SQL statement is interpreted as a write operation statement, the database cluster agent appends a query statement to the back of the write operation statement: select send_lsn from pg_stat_reply; (for Postgresql), the latest log sequence number M sent by the master to the slave is queried _LSB And returns to the proxy layer. The agent layer can record the returned result into a preset memory area or a Redis and other caches, so that each write operation can be ensuredAnd then inquiring the latest log serial number sent to the standby library by the main library, and updating in time.

Step 2022: if the type of the data processing request is a read operation statement, polling the slave server set C and acquiring the target slave server S based on the existing load balancing polling algorithm of the database proxy _cur The latest replay log sequence number (noted as)；

In load balancing, the polling algorithm plays an important role. It can ensure that each server can participate in the processing of the request to achieve load balancing. The working principle is that requests are distributed to each slave server in turn according to a predefined sequence, and then the process is repeated in a loop. Therefore, this embodiment selects one of the slave servers S by using the polling algorithm _cur Is a target slave server and acquires the target slave server S _cur The latest replay log sequence number of (i.e., the first log sequence number) is returned to the proxy layer.

Further, in practical application, in order to obtain a certain slave server S _cur One possible implementation is: when the agent layer polls S _cur In this case, the database cluster agent distributes a query statement to S _cur : select pg_last_ wal _replay_ lsn (); querying playback of the latest log sequence number from the server (for postgresql)And returns to the proxy layer.

Step 203: calculate M and S _cur The difference in the synchronization delay between them,

in practical application, the proxy layer passes throughThereby obtaining the current polling determinationTarget slave server S of (a) _cur Synchronous delay difference with the main server, when +.>The larger the representation of the master server data and the slave server S _cur The larger the data phase difference, the +.>When 0, the slave server S is represented _cur Has the same data as the main server M.

Step 204: and determining whether the polling meets the polling stopping condition or not based on a preset consistency tolerance factor V and a consistency maximum polling frequency T. The method specifically comprises the following steps:

step 2041: if it isAnd T is less than or equal to T, stopping polling and distributing the read operation request to the target slave server S _cur And returning the query result and ending the algorithm flow. It will be appreciated that if +.>And T is less than or equal to T, representing the current target slave server S _cur While meeting the query real-time property (T is less than or equal to T), the strength of the query consistency also meets the set consistency tolerance standard +.>Thus distributing the read operation request to the target slave server S _cur And returning the query result and ending the algorithm flow.

Step 2042: if it isAnd t is<T, then t=t+1 and go to step 2022; it will be appreciated that ifAnd t is<T represents that under the condition of ensuring that query instantaneity is metT is less than or equal to T), the current target slave server S _cur The strength of the query consistency of (a) does not meet the set consistency tolerance criterion +.>Since t.ltoreq.T, there is still an opportunity to continue to poll the next target slave server S _cur+1 The above steps 201 to 2041 are repeated.

Step 2043: if it isAnd t=t, the data reading request is directly distributed to the main server M, the query result is returned, and the algorithm flow is ended. It will be appreciated that if +.>And t=t, representing that under the condition that query instantaneity is guaranteed to be satisfied (T is less than or equal to T), the current target slave server S _cur The strength of query consistency of (a) does not meet the set consistency tolerance criteriaAnd this poll is the last poll opportunity (t=t). Therefore, in order to meet the requirement (T) of the business on real-time performance, a preferred scheme is to directly distribute the read operation statement to the main server M, return the query result and end the algorithm flow.

In practical application, the proxy layer is based onAnd the polling times T, and introducing a consistency tolerance factor V and a consistency maximum polling times T, so that the balance between the consistency and the instantaneity of the query can be realized. Steps 2041, 2042, and 2043 described above may all be implemented in the proxy layer.

Further, finally, in this embodiment, a slave server polling algorithm related to the above steps may be given, where the algorithm steps are as follows:

s1, randomly selecting a server corner index value i=random (1, N) through a random function;

s2, from S _i％N In turn, start polling, i.e. S _i％N ，S _(i+1)％N ,. polling is ended until a slave server that satisfies the strength of the consistency of the query (i.e. the consistency strength factor proposed by the present invention) has not been found when the number of polls t=t. Note that the present invention is not limited to the above-described polling algorithm.

It can be seen that, according to the data processing method provided by the embodiment, on the basis of the synchronous delay difference of the latest log serial numbers of the master server and the slave server, the consistency tolerance factor is introduced, so that the consistency strength of the query can be dynamically adjusted, and the requirements of different business scenes on the consistency of the query can be met. In addition, the design of the maximum polling times is introduced on the basis of the existing load balancing algorithm, and the size of the load balancing algorithm can be dynamically adjusted according to service requirements under the condition of meeting the service query consistency intensity, so that the requirements of different service scenes on query instantaneity are met.

The data processing apparatus provided by the present invention will be described below, and the data processing apparatus described below and the data processing method described above may be referred to correspondingly to each other.

Fig. 3 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present invention, referring to fig. 3, the data processing apparatus includes: a request receiving unit 301, a polling unit 302, a synchronization delay difference determining unit 303, a polling stop determining unit 304, and a polling control unit 305.

A request receiving unit 301 configured to trigger the polling unit 302, the synchronization delay difference determining unit 303, the polling stop determining unit 304, and the polling control unit 305 to operate when it is determined that the data processing request is a data reading request, in response to the received data processing request.

And the polling unit 302 is configured to poll the plurality of slave servers and determine that one of the slave servers is the target slave server, and obtain a first log sequence number stored in the target slave server determined this time.

And a synchronization delay difference determining unit 303, configured to obtain a second log sequence number stored in the master server, and determine a synchronization delay difference between the master server and the target slave server according to the first log sequence number and the second log sequence number.

And a polling stop determining unit 304, configured to determine whether the synchronization delay difference is less than or equal to a preset threshold value and whether the polling frequency is less than or equal to a preset maximum polling frequency, if yes, stop polling and distribute the data reading request to the target slave server, and return a query result obtained by the target server according to the data reading request.

And a polling control unit 305, configured to control the polling unit, the synchronization delay difference determining unit, and the polling stop determining unit to operate cyclically until polling is stopped when the polling times are less than a preset maximum polling times and the synchronization delay difference is greater than a preset threshold.

In one embodiment, the polling stop determination unit 304 is further configured to:

In an embodiment, the request receiving unit 301 is further configured to:

In one embodiment, the synchronization delay difference determining unit 303 is specifically configured to:

Fig. 4 illustrates a physical schematic diagram of an electronic device, as shown in fig. 4, which may include: processor 410, communication interface (Communications Interface) 420, memory 430 and communication bus 440, wherein processor 410, communication interface 420 and memory 430 communicate with each other via communication bus 440. Processor 410 may invoke logic instructions in memory 430 to perform the data processing methods described above.

Further, the logic instructions in the memory 430 described above may be implemented in the form of software functional units and may be stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

In another aspect, the present invention also provides a computer program product comprising a computer program storable on a non-transitory computer readable storage medium, the computer program, when executed by a processor, being capable of performing the data processing method provided by the methods described above.

In yet another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the data processing method provided by the above methods.

The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. The data processing method is applied to a distributed database cluster, wherein the distributed database cluster comprises a plurality of server nodes, and each server node comprises a master server and a plurality of slave servers; the data processing method is characterized by comprising the following steps of, for any server node:

2. The data processing method according to claim 1, wherein the polling stop determination step further comprises:

3. The data processing method according to claim 1 or 2, characterized by further comprising:

4. The data processing method according to claim 1 or 2, wherein the determining a synchronization delay difference between the master server and the target slave server from the first log sequence number and the second log sequence number includes:

5. A data processing apparatus for use in a distributed database cluster, the distributed database cluster comprising a plurality of server nodes, each server node comprising a master server and a plurality of slave servers; characterized in that the data processing device comprises: a request receiving unit, a polling unit, a synchronization delay difference determining unit, a polling stop determining unit, and a polling control unit;

6. The data processing apparatus according to claim 5, wherein the polling stop determination unit is further configured to:

7. The apparatus according to claim 5 or 6, wherein the request receiving unit is further configured to, in response to a received data processing request, allocate the data writing request to the main server and return a second log serial number stored in the main server when it is determined that the data processing request is a data writing request.

8. The data processing apparatus according to claim 5 or 6, wherein the synchronization delay difference determining unit is specifically configured to:

wherein,m is the synchronization delay difference between the master server and the slave server _LSN A second log sequence number stored in the primary server; / >A first log sequence number stored for the target from the server.

9. A non-transitory computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when executed by a processor, implements the data processing method according to any one of claims 1 to 4.

10. A computer program product comprising a computer program which, when executed by a processor, implements the data processing method according to any one of claims 1 to 4.