CN113626517B - PostgreSQL database stream replication exception handling method and device - Google Patents

PostgreSQL database stream replication exception handling method and device Download PDF

Info

Publication number
CN113626517B
CN113626517B CN202110722991.4A CN202110722991A CN113626517B CN 113626517 B CN113626517 B CN 113626517B CN 202110722991 A CN202110722991 A CN 202110722991A CN 113626517 B CN113626517 B CN 113626517B
Authority
CN
China
Prior art keywords
standby
log
database
main
writing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110722991.4A
Other languages
Chinese (zh)
Other versions
CN113626517A (en
Inventor
孙颉
张闯
刘科
黄广奎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202110722991.4A priority Critical patent/CN113626517B/en
Publication of CN113626517A publication Critical patent/CN113626517A/en
Application granted granted Critical
Publication of CN113626517B publication Critical patent/CN113626517B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1469Backup restoration techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/466Transaction processing
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a postgreSQL database stream replication exception handling method and device, wherein the method comprises the following steps: in an asynchronous mode, before and after writing the pre-written log, the main stream replication controller respectively sends a user writing transaction command and a main pre-written log writing completion synchronous message to the standby stream replication controller, completes the pre-written log sending, and receives the standby pre-written log writing completion synchronous message returned by the standby stream replication controller; the master node crashes, only the master pre-write log is written to complete synchronization, but when no backup pre-write log is written to complete synchronization information, the backup stream copy controller synchronizes the master and backup data; in the synchronous mode, the main stream copy controller sends a handshake message and judges whether the standby node crashes or not according to whether a handshake message reply is returned or not; and when the standby nodes are crashed, selecting a change mode or constructing a confirmation message according to whether the main point is idle. The invention realizes the reliability of asynchronous mode data backup and the foolproof of the master node when the synchronous mode standby node crashes at the same time.

Description

PostgreSQL database stream replication exception handling method and device
Technical Field
The invention belongs to the technical field of database flow exception handling, and particularly relates to a PostgreSQL database flow replication exception handling method and device.
Background
PostgreSQL is an enterprise-level, cross-platform, highly customizable, powerful open-source database system. The reliability, stability, data consistency and the like of the PostgreSQL obtain the affirmation of the database industry, and the PostgreSQL is the only open source database system capable of realizing zero data loss. With the gradual promotion of PostgreSQL business, the database market is a powerful competitor to traditional large commercial databases such as the standard Oracle, mySQL, etc.
In a database system, data backup becomes an important backup means for ensuring data security; with the cluster distribution of the database servers, the database servers can work cooperatively, and when the main server fails, a plurality of computers are allowed to provide the same data.
PostgreSQL provides a streaming replication technique to synchronize data, i.e., a standby server can synchronize a primary server in real time. First, the primary and the secondary database servers use the TCP protocol provided by PostgreSQL to establish network connection. The server of the primary database immediately transmits the record to the backup database server in the form of a TCP data stream after the data changes and WAL records are generated, without waiting for the pre-written log file to be filled before transmission. The mechanism greatly reduces the data synchronization delay between the main server and the standby server and has higher efficient availability. The stream replication technique provides native technical support for hot standby, high availability, load balancing, etc. of the database. PostgreSQL streaming replication provides both asynchronous replication and synchronous replication.
PostgreSQL stream replication has the following problems:
PostgreSQL stream replication is by default asynchronous, and if the primary database server crashes, some committed transactions may not have been replicated to the backup database server, which may result in data loss.
And (II) synchronous replication can ensure that all modifications of a transaction can be transferred to one or more synchronous standby database servers. Each commit of a write transaction will wait until an acknowledgement is received indicating that the commit has been written to the transaction log on disk on both the primary database server and the backup database server when synchronous stream replication is requested. The method ensures that the WAL logs of the master and slave servers are returned successfully only after double-disk-drop, but increases the waiting time of communication confirmation back and forth between the master and slave database servers, and prolongs the response time of a foreground client side request transaction.
Under the condition of starting the synchronous replication mode, the state of the standby database node directly affects the main database node. If all the backup database nodes of the whole distributed cluster architecture crash and cannot be recovered at the same time, the master database node performs a write transaction operation, and at the moment, the master database node cannot receive the acknowledgement returned by any backup node, then all the operations on the master database node are suspended, and the master database node is not recovered until the backup database node recovers and returns the acknowledgement, and at the moment, if the master database node is an important service, the phenomenon that the suspension cannot be operated is a very fatal defect for users.
This is a deficiency of the prior art, and therefore, it is necessary to provide a method and apparatus for processing PostgreSQL database flow replication anomalies in order to address the above-mentioned deficiencies in the prior art.
Disclosure of Invention
Aiming at the defect that in the asynchronous replication mode of the existing PostgreSQL database flow replication in the prior art, if some submitted transactions are not replicated to a backup server yet in the event of crash of the main database node, the main database node is suspended, and in the event of crash of all backup database nodes in the synchronous replication mode, the invention provides a PostgreSQL database flow replication exception handling method and device, so as to solve the technical problems.
In a first aspect, the present invention provides a PostgreSQL database flow replication exception handling method, including the steps of:
s1, in a master-slave database node asynchronous mode, a master stream replication controller respectively sends a user writing transaction command message and a master pre-writing log writing completion synchronous message to a slave stream replication controller before and after controlling writing of a pre-writing log, and then controls a master log sending module to send the pre-writing log to a slave log receiving module and receives the slave pre-writing log writing completion synchronous message returned by the slave stream replication controller after the slave database node finishes the playback of the pre-writing log;
s2, when a main database node in an asynchronous mode crashes and a main pre-write log writing completion synchronous message exists, and no standby pre-write log writing completion synchronous message exists, the standby flow replication controller controls the standby database node to execute a user writing transaction operation command received by writing transaction operation, so that the user writing transaction operation command is consistent with the main database data;
s3, in a synchronous mode of the nodes of the main database and the standby database, the main flow replication controller sends handshake messages to each standby flow replication controller according to a set period, and judges whether the corresponding standby database node crashes or not according to whether handshake message replies are received within a preset time threshold;
s4, when the standby database nodes with the synchronous mode are crashed, judging whether to change the mode of the main database nodes or the standby database nodes or construct a confirmation message returned to the main log sending module by the standby log receiving module according to whether the main database nodes are idle or not.
Further, the step S1 further includes the following steps:
S1A, configuring a main stream replication controller at a main database node, configuring a standby stream replication controller at a standby database node, and setting each stream replication controller to acquire configuration information of the database node;
S1B, judging a current working mode according to configuration information of a database node acquired by a stream replication controller;
if the asynchronous mode is adopted, entering a step S1;
if the synchronous mode is adopted, the process proceeds to step S3. The flow replication controller reads the configuration information for mode determination in subsequent message processing and exception handling and primary database node and backup data node locatable.
Further, the database node configuration information includes master-slave configuration, asynchronous/synchronous mode, IP configuration and opposite IP of the node. The master-slave configuration of the node judges whether the node where the flow replication controller is located is a master database node or a standby database node, and the IP configuration and the opposite end IP are used for positioning the database node which is communicated with the current database node.
Further, the specific steps of step S1 are as follows:
s11, in an asynchronous mode of the master and standby database nodes, after receiving user writing transaction operation at the master database node, the master stream replication controller synchronously sends a user writing transaction command to the standby stream replication controller;
s12, the main stream replication controller controls the local write-in pre-write log, and after the local write-in pre-write log is written, a main write-in pre-write log write-in completion synchronous message is sent to the standby stream replication controller;
s13, the main stream replication controller controls the main log sending module to send a pre-written log to the standby log receiving module;
s14, the backup stream replication controller sends the pre-written log received by the backup log receiving module to the log playback module for playback, and after the log playback is completed, sends a write completion synchronization message of the backup pre-written log to the main stream replication controller. The user writing transaction command and the master pre-writing log writing completion synchronous message record the time node when the master database node writes the local pre-writing log, and the standby pre-writing log writing completion synchronous message records the time node when the standby database node completes log playback.
Further, in step S11, after the postgres module of the primary database node receives the user writing transaction operation in the primary and standby database node asynchronous mode, the primary flow replication controller is notified to synchronously send the user writing transaction command message to the standby flow replication controller;
in step S12, after sending a primary write completion synchronization message to the backup stream replication controller, the primary stream replication controller returns a notification of completion of the pre-write log in the primary database node to the user through the postgres module;
in step S14, after the backup stream replication controller sends the backup pre-write log writing completion synchronization message to the main stream replication controller, a log playback completion notification is returned to the main stream replication controller through the main log receiving module by the backup log receiving module. The postgres module is a module for exchanging a postgres database with a user, and is used for receiving the writing transaction operation of the user and returning a writing transaction completion notification to the user; the master log receiving module is started from the master database node and is mainly responsible for transmitting the pre-written log record generated by the master database node to a lower computer serving as a standby database node; the standby log receiving module automatically receives the pre-written log records sent by the master log sending module serving as an upper computer from the standby database node, and the corresponding master log sending module communicates with the standby log receiving module; the log playback module is started from the standby data node and is mainly responsible for recording the pre-written log received by the standby log receiving module in the standby database node for playback, so that the data synchronization between the main database node and the standby database node is achieved asynchronously. In the cascade mode, a standby database node may start three modules simultaneously, that is, it is not only responsible for receiving and playing back the pre-written log record of the upper computer, but also sends the pre-written log record to the lower computer.
Further, the specific steps of step S2 are as follows:
s21, judging whether the node of the main database crashes or not by the standby flow copy controller in the asynchronous mode of the node of the main database and the standby database;
if yes, go to step S22;
if not, returning to the step S21;
s22, checking a synchronous write transaction operation record of the standby flow copy controller and the main flow copy controller;
s23, when the main writing completion synchronous message exists, but no corresponding backup pre-writing log writing completion synchronous message exists, the backup stream copy controller executes a writing transaction operation command so as to keep the backup database node data and the main database node data consistent. When the original asynchronous replication mode is adopted, the postgres module completes local pre-written log writing in the main database, and when the transaction is submitted on the main database node, the backup database node does not need to wait for receiving the pre-written log and returns success when log playback is completed; when the main database node crashes, log writing of user writing transaction in the main database node is completed, and the backup database node does not receive the pre-written log synchronized by the main database node, so that the main and backup data are not synchronized.
Further, the specific steps of step S3 are as follows:
s31, in a master-standby database node synchronization mode, the master flow replication controller sends handshake messages to all standby flow replication controllers according to a set period;
s32, the main stream replication controller judges whether a standby stream replication controller exceeds a preset time threshold and does not answer a handshake message;
if yes, go to step S33;
if not, returning to the step S31;
s33, the main stream replication controller sends an alarm to the operation and maintenance management platform to prompt the occurrence of the abnormal breakdown of the node of the standby database where the corresponding standby stream replication controller is located. And timely discovering the abnormal standby database nodes through handshake messages.
Further, the specific steps of step S4 are as follows:
s41, judging that all standby database nodes collapse when the standby flow copy controllers in the synchronous mode exceed a preset time threshold and do not answer to handshake messages;
s42, the master stream replication controller judges whether the master database node has ongoing write transaction operation or not;
if not, go to step S43;
if yes, go to step S45;
s43, the master stream replication controller changes the node mode of the master database into an asynchronous mode;
s44, the main flow replication controller judges whether a standby database node is recovered from a crash state;
if yes, the master stream replication controller restores the master database node mode to a synchronous mode, and returns to the step S31;
if not, returning to the step S44;
s45, the main stream replication controller stops the operation that the main log sending module of the main database node sends the pre-written log to the standby log sending module according to the current running state, constructs a log playback completion notification returned to the main log sending module by the standby log receiving module, and returns to the step S43. When synchronously copying, the postgres module can send successful return information to the user after receiving the confirmation information returned by the main log sending module, but the state of the standby node directly influences the main database, if all the standby database nodes of the whole distributed cluster architecture crash and cannot be recovered at the same time, the main database node performs writing transaction operation, at the moment, the main database node cannot receive the confirmation of any standby database node reply, and then all the operations on the main database node are hung until the standby database node is recovered and the confirmation reply is returned; the master node database point returns to normal. The invention solves the problem of finding abnormality in time through handshake information and avoids the hanging up of the nodes of the main database by changing the synchronous mode to asynchronous mode.
In a second aspect, the present invention provides a PostgreSQL database flow replication exception handling apparatus, including:
the master stream replication controller is used for respectively transmitting a user writing transaction command and master writing completion synchronous information to the standby stream replication controller before and after controlling writing of the pre-writing log, and then controlling the master log transmitting module to transmit the pre-writing log to the standby log receiving module and receiving the standby pre-writing log writing completion synchronous information returned by the standby stream replication controller after the standby database node finishes the playback of the pre-writing log;
the asynchronous mode exception handling unit is used for controlling the standby database node to execute writing transaction operation when the main database node in an asynchronous mode crashes and a main pre-writing log writing completion synchronous message exists, but the standby pre-writing log writing completion synchronous message does not exist, and synchronizing the main database node data and the standby database node data;
the synchronous mode message processing unit is used for sending handshake messages to each standby flow replication controller according to a set period in a synchronous mode of the main and standby database nodes, and judging whether the corresponding standby database node crashes or not according to whether handshake message replies are received within a preset time threshold or not;
and the synchronous mode exception handling unit is used for judging whether the primary database node is idle or not to change the primary database node mode or constructing a confirmation message returned to the primary log sending module by the standby log receiving module when the standby database node in the synchronous mode is crashed.
Further, the method further comprises the following steps:
the stream replication controller configuration unit is used for configuring a main stream replication controller at a main database node, configuring a standby stream replication controller at a standby database node, and setting each stream replication controller to acquire configuration information of the database node where the stream replication controller is located;
and the current working mode judging unit is used for judging the current working mode according to the configuration information of the database node acquired by the stream replication controller.
Further, the asynchronous mode message processing unit includes:
the write transaction message sending subunit is used for synchronously sending the user write transaction command message to the standby stream replication controller after the main stream replication controller receives the user write transaction command at the main database node in the asynchronous mode of the main and standby database nodes;
the write-completion synchronous message sending subunit is used for controlling the local write-in pre-write log by the main stream replication controller, and sending the write-completion synchronous message of the main pre-write log to the standby stream replication controller after the write-in of the local pre-write log is completed;
the pre-write log issuing subunit is used for controlling the main stream replication controller to control the main log sending module to send the pre-write log to the standby log receiving module;
and the standby writing completion synchronous message sending subunit is used for sending the pre-written log received by the standby log receiving module to the log playback module by the standby stream replication controller for playback, and sending the standby pre-written log writing completion synchronous message to the main stream replication controller after the log playback is completed.
Further, the asynchronous mode exception handling unit includes:
an asynchronous mode main database node crash judging subunit, configured to judge whether a main database node crash occurs in the standby flow replication controller in the main standby database node asynchronous mode;
a synchronous write transaction operation record checking subunit, configured to check, by the standby flow replication controller, a synchronous write transaction operation record with the primary flow replication controller when a node crash of the primary database occurs;
and the data synchronization subunit is used for executing a write transaction operation command by the standby flow replication controller when the main pre-write log write completion synchronous message exists but the corresponding standby pre-write log write completion synchronous message does not exist, so as to keep the standby database node data and the main database node data consistent.
Further, the synchronization pattern message processing unit includes:
a handshake message sending subunit, configured to send handshake messages to all standby flow replication controllers according to a set period in a primary-standby database node synchronization mode;
a handshake message reply judging subunit, configured to determine whether a standby flow replication controller exists or not and does not reply to a handshake message when the standby flow replication controller exceeds a preset time threshold;
and the synchronous mode abnormality warning subunit is used for sending an alarm to the operation and maintenance management platform by the main stream replication controller when the standby stream replication controller exceeds a preset time threshold and does not answer a handshake message, and prompting the occurrence of abnormality of the crash of the standby database node where the corresponding standby stream replication controller is located.
Further, the synchronization pattern abnormality processing unit includes:
a standby database full-collapse judging subunit, configured to judge that all standby database nodes collapse when the standby flow replication controllers in the synchronous mode all exceed a preset time threshold and do not answer to the handshake message;
a main database node idle judging subunit, wherein the main stream replication controller judges whether the main database node has an ongoing writing transaction operation;
a mode changing subunit, configured to change the mode of the master database node to an asynchronous mode by the master replication controller when no ongoing write transaction operation exists at the master database node;
the standby database node recovery judging subunit is used for judging whether the standby database node is recovered from the crash state by the main stream replication controller;
a mode recovery subunit, configured to, when the standby database node recovers from the crash state, restore the primary database node mode to a synchronous mode by using the primary replication controller;
and the return message construction subunit is used for stopping the operation of sending the pre-written log to the standby log sending module by the main stream replication controller according to the current running state when the main database node has the running writing transaction operation, and constructing a log playback completion notification returned to the main log sending module by the standby log receiving module.
The invention has the advantages that,
the PostgreSQL database stream replication exception handling method and device provided by the invention realize that under an asynchronous replication mode, the data of some submitted transactions which are possibly not replicated to the standby database node are crashed in the main database node, log playback is performed in time, and the data of the main database node and the standby database node are synchronized; under the extreme condition that all standby database nodes crash simultaneously in a synchronous replication mode, the method prevents the master database node from hanging up to wait, and ensures that a user can normally perform subsequent operation.
In addition, the invention has reliable design principle, simple structure and very wide application prospect.
It can be seen that the present invention has outstanding substantial features and significant advances over the prior art, as well as the benefits of its implementation.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the description of the embodiments or the prior art will be briefly described below, and it will be obvious to those skilled in the art that other drawings can be obtained from these drawings without inventive effort.
Fig. 1 is a schematic flow chart of a PostgreSQL database flow replication exception handling method of the present invention.
Fig. 2 is a schematic flow chart of a PostgreSQL database flow replication exception handling method according to the present invention.
Fig. 3 is a schematic diagram of the PostgreSQL database asynchronous mode data process of the present invention.
Fig. 4 is a schematic diagram of a PostgreSQL database flow replication exception handling apparatus of the present invention.
In the figure, a 1-asynchronous mode message processing unit; 1.1-a write transaction message sending subunit; 1.2-write complete synchronization message sending subunit; 1.3-a pre-write log issuing subunit; 1.4-a write completion synchronization message sending subunit; 2-an asynchronous mode exception handling unit; 2.1-an asynchronous mode master database node crash determination subunit; 2.2-synchronous write transaction operation record checking subunit; 2.3-data synchronization subunit; a 3-synchronous mode message processing unit; 3.1-handshake message transmitting subunit; 3.2-handshake message reply judgment subunit; 3.3-sync pattern anomaly alert subunit; a 4-synchronous mode exception handling unit; 4.1-a database full crash determination subunit; 4.2-a main database node idle judgment subunit; 4.3-mode change subunit; 4.4-a standby database node recovery judgment subunit; a 4.5-mode recovery subunit; 4.6-a return message construction subunit; a 5-stream copy controller configuration unit; and 6, a current working mode judging unit.
Detailed Description
In order to make the technical solution of the present invention better understood by those skilled in the art, the technical solution of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
Example 1:
as shown in fig. 1, the present invention provides a PostgreSQL database flow replication exception handling method, which includes the following steps:
s1, in a master-slave database node asynchronous mode, a master stream replication controller respectively sends a user writing transaction command and a master writing completion synchronous message to a slave stream replication controller before and after writing a pre-writing log, and then controls a master log sending module to send the pre-writing log to a slave log receiving module and receives a slave pre-writing log writing completion synchronous message returned by the slave stream replication controller after the slave database node finishes the playback of the pre-writing log;
s2, when a main database node in an asynchronous mode crashes, the standby flow replication controller detects the state of the standby database node, and if a synchronous message of writing completion of a main pre-written log exists, but no synchronous message of writing completion of the standby pre-written log exists, the standby flow replication controller controls the standby database node to execute a user writing transaction operation command so as to keep consistent with the main database data;
s3, in a synchronous mode of the nodes of the main database and the standby database, the main flow replication controller sends handshake messages to each standby flow replication controller according to a set period, and judges whether the corresponding standby database node crashes or not according to whether handshake message replies are received within a preset time threshold;
s4, when the standby database nodes with the synchronous mode are crashed, judging whether to change the mode of the main database nodes or the standby database nodes or construct a confirmation message returned to the main log sending module by the standby log receiving module according to whether the main database nodes are idle or not.
Example 2:
as shown in fig. 2, the present invention provides a PostgreSQL database flow replication exception handling method, which is characterized by comprising the following steps:
S1A, configuring a main stream replication controller at a main database node, configuring a standby stream replication controller at a standby database node, and setting each stream replication controller to acquire configuration information of the database node; the database node configuration information comprises master-slave configuration, asynchronous/synchronous mode, IP configuration and opposite terminal IP of the node;
S1B, judging a current working mode according to configuration information of a database node acquired by a stream replication controller;
if the asynchronous mode is adopted, entering a step S1;
if the synchronous mode is the synchronous mode, entering step S3;
s1, in a master-slave database node asynchronous mode, a master stream replication controller respectively sends a user writing transaction command and a master writing completion synchronous message to a slave stream replication controller before and after writing a pre-writing log, and then controls a master log sending module to send the pre-writing log to a slave log receiving module and receives a slave pre-writing log writing completion synchronous message returned by the slave stream replication controller after the slave database node finishes the playback of the pre-writing log; the method comprises the following specific steps:
s11, in an asynchronous mode of the master and standby database nodes, after receiving user writing transaction operation at the master database node, the master stream replication controller synchronously sends a user writing transaction command to the standby stream replication controller;
s12, the main stream replication controller controls the local write-in pre-write log, and sends a main write-in completion synchronous message to the standby stream replication controller after the local write-in pre-write log is written;
s13, the main stream replication controller controls the main log sending module to send a pre-written log to the standby log receiving module;
s14, the backup stream replication controller sends the pre-written log received by the backup log receiving module to a log playback module for playback, and sends a write-completion synchronous message of the backup pre-written log to the main stream replication controller after the log playback is completed;
s2, when a main database node in an asynchronous mode crashes, the standby flow replication controller detects the state of the standby database node, and if a synchronous message of writing completion of a main pre-written log exists, but no synchronous message of writing completion of the standby pre-written log exists, the standby flow replication controller controls the standby database node to execute received user writing transaction operation, so that the state of the standby database node is consistent with that of the main database data; the method comprises the following specific steps:
s21, judging whether the node of the main database crashes or not by the standby flow copy controller in the asynchronous mode of the node of the main database and the standby database;
if yes, go to step S22;
if not, returning to the step S21;
s22, checking a synchronous write transaction operation record of the standby flow copy controller and the main flow copy controller;
s23, when a main pre-write log writing completion synchronous message exists, but no corresponding standby pre-write log writing completion synchronous message exists, the standby flow replication controller executes a writing transaction operation command so as to keep the standby database node data and the main database node data consistent;
s3, in a synchronous mode of the nodes of the main database and the standby database, the main flow replication controller sends handshake messages to each standby flow replication controller according to a set period, and judges whether the corresponding standby database node crashes or not according to whether handshake message replies are received within a preset time threshold; the method comprises the following specific steps:
s31, in a master-standby database node synchronization mode, the master flow replication controller sends handshake messages to all standby flow replication controllers according to a set period;
s32, the main stream replication controller judges whether a standby stream replication controller exceeds a preset time threshold and does not answer a handshake message;
if yes, go to step S33;
if not, returning to the step S31;
s33, the main stream replication controller gives an alarm to the operation and maintenance management platform to prompt the occurrence of an abnormality of breakdown of a node of a standby database where the corresponding standby stream replication controller is located;
s4, when the standby database nodes with the synchronous mode are crashed, judging whether to change the mode of the main database node or the standby database node or construct a confirmation message returned to the main log sending module by the standby log receiving module according to whether the main database node is idle or not; the method comprises the following specific steps:
s41, judging that all standby database nodes collapse when the standby flow copy controllers in the synchronous mode exceed a preset time threshold and do not answer to handshake messages;
s42, the master stream replication controller judges whether the master database node has ongoing write transaction operation or not;
if not, go to step S43;
if yes, go to step S45;
s43, the master stream replication controller changes the node mode of the master database into an asynchronous mode;
s44, the main flow replication controller judges whether a standby database node is recovered from a crash state;
if yes, the master stream replication controller restores the master database node mode to a synchronous mode, and returns to the step S31;
if not, returning to the step S44;
s45, the main stream replication controller stops the operation that the main log sending module of the main database node sends the pre-written log to the standby log sending module according to the current running state, constructs a log playback completion notification returned to the main log sending module by the standby log receiving module, and returns to the step S43.
In the above embodiment 2, as shown in fig. 3, in step S11, after the postgres module of the primary database node receives the user writing transaction operation in the primary and standby database nodes asynchronous mode, the primary database node notifies the primary flow replication controller to synchronously send the user writing transaction command to the standby flow replication controller;
in step S12, after sending a primary write completion synchronization message to the backup stream replication controller, the primary stream replication controller returns a notification of completion of the pre-write log in the primary database node to the user through the postgres module;
in step S14, after the backup stream replication controller sends the backup pre-write log writing completion synchronization message to the main stream replication controller, a log playback completion notification is returned to the main stream replication controller through the main log receiving module by the backup log receiving module.
Example 3:
as shown in fig. 4, the present invention provides a PostgreSQL database flow replication exception handling apparatus, comprising:
the asynchronous mode message processing unit 1 is used for respectively sending a user writing transaction command and a main writing completion synchronous message to the standby stream copy controller before and after the main stream copy controller controls writing of the pre-writing log in an asynchronous mode of the main and standby database nodes, controlling the main log sending module to send the pre-writing log to the standby log receiving module, and receiving the standby pre-writing log writing completion synchronous message returned by the standby stream copy controller after the standby database nodes complete the playback of the pre-writing log; the asynchronous mode message processing unit 1 includes:
the writing transaction message sending subunit 1.1 is configured to send a user writing transaction command synchronously to the standby stream replication controller after the main stream replication controller receives a user writing transaction operation at the main database node in an asynchronous mode of the main and standby database nodes;
a write-completion synchronization message sending subunit 1.2, configured to control the local write-pre-write log by using the master stream replication controller, and send a master write-completion synchronization message to the standby stream replication controller after the local write-pre-write log is written;
the pre-writing log issuing subunit 1.3 is used for controlling the main stream replication controller to control the main log sending module to send the pre-writing log to the standby log receiving module;
the standby writing completion synchronous message sending subunit 1.4 is used for sending the pre-written log received by the standby log receiving module to the log playback module by the standby stream replication controller for playback, and sending a standby pre-written log writing completion synchronous message to the main stream replication controller after the log playback is completed;
the asynchronous mode exception handling unit 2 is configured to, when a primary database node in an asynchronous mode crashes, detect a state of the primary database node by using the backup flow replication controller, and if there is a primary pre-write log writing completion synchronization message, but no backup pre-write log writing completion synchronization message, control the backup database node to execute a writing transaction operation, and synchronize primary and backup database node data; the asynchronous mode exception handling unit 2 includes:
an asynchronous mode master database node crash judging subunit 2.1, configured to judge whether a master database node crash occurs in the master-slave database node asynchronous mode by using the standby flow replication controller;
a synchronous write transaction operation record checking subunit 2.2, configured to check, by using the standby flow replication controller, a synchronous write transaction operation record with the primary flow replication controller when a node crash of the primary database occurs;
the data synchronization subunit 2.3 is configured to execute a write transaction operation command by using the standby flow replication controller when there is a master pre-write log write completion synchronization message, but no corresponding standby pre-write log write completion synchronization message, so as to keep the standby database node data and the master database node data consistent;
the synchronous mode message processing unit 3 is used for sending handshake messages to each standby flow replication controller according to a set period by the main flow replication controller in a synchronous mode of the nodes of the main and standby databases, and judging whether the corresponding standby database node crashes or not according to whether handshake message replies are received within a preset time threshold or not; the synchronous mode message processing unit 3 includes:
a handshake message sending subunit 3.1, configured to send handshake messages to all standby flow replication controllers according to a set period in a primary-standby database node synchronization mode;
a handshake message reply judgment subunit 3.2, configured to judge whether a standby flow replication controller exists or not and does not reply to a handshake message when the standby flow replication controller exceeds a preset time threshold;
the synchronous mode abnormal warning subunit 3.3 is used for sending an alarm to the operation and maintenance management platform by the main stream replication controller when the standby stream replication controller exceeds a preset time threshold and does not answer a handshake message, and prompting the occurrence of an abnormality of the crash of the standby database node where the corresponding standby stream replication controller is located;
the synchronization mode exception handling unit 4 is configured to, when the standby database nodes in the synchronization mode crash, determine to change the mode of the main database node or the standby database node according to whether the main database node is idle or not, or construct a confirmation message returned by the standby log receiving module to the main log sending module; the synchronous mode exception handling unit 4 includes:
a standby database full-collapse judging subunit 4.1, configured to judge that all standby database nodes collapse when the standby flow replication controllers in the synchronous mode all exceed a preset time threshold and do not answer to the handshake message;
the main database node idle judging subunit 4.2, the main stream replication controller judges whether the main database node has an ongoing writing transaction operation or not;
a mode change subunit 4.3, configured to change the master database node mode to an asynchronous mode by the master replication controller when there is no ongoing write transaction operation at the master database node;
a standby database node recovery judging subunit 4.4, configured to judge whether a standby database node is recovered from a crash state by using the primary flow replication controller;
a mode recovery subunit 4.5, configured to, when the standby database node recovers from the crash state, restore the primary database node mode to the synchronous mode by using the primary replication controller;
a return message constructing subunit 4.6, configured to stop, by the master journal sending module of the master database node according to the current running state, an operation of sending a pre-written journal to the standby journal sending module when there is an ongoing write transaction operation of the master database node, and construct a notification of completion of journal playback returned to the master journal sending module by the standby journal receiving module;
a flow replication controller configuration unit 5, configured to configure a main flow replication controller at a main database node, configure a standby flow replication controller at a standby database node, and set each flow replication controller to obtain configuration information of the database node where the flow replication controller is located;
and the current working mode judging unit 6 is used for judging the current working mode according to the configuration information of the database node acquired by the stream replication controller.
Although the present invention has been described in detail by way of preferred embodiments with reference to the accompanying drawings, the present invention is not limited thereto. Various equivalent modifications and substitutions may be made in the embodiments of the present invention by those skilled in the art without departing from the spirit and scope of the present invention, and it is intended that all such modifications and substitutions be within the scope of the present invention/be within the scope of the present invention as defined by the appended claims. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. The postgreSQL database stream replication exception handling method is characterized by comprising the following steps of:
s1, in a master-slave database node asynchronous mode, a master stream replication controller respectively sends a user writing transaction command message and a master pre-writing log writing completion synchronous message to a slave stream replication controller before and after controlling writing of a pre-writing log, and then controls a master log sending module to send the pre-writing log to a slave log receiving module and receives the slave pre-writing log writing completion synchronous message returned by the slave stream replication controller after the slave database node finishes the playback of the pre-writing log;
s2, when a main database node in an asynchronous mode crashes, the standby flow replication controller detects the state of the standby database node, and if a synchronous message of writing completion of a main pre-written log exists, but the synchronous message of writing completion of the standby pre-written log does not exist, the standby flow replication controller controls the standby database node to execute a received user writing transaction operation command so as to keep consistent with the main database data;
s3, in a synchronous mode of the nodes of the main database and the standby database, the main flow replication controller sends handshake messages to each standby flow replication controller according to a set period, and judges whether the corresponding standby database node crashes or not according to whether handshake message replies are received within a preset time threshold;
s4, when the standby database nodes with the synchronous mode are crashed, judging whether to change the mode of the main database nodes or the standby database nodes or construct a confirmation message returned to the main log sending module by the standby log receiving module according to whether the main database nodes are idle or not.
2. The PostgreSQL database flow replication exception handling method of claim 1, further comprising, prior to step S1, the steps of:
S1A, configuring a main stream replication controller at a main database node, configuring a standby stream replication controller at a standby database node, and setting each stream replication controller to acquire configuration information of the database node;
S1B, judging a current working mode according to configuration information of a database node acquired by a stream replication controller;
if the asynchronous mode is adopted, entering a step S1;
if the synchronous mode is adopted, the process proceeds to step S3.
3. The PostgreSQL database flow replication exception handling method of claim 2, wherein the database node configuration information comprises a home node master-slave configuration, an asynchronous/synchronous mode, an IP configuration, and a peer IP.
4. The PostgreSQL database flow replication exception handling method of claim 1, wherein step S1 comprises the specific steps of:
s11, in an asynchronous mode of the master and standby database nodes, after receiving a user transaction writing command at the master database node, the master stream replication controller synchronously sends a transaction writing command message to the standby stream replication controller;
s12, the main stream replication controller controls the local write-in pre-write log, and after the local write-in pre-write log is written, a main write-in pre-write log write-in completion synchronous message is sent to the standby stream replication controller;
s13, the main stream replication controller controls the main log sending module to send a pre-written log to the standby log receiving module;
s14, the backup stream replication controller sends the pre-written log received by the backup log receiving module to the log playback module for playback, and after the log playback is completed, sends a write completion synchronization message of the backup pre-written log to the main stream replication controller.
5. The postgres database flow replication exception handling method of claim 4, wherein in step S11, after the postgres module of the master database node receives the user write transaction operation in the master-slave database node asynchronous mode, the master replication controller is notified to synchronously send a user write transaction command message to the slave replication controller;
in step S12, after sending a master pre-write log write completion synchronization message to the backup stream replication controller, the master replication controller returns a notification that the pre-write log is completed at the master database node to the user through the postgres module;
in step S14, after the backup database node log playback module completes writing the pre-written log, the backup stream replication controller sends a backup write pre-written log completion synchronization message to the primary stream replication controller.
6. The PostgreSQL database flow replication exception handling method of claim 4, wherein step S2 comprises the specific steps of:
s21, judging whether the node of the main database crashes or not by the standby flow copy controller in the asynchronous mode of the node of the main database and the standby database;
if yes, go to step S22;
if not, returning to the step S21;
s22, checking a synchronous write transaction operation record of the standby flow copy controller and the main flow copy controller;
s23, when the master pre-write log writing completion synchronous message exists, but no corresponding backup pre-write log writing completion synchronous message exists, the backup stream replication controller executes a writing transaction operation command so as to keep the backup database node data and the master database node data consistent.
7. The PostgreSQL database flow replication exception handling method of claim 1, wherein step S3 comprises the specific steps of:
s31, in a master-standby database node synchronization mode, the master flow replication controller sends handshake messages to all standby flow replication controllers according to a set period;
s32, the main stream replication controller judges whether a standby stream replication controller exceeds a preset time threshold and does not answer a handshake message;
if yes, go to step S33;
if not, returning to the step S31;
s33, the main stream replication controller sends an alarm to the operation and maintenance management platform to prompt the occurrence of the abnormal breakdown of the node of the standby database where the corresponding standby stream replication controller is located.
8. The PostgreSQL database flow replication exception handling method of claim 7, wherein step S4 comprises the specific steps of:
s41, judging that all standby database nodes collapse when the standby flow copy controllers in the synchronous mode exceed a preset time threshold and do not answer to handshake messages;
s42, the master stream replication controller judges whether the master database node has ongoing write transaction operation or not;
if not, go to step S43;
if yes, go to step S45;
s43, the master stream replication controller changes the node mode of the master database into an asynchronous mode;
s44, the main flow replication controller judges whether a standby database node is recovered from a crash state;
if yes, the master stream replication controller restores the master database node mode to a synchronous mode, and returns to the step S31;
if not, returning to the step S44;
s45, the main stream replication controller stops the operation that the main log sending module of the main database node sends the pre-written log to the standby log sending module according to the current running state, constructs a log playback completion notification returned to the main log sending module by the standby log receiving module, and returns to the step S43.
9. A PostgreSQL database flow replication exception handling apparatus, comprising:
the asynchronous mode message processing unit (1) is used for respectively sending a user writing transaction command and a master pre-writing log writing completion synchronous message to the standby flow copying controller before and after controlling writing of the pre-writing log in an asynchronous mode of the master and standby database nodes, controlling the master log sending module to send the pre-writing log to the standby log receiving module, and receiving the standby pre-writing log writing completion synchronous message returned by the standby flow copying controller after the standby database nodes complete the pre-writing log playback;
the asynchronous mode exception handling unit (2) is used for detecting the state of the standby database node when the main database node in an asynchronous mode crashes, and controlling the standby database node to execute a writing transaction operation by the standby flow replication controller to synchronize the main and standby database node data if the main pre-writing log writing completion synchronous message exists but the standby pre-writing log writing completion synchronous message does not exist;
the synchronous mode message processing unit (3) is used for sending handshake messages to each standby flow replication controller according to a set period in a synchronous mode of the nodes of the main database and the standby database, and judging whether the corresponding standby database node crashes or not according to whether handshake message replies are received within a preset time threshold or not;
and the synchronous mode exception handling unit (4) is used for judging whether to change the mode of the main database node or the standby log receiving module or constructing a confirmation message returned to the main log sending module by the standby log receiving module according to whether the main database node is idle or not when the standby database node with the synchronous mode is crashed.
10. The PostgreSQL database flow replication exception handling apparatus of claim 9, further comprising:
a stream replication controller configuration unit (5) for configuring a main stream replication controller at a main database node, configuring a standby stream replication controller at a standby database node, and setting each stream replication controller to obtain configuration information of the database node where the stream replication controller is located;
and the current working mode judging unit (6) is used for judging the current working mode according to the configuration information of the database node acquired by the stream replication controller.
CN202110722991.4A 2021-06-28 2021-06-28 PostgreSQL database stream replication exception handling method and device Active CN113626517B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110722991.4A CN113626517B (en) 2021-06-28 2021-06-28 PostgreSQL database stream replication exception handling method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110722991.4A CN113626517B (en) 2021-06-28 2021-06-28 PostgreSQL database stream replication exception handling method and device

Publications (2)

Publication Number Publication Date
CN113626517A CN113626517A (en) 2021-11-09
CN113626517B true CN113626517B (en) 2023-08-15

Family

ID=78378510

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110722991.4A Active CN113626517B (en) 2021-06-28 2021-06-28 PostgreSQL database stream replication exception handling method and device

Country Status (1)

Country Link
CN (1) CN113626517B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108376142A (en) * 2018-01-10 2018-08-07 北京思特奇信息技术股份有限公司 A kind of distributed memory database method of data synchronization and system
CN111382024A (en) * 2018-12-28 2020-07-07 中兴通讯股份有限公司 Method, device and system for monitoring master-slave copying delay of database

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108376142A (en) * 2018-01-10 2018-08-07 北京思特奇信息技术股份有限公司 A kind of distributed memory database method of data synchronization and system
CN111382024A (en) * 2018-12-28 2020-07-07 中兴通讯股份有限公司 Method, device and system for monitoring master-slave copying delay of database

Also Published As

Publication number Publication date
CN113626517A (en) 2021-11-09

Similar Documents

Publication Publication Date Title
US7793060B2 (en) System method and circuit for differential mirroring of data
EP1639470B1 (en) Method, system and program for handling a failover to a remote storage location
JP4387707B2 (en) System and method for bi-directional fault detection at a clustering system site
KR20110044858A (en) Maintain data indetermination in data servers across data centers
CN105069160A (en) Autonomous controllable database based high-availability method and architecture
US20050071708A1 (en) Method, system, and program for recovery from a failure in an asynchronous data copying system
JP2007310701A (en) Database system, storage device, initial duplication method, and log application method
JP2003076592A (en) Data storage system
CN111327467A (en) Server system, disaster recovery backup method thereof and related equipment
US7797571B2 (en) System, method and circuit for mirroring data
CN115794499B (en) Method and system for dual-activity replication data among distributed block storage clusters
CN107357800A (en) A kind of database High Availabitity zero loses solution method
KR101605455B1 (en) Method for Replicationing of Redo Log without Data Loss and System Thereof
CN113254275A (en) MySQL high-availability architecture method based on distributed block device
CN112783694B (en) Long-distance disaster recovery method for high-availability Redis
CN112929438B (en) Business processing method and device of double-site distributed database
WO2021115043A1 (en) Distributed database system and data disaster backup drilling method
CN113626517B (en) PostgreSQL database stream replication exception handling method and device
CN116185697B (en) Container cluster management method, device and system, electronic equipment and storage medium
WO2023151443A1 (en) Synchronizing main database and standby database
JP2004272318A (en) System changeover system, processing method therefor, and processing program therefor
KR20100061983A (en) Method and system for operating management of real-time replicated database
CN112596945A (en) Disaster recovery method based on double main units
CN113467717B (en) Dual-machine volume mirror image management method, device and equipment and readable storage medium
CN117827544B (en) Hot backup system, method, electronic device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant