CN113626517A - PostgreSQL database stream copy exception processing method and device - Google Patents

PostgreSQL database stream copy exception processing method and device Download PDF

Info

Publication number
CN113626517A
CN113626517A CN202110722991.4A CN202110722991A CN113626517A CN 113626517 A CN113626517 A CN 113626517A CN 202110722991 A CN202110722991 A CN 202110722991A CN 113626517 A CN113626517 A CN 113626517A
Authority
CN
China
Prior art keywords
standby
log
main
replication controller
write
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110722991.4A
Other languages
Chinese (zh)
Other versions
CN113626517B (en
Inventor
孙颉
张闯
刘科
黄广奎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202110722991.4A priority Critical patent/CN113626517B/en
Publication of CN113626517A publication Critical patent/CN113626517A/en
Application granted granted Critical
Publication of CN113626517B publication Critical patent/CN113626517B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1469Backup restoration techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/466Transaction processing
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a PostgreSQL database stream replication exception handling method and a PostgreSQL database stream replication exception handling device, wherein the method comprises the following steps: in the asynchronous mode, before and after the write-in of the pre-written log, the main stream replication controller respectively sends a user write transaction command and a main pre-written log write completion synchronization message to the standby stream replication controller, completes the transmission of the pre-written log, and receives the standby pre-written log write completion synchronization message returned by the standby stream replication controller; when the main node crashes and only the main pre-write log is written to complete synchronization but no backup pre-write log is written to complete synchronization information, the backup stream copy controller synchronizes the main data and the backup data; in the synchronous mode, the mainstream copy controller sends a handshake message and judges whether the standby node crashes or not according to whether a handshake message reply is returned or not; and when the standby nodes are crashed, selecting a change mode or constructing a confirmation message according to whether the main point is idle or not. The invention realizes the data backup reliability in the asynchronous mode and the fool-proofing of the main node when the backup node in the synchronous mode crashes simultaneously.

Description

PostgreSQL database stream copy exception processing method and device
Technical Field
The invention belongs to the technical field of database stream exception handling, and particularly relates to a PostgreSQL database stream copy exception handling method and device.
Background
PostgreSQL is a suite of enterprise-level, cross-platform, highly customizable, powerful open source database systems. The reliability, stability, data consistency and the like of the PostgreSQL are confirmed by the database industry, and the PostgreSQL is the only open-source database system capable of realizing zero data loss. With the gradual promotion of the business of PostgreSQL, the PostgreSQL becomes a powerful competitor to the traditional large business databases such as standard Oracle, MySQL and the like in the database market.
In a database system, data backup becomes an important backup means for ensuring data security; with the cluster distribution of the database servers, the database servers can work cooperatively, and when the main server fails, a plurality of computers are allowed to provide the same data.
PostgreSQL provides a stream replication technique to achieve synchronization of data, i.e., the standby server can synchronize the primary server in real time. Firstly, a network connection is established between a main database server and a standby database server by using a TCP protocol provided by PostgreSQL. The server in the primary database immediately transmits the record to the backup database server in a TCP data stream manner after the data changes and the WAL record is generated, without waiting until the pre-written log file is filled. The mechanism greatly reduces the data synchronization delay between the main server and the standby server and has more efficient availability. The stream replication technology provides native technical support for hot standby, high availability, load balancing and the like of the database. PostgreSQL stream replication provides both asynchronous replication and synchronous replication.
PostgreSQL stream replication has problems as follows:
PostgreSQL stream replication is asynchronous by default, and if the primary database server crashes, some committed transactions may not have been replicated to the backup database server, which may result in data loss.
Synchronous replication can ensure that all modifications of a transaction are transferred to one or more synchronous standby database servers. When a synchronous stream replication is requested, each commit of a write transaction will wait until an acknowledgement is received that the commit has been written to the transaction log on disk at both the primary and backup database servers. The method ensures that the WAL logs of the main and standby servers return to success only by double disk-dropping, but increases the waiting time of back-and-forth communication confirmation between the main and standby database servers and prolongs the response time of the foreground client for requesting transactions.
Under the condition of starting the synchronous replication mode, the state of the standby database node can directly influence the primary database node. If all the standby database nodes of the whole distributed cluster architecture crash simultaneously and cannot be recovered, the main database node performs write transaction operation, and the main database node cannot receive confirmation of any standby node reply at the moment, all the operations on the main database node are hung until the standby database nodes recover and return confirmation replies, the main database node recovers to be normal, and if the main database node is an important service, the phenomenon of the hang-up incapability is a fatal defect for users.
Therefore, it is very necessary to provide a method and an apparatus for handling a copy exception of a PostgreSQL database stream to overcome the above-mentioned drawbacks in the prior art.
Disclosure of Invention
The invention provides a method and a device for processing PostgreSQL database stream replication exception, aiming at the defects that when a main database node crashes, if some submitted transactions are not replicated to a backup server, data loss is caused, and under the condition that all backup database nodes crash in a synchronous replication mode, the main database node hangs down in the asynchronous replication mode of the existing PostgreSQL database stream replication in the prior art.
In a first aspect, the present invention provides a method for processing a copy exception of a PostgreSQL database stream, including the following steps:
s1, in an asynchronous mode of nodes of a main database and a standby database, a main stream replication controller respectively sends a user transaction writing command message and a main pre-written log writing completion synchronous message to a standby stream replication controller before and after controlling writing of a pre-written log, then controls a main log sending module to send the pre-written log to a standby log receiving module, and receives a standby pre-written log writing completion synchronous message returned by the standby stream replication controller after the backup database nodes complete playback of the pre-written log;
s2, when the main database node in the asynchronous mode crashes, the main pre-write log write-in completion synchronization message exists, but the backup pre-write log write-in completion synchronization message does not exist, the backup stream replication controller controls the backup database node to execute a user write transaction operation command received by the write transaction operation, and the consistency with the main database data is realized;
s3, in a master-backup database node synchronization mode, the main stream replication controller sends handshake messages to each backup stream replication controller according to a set period, and whether the corresponding backup database node is crashed or not is judged according to whether handshake message replies are received within a preset time threshold or not;
and S4, when all the standby database nodes in the synchronization mode are crashed, judging whether the master database node mode is changed or the standby log receiving module is constructed according to the idleness of the master database node or not, and returning a confirmation message to the master log sending module.
Further, step S1 is preceded by the following steps:
S1A, configuring a main stream replication controller at a main database node, configuring a standby stream replication controller at a standby database node, and setting each stream replication controller to acquire configuration information of the database node where the stream replication controller is located;
S1B, judging a current working mode according to configuration information of the database nodes acquired by the stream replication controller;
if the mode is asynchronous mode, go to step S1;
if the mode is the synchronous mode, the process proceeds to step S3. The flow replication controller reads the configuration information for mode judgment in subsequent message processing and exception handling and primary database node and backup data node location.
Further, the configuration information of the database node comprises master-slave configuration, asynchronous/synchronous mode, IP configuration and opposite terminal IP of the node. The node master-slave configuration judges whether the node where the stream replication controller is located is a master database node or a standby database node, and the IP configuration and the opposite end IP are used for positioning the database node communicated with the current database node.
Further, the step S1 specifically includes the following steps:
s11, in an asynchronous mode of the master database node and the standby database node, after receiving user write transaction operation at the master database node, the master stream replication controller synchronously sends a user write transaction command to the standby stream replication controller;
s12, the main stream replication controller controls local write-in of the pre-written log, and sends a main pre-written log write-in completion synchronization message to the standby stream replication controller after the local pre-written log write-in is completed;
s13, the main stream replication controller controls the main log sending module to send the pre-written log to the standby log receiving module;
s14, the standby stream replication controller sends the pre-written log received by the standby log receiving module to the log playback module for playback, and sends a synchronization message of writing completion of the standby pre-written log to the main stream replication controller after the log playback is completed. And the user writes a transaction command and writes a master pre-written log into a time node for completing the synchronization message recording of the master database node and writing a local pre-written log into the time node, and writes a backup pre-written log into a time node for completing the synchronization message recording of the backup database node and the log playback.
Further, in step S11, in the master/standby database node asynchronous mode, after the postgres module of the master database node receives the user write transaction operation, the master stream replication controller is notified to synchronously send a user write transaction command message to the backup stream replication controller;
in step S12, after sending the primary write completion synchronization message to the standby stream replication controller, the primary stream replication controller returns a completion notification of the pre-write log at the primary database node to the user through the postgres module;
in step S14, after the backup stream replication controller sends a backup pre-write log write completion synchronization message to the primary stream replication controller, a log playback completion notification is returned to the primary stream replication controller via the primary log receiving module via the backup log receiving module. The postgres module is a module for exchanging the postgres database with the user, and is used for receiving write transaction operation of the user and returning a write transaction completion notification to the user; the master log receiving module is started from a master database node and is mainly responsible for sending the pre-written log records generated by the master database node to a lower computer serving as a standby database node; the standby log receiving module is automatically controlled from the standby database node, and the corresponding main log sending module is communicated with the standby database node and is mainly responsible for receiving the pre-written log record sent by the main log sending module serving as the upper computer; the log playback module is started from the standby data node and is mainly responsible for recording the pre-written log received by the standby log receiving module in the standby database node for playback, and data synchronization between the main database node and the standby database node is achieved asynchronously. In the cascade mode, one standby database node may start three modules at the same time, namely, the standby database node is responsible for receiving and playing back the pre-written log record of the upper computer and continuously sending the pre-written log record to the lower computer.
Further, the step S2 specifically includes the following steps:
s21, in the master and standby database node asynchronous mode, judging whether the master database node is crashed or not by the standby flow replication controller;
if yes, go to step S22;
if not, returning to the step S21;
s22, the standby stream replication controller checks the synchronous write transaction operation record with the main stream replication controller;
and S23, when the main write-in completion synchronization message exists but the corresponding backup pre-write log write-in completion synchronization message does not exist, the backup stream replication controller executes a write transaction operation command to realize that the backup database node data is consistent with the main database node data. In the original asynchronous replication mode, the postgres module completes the writing of the local pre-written log in the main database, and returns success when the backup database node does not need to wait for receiving the pre-written log and completing log playback when the transaction is submitted on the node of the main database; when the main database node is crashed, the log writing completion of the user writing transaction in the main database node occurs, and the standby database node does not receive the pre-written log synchronized with the main database node, so that the main and standby data are not synchronized.
Further, the step S3 specifically includes the following steps:
s31, in a master-backup database node synchronization mode, a main stream replication controller sends handshake messages to all backup stream replication controllers according to a set period;
s32, the main stream replication controller judges whether a standby stream replication controller exceeds a preset time threshold and does not reply to a handshake message;
if yes, go to step S33;
if not, returning to the step S31;
and S33, the main stream replication controller sends an alarm to the operation and maintenance management platform to prompt that the backup database node where the corresponding backup stream replication controller is located is crashed abnormally. And discovering the abnormal standby database nodes in time through handshake messages.
Further, the step S4 specifically includes the following steps:
s41, when all standby flow replication controllers in the synchronous mode exceed a preset time threshold and do not reply handshake messages, judging that all standby database nodes are crashed;
s42, judging whether the main database node has ongoing write transaction operation by the main stream replication controller;
if not, go to step S43;
if yes, go to step S45;
s43, the master flow replication controller changes the master database node mode into an asynchronous mode;
s44, judging whether a standby database node is recovered from a crash state by the main stream replication controller;
if yes, the main stream replication controller restores the main database node mode to the synchronous mode, and returns to the step S31;
if not, returning to the step S44;
and S45, the master stream replication controller stops the master log sending module of the master database node from sending the pre-written log to the standby log sending module according to the current progress state, constructs a standby log receiving module, returns a log playback completion notice to the master log sending module, and returns to the step S43. When synchronous replication is carried out, the postgres module can send successful return information to a user only after receiving confirmation information returned by the main log sending module, but the state of the standby node directly affects the main database, if all the standby database nodes of the whole distributed cluster architecture crash simultaneously and cannot be recovered, the main database node carries out write transaction operation, at the moment, the main database node cannot receive confirmation of any standby database node reply, all the operations on the main database node are hung until the standby database node recovers and returns confirmation reply; the primary database point returns to normal. In the invention, the problem of timely abnormity discovery is solved through handshake messages, and the master database node is prevented from being hung up by changing a synchronous mode to be asynchronous.
In a second aspect, the present invention provides a PostgreSQL database stream copy exception handling apparatus, including:
the asynchronous mode message processing unit is used for respectively sending a user write transaction command and a main write completion synchronous message to the standby stream replication controller before and after the main stream replication controller controls the write of the pre-written log in the asynchronous mode of the nodes of the main and standby databases, then controlling the main log sending module to send the pre-written log to the standby log receiving module, and receiving the write completion synchronous message of the standby pre-written log returned by the standby stream replication controller after the backup database nodes complete the playback of the pre-written log;
the asynchronous mode exception handling unit is used for controlling the standby database node to execute write transaction operation and synchronize the data of the main and standby database nodes when the main database node in the asynchronous mode crashes and a main pre-write log write-in completion synchronization message exists but no standby pre-write log write-in completion synchronization message exists;
a synchronous mode message processing unit, configured to, in a master-backup database node synchronous mode, send a handshake message to each backup stream replication controller according to a set period by a master stream replication controller, and determine whether a corresponding backup database node crashes according to whether a handshake message reply is received within a preset time threshold;
and the synchronous mode exception processing unit is used for judging whether the master database node mode is changed or constructing a confirmation message returned by the master log receiving module to the master log sending module according to the condition that whether the master database node is idle or not when all the standby database nodes in the synchronous mode crash.
Further, still include:
the system comprises a flow replication controller configuration unit, a main flow replication controller configuration unit, a standby flow replication controller configuration unit and a flow replication controller configuration unit, wherein the flow replication controller configuration unit is used for configuring a main flow replication controller at a main database node, configuring a standby flow replication controller at a standby database node and setting each flow replication controller to acquire configuration information of the database node where the flow replication controller is located;
and the current working mode judging unit is used for judging the current working mode according to the configuration information of the database nodes acquired by the stream replication controller.
Further, the asynchronous mode message processing unit includes:
a write transaction message sending subunit, configured to, in an asynchronous mode of the master and standby database nodes, send a user write transaction command message to the standby stream replication controller synchronously after the master database node receives the user write transaction command;
a write completion synchronization message sending subunit, configured to control the local write-in pre-written log by the primary stream replication controller, and send a primary pre-written log write completion synchronization message to the backup stream replication controller after the local write-in of the pre-written log is completed;
the pre-written log issuing subunit is used for controlling the main log sending module to send the pre-written log to the standby log receiving module by the main stream replication controller;
and the backup write completion synchronization message sending subunit is used for sending the pre-written log received by the backup log receiving module to the log playback module for playback by the backup stream replication controller, and sending a backup pre-written log write completion synchronization message to the main stream replication controller after the log playback is completed.
Further, the asynchronous mode exception handling unit includes:
the backup flow replication controller is used for judging whether the primary database node is crashed or not in the primary and backup database nodes in the asynchronous mode;
the synchronous write transaction operation record checking subunit is used for checking the synchronous write transaction operation record of the standby stream replication controller and the main stream replication controller when the main database node crashes;
and the data synchronization subunit is used for executing a write transaction operation command by the standby stream replication controller when the main pre-write log write completion synchronization message exists but no corresponding standby pre-write log write completion synchronization message exists, so as to keep the standby database node data consistent with the main database node data.
Further, the synchronization mode message processing unit includes:
a handshake message sending subunit, configured to send handshake messages to all standby stream replication controllers according to a set period in a primary/standby database node synchronization mode;
a handshake message reply judgment subunit, configured to judge, by the primary stream replication controller, whether there is a handshake message that the backup stream replication controller does not reply after exceeding a preset time threshold;
and the synchronous mode abnormity warning subunit is used for sending a warning to the operation and maintenance management platform by the main flow replication controller when the standby flow replication controller does not reply the handshake message after exceeding the preset time threshold, and prompting the occurrence of abnormity of crash of the standby database node where the corresponding standby flow replication controller is located.
Further, the synchronous mode exception handling unit includes:
the backup database full crash judging subunit is used for judging that all backup database nodes crash when the backup stream replication controllers in the synchronous mode exceed the preset time threshold and do not reply the handshake message;
a main database node idle judging subunit, a main stream replication controller judging whether the main database node has ongoing write transaction operation;
the mode change subunit is used for changing the mode of the main database node into an asynchronous mode by the main flow replication controller when the main database node has no ongoing write transaction operation;
a backup database node recovery judging subunit, configured to judge, by the master stream replication controller, whether a backup database node recovers from the crash state;
the mode recovery subunit is used for recovering the mode of the main database node into a synchronous mode by the main stream replication controller when the standby database node is recovered from the crash state;
and the return message construction subunit is used for stopping the operation of sending the pre-written log to the standby log sending module by the main log sending module of the main database node according to the current progress state when the main database node has ongoing write transaction operation, and constructing a standby log receiving module to return a log playback completion notice to the main log sending module.
The beneficial effect of the invention is that,
the PostgreSQL database stream replication exception processing method and device provided by the invention have the advantages that in an asynchronous replication mode, when a main database node crashes, some submitted transactions may not be replicated to data of a standby database node, log playback is carried out in time, and the data of the main database node and the data of the standby database node are synchronized; under the extreme condition that all the standby database nodes are crashed simultaneously in the synchronous replication mode, the main database node is prevented from hanging up and waiting, and the user can be ensured to normally perform subsequent operation.
In addition, the invention has reliable design principle, simple structure and very wide application prospect.
Therefore, compared with the prior art, the invention has prominent substantive features and remarkable progress, and the beneficial effects of the implementation are also obvious.
Drawings
In order to more clearly illustrate the embodiments or technical solutions in the prior art of the present invention, the drawings used in the description of the embodiments or prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained based on these drawings without creative efforts.
Fig. 1 is a first flowchart of a processing method for exception handling of PostgreSQL database stream replication according to the present invention.
Fig. 2 is a schematic flow diagram of a processing method for processing a PostgreSQL database stream replication exception according to the present invention.
Fig. 3 is a schematic diagram of the asynchronous mode data process of the PostgreSQL database of the invention.
Fig. 4 is a schematic diagram of a PostgreSQL database stream copy exception handling device according to the present invention.
In the figure, 1-asynchronous mode message processing unit; 1.1-write transaction message sending subunit; 1.2-write completion synchronization message sending subunit; 1.3-pre-write log issuing subunit; 1.4-prepare the write completion synchronization message sending subunit; 2-asynchronous mode exception handling unit; 2.1-crash judgment subunit of main database node in asynchronous mode; 2.2-synchronous write transaction operation record checking subunit; 2.3-data synchronization subunit; 3-a synchronous mode message handling unit; 3.1-handshake messaging subunit; 3.2-handshake message reply judgment subunit; 3.3-synchronous mode exception warning subunit; 4-a synchronous mode exception handling unit; 4.1-prepare the database total crash judging subunit; 4.2-main database node idle judging subunit; 4.3-mode change subunit; 4.4-backup database node recovery judgment subunit; 4.5-mode recovery subunit; 4.6-return message construction subunit; 5-a stream replication controller configuration unit; 6-current working mode judging unit.
Detailed Description
In order to make those skilled in the art better understand the technical solution of the present invention, the technical solution in the embodiment of the present invention will be clearly and completely described below with reference to the drawings in the embodiment of the present invention, and it is obvious that the described embodiment is only a part of the embodiment of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example 1:
as shown in fig. 1, the present invention provides a method for processing a copy exception of a PostgreSQL database stream, which includes the following steps:
s1, in an asynchronous mode of a master database node and a backup database node, a main stream replication controller respectively sends a user transaction writing command and a main writing completion synchronous message to a backup stream replication controller before and after controlling to write a pre-written log, then controls a main log sending module to send the pre-written log to a backup log receiving module, and receives a backup pre-written log writing completion synchronous message returned by the backup stream replication controller after the backup database node completes the playback of the pre-written log;
s2, when the main database node in the asynchronous mode crashes, the standby flow replication controller detects the state of the standby database node, if the main pre-write log write-in completion synchronization message exists but the standby pre-write log write-in completion synchronization message does not exist, the standby flow replication controller controls the standby database node to execute a user write transaction operation command, and the consistency with the main database data is realized;
s3, in a master-backup database node synchronization mode, the main stream replication controller sends handshake messages to each backup stream replication controller according to a set period, and whether the corresponding backup database node is crashed or not is judged according to whether handshake message replies are received within a preset time threshold or not;
and S4, when all the standby database nodes in the synchronization mode are crashed, judging whether the master database node mode is changed or the standby log receiving module is constructed according to the idleness of the master database node or not, and returning a confirmation message to the master log sending module.
Example 2:
as shown in fig. 2, the present invention provides a method for processing a copy exception of a PostgreSQL database stream, which is characterized by comprising the following steps:
S1A, configuring a main stream replication controller at a main database node, configuring a standby stream replication controller at a standby database node, and setting each stream replication controller to acquire configuration information of the database node where the stream replication controller is located; the database node configuration information comprises the node master-slave configuration, an asynchronous/synchronous mode, IP configuration and opposite terminal IP;
S1B, judging a current working mode according to configuration information of the database nodes acquired by the stream replication controller;
if the mode is asynchronous mode, go to step S1;
if the mode is the synchronous mode, go to step S3;
s1, in an asynchronous mode of a master database node and a backup database node, a main stream replication controller respectively sends a user transaction writing command and a main writing completion synchronous message to a backup stream replication controller before and after controlling to write a pre-written log, then controls a main log sending module to send the pre-written log to a backup log receiving module, and receives a backup pre-written log writing completion synchronous message returned by the backup stream replication controller after the backup database node completes the playback of the pre-written log; the method comprises the following specific steps:
s11, in an asynchronous mode of the master database node and the standby database node, after receiving user write transaction operation at the master database node, the master stream replication controller synchronously sends a user write transaction command to the standby stream replication controller;
s12, the main stream replication controller controls local write-in of the pre-written log, and sends a main write-in completion synchronization message to the standby stream replication controller after the local write-in of the pre-written log is completed;
s13, the main stream replication controller controls the main log sending module to send the pre-written log to the standby log receiving module;
s14, the standby stream replication controller sends the pre-written log received by the standby log receiving module to a log playback module for playback, and sends a standby pre-written log write-in completion synchronization message to the main stream replication controller after the log playback is completed;
s2, when the main database node in the asynchronous mode crashes, the standby flow replication controller detects the state of the standby database node, if the main pre-write log write-in completion synchronization message exists but the standby pre-write log write-in completion synchronization message does not exist, the standby flow replication controller controls the standby database node to execute the received user write transaction operation, and the consistency with the main database data is realized; the method comprises the following specific steps:
s21, in the master and standby database node asynchronous mode, judging whether the master database node is crashed or not by the standby flow replication controller;
if yes, go to step S22;
if not, returning to the step S21;
s22, the standby stream replication controller checks the synchronous write transaction operation record with the main stream replication controller;
s23, when the main pre-write log write-in completion synchronization message exists but no corresponding backup pre-write log write-in completion synchronization message exists, the backup stream replication controller executes a write transaction operation command to keep the backup database node data consistent with the main database node data;
s3, in a master-backup database node synchronization mode, the main stream replication controller sends handshake messages to each backup stream replication controller according to a set period, and whether the corresponding backup database node is crashed or not is judged according to whether handshake message replies are received within a preset time threshold or not; the method comprises the following specific steps:
s31, in a master-backup database node synchronization mode, a main stream replication controller sends handshake messages to all backup stream replication controllers according to a set period;
s32, the main stream replication controller judges whether a standby stream replication controller exceeds a preset time threshold and does not reply to a handshake message;
if yes, go to step S33;
if not, returning to the step S31;
s33, the main stream replication controller sends an alarm to the operation and maintenance management platform to prompt that the backup database node where the corresponding backup stream replication controller is located is crashed abnormally;
s4, when all the standby database nodes in the synchronization mode are crashed, judging whether the master database node mode is changed or a standby log receiving module is constructed according to the condition whether the master database node is idle or not, and returning a confirmation message to the master log sending module; the method comprises the following specific steps:
s41, when all standby flow replication controllers in the synchronous mode exceed a preset time threshold and do not reply handshake messages, judging that all standby database nodes are crashed;
s42, judging whether the main database node has ongoing write transaction operation by the main stream replication controller;
if not, go to step S43;
if yes, go to step S45;
s43, the master flow replication controller changes the master database node mode into an asynchronous mode;
s44, judging whether a standby database node is recovered from a crash state by the main stream replication controller;
if yes, the main stream replication controller restores the main database node mode to the synchronous mode, and returns to the step S31;
if not, returning to the step S44;
and S45, the master stream replication controller stops the master log sending module of the master database node from sending the pre-written log to the standby log sending module according to the current progress state, constructs a standby log receiving module, returns a log playback completion notice to the master log sending module, and returns to the step S43.
In the foregoing embodiment 2, as shown in fig. 3, in step S11, in the master-slave database node asynchronous mode, after the postgres module of the master database node receives a user write transaction operation, the master database node notifies the master stream replication controller to synchronously send a user write transaction command to the backup stream replication controller;
in step S12, after sending the primary write completion synchronization message to the standby stream replication controller, the primary stream replication controller returns a completion notification of the pre-write log at the primary database node to the user through the postgres module;
in step S14, after the backup stream replication controller sends a backup pre-write log write completion synchronization message to the primary stream replication controller, a log playback completion notification is returned to the primary stream replication controller via the primary log receiving module via the backup log receiving module.
Example 3:
as shown in fig. 4, the present invention provides a device for handling a copy exception of a PostgreSQL database stream, including:
an asynchronous mode message processing unit 1, configured to send a user write transaction command and a main write completion synchronization message to a standby stream replication controller before and after controlling writing of a pre-written log in an asynchronous mode at a master and standby database node, respectively, and then control a main log sending module to send the pre-written log to a standby log receiving module, and receive a standby pre-written log write completion synchronization message returned by the standby stream replication controller after completing playback of the pre-written log at the standby database node; the asynchronous mode message processing unit 1 includes:
a write transaction message sending subunit 1.1, configured to, in an asynchronous mode of the master and standby database nodes, send a user write transaction command to the standby stream replication controller synchronously after the master database node receives a user write transaction operation;
a write completion synchronization message sending subunit 1.2, configured to control the local write-in pre-written log by the primary stream replication controller, and send a primary write completion synchronization message to the standby stream replication controller after the local write-in pre-written log is completed;
the pre-written log issuing subunit 1.3 is used for controlling the main log sending module to send the pre-written log to the standby log receiving module by the main stream replication controller;
a spare write completion synchronization message sending subunit 1.4, configured to send, by the spare stream replication controller, the pre-written log received by the spare log receiving module to the log playback module for playback, and send, after the log playback is completed, a spare pre-written log write completion synchronization message to the main stream replication controller;
an asynchronous mode exception handling unit 2, configured to detect, by the backup stream replication controller, a backup database node state when a primary database node in an asynchronous mode crashes, and if there is a primary write-in log write-in completion synchronization message but there is no backup write-in log write-in completion synchronization message, control, by the backup stream replication controller, the backup database node to perform a write transaction operation to synchronize data of the primary and backup database nodes; the asynchronous mode exception processing unit 2 includes:
the primary database node crash judging subunit 2.1 of the asynchronous mode is used for judging whether the primary database node crash occurs or not in the primary and standby database node asynchronous mode by the standby stream replication controller;
a synchronous write transaction operation record checking subunit 2.2, configured to check, when a primary database node crashes, a synchronous write transaction operation record between the backup stream replication controller and the primary stream replication controller;
the data synchronization subunit 2.3 is configured to, when there is a primary write-in completion synchronization message for the pre-write log, but there is no corresponding secondary write-in completion synchronization message for the pre-write log, execute a write transaction operation command by the secondary stream replication controller, so as to keep the secondary database node data consistent with the primary database node data;
a synchronous mode message processing unit 3, configured to send, in a primary and secondary database node synchronous mode, a handshake message to each backup stream replication controller according to a set period by a primary stream replication controller, and determine whether a corresponding backup database node crashes according to whether a handshake message reply is received within a preset time threshold; the synchronization mode message processing unit 3 includes:
a handshake message sending subunit 3.1, configured to send handshake messages to all standby stream replication controllers according to a set period in a primary and standby database node synchronization mode;
a handshake message reply judgment subunit 3.2, configured to judge, by the primary stream replication controller, whether there is a handshake message that the backup stream replication controller does not reply after exceeding a preset time threshold;
a synchronous mode exception warning subunit 3.3, configured to send an alert to the operation and maintenance management platform by the primary stream replication controller when the standby stream replication controller does not reply the handshake message after exceeding the preset time threshold, so as to prompt an exception that a backup database node where the corresponding standby stream replication controller is located crashes;
the synchronous mode exception processing unit 4 is used for judging whether the master database node mode is changed or constructing a confirmation message returned by the master log sending module to the master log receiving module according to the condition that whether the master database node is idle or not when all the standby database nodes in the synchronous mode are crashed; the synchronous mode exception processing unit 4 includes:
a backup database full crash determining subunit 4.1, configured to determine that all backup database nodes crash when all backup stream replication controllers in the synchronization mode exceed a preset time threshold and do not reply to the handshake message;
a main database node idle judging subunit 4.2, a main stream replication controller judges whether the main database node has ongoing write transaction operation;
a mode change subunit 4.3, configured to change the mode of the master database node to an asynchronous mode by the master flow replication controller when there is no ongoing write transaction operation on the master database node;
a backup database node recovery judging subunit 4.4, configured to judge, by the master stream replication controller, whether a backup database node recovers from a crash state;
the mode recovery subunit 4.5 is used for recovering the mode of the main database node into a synchronous mode by the main flow replication controller when the standby database node is recovered from the crash state;
a return message construction subunit 4.6, configured to, when there is an ongoing write transaction operation in the master database node, stop, according to the current progress status, the master log sending module of the master database node from sending a pre-write log to the standby log sending module by the master replication controller, and construct a standby log receiving module to return a log playback completion notification to the master log sending module;
a stream replication controller configuration unit 5, configured to configure a main stream replication controller at a main database node, configure a standby stream replication controller at a standby database node, and set each stream replication controller to acquire configuration information of the database node where the stream replication controller is located;
and the current working mode judging unit 6 is used for judging the current working mode according to the configuration information of the database nodes acquired by the stream replication controller.
Although the present invention has been described in detail by referring to the drawings in connection with the preferred embodiments, the present invention is not limited thereto. Various equivalent modifications or substitutions can be made on the embodiments of the present invention by those skilled in the art without departing from the spirit and scope of the present invention, and these modifications or substitutions are within the scope of the present invention/any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims (10)

1. A PostgreSQL database stream replication exception handling method is characterized by comprising the following steps:
s1, in an asynchronous mode of nodes of a main database and a standby database, a main stream replication controller respectively sends a user transaction writing command message and a main pre-written log writing completion synchronous message to a standby stream replication controller before and after controlling writing of a pre-written log, then controls a main log sending module to send the pre-written log to a standby log receiving module, and receives a standby pre-written log writing completion synchronous message returned by the standby stream replication controller after the backup database nodes complete playback of the pre-written log;
s2, when the main database node in the asynchronous mode crashes, the standby flow replication controller detects the state of the standby database node, if the main pre-written log write-in completion synchronization message exists but the standby pre-written log write-in completion synchronization message does not exist, the standby flow replication controller controls the standby database node to execute the received user write transaction operation command, and the consistency with the main database data is realized;
s3, in a master-backup database node synchronization mode, the main stream replication controller sends handshake messages to each backup stream replication controller according to a set period, and whether the corresponding backup database node is crashed or not is judged according to whether handshake message replies are received within a preset time threshold or not;
and S4, when all the standby database nodes in the synchronization mode are crashed, judging whether the master database node mode is changed or the standby log receiving module is constructed according to the idleness of the master database node or not, and returning a confirmation message to the master log sending module.
2. The PostgreSQL database stream copy exception handling method according to claim 1, wherein step S1 is preceded by the steps of:
S1A, configuring a main stream replication controller at a main database node, configuring a standby stream replication controller at a standby database node, and setting each stream replication controller to acquire configuration information of the database node where the stream replication controller is located;
S1B, judging a current working mode according to configuration information of the database nodes acquired by the stream replication controller;
if the mode is asynchronous mode, go to step S1;
if the mode is the synchronous mode, the process proceeds to step S3.
3. The PostgreSQL database stream replication exception handling method according to claim 2, wherein the database node configuration information includes a home node master-slave configuration, an asynchronous/synchronous mode, an IP configuration, and a peer IP.
4. The method for processing a copy exception of a PostgreSQL database stream according to claim 1, wherein step S1 comprises the following steps:
s11, in an asynchronous mode of the master database node and the standby database node, after receiving a user write transaction command at the master database node, the master stream replication controller synchronously sends a write transaction command message to the standby stream replication controller;
s12, the main stream replication controller controls local write-in of the pre-written log, and sends a main pre-written log write-in completion synchronization message to the standby stream replication controller after the local pre-written log write-in is completed;
s13, the main stream replication controller controls the main log sending module to send the pre-written log to the standby log receiving module;
s14, the standby stream replication controller sends the pre-written log received by the standby log receiving module to the log playback module for playback, and sends a synchronization message of writing completion of the standby pre-written log to the main stream replication controller after the log playback is completed.
5. The method for processing exception replication of PostgreSQL database flow according to claim 4, wherein in step S11, in the asynchronous mode of the primary and backup database nodes, after the postgres module of the primary database node receives the user write transaction operation, the primary stream replication controller is notified to synchronously send a user write transaction command message to the backup stream replication controller;
in step S12, after sending the main pre-write log write completion synchronization message to the standby stream replication controller, the main stream replication controller returns a pre-write log completion notification at the main database node to the user through the postgres module;
in step S14, after the backup database node log playback module completes writing the pre-written log, the backup stream replication controller sends a synchronization message for completing writing the pre-written log to the primary stream replication controller.
6. The method for processing the replication exception of the PostgreSQL database stream according to claim 4, wherein the step S2 is as follows:
s21, in the master and standby database node asynchronous mode, judging whether the master database node is crashed or not by the standby flow replication controller;
if yes, go to step S22;
if not, returning to the step S21;
s22, the standby stream replication controller checks the synchronous write transaction operation record with the main stream replication controller;
and S23, when the main pre-write log write-in completion synchronization message exists but no corresponding backup pre-write log write-in completion synchronization message exists, the backup stream replication controller executes a write transaction operation command to keep the backup database node data consistent with the main database node data.
7. The method for processing a copy exception of a PostgreSQL database stream according to claim 1, wherein step S3 comprises the following steps:
s31, in a master-backup database node synchronization mode, a main stream replication controller sends handshake messages to all backup stream replication controllers according to a set period;
s32, the main stream replication controller judges whether a standby stream replication controller exceeds a preset time threshold and does not reply to a handshake message;
if yes, go to step S33;
if not, returning to the step S31;
and S33, the main stream replication controller sends an alarm to the operation and maintenance management platform to prompt that the backup database node where the corresponding backup stream replication controller is located is crashed abnormally.
8. The method for processing a copy exception of a PostgreSQL database stream according to claim 7, wherein step S4 comprises the following steps:
s41, when all standby flow replication controllers in the synchronous mode exceed a preset time threshold and do not reply handshake messages, judging that all standby database nodes are crashed;
s42, judging whether the main database node has ongoing write transaction operation by the main stream replication controller;
if not, go to step S43;
if yes, go to step S45;
s43, the master flow replication controller changes the master database node mode into an asynchronous mode;
s44, judging whether a standby database node is recovered from a crash state by the main stream replication controller;
if yes, the main stream replication controller restores the main database node mode to the synchronous mode, and returns to the step S31;
if not, returning to the step S44;
and S45, the master stream replication controller stops the master log sending module of the master database node from sending the pre-written log to the standby log sending module according to the current progress state, constructs a standby log receiving module, returns a log playback completion notice to the master log sending module, and returns to the step S43.
9. A PostgreSQL database stream replication exception handling apparatus, comprising:
the asynchronous mode message processing unit (1) is used for respectively sending a user write transaction command and a main pre-write log write completion synchronous message to the standby stream replication controller before and after the main stream replication controller controls the write of the pre-write log in an asynchronous mode of the main and standby database nodes, then controlling the main log sending module to send the pre-write log to the standby log receiving module, and receiving the standby pre-write log write completion synchronous message returned by the standby stream replication controller after the backup database nodes complete the playback of the pre-write log;
the asynchronous mode exception handling unit (2) is used for detecting the state of a standby database node by a standby stream replication controller when the main database node in an asynchronous mode crashes, and controlling the standby database node to execute write transaction operation and synchronize the data of the main and standby database nodes if a main pre-write log write-in completion synchronization message exists but no standby pre-write log write-in completion synchronization message exists;
a synchronous mode message processing unit (3) for sending handshake messages to each backup stream copy controller according to a set period in a master-backup database node synchronous mode, and judging whether the corresponding backup database node is crashed according to whether handshake message replies are received within a preset time threshold;
and the synchronous mode exception processing unit (4) is used for judging whether the master and standby database node modes are changed or constructing a confirmation message returned to the master log sending module by the standby log receiving module according to the condition that whether the master database node is idle or not when all the standby database nodes in the synchronous mode crash.
10. The PostgreSQL database stream replication exception handling apparatus of claim 9, further comprising:
a stream replication controller configuration unit (5) for configuring a main stream replication controller at a main database node, configuring a standby stream replication controller at a standby database node, and setting each stream replication controller to acquire configuration information of the database node where the stream replication controller is located;
and the current working mode judging unit (6) is used for judging the current working mode according to the configuration information of the database nodes acquired by the stream replication controller.
CN202110722991.4A 2021-06-28 2021-06-28 PostgreSQL database stream replication exception handling method and device Active CN113626517B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110722991.4A CN113626517B (en) 2021-06-28 2021-06-28 PostgreSQL database stream replication exception handling method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110722991.4A CN113626517B (en) 2021-06-28 2021-06-28 PostgreSQL database stream replication exception handling method and device

Publications (2)

Publication Number Publication Date
CN113626517A true CN113626517A (en) 2021-11-09
CN113626517B CN113626517B (en) 2023-08-15

Family

ID=78378510

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110722991.4A Active CN113626517B (en) 2021-06-28 2021-06-28 PostgreSQL database stream replication exception handling method and device

Country Status (1)

Country Link
CN (1) CN113626517B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108376142A (en) * 2018-01-10 2018-08-07 北京思特奇信息技术股份有限公司 A kind of distributed memory database method of data synchronization and system
CN111382024A (en) * 2018-12-28 2020-07-07 中兴通讯股份有限公司 Method, device and system for monitoring master-slave copying delay of database

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108376142A (en) * 2018-01-10 2018-08-07 北京思特奇信息技术股份有限公司 A kind of distributed memory database method of data synchronization and system
CN111382024A (en) * 2018-12-28 2020-07-07 中兴通讯股份有限公司 Method, device and system for monitoring master-slave copying delay of database

Also Published As

Publication number Publication date
CN113626517B (en) 2023-08-15

Similar Documents

Publication Publication Date Title
US10713135B2 (en) Data disaster recovery method, device and system
US7793060B2 (en) System method and circuit for differential mirroring of data
EP2281240B1 (en) Maintaining data integrity in data servers across data centers
US7694177B2 (en) Method and system for resynchronizing data between a primary and mirror data storage system
US7437598B2 (en) System, method and circuit for mirroring data
CN106776121B (en) Data disaster recovery device, system and method
CN106815097A (en) Database disaster tolerance system and method
CN111327467A (en) Server system, disaster recovery backup method thereof and related equipment
CN105069160A (en) Autonomous controllable database based high-availability method and architecture
US7797571B2 (en) System, method and circuit for mirroring data
US7069317B1 (en) System and method for providing out-of-band notification of service changes
CN109117310A (en) Realize disaster tolerance system, the method and device of data backup
WO2021115043A1 (en) Distributed database system and data disaster backup drilling method
CN112181723B (en) Financial disaster recovery method and device, storage medium and electronic equipment
CN107357800A (en) A kind of database High Availabitity zero loses solution method
CN113254275A (en) MySQL high-availability architecture method based on distributed block device
KR101605455B1 (en) Method for Replicationing of Redo Log without Data Loss and System Thereof
CN112783694B (en) Long-distance disaster recovery method for high-availability Redis
CN112929438B (en) Business processing method and device of double-site distributed database
WO2023151443A1 (en) Synchronizing main database and standby database
CN113626517B (en) PostgreSQL database stream replication exception handling method and device
JP2004272318A (en) System changeover system, processing method therefor, and processing program therefor
KR20100061983A (en) Method and system for operating management of real-time replicated database
CN117827544B (en) Hot backup system, method, electronic device and storage medium
CN113467717B (en) Dual-machine volume mirror image management method, device and equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant