CN113986594A - Method, system, storage medium and server for real-time database fault recovery - Google Patents

Method, system, storage medium and server for real-time database fault recovery Download PDF

Info

Publication number
CN113986594A
CN113986594A CN202111264932.3A CN202111264932A CN113986594A CN 113986594 A CN113986594 A CN 113986594A CN 202111264932 A CN202111264932 A CN 202111264932A CN 113986594 A CN113986594 A CN 113986594A
Authority
CN
China
Prior art keywords
service process
service
shared memory
real
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111264932.3A
Other languages
Chinese (zh)
Inventor
何清
王毅
王奕飞
谢贝贝
何新
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Thermal Power Research Institute Co Ltd
Xian TPRI Power Station Information Technology Co Ltd
Original Assignee
Xian Thermal Power Research Institute Co Ltd
Xian TPRI Power Station Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Thermal Power Research Institute Co Ltd, Xian TPRI Power Station Information Technology Co Ltd filed Critical Xian Thermal Power Research Institute Co Ltd
Priority to CN202111264932.3A priority Critical patent/CN113986594A/en
Publication of CN113986594A publication Critical patent/CN113986594A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0751Error or fault detection not based on redundancy
    • G06F11/0754Error or fault detection not based on redundancy by exceeding limits
    • G06F11/0757Error or fault detection not based on redundancy by exceeding limits by exceeding a time limit, i.e. time-out, e.g. watchdogs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0793Remedial or corrective actions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5022Mechanisms to release resources

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Quality & Reliability (AREA)
  • Retry When Errors Occur (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Hardware Redundancy (AREA)

Abstract

A method, a system, a storage medium and a server for real-time database fault recovery are provided, wherein the method comprises the following steps: starting a watchdog service process of a real-time database, and initializing a shared memory; starting and initializing other service processes of a real-time database, wherein the other service processes of the real-time database comprise a communication service process, a basic service process, a snapshot service process and a historical service process; other service processes apply for the shared memory needing the hot standby data to the watchdog service process and read and write the data; other service processes update the process running state to the shared memory at regular time; if the running state of a certain service process is not updated after overtime, restarting the service process; and the restarted service process applies for retrieving data from the shared memory hosting process for field recovery. The invention can realize quick service restart and data recovery, and greatly improves the service stability and data security of the real-time database.

Description

Method, system, storage medium and server for real-time database fault recovery
Technical Field
The invention belongs to the technical field of real-time database development, and particularly relates to a method, a system, a storage medium and a server for real-time database fault recovery.
Background
The implementation basis of the innovation strategy of the manufacturing industry of all countries is the collection and feature analysis of industrial big data and careless environment built for future manufacturing systems. The real-time database is a core service of industrial big data and is also the foundation of industrial 4.0.
The power generation enterprises need to collect mass production real-time data from industrial control systems such as DCS, auxiliary control and the like and store the mass production real-time data into a real-time database, the safety requirement on the data stored in the database is very high, and the data stored in the database is not allowed to be lost due to software defects.
Generally, a real-time database generally adopts a data caching strategy in order to improve the storage performance of mass production real-time data. However, this will cause the data that has been put into the storage to be not really archived, and if the service process crashes at this time, the data will be permanently lost. Meanwhile, when a certain service of the real-time database is crashed and restarted, a large amount of basic data in a disk may need to be loaded for initialization, and the service starting process is slow, so that the service cannot be quickly recovered, and long-time service interruption is caused.
Disclosure of Invention
The invention aims to provide a method, a system, a storage medium and a server for recovering a real-time database fault, which solve the problems of cache data loss and slow start initialization after a service process of a real-time database crashes.
In order to achieve the purpose, the invention has the following technical scheme:
in a first aspect, a method for real-time database failure recovery is provided, which includes the following steps:
starting a watchdog service process of a real-time database, and initializing a shared memory;
starting and initializing other service processes of a real-time database, wherein the other service processes of the real-time database comprise a communication service process, a basic service process, a snapshot service process and a historical service process;
other service processes apply for the shared memory needing the hot standby data to the watchdog service process and read and write the data;
other service processes update the process running state to the shared memory at regular time;
if the running state of a certain service process is not updated after overtime, restarting the service process;
and the restarted service process applies for retrieving data from the shared memory hosting process for field recovery.
As a preferred scheme of the method of the invention, the watchdog service process monitors other service processes on one hand, and automatically restarts the corresponding service process if abnormality or crash is found; on the other hand, all shared memories needing hot standby data are managed, so that the shared memories are not recycled when other service processes are abnormally quitted;
the watchdog service process can be detected backwards and restarted if an exception or crash is found.
As a preferred scheme of the method of the present invention, the communication service process forwards the function called by the API to the corresponding service process for processing, and returns the response message to the API caller.
As a preferred scheme of the method, the basic service process loads the measuring point table and stores the measuring point table in the memory of the shared memory Hash table for other service processes to inquire.
As a preferred scheme of the method of the present invention, the snapshot service process is used to implement caching and compression of snapshot data, at least one snapshot data of each measurement point is cached in the shared memory of the snapshot service process, if the measurement point supports compression, other data related to the compression calculation is also cached in the shared memory of the snapshot service process; when some measuring point snapshot data is written in, if the measuring point snapshot data is compressed, the original snapshot data can be directly covered, and if the measuring point snapshot data is not compressed, the measuring point snapshot data can be pushed to the history service process.
As a preferred scheme of the method, the historical service process is used for archiving and querying historical data, and the historical data is cached for each measuring point according to the size of a data page, and is stored in a shared memory of the historical service process; the snapshot data pushed by the snapshot service process is firstly written into the historical data cache, when the historical data cache of a certain measuring point is fully written, the historical data cache is filed and written into an archive file, and then the historical data cache is emptied to continue waiting for receiving other historical data.
As a preferred solution of the method of the present invention, the shared memories can be named, and the shared memories corresponding to the same shared memory name are the same shared memory block;
when multiple service processes hold the same shared memory block, as long as one service process has not released the shared memory block, the shared memory will not be recovered, and other service processes can acquire the access right of the shared memory block again.
In a second aspect, a system for real-time database failure recovery is provided, including:
the watchdog service process starting module is used for starting a watchdog service process of the real-time database and initializing the shared memory;
the other service process starting module is used for starting and initializing other service processes of the real-time database, wherein the other service processes of the real-time database comprise a communication service process, a basic service process, a snapshot service process and a historical service process;
the shared memory application module is used for applying the shared memory needing the hot standby data to the watchdog service process by other service processes and reading and writing the data;
the process state updating module is used for updating the process running state to the shared memory at regular time by other service processes;
the process restarting module is used for restarting the service process if the running state of a certain service process is not updated after overtime;
and the field recovery module is used for applying the restarted service process to the shared memory hosting process for retrieving data for field recovery.
In a third aspect, a computer-readable storage medium is provided, storing a computer program that, when executed by a processor, implements the method for real-time database failure recovery.
In a fourth aspect, a server is provided, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements the method for real-time database failure recovery when executing the computer program.
Compared with the prior art, the first aspect of the invention has at least the following beneficial effects:
the real-time database service process stores data needing hot standby in a shared memory, and monitors the running states of all other service processes used for service processing in the real-time database through the watchdog service process, when a certain service process crashes, the watchdog service process restarts the crashed service process, and the data is quickly recovered through the hot standby shared memory, so that the risk of data loss caused by crash of the service process is avoided, and meanwhile, the initialization speed of the restarting service process can be accelerated to quickly recover the service. According to the invention, the watchdog service process and the business service process jointly hold the shared memory, so that the quick restart of the business service process during abnormal operation or breakdown is realized, and the operating memory data before restart can be recovered, thereby realizing quick service restart and data recovery, and greatly improving the service stability and data security of the real-time database.
It is understood that the beneficial effects of the second to fourth aspects can be seen from the description of the first aspect, and are not described herein again.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.
FIG. 1 is a diagram illustrating the operational relationship between a watchdog service process and other service processes according to the present invention;
FIG. 2 is a flow chart of a method for real-time database failure recovery in accordance with the present invention;
FIG. 3 is a diagram illustrating a shared memory holding structure according to the present invention;
FIG. 4 is a diagram illustrating a naming scheme of a shared memory according to the present invention;
FIG. 5 is a flow chart of monitoring a service process according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer and clearer, the present invention is further described in detail below with reference to the following detailed description and the accompanying drawings. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict.
The invention provides a method for recovering a real-time database fault, wherein the real-time database is divided into different service processes according to functions, the service processes comprise a watchdog service process, a communication service process, a basic service process, a snapshot service process and a historical service process, and cross-process communication and coordinated operation are realized among the processes through an inter-process communication protocol (IPC).
As shown in fig. 2, in one embodiment, the method of the present invention comprises the following steps:
step S11: starting a watchdog service process of a real-time database, and initializing a shared memory block;
step S12: sequentially starting and initializing other service processes of the real-time database;
step S13: the real-time database service process applies for a shared memory needing hot standby data to the watchdog service process and reads and writes the data;
step S14: the real-time database service process collects the process running state to the shared memory service process running state area at regular time;
step S15: restarting the abnormal service process if the running state is not reported after the overtime of a certain service process is found, otherwise, turning to the step S14;
step S16: the restarted service process applies for the retrieved data from the shared memory hosting process for fast field recovery, and then goes to step S14.
As shown in fig. 3, in the schematic view of holding all service processes and shared memories in the real-time database, the watchdog service process holds all shared memories, and ensures that all shared memories are not recovered by the system after other service processes crash; the communication service process holds a service running state shared memory block S21, which is used for updating the running state of the process; the basic service process holds a service running state shared memory block S21 used for updating the running state of the process, and simultaneously holds an allocated counter shared memory block S22 and a measuring point data cache shared memory block set S23 used for storing attribute information of all measuring points; the snapshot service process holds a service running state shared memory block S21 for updating the running state of the process itself, and simultaneously holds an allocated counter shared memory block S22 and a snapshot data cache shared memory block set S24 for storing snapshot cache data; the history service process holds a service operation state shared memory block S21 for updating the operation state of its own process, and also holds an allocated counter shared memory block S22 and a history data cache shared memory block set S25 for storing history cache data.
As shown in fig. 4, the shared memory held by the service processes of the real-time database is named according to a certain rule, which is convenient for different service processes to quickly obtain the required shared memory block. The shared memory block S31 in the service running state and the shared memory block S32 in the allocated counter are globally only one, so the shared memory blocks are given fixed names, which are rtdb _ service _ status.dat and rtdb _ shared _ counter.dat; the three shared memory block sets, i.e., the measurement point table shared memory block set S33, the snapshot data cache shared memory block set S34, and the history data cache shared memory block set S35, are all composed of multiple shared memory blocks of fixed size, and according to different shared memory block sets, the shared memory block names are composed of prefixes and sequence numbers, and as long as the service process knows the shared memory block allocation counter value, all the shared memory block names of the shared memory set, such as rtdb _ snapshot _00000001.dat, rtdb _ snapshot _00000002.dat, and rtdb _ snapshot _00000003.dat … …, can be formatted
When the watchdog service process of the real-time database is started in step S11, the goal is to ensure that the watchdog service process is the first started service process of the real-time database; after the watchdog service process is started, two global shared memory blocks, i.e., a service running state shared memory block S21 and an allocated counter shared memory block S22, are initialized first, and named as S31 and S32, which are rtdb _ service _ status. The service running state shared memory block comprises five data units which are respectively used for storing running states of watchdog service processes, communication service processes, basic service processes, snapshot service processes and historical service processes, and running state information comprises running state update timestamps of the service processes, working states of important submodules and working threads and the like. The watchdog service process judges whether other service processes normally operate according to the information; the allocated counter shared memory blocks include the number of shared memory blocks allocated to the three sets of shared memory blocks, i.e., the measurement point table shared memory block set S23, the snapshot data cache shared memory block set S24, and the history data cache shared memory block set S25. The required capacity of the three shared memory blocks is different in size under different measuring point scales, the three shared memory blocks cannot be allocated enough at one time, but are allocated according to a certain fixed size unit, and when the allocated shared memory is insufficient, allocation is applied again.
Referring to fig. 1, in the step S12, after the watchdog service process is started, if it is detected that no other service process is started, the basic service process S3, the history service process S5, the snapshot service process S4, and the communication service process S2 are sequentially started, and after each service process is started in the step S13, corresponding data is loaded for initialization, which is specifically as follows:
the basic service process S3 loads the attribute information of the measure point from the measure point table file and stores the measure point attribute information in a shared memory block set applied from the watchdog service process in a Hash table form, on one hand, the snapshot service process S4 and the history service process S5 can quickly query the measure point attribute information across processes, on the other hand, the measure point table shared memory basic service process S3 and the watchdog service process S1 share, and after the basic service process S3 is crashed and restarted, the measure point data in the shared memory set can be directly obtained without secondary loading from the measure point table file again, so that the service recovery speed of the basic service process S3 is improved.
And the history service process S5 allocates a history data cache block for each measuring point from the shared memory block set applied by the watchdog service process S1, and caches the history data of the measuring point for the user. When the history service process S5 crashes and restarts, the historical cache data of the measuring point in the shared memory set can be directly obtained, thereby recovering the historical cache data before crashing and ensuring that the data is not lost.
And the snapshot service process S4 allocates a snapshot cache data block to each measurement point from the shared memory block set applied by the watchdog service process S1, and caches snapshot data for the user. When the snapshot service process S4 crashes and restarts, the point snapshot cache data in the shared memory set can be directly obtained, so as to recover the snapshot cache data before the crash, and ensure that the data is not lost.
And the communication service process S2 initializes the communication service process S2 after all other service processes are initialized, and then the starting process of the whole real-time database is completed.
In the step S14, each service process periodically updates the running state of the process to the service running state shared memory block S21, where the running state information includes a running state update timestamp of each service process, working states of the important sub-modules and threads, and the like, and when the process runs abnormally or crashes and exits, the running state stops updating, and the watchdog service process may find that the corresponding service process runs abnormally, thereby performing service recovery.
As shown in fig. 5, the running state of other service processes is detected by the watchdog service process, and when the service process is found to be abnormal or crashed, a quick service process restart and data recovery process is adopted. The method specifically comprises the following steps:
step S41: the watchdog service process detects whether the target service process exists, if the process does not exist, the step S42 is continued, otherwise, the step S43 is skipped;
step S42: the watchdog service process detects a state update flag of the target service process in the service running state shared memory block S21, and if the state update flag is not updated after time out, which indicates that the service process exists but runs abnormally, the service process needs to be forcibly exited;
step S43: the watchdog service process restarts the target service process, and after the target service process is started, the target service process first acquires a service running state shared memory block S21 and an allocated counter shared memory block S22, which are already held by the watchdog service process;
s44: if the target service process is a basic service process, acquiring a measurement point table shared memory block set S23, completing the rapid loading of the measurement point table shared memory, and recovering service; if not, continuing the next step;
s45: if the target service process is the snapshot service process, acquiring a snapshot data cache shared memory block set S24, completing the quick loading of the snapshot data cache shared memory, and recovering service; if the process is not the snapshot service process, continuing to the next step;
s46: if the target service process is a historical service process, acquiring a historical data cache shared memory block set S25, completing the quick loading of the historical data cache shared memory, and recovering service; if not, continuing the next step;
s47: if the target service process is a communication service process, initializing a TCP network server and recovering service;
s48: and completing quick recovery of the service process and corresponding data, and jumping to S41 to continue the next round of service process monitoring flow.
If the watchdog service process crashes, the steps S15 and S16 cannot be executed, and the real-time database loses the corresponding functions. In order to avoid the situation, in the implementation process of the method provided by the invention, the service process is allowed to reversely detect the running state of the watchdog service process, and if the watchdog service process is found to run abnormally or crash, the communication service process restarts the watchdog service process. After the watchdog service process is restarted, the service running state shared memory block S21 and the allocated counter shared memory block S22 are obtained, then a shared memory set among the measurement point data cache shared memory block set S23, the snapshot data cache shared memory block set S24, and the history data cache shared memory block set S25 is obtained according to the counter shared memory block S22, and service is recovered, thereby ensuring that all processes of the whole real-time database are in a monitoring state.
In another embodiment, there is also provided a system for real-time database failure recovery, comprising:
the watchdog service process starting module is used for starting a watchdog service process of the real-time database and initializing the shared memory;
the other service process starting module is used for starting and initializing other service processes of the real-time database, wherein the other service processes of the real-time database comprise a communication service process, a basic service process, a snapshot service process and a historical service process;
the shared memory application module is used for applying the shared memory needing the hot standby data to the watchdog service process by other service processes and reading and writing the data;
the process state updating module is used for updating the process running state to the shared memory at regular time by other service processes;
the process restarting module is used for restarting the service process if the running state of a certain service process is not updated after overtime;
and the field recovery module is used for applying the restarted service process to the shared memory hosting process for retrieving data for field recovery.
In another embodiment, a computer-readable storage medium is also provided, storing a computer program that, when executed by a processor, implements the method for real-time database failure recovery.
In another embodiment, a server is also provided, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements the method for real-time database failure recovery when executing the computer program.
Illustratively, the computer program may be partitioned into one or more modules/units, which are stored in a computer-readable storage medium and executed by the processor to perform the steps of the method for real-time database failure recovery of the present application. The one or more modules/units may be a series of computer-readable instruction segments capable of performing certain functions, which are used to describe the execution of the computer program in the server.
The server can be a notebook computer, a desktop computer, a cloud server and other computing devices. The server may include, but is not limited to, a processor, a memory. Those skilled in the art will appreciate that the server may also include more or fewer components, or some components in combination, or different components, e.g., the server may also include input output devices, network access devices, buses, etc.
The Processor may be a CentraL Processing Unit (CPU), other general purpose Processor, a DigitaL SignaL Processor (DSP), an AppLication Specific Integrated Circuit (ASIC), an off-the-shelf ProgrammabLe Gate Array (FPGA) or other ProgrammabLe logic device, discrete Gate or transistor logic device, discrete hardware component, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The storage may be an internal storage unit of the server, such as a hard disk or a memory of the server. The memory may also be an external storage device of the server, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure DigitaL (SD) Card, a FLash memory Card (FLash Card), or the like provided on the server. Further, the memory may also include both an internal storage unit of the server and an external storage device. The memory is used to store the computer readable instructions and other programs and data needed by the server. The memory may also be used to temporarily store data that has been output or is to be output.
It should be noted that, for the information interaction, execution process, and other contents between the above-mentioned devices/units, the specific functions and technical effects thereof are based on the same concept as those of the method embodiment, and specific reference may be made to the part of the method embodiment, which is not described herein again.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, all or part of the processes in the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium and can implement the steps of the embodiments of the methods described above when the computer program is executed by a processor. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer readable medium may include at least: any entity or device capable of carrying computer program code to a photographing apparatus/terminal apparatus, a recording medium, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), an electrical carrier signal, a telecommunications signal, and a software distribution medium. Such as a usb-disk, a removable hard disk, a magnetic or optical disk, etc.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims (10)

1. A method for real-time database failure recovery, comprising the steps of:
starting a watchdog service process of a real-time database, and initializing a shared memory;
starting and initializing other service processes of a real-time database, wherein the other service processes of the real-time database comprise a communication service process, a basic service process, a snapshot service process and a historical service process;
other service processes apply for the shared memory needing the hot standby data to the watchdog service process and read and write the data;
other service processes update the process running state to the shared memory at regular time;
if the running state of a certain service process is not updated after overtime, restarting the service process;
and the restarted service process applies for retrieving data from the shared memory hosting process for field recovery.
2. The method for real-time database failure recovery according to claim 1, wherein: on one hand, the watchdog service process monitors other service processes, and if abnormity or crash is found, the corresponding service process is automatically restarted; on the other hand, all shared memories needing hot standby data are managed, so that the shared memories are not recycled when other service processes are abnormally quitted;
the watchdog service process can be detected backwards and restarted if an exception or crash is found.
3. The method for real-time database failure recovery according to claim 1, wherein: and the communication service process forwards the message to a corresponding service process for processing according to the function called by the API, and returns a response message to the API caller.
4. The method for real-time database failure recovery according to claim 1, wherein: and the basic service process loads the measuring point table and stores the measuring point table in a long-term memory in a storage form of a shared memory Hash table for other service processes to inquire.
5. The method for real-time database failure recovery according to claim 1, wherein: the snapshot service process is used for realizing the caching and compression of snapshot data, at least one snapshot data of each measuring point is cached in a shared memory of the snapshot service process, and if the measuring point supports the compression, other data related to the compression calculation are cached in the shared memory of the snapshot service process; when some measuring point snapshot data is written in, if the measuring point snapshot data is compressed, the original snapshot data can be directly covered, and if the measuring point snapshot data is not compressed, the measuring point snapshot data can be pushed to the history service process.
6. The method for real-time database failure recovery according to claim 1, wherein: the historical service process is used for archiving and inquiring historical data, and caching the historical data of each measuring point according to the size of a data page, wherein the historical data cache is stored in a shared memory of the historical service process; the snapshot data pushed by the snapshot service process is firstly written into the historical data cache, when the historical data cache of a certain measuring point is fully written, the historical data cache is filed and written into an archive file, and then the historical data cache is emptied to continue waiting for receiving other historical data.
7. The method for real-time database failure recovery according to claim 1, wherein: the shared memories can be named, and the shared memories corresponding to the same shared memory name are the same shared memory block;
when multiple service processes hold the same shared memory block, as long as one service process has not released the shared memory block, the shared memory will not be recovered, and other service processes can acquire the access right of the shared memory block again.
8. A system for real-time database failure recovery, comprising:
the watchdog service process starting module is used for starting a watchdog service process of the real-time database and initializing the shared memory;
the other service process starting module is used for starting and initializing other service processes of the real-time database, wherein the other service processes of the real-time database comprise a communication service process, a basic service process, a snapshot service process and a historical service process;
the shared memory application module is used for applying the shared memory needing the hot standby data to the watchdog service process by other service processes and reading and writing the data;
the process state updating module is used for updating the process running state to the shared memory at regular time by other service processes;
the process restarting module is used for restarting the service process if the running state of a certain service process is not updated after overtime;
and the field recovery module is used for applying the restarted service process to the shared memory hosting process for retrieving data for field recovery.
9. A computer-readable storage medium, storing a computer program, wherein the computer program, when executed by a processor, implements a method for real-time database failure recovery as claimed in any one of claims 1 to 7.
10. A server comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor when executing the computer program implements a method for real-time database failure recovery as claimed in any one of claims 1 to 7.
CN202111264932.3A 2021-10-28 2021-10-28 Method, system, storage medium and server for real-time database fault recovery Pending CN113986594A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111264932.3A CN113986594A (en) 2021-10-28 2021-10-28 Method, system, storage medium and server for real-time database fault recovery

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111264932.3A CN113986594A (en) 2021-10-28 2021-10-28 Method, system, storage medium and server for real-time database fault recovery

Publications (1)

Publication Number Publication Date
CN113986594A true CN113986594A (en) 2022-01-28

Family

ID=79743648

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111264932.3A Pending CN113986594A (en) 2021-10-28 2021-10-28 Method, system, storage medium and server for real-time database fault recovery

Country Status (1)

Country Link
CN (1) CN113986594A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114443340A (en) * 2022-01-29 2022-05-06 亿咖通(湖北)技术有限公司 Abnormal process processing method and device and server
CN115862208A (en) * 2022-11-30 2023-03-28 广州广电运通智能科技有限公司 Service processing method, equipment and storage medium for rail transit gate software
CN116055285A (en) * 2023-03-27 2023-05-02 西安热工研究院有限公司 Process management method and system of industrial control system

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114443340A (en) * 2022-01-29 2022-05-06 亿咖通(湖北)技术有限公司 Abnormal process processing method and device and server
CN115862208A (en) * 2022-11-30 2023-03-28 广州广电运通智能科技有限公司 Service processing method, equipment and storage medium for rail transit gate software
CN116055285A (en) * 2023-03-27 2023-05-02 西安热工研究院有限公司 Process management method and system of industrial control system
CN116055285B (en) * 2023-03-27 2023-06-16 西安热工研究院有限公司 Process management method and system of industrial control system

Similar Documents

Publication Publication Date Title
CN113986594A (en) Method, system, storage medium and server for real-time database fault recovery
CN107133234B (en) Method, device and system for updating cache data
CN113849339B (en) Method, device and storage medium for restoring running state of application program
RU2653254C1 (en) Method, node and system for managing data for database cluster
CN111125040B (en) Method, device and storage medium for managing redo log
CN111177143B (en) Key value data storage method and device, storage medium and electronic equipment
CN110196759B (en) Distributed transaction processing method and device, storage medium and electronic device
CN110019063B (en) Method for computing node data disaster recovery playback, terminal device and storage medium
CN111176584A (en) Data processing method and device based on hybrid memory
CN109726211B (en) Distributed time sequence database
CN111309548A (en) Timeout monitoring method and device and computer readable storage medium
WO2023197904A1 (en) Data processing method and apparatus, computer device and storage medium
CN117131014A (en) Database migration method, device, equipment and storage medium
CN111694806A (en) Transaction log caching method, device, equipment and storage medium
WO2023115935A1 (en) Data processing method, and related apparatus and device
CN115268767A (en) Data processing method and device
US9471409B2 (en) Processing of PDSE extended sharing violations among sysplexes with a shared DASD
CN114003612A (en) Processing method and processing system for abnormal conditions of database
CN113448758A (en) Task processing method and device and terminal equipment
US10866756B2 (en) Control device and computer readable recording medium storing control program
CN109857523B (en) Method and device for realizing high availability of database
CN112131433B (en) Interval counting query method and device
EP4123470A1 (en) Data access method and apparatus
CN115016740B (en) Data recovery method and device, electronic equipment and storage medium
US11334450B1 (en) Backup method and backup system for virtual machine

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination