CN101794242B - Fault-tolerant computer system data comparing method serving operating system core layer - Google Patents

Fault-tolerant computer system data comparing method serving operating system core layer Download PDF

Info

Publication number
CN101794242B
CN101794242B CN201010103349XA CN201010103349A CN101794242B CN 101794242 B CN101794242 B CN 101794242B CN 201010103349X A CN201010103349X A CN 201010103349XA CN 201010103349 A CN201010103349 A CN 201010103349A CN 101794242 B CN101794242 B CN 101794242B
Authority
CN
China
Prior art keywords
data
syner
list
message
kernel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201010103349XA
Other languages
Chinese (zh)
Other versions
CN101794242A (en
Inventor
张兴军
董小社
雷济凯
胡冰
王恩东
胡雷钧
孙江斌
张东
田佳
赵晓昳
伍卫国
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong High-End Server & Storage Research Institute
Xian Jiaotong University
Original Assignee
Shandong High-End Server & Storage Research Institute
Xian Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong High-End Server & Storage Research Institute, Xian Jiaotong University filed Critical Shandong High-End Server & Storage Research Institute
Priority to CN201010103349XA priority Critical patent/CN101794242B/en
Publication of CN101794242A publication Critical patent/CN101794242A/en
Application granted granted Critical
Publication of CN101794242B publication Critical patent/CN101794242B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Hardware Redundancy (AREA)

Abstract

The invention relates to a fault-tolerant computer system data comparing method serving an operating system core layer. The data comparing service is provided for a double-module redundancy process in a fault-tolerant computer system by starting a core daemon process in a Linux operating system and executing data comparator logics. An event list is added in a core to be used as a message passage; the redundancy process and a data comparator are operated in a production-consumption way; the redundancy process packages the data to be written into a message packet to be inserted into a message list; and the comparator takes the message packet from the message list, analyzes the message packet according to a definition format, compares the data to be written of the redundancy process and finally returns a result to the redundancy process. The invention is realized in the operating system core layer, has no customized hardware, simple realization and good generality and is suitable for a process-grade double-module redundancy fault-tolerant system based on a common hardware structure. All comparing logics are automatically finished in the operating system core layer without participation of an application program and have favorable transparency on application.

Description

Serve the fault-tolerant computer system data comparing method of operating system kernel layer
Technical field
The invention belongs to computer realm, relate to Fault-tolerant Technique and data comparison techniques, particularly a kind of fault-tolerant computer system data comparing method of serving the operating system kernel layer.
Background technology
Develop rapidly along with computing machine, Internet technology; The information-based every aspect that has been deep into society; Computer technology has greatly changed people's life style at aspects such as increasing work efficiency, promote information interchange; But also make people that it has been produced increasing dependence simultaneously, the fault of a computer system possibly brought the loss that can't estimate.Concerning those need ensure information safety and provide the mechanism of uninterrupted information service, for example security, manufacturing, communication, bank, transportation, it is particularly important that the reliability of operation system and continuity seem.How to improve the reliability and availability of computer system, thereby ensure that various key application continue operation, reach sustainable benign cycle, become a major issue of message area.Fault-tolerant computer and correlation technique are arisen at the historic moment under this objective demand just, utilize the fault-tolerant calculation function to avoid the ten hundreds of economic loss that causes because of server failure.
Fault-tolerant computer is on the basis of redundant resource (hardware redundancy, time redundancy, information redundancy, software redundancy), through architecture reasonable in design, under effective management of system software and highly reliable, the high available computers that forms.Fault detect is to realize one of gordian technique of fault-tolerant computer system, and is the wrong main means of finding to comparison, the voting of task data.
Comparison, voting to data mainly contain based on hardware with based on the software dual mode.Hardware based method increases comparable chip in system, comprise in the chip relatively or the logic of voting, and all data to be write out are compared, decide by vote, and this mode is found in time wrong, but complex design realizes that cost is high.Method based on software is provided with comparison, voting point in built-in function or application program; The intermediate result of task is carried out consistance with last output to be judged; This mode system design is simple, but poor to using the transparency, has brought extra burden for programming personnel and user.
Summary of the invention
The objective of the invention is to shortcoming and defect to above-mentioned prior art; A kind of fault-tolerant computer system data comparing method of serving the operating system kernel layer is provided; The present invention can carry out consistance relatively to the state and the data result of redundant task in the tolerant system, writes down the synchronous comparison information of redundant task simultaneously.
In order to realize above-mentioned task, the present invention adopts following technical solution: in the (SuSE) Linux OS kernel, create kernel state finger daemon ft_syner, carry out comparator logic, relatively serve for redundant process provides data; Redundant process is when carrying out write operation; Be ready to treat write data respectively; Be the message bag by host process with data encapsulation to be written again, then the message bag be added into redundant process and data comparator communication port, and initiatively wake data comparator ft_syner up and carry out data relatively; Data comparator ft_syner accomplish data relatively after, redundant process obtains comparative result through the comparative result field in the detect-message bag.
Described communication port, implementation is following: in the (SuSE) Linux OS kernel, create incident chained list ft_syner_event_list, redundant process realizes communicating by letter through incident chained list ft_syner_event_list with data comparator.
Data comparator and redundant process are worked with Survivor-consumer's mode; Redundant process is put data to be compared for behind the message bag in order by protocol format and is articulated among the incident chained list ft_syner_event_list; Comparer takes off the message bag from this incident chained list ft_syner_event_list, therefrom extract data message and compare.
The form of described message bag is:
typedef?struct{
struct?list_head?list;
short?ft_msg_type;
struct?task_struct*p1;
struct?task_struct*p2;
void*master_data;
long?master_data_len;
void*slave_data;
long?slave_data_len;
short?error;
}ft_syner_event_msg;
Wherein, list is a linked list head, is used for the message bag is hung the incoming event chained list, and adopting list_head is linux kernel universal chain list structure, and the insertion of message bag, deletion action use list_add () and the list_del () in the kernel to accomplish;
Ft_msg_type is a type of message, and which kind of system call expression message bag comes from, and occurrence defines as follows:
#define FT_WRITE 1 //write () system call
#define FT_WRITEV 2 //writev () system call
#define FT_SEND 3 //send () system call
#define FT_SENDTO 4 //sendto () system call
#define FT_SENDMSG 5 //sendmsg () system call
Comparer judges according to the value of ft_msg_type type of message the message bag from which kind of system call produces, and above-mentioned definition can be according to requirements extend or reduction;
P1, p2 is for generating the right process control block (PCB) pointer of redundant process of this message bag, and p1 is a host process, and p2 is from process, and comparer obtains the right process control block (PCB) of redundant process through these two pointers;
Master_data is the data buffer pointer of redundant process centering host process, and master_data_len is a buffer length;
Slave_data is the data buffer pointer of redundant process centering from process, and slave_data_len is a buffer length;
Error writes down comparative result, and value is 1 expression data consistent, and value is that 0 expression data are inconsistent.
Described data comparator is a kernel state finger daemon ft_syner who in the (SuSE) Linux OS kernel, creates, and its resident operating system kernel is carried out comparator logic, relatively serves for redundant process provides data.Kernel state finger daemon ft_syner is in waiting status when idle; When being arranged, task can be waken up by redundant process; Or self periodic wakeup, after kernel state finger daemon ft_syner is waken up at every turn, traversal incident chained list ft_syner_event_list; Each the message bag that takes off in the chained list is resolved, and data are compared.After having resolved all the message bags in the current event chained list; Kernel state finger daemon ft_syner call function sleep_on_timeout () gets into waiting status; It is 5 seconds that latent period is set in this function, wakes kernel state finger daemon ft_syner after the stand-by period uses up up and gets into the next round traversal.
Described redundant process need increase principal and subordinate's attribute; Redundant process is when carrying out write operation; Being ready to treat write data respectively, is the message bag by host process with data encapsulation to be written again, through list_add ((msg->list); &ft_syner_event_list) the message bag is added into incident chained list ft_syner_event_list; And initiatively wake kernel state finger daemon ft_syner up, kernel state finger daemon ft_syner accomplish data relatively after, redundant process can obtain comparative result through detecting msg->error.
Data comparator of the present invention provides data relatively to serve with the mode of kernel finger daemon, and redundant process is being called write (), writev (); Send (); Sendto () during the relevant write operation of sendmsg (), utilizes the comparison of treating write data in the service complete operation of data comparator.Promptly through in (SuSE) Linux OS, starting kernel finger daemon ft_syner, this process is carried out the data comparator logic, for the duplication redundancy process in the fault-tolerant computer system provides data relatively to serve.The incident chained list ft_syner_event_list that adds in the kernel is as message channel; Redundant process and data comparator are worked with the mode of production-consumption; Redundant process is inserted the message chained list with data encapsulation to be written for the message bag; Comparer takes off, resolves the message bag and accomplishes data relatively from the message chained list, at last the result is returned to redundant process.The data that the present invention has accomplished redundant process in the bimodulus tolerant system at the operating system kernel layer with the mode of software succinctly, reliably compare.This method realizes at the operating system kernel layer; Need not customize by hardware, be applicable to the process level duplication redundancy tolerant system based on the common hardware framework, all logics are all accomplished at the operating system kernel layer automatically; Need not application program participate in, corresponding to having the good transparency.
Description of drawings
Fig. 1 is the workflow diagram of data comparator among the present invention;
Fig. 2 is the interaction concept figure of redundant process among the present invention and data comparator.
Embodiment
Below in conjunction with accompanying drawing the present invention is done further explain.
Method of the present invention is following:
In the (SuSE) Linux OS kernel, create kernel state finger daemon ft_syner, carry out comparator logic, relatively serve for redundant process provides data; Redundant process is when carrying out write operation; Be ready to treat write data respectively; Be the message bag by host process with data encapsulation to be written again, then the message bag be added into redundant process and data comparator communication port, and initiatively wake data comparator ft_syner up and carry out data relatively; Data comparator ft_syner accomplish data relatively after, redundant process obtains comparative result through the comparative result field in the detect-message bag.
Described communication port, implementation is following: in the (SuSE) Linux OS kernel, create incident chained list ft_syner_event_list, redundant process realizes communicating by letter through incident chained list ft_syner_event_list with data comparator.
Data comparator and redundant process are worked with Survivor-consumer's mode; Redundant process is put data to be compared for behind the message bag in order by protocol format and is articulated among the incident chained list ft_syner_event_list; Comparer takes off the message bag from this incident chained list ft_syner_event_list, therefrom extract data message and compare.
The form of described message bag is:
typedef?struct{
struct?list_head?list;
short?ft_msg_type;
struct?task_struct*p1;
struct?task_struct*p2;
void*master_data;
long?master_data_len;
void*slave_data;
long?slave_data_len;
short?error;
}ft_syner_event_msg;
Wherein, list is a linked list head, is used for the message bag is hung the incoming event chained list, and adopting list_head is linux kernel universal chain list structure, and the insertion of message bag, volume remove the list_add () and the list_del () that manipulate in the kernel and accomplish;
Ft_msg_type is a type of message, and which kind of system call expression message bag comes from, and occurrence defines as follows:
#define FT_WRITE 1 //write () system call
#define FT_WRITEV 2 //writev () system call
#define FT_SEND 3 //send () system call
#define FT_SENDTO 4 //sendto () system call
#define FT_SENDMSG 5 //sendmsg () system call
Comparer judges according to the value of ft_msg_type type of message the message bag from which kind of system call produces, and above-mentioned definition can be according to requirements extend or reduction;
P1, p2 is for generating the right process control block (PCB) pointer of redundant process of this message bag, and p1 is a host process, and p2 is from process, and comparer obtains the right process control block (PCB) of redundant process through these two pointers;
Master_data is the data buffer pointer of redundant process centering host process, and master_data_len is a buffer length;
Slave_data is the data buffer pointer of redundant process centering from process, and slave_data_len is a buffer length;
Error writes down comparative result, and value is 1 expression data consistent, and value is that 0 expression data are inconsistent.
Described data comparator is a kernel state finger daemon ft_syner who in the (SuSE) Linux OS kernel, creates, and its resident operating system kernel is carried out comparator logic, relatively serves for redundant process provides data.Kernel state finger daemon ft_syner is in waiting status when idle; When being arranged, task can be waken up by redundant process; Or self periodic wakeup, after kernel state finger daemon ft_syner is waken up at every turn, traversal incident chained list ft_syner_event_list; Each the message bag that takes off in the chained list is resolved, and data are compared.After having resolved all the message bags in the current event chained list, ft_syner call function sleep_on_timeout () gets into waiting status, and it is 5 seconds that latent period is set in this function, wakes ft_syner after the stand-by period uses up up and gets into the next round traversal.
Described redundant process need increase principal and subordinate's attribute; Redundant process is when carrying out write operation; Being ready to treat write data respectively, is the message bag by host process with data encapsulation to be written again, through list_add ((msg->list); &ft_syner_event_list) the message bag is added into incident chained list ft_syner_event_list; And initiatively wake data comparator ft_syner up, data comparator ft_syner accomplish data relatively after, redundant process can obtain comparative result through detecting msg->error.
The workflow of data comparator shown in Figure 1 is:
(1) comparer process ft_syner calls the spin lock that spin_lock () obtains the message chained list;
(2) judge whether message chained list ft_syner_event_list is empty, if be idle running (6), if be not idle running (3);
(3) obtain a message bag msg in the chained list, resolve this message bag and accomplish data relatively;
(4) use the list_add () in the kernel that the message bag of handling is deleted from the message chained list;
(5), change (3), otherwise change (6) if also have untreated residue message bag in the message chained list;
(6) call the spin lock of spin_unlock () release message chained list;
(7) comparer process ft_syner calls sleep_on_timeout () and gets into the sleep wait;
(8) process ft_syner arrives passive the waking up in back in latency time period, or is initiatively waken up by redundant process;
(9) judgement symbol position finish, if finish is 1, expression receives the request that finishes the comparer service, ft_syner finishes, if finish is 0, changes (1) and gets into the next round service.
Fig. 2 has showed the reciprocal process of redundant process and data comparator.This figure is operating as example with write (), and redundant process P1, P2 need carry out data relatively when carrying out write () operation, and redundant process in message bag msg, and is inserted into the message chained list with data encapsulation to be compared, wakes data comparator up.Comparer takes off the message bag from the message chained list, resolve the message bag by formal definition, and two piece of data in the bag are carried out consistance relatively, and comparative result is deposited among msg->error.Last redundant process is obtained comparative result through the value of checking msg->error.

Claims (1)

1. serve the operating system kernel layer system data comparing method of fault-tolerant computer; It is characterized in that; At first in the (SuSE) Linux OS kernel, create kernel state finger daemon ft_syner, it act as the execution comparator logic, relatively serves for redundant process provides data; Secondly after redundant process is ready to respectively treat write data in execution write operation process; Host process is the message bag with these data encapsulation to be written and the message bag is added in redundant process and the data comparator communication port, initiatively wakes data comparator simultaneously up and carries out the data comparison; The final data comparer is accomplished data relatively, and redundant process obtains comparative result through the comparative result field in the detect-message bag; Described communication port, implementation is following: in the (SuSE) Linux OS kernel, create incident chained list ft_syner_event_list, redundant process realizes communicating by letter through incident chained list ft_syner_event_list with data comparator; Data comparator and redundant process are worked with Survivor-consumer's mode; Redundant process is put data to be compared for behind the message bag in order by protocol format and is articulated among the incident chained list ft_syner_event_list, and comparer takes off the message bag and therefrom extracts data message and compares from this incident chained list ft_syner_event_list; Described data comparator is a kernel state finger daemon ft_syner who in the (SuSE) Linux OS kernel, creates, its resident operating system kernel; Carry out comparator logic, relatively serve for redundant process provides data, kernel state finger daemon ft_syner is in waiting status when idle; When being arranged, task waken up by redundant process; Or self periodic wakeup, after kernel state finger daemon ft_syner is waken up at every turn, traversal incident chained list ft_syner_event_list; Each the message bag that takes off in the chained list is resolved; And data are compared, resolved all the message bags in the current event chained list after, kernel state finger daemon ft_syner call function sleep_on_timeout gets into waiting status; It is 5 seconds that latent period is set in this function, wakes kernel state finger daemon ft_syner after the stand-by period uses up up and gets into the next round traversal; Described redundant process need increase principal and subordinate's attribute; Redundant process is ready to treat write data respectively when carrying out write operation, be the message bag by host process with data encapsulation to be written again; Through function list_add the message bag is added into incident chained list ft_syner_event_list; And initiatively wake kernel state finger daemon ft_syner up, kernel state finger daemon ft_syner accomplish data relatively after, redundant process obtains comparative result through the value that detects msg.
CN201010103349XA 2010-01-29 2010-01-29 Fault-tolerant computer system data comparing method serving operating system core layer Expired - Fee Related CN101794242B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201010103349XA CN101794242B (en) 2010-01-29 2010-01-29 Fault-tolerant computer system data comparing method serving operating system core layer

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201010103349XA CN101794242B (en) 2010-01-29 2010-01-29 Fault-tolerant computer system data comparing method serving operating system core layer

Publications (2)

Publication Number Publication Date
CN101794242A CN101794242A (en) 2010-08-04
CN101794242B true CN101794242B (en) 2012-07-18

Family

ID=42586952

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201010103349XA Expired - Fee Related CN101794242B (en) 2010-01-29 2010-01-29 Fault-tolerant computer system data comparing method serving operating system core layer

Country Status (1)

Country Link
CN (1) CN101794242B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102323900B (en) * 2011-08-31 2014-03-26 国家计算机网络与信息安全管理中心 System fault tolerance mechanism based on dynamic sensing for many-core environment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6088817A (en) * 1993-11-26 2000-07-11 Telefonaktiebolaget Lm Ericsson Fault tolerant queue system
CN101000561A (en) * 2006-12-20 2007-07-18 中国电子科技集团公司第十四研究所 Implementing method of multi-machine fault-tolerance system kermel
CN101369241A (en) * 2007-09-21 2009-02-18 中国科学院计算技术研究所 Cluster fault-tolerance system, apparatus and method
CN101383690A (en) * 2008-10-27 2009-03-11 西安交通大学 Grid synchronization method for fault tolerant computer system based on socket

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6088817A (en) * 1993-11-26 2000-07-11 Telefonaktiebolaget Lm Ericsson Fault tolerant queue system
CN101000561A (en) * 2006-12-20 2007-07-18 中国电子科技集团公司第十四研究所 Implementing method of multi-machine fault-tolerance system kermel
CN101369241A (en) * 2007-09-21 2009-02-18 中国科学院计算技术研究所 Cluster fault-tolerance system, apparatus and method
CN101383690A (en) * 2008-10-27 2009-03-11 西安交通大学 Grid synchronization method for fault tolerant computer system based on socket

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
附图1.

Also Published As

Publication number Publication date
CN101794242A (en) 2010-08-04

Similar Documents

Publication Publication Date Title
CN106850260A (en) A kind of dispositions method and device of virtual resources management platform
US8990617B2 (en) Fault-tolerant computer system, fault-tolerant computer system control method and recording medium storing control program for fault-tolerant computer system
CN101383690B (en) Grid synchronization method for fault tolerant computer system based on socket
Bouteiller et al. Correlated set coordination in fault tolerant message logging protocols
CN107634855A (en) A kind of double hot standby method of embedded system
CN102591759B (en) Clock precision parallel simulation system for on-chip multi-core processor
CN103064770B (en) Dual-process redundancy transient fault tolerating method
CN109189860A (en) A kind of active and standby increment synchronization method of MySQL based on Kubernetes system
TWI522794B (en) Energy-efficient nonvolatile microprocessor
US11656902B2 (en) Distributed container image construction scheduling system and method
CN104205755A (en) Method, device, and system for delaying packets during a network-triggered wake of a computing device
CN103455393A (en) Fault tolerant system design method based on process redundancy
Chen et al. Replication-based fault-tolerance for large-scale graph processing
Bouteiller et al. Correlated set coordination in fault tolerant message logging protocols for many‐core clusters
CN111221662B (en) Task scheduling method, system and device
CN101794242B (en) Fault-tolerant computer system data comparing method serving operating system core layer
CN112367186B (en) Fault protection method and device based on 0penStack bare computer
Camargos et al. Multicoordinated paxos
CN103593251A (en) Fault-tolerant system based on process redundancy and design method thereof
JP2010534888A (en) High integrity and high availability computer processing module
WO2023185335A1 (en) Crash clustering method and apparatus, electronic device and storage medium
CN106775964A (en) The operating system framework and method for scheduling task of time/event mixing triggering
EP4170519A1 (en) Data synchronization method and device
TW201113696A (en) Test method and tool for master-slave systems on multicore processors
Bowles et al. A formal model for integrating multiple views

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20120718

Termination date: 20150129

EXPY Termination of patent right or utility model