CN112231179A - Member and task integrated management system - Google Patents

Member and task integrated management system Download PDF

Info

Publication number
CN112231179A
CN112231179A CN202011227150.8A CN202011227150A CN112231179A CN 112231179 A CN112231179 A CN 112231179A CN 202011227150 A CN202011227150 A CN 202011227150A CN 112231179 A CN112231179 A CN 112231179A
Authority
CN
China
Prior art keywords
task
module
node
computer
state
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011227150.8A
Other languages
Chinese (zh)
Inventor
程俊强
段小虎
杨菊平
边庆
刘帅
陈益
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Aeronautics Computing Technique Research Institute of AVIC
Original Assignee
Xian Aeronautics Computing Technique Research Institute of AVIC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Aeronautics Computing Technique Research Institute of AVIC filed Critical Xian Aeronautics Computing Technique Research Institute of AVIC
Priority to CN202011227150.8A priority Critical patent/CN112231179A/en
Publication of CN112231179A publication Critical patent/CN112231179A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3027Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a bus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/28Supervision thereof, e.g. detecting power-supply failure by out of limits supervision
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3013Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is an embedded system, i.e. a combination of hardware and software dedicated to perform a certain function in mobile devices, printers, automotive or aircraft systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3051Monitoring arrangements for monitoring the configuration of the computing system or of the computing system component, e.g. monitoring the presence of processing resources, peripherals, I/O links, software programs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3055Monitoring arrangements for monitoring the status of the computing system or of the computing system component, e.g. monitoring if the computing system is on, off, available, not available

Abstract

The application provides a member and task integrated management system, the high-safety distributed computer system (101) comprises N computer nodes (103), each computer node (103) comprises a bus network end node (104) and a member and task integrated management module (105), wherein: the bus network end nodes (104) are connected with the bus network system (102) and are used for communicating with the bus network system (102) and transmitting member and task comprehensive management data packets by adopting a time trigger protocol; the member and task integrated management module (105) of each computer node (103) comprises: the system comprises an integrated management core module (10), a local state monitoring module (20), a node task configuration detection module (30), an information receiving and checking decoding module (40), an information checking coding and sending module (50), a node fault safety module (60), an other machine task state monitoring module (70) and a software interface module (80).

Description

Member and task integrated management system
Technical Field
The application relates to the field of high-reliability embedded computer design, in particular to a member and task integrated management system.
Background
The high security requirement of the computer system in the high security field can be satisfied only by the computer system which needs a plurality of redundant computers to execute the same task to form redundancy, mutual voting and fault judgment must be carried out between the computers, and the fault module is used for transmitting self effective information and fault judgment information between the computers, so that the reliability of the fault module must be far higher than that of the computer system, and the traditional channel fault module design is suitable for the redundancy single-task redundancy computer system based on channel degradation in resource concentration such as calculation control and the like. However, with the technical progress, the embedded field gradually applies the distributed computer system, different computing and control resources are distributed at different computer nodes, each computer node has the capability of executing tasks, but based on the security requirement, not all computer nodes are required to execute the same task, but when a computer node fails, other computer nodes are allowed to perform task migration after sensing, the failure module design based on the channel cannot be applied to the distributed computer system, and the integrated management of computer node members and tasks must be performed according to the computer node state information and the task state information of each computer node, mainly because: 1. the state transmission of the fault module based on the channel uses the discrete quantity with simple circuit and high reliability, the discrete quantity of the fault module required by a single channel is about the nth power of 2, the required resource is exponentially increased along with the increase of the number of the channels, therefore, the state transmission system can only be used for a centralized computer system, the number of redundant computer systems based on channel degradation is usually 2-4, the number of computer nodes participating in the fault module of the distributed computer system can reach dozens, the state quantity is transmitted in the same way, the transmission line is exponentially increased, the reliability of the fault module system and the corresponding circuit is far lower than that of the computer system, the transmitted state result can not represent the input state, and therefore, the member management and task management system is new and suitable for the distributed computer system, new mechanisms must be employed and their certainty and integrity ensured; 2. the fault module based on the channel only has one layer, namely a computer channel, the tasks of all computers are the same, only one layer needs to be considered, but the multitask and the reconstruction among the tasks of the distributed computer system are important characteristics, so that a new member management and task management system suitable for the distributed computer system needs to adopt a new mechanism to adapt to the multi-layer comprehensive management of the members and the tasks of the computers.
Disclosure of Invention
In order to solve the above technical problems, the present application provides a member and task integrated management system, which can complete indication and approval information of computer node states and task states between computer nodes, can complete dynamic adaptation according to reconstruction of computer nodes and tasks, and can support integrated management requirements of member and task of multiple computer nodes and multiple tasks of a high-security distributed computer system.
The application provides a member and task integrated management system, the high-safety distributed computer system (101) comprises N computer nodes (103), wherein N is not less than 3, each computer node (103) comprises a bus network end node (104) and a member and task integrated management module (105), wherein:
the bus network end node (104) of each computer node (103) is connected with the bus network system (102) and is used for communicating with the bus network system (102) and carrying out member and task comprehensive management data packet transmission by adopting a time trigger protocol;
the member and task integrated management module (105) of each computer node (103) comprises: the system comprises an integrated management core module (10), a local state monitoring module (20), a BIT test result detection module (21), a power supply monitoring result detection module (22), a task running state detection module (23), a node task configuration detection module (30), a node ID test module (31), a task ID test module (32), a node-task-cycle configuration table (33), an information receiving and checking decoding module (40), an information checking coding and sending module (50), a node fault safety module (60), an other machine task state monitoring module (70) and a software interface module (80).
Specifically, the member and task integrated management data packet includes a computer ID including a check bit, computer node state information, an ID of each task, state information of each task, a frame period count of each task, and a data load area check code.
Specifically, the integrated management core module (10) is configured to receive state information of a computer node and state information of each task that is run and sent by the local state monitoring module (20), a computer node task configuration detection result sent by the node task configuration detection module (30), and other computer node information sent by the information receiving and checking decoding module (40); generating state information of computer nodes and state information of each task of the computer nodes; simultaneously sending the state information of the computer nodes and the state information of each task of the computer nodes to a node failure safety module (60);
the other computer node information includes computer node state information of other computer nodes, state information of tasks run by the computer nodes, computer node state information indicating the computer node, and state information indicating the computer node tasks.
Specifically, the local state monitoring module (20) is configured to receive a BIT test result of the computer node state and a BIT test result of each task sent by the BIT test result detection module (21), a power supply monitoring test result of the computer node state and a power supply monitoring test result of each task sent by the power supply monitoring result detection module (22), a computer node operation state monitoring result sent by the task operation state detection module (23), and an operation state monitoring result of each task operated by the computer node; generating a computer node self state and a task state of the computer node; sending the self state and task state of the computer node to an integrated management core module (10);
the power supply monitoring result detection module (22) comprehensively obtains power supply monitoring test results of computer node states and power supply monitoring test results of tasks according to power supply monitoring test items, influence ranges and corresponding monitoring test results of all power supplies executed by software and hardware, and sends the power supply monitoring test results of the computer node states and the power supply monitoring test results of all tasks to the local state monitoring module (20);
the task running state detection module (23) comprises a frame counting accumulation judgment module and a timeout judgment module which are configured on all task sharing functional parts, and a respective frame counting accumulation judgment module and a respective timeout judgment module which are configured on each task independent functional part respectively, obtains a computer node running state monitoring result and a running state monitoring result of each task run by a computer node by adopting a frame counting accumulation judgment method and a timeout judgment method, and sends the computer node running state monitoring result and the running state monitoring result of each task run by the computer node to a local state monitoring module (20).
Specifically, the node task configuration detection module (30) reads all task IDs corresponding to the computer node ID from the node-task-period configuration table (33) according to the computer node ID and the computer node ID state given by the node ID test module (31) and the task ID state given by the task ID test module (32), and the computer node ID and the task ID state given by the task ID test module (32), and matches the task IDs given by the task ID test module (32). The inconsistency of matching, the error of the ID state of the computer node and the error of the ID state of the task represent task configuration detection faults, the configuration detection fault result is sent to the comprehensive management core module (10), and the configuration detection fault result and the task ID are sent to the information receiving and checking decoding module (40); and a node-task-period configuration table (33) for storing valid computer node IDs of the respective computer nodes of the distributed computer system, task IDs of all executable tasks configured by the computer node IDs, reconstruction priorities of different tasks, execution periods of the respective tasks, and the like.
Specifically, the information receiving and checking decoding module (40) receives a configuration detection fault result and a task ID given by the node task configuration detection module (30) to obtain a valid task ID, screens a data load area of a member and task comprehensive management data packet received by the bus network end node (104) by using a time trigger protocol, only screens computer node state information of other computer nodes which are the same as the task ID of the computer node and task state information of the same task ID, then checks and decodes the screened information, performs matching detection according to a node-task-period configuration table (33), and only retains correct information; according to the screened correct information, obtaining computer node state information of other computer nodes, state information of tasks run by the computer nodes of other computer nodes, computer node state information indicating the computer node and state information indicating the computer node task; sending the obtained information to an integrated management core module (10); and meanwhile, the information receiving and checking decoding module (40) screens out a task period of the effective task ID according to the configuration detection fault result and the task ID, overtime detection of corresponding state information is carried out according to the task period, and data received according to the task period is not regarded as invalid data.
Specifically, the node failure safety module (60) receives the state of the computer node sent by the information checking code and sending module (50), and if an invalid state is received in the task period or no information is received in the task period, the node failure safety module locks the task failure state, sends a prohibition signal to the bus network end node (104), and prohibits the bus network end node (104) from outputting information to the outside;
and meanwhile, the task state of the computer of the coding and sending module (50) is checked according to the information, if an invalid state is received in the task period or the information is not received in the task period, the task fault state is locked, and a task fault indication is generated to disable an output interface related to the task in the computer.
Specifically, the other-computer task state monitoring module (70) is provided with a timeout detection module, if the software sent by the receiving module 80 indicates the state and the task state information of other computer nodes within the specified time period, and sends the information to the node 50, if the software sent by the receiving module 80 does not indicate the state information of other computer nodes within the specified time period, the information indicating the state of other computer nodes is set to be a fault and sent to the node 50, and if the software sent by the receiving module 80 does not indicate some task state information of other computer within the specified time period, the information indicating some task state information of other computer is set to be a fault and sent to the node 50.
Specifically, the bus network system (102) supports a time trigger protocol for data transmission, and the bus network system (102) is implemented in a bus mode or a switching network mode.
In summary, according to the above-mentioned solution, the member and task integrated management module can complete indication and approval information of computer node states and task states between computer nodes, and can complete dynamic adaptation according to reconstruction of computer nodes, thereby supporting the requirements of high-security distributed computer system for integrated management of computer nodes, multitask members and tasks.
Drawings
Fig. 1 is a schematic composition diagram of a member and task integrated management system provided by the present invention.
Detailed Description
The high security requirement of the computer system in the high security field can be satisfied only by executing the same task by a plurality of redundant computers to form a computer system with redundancy, mutual voting and fault judgment must be performed between the computers, in a combined high security computer architecture, resources such as computing control and the like are centralized, and the traditional channel fault module design is suitable for the redundancy single-task redundant computer system based on channel degradation. However, with the progress of technology, the high-security embedded field gradually applies the distributed computer system, different computing and control resources are distributed in different computer nodes, each computer node has the capability of executing multiple tasks, different computer nodes execute the same or different tasks, and based on the requirements of higher security and higher availability, when a computer node fails, other computer nodes are allowed to perform task migration after sensing, and the failure module design based on the channel cannot be applied to the high-security distributed computer system, mainly because: 1. the state transmission of the fault module based on the channel uses the discrete quantity with simple circuit and high reliability, the channel number is n, the discrete quantity of the fault module required by the single channel is about the nth power of 2, the required resource is exponentially increased along with the increase of the channel number, the resource based on hardware stacking is not reusable, the exchangeable information is extremely little, the number of computer nodes of the distributed computer system participating in the fault module can reach dozens, the state quantity is transmitted by adopting the same mode, the transmission line is exponentially increased, the cost is huge, the reliability of the fault module system and the corresponding circuit is far lower than that of the computer system, and the transmitted state result can not represent the input state; 2. the fault module based on the channel only has one level, namely a computer channel, the tasks of all computers are the same, the multitask and task reconstruction of the distributed computer system are important characteristics, the distributed computer system is divided into at least two levels, namely a computer node and a task two-level operated by the computer node. Aiming at the problems, the invention provides a member and task comprehensive management module and a control method suitable for a high-safety distributed system, which can complete the indication and approval information of the computer node state and the task state among computer nodes in a high-certainty and high-integrity information transmission mode, can complete dynamic adaptation according to the reconstruction of the computer nodes and tasks, and can realize the member and task comprehensive management of the computer nodes and multiple tasks of the distributed computer system.
The member and task integrated management module is strongly related to the type of the computer node and the type of the task, and a highly-determined and highly-complete member and task integrated management module data packet format and a member and task integrated management module data packet transmission mode are used, so that the distributed computer system can also comprise a computer without the member and task integrated management module, and the member and task integrated management module of other computer nodes can not be influenced by the computer node and the task. Through the analysis, the member and task integrated management module is not only suitable for a high-safety distributed computer system, but also suitable for other types of distributed computer systems with high-safety functions.
Technical scheme of the invention
The invention adopts bus network of distributed computer system, information transmission mode of member and task integrated management, member and task integrated management module of each computer node, etc. to form member and task integrated management module suitable for high-safety distributed computer system.
The connection between each computer node of the high-safety distributed computer system must adopt a bus network system, the adopted bus network system must support a time trigger protocol to transmit messages, and the bus network end node positioned at each computer node must also support the time trigger protocol of the bus network system to support the deterministic and integral transmission of member and task comprehensive management information. Each computer node of the high-safety distributed computer system can be divided into a task reconfigurable computer node, a task non-reconfigurable computer node, a single-task computer node and a multi-task computer node from different classification angles. Regardless of the type of the node, each computer node participating in task voting and monitoring only comprises a member and a task integrated management module, and each computer node participating in task migration comprises a member and a task integrated management module.
The distributed computer system must design a node-task-cycle configuration table containing all computer nodes in the system, the contents of the node-task-cycle configuration table include valid computer node IDs of all computer nodes of the distributed computer system, task IDs of all executable tasks configured by the computer node IDs, reconstruction priorities of different tasks, execution cycles of all tasks, check information and the like, each computer can only store relevant computer node-task-cycle configuration parts of the computer node of the computer system, and can also store all detailed information.
The comprehensive management data packet content of the members and tasks of the distributed computer system comprises the following steps: the data processing system comprises a packet header, a packet header check code, a data length check code, a data load area and a data packet check code. The data load area includes: computer ID (including check bit), computer node state information, ID of each task, state information of each task, frame period count of each task, and data load area check code. The frame period count of the data payload area must be filled in by software, not allowing the hardware circuitry to generate automatically. The transmission of the data packet in the bus network must adopt the time trigger protocol to carry on the periodic transmission, the end node of the bus network carries on the packet head based on time trigger protocol member and task comprehensive management data packet received from the bus network, data length, data packet check, etc. of the network protocol adopted to check, send the data load area to member and task comprehensive management module of the computer node if correct, discard and record if the mistake; the data load area received from the member and task integrated management module of the computer node adds necessary information such as packet header, length, data packet inspection and the like to meet the format requirement of the bus protocol, and simultaneously, the information is sent or discarded and recorded according to the enabling of the member and task integrated management module.
The member and task comprehensive management data packet must adopt the time trigger protocol to transmit on the bus network, the end node of the bus network checks and checks the header, data length, data packet, etc. of the network protocol according to the member and task comprehensive management data packet based on the time trigger protocol received on the bus network, receive and check the decoding module with the information of the data load area if the check is correct, discard and record if the error is wrong; and adding necessary information such as packet headers, lengths, data packet inspection and the like to the data load areas received by the information inspection coding and sending module according to the member and task comprehensive management data packet format, if the node fault safety module allows sending, sending the data packet to the bus network according to a time trigger protocol, otherwise discarding and recording. The bus network system transmits all member and task integrated management data packets received from a certain bus network end node to all other bus network end nodes connected with the bus network through a time trigger protocol.
The member and task integrated management module located in the computer node can be divided into an integrated management core module, a local state monitoring module, a BIT test result detection module, a power supply monitoring result detection module, a task running state detection module, a node task configuration detection module, a node ID test module, a task ID test module, a node-task-period configuration table, an information receiving and inspection decoding module, an information inspection coding and sending module, a node fault safety module, an other computer task state monitoring module and a software interface module from module functions.
The comprehensive management core module comprehensively obtains the computer node state information of the computer node and the state information of each task of the computer node according to the state information of the computer node and the state information of each task operated, which are given by the local state monitoring module, the computer node task configuration detection result given by the node task configuration detection module, the computer node state information of other computer nodes, the state information of the tasks operated by the computer nodes of other computer nodes, the computer node state information indicating the computer node and the state information indicating the task of the computer node, which are given by the information receiving and checking decoding module, and simultaneously sends the computer node state information of the computer node and the state information of each task of the computer node to the node fault safety module.
The local state monitoring module comprehensively obtains the self state and task state of the computer node according to the BIT test result of the computer node state and the BIT test result of each task given by the BIT test result detection module, the power monitoring test result of the computer node state and the power monitoring test result of each task given by the power monitoring result detection module, the computer node running state monitoring result given by the task running state detection module and the running state monitoring result of each task running by the computer node, and the information is sent to the comprehensive management core module.
The BIT test result detection module comprehensively obtains BIT test results of computer node states and BIT test results of all tasks according to BIT test items executed by software and hardware, influence ranges (computer nodes, a certain task and a plurality of tasks) of the BIT test items and corresponding test results.
The power supply monitoring result detection module comprehensively obtains power supply monitoring test results of computer node states and power supply monitoring test results of all tasks according to power supply monitoring test items of all power supplies executed by software and hardware, influence ranges (a computer node, a certain task and a plurality of tasks) of the power supply monitoring test items and corresponding monitoring test results.
The task running state detection module is used for configuring a frame counting accumulation judgment module and a timeout judgment module for all task shared functional parts, respectively configuring respective frame counting accumulation judgment modules and timeout judgment modules for each task independent functional part, and comprehensively obtaining a computer node running state monitoring result and a running state monitoring result of each task run by the computer node by comprehensively adopting a frame counting accumulation judgment method and a timeout judgment method.
The node task configuration detection module reads all task IDs corresponding to the computer node ID from the node-task-period configuration table according to the computer node ID and the computer node ID state given by the node ID test module and the task ID and task ID state given by the task ID test module, and reads all task IDs corresponding to the computer node ID from the node-task-period configuration table according to the computer node ID given by the task ID test module, and then the task IDs are matched with the task ID given by the task ID test module. The inconsistency of matching, the error of the ID state of the computer node and the error of the ID state of the task represent task configuration detection faults, configuration detection fault results are sent to the comprehensive management core module, and the configuration detection fault results and the task ID are sent to the information receiving and checking decoding module.
The node ID test module verifies input information according to the computer node ID input and the computer node ID, verifies whether the computer node ID is correct or not, and sends the computer node ID and the computer node ID state to the node task configuration detection module.
And the task ID testing module verifies the input information according to the task ID input and the task ID, verifies whether the task ID is correct, and sends the task ID and the task ID state to the node task configuration detecting module.
The information receiving and checking decoding module obtains effective task ID according to the configuration detection fault result and the task ID given by the node task configuration detection module, screens the data load area of other computer nodes received by the comprehensive management core module by using a time trigger protocol, only screens computer node state information of other computer nodes with the same task ID as the task ID of the computer node and task state information with the same task ID, then checks and decodes the information, carries out matching detection according to a configuration table given by a node-task-period configuration table, only reserves correct information, synthesizes the screened information, and synthesizes the computer node state information of other computer nodes, the state information of tasks operated by the computer nodes of other computer nodes, the state information of the computer nodes indicating the computer node and the state information indicating the task of the computer node, the information is sent to the comprehensive management core module; meanwhile, the information receiving and checking decoding module needs to detect a task period of a valid task ID obtained according to a configuration detection fault result and the task ID given by the node task configuration detection module, overtime detection of corresponding state information is carried out according to the period, data received according to the task period is not regarded as invalid data, and the overtime module is set to be a multiple (more than or equal to 2) of the task period to ensure the usability of the system; meanwhile, the information receiving and checking decoding module needs to detect the small frame count of the received data packet, and data which does not accord with the accumulation rule (see the information checking coding and sending module) is regarded as invalid data.
The information inspection coding and sending module codes the information of the computer node indicating other computer node member states and indicating other computer node task states given by the other computer task state monitoring module, and the information of the computer node state and the computer node task state given by the comprehensive management core module, which conforms to the format of 91, and sends the information to the comprehensive management core module 4; and sending the node state information and the task state information of the computer to a node fault safety module.
The node fault safety module is provided with an overtime detection module and is used for receiving the state of the computer node from the information checking coding and sending module, if an invalid state is received in a specified time period or information is not received in the specified time period, the node fault safety module locks a task fault state and sends a prohibition signal of the comprehensive management core module 4 to prohibit the comprehensive management core module 4 from outputting information outwards, otherwise, the node fault safety module is in an enabling state and allows the comprehensive management core module 4 to output information outwards; and meanwhile, according to the task state of the computer from the information checking coding and sending module, if an invalid state is received within a specified time period or information is not received within the specified time period, the task fault state is locked, and a task fault indication is generated to disable an output interface related to the task in the computer.
The other computer task state monitoring module is provided with a timeout detection module, if the software sent by the software interface module indicates the state and the task state information of other computer nodes within the specified time period and sends the information to the information check code and sending module, if the software sent by the software interface module does not indicate the state information of other computer nodes within the specified time period, the information indicating other computer nodes is set as a fault and sent to the information check code and sending module, and if the software sent by the software interface module does not indicate some task state information of other computer within the specified time period, the information indicating some task state information of other computer is set as a fault and sent to the information check code and sending module.
The software interface module is provided with a bus decoding module and a plurality of registers which are connected with a host bus interface, and the registers comprise an integrated management core module, a local state monitoring module, a BIT test result detection module, a power supply monitoring result detection module, a task running state detection module, a node task configuration detection module, a node ID test module, a task ID test module, a node-task-period configuration table, an information receiving and checking decoding module, an information checking coding and sending module, a node fault safety module, an other machine task state monitoring module and other modules, so that the software can acquire all states of the integrated management core module, internal transmission information and error record information.
In summary, the present invention provides a member and task integrated management module and a control method suitable for a high-security distributed system, in which the computer node member and task integrated management module completes indication and approval information of computer node states and task states between computer nodes without increasing extra resource overhead, and can complete dynamic adaptation according to reconstruction of computer nodes and tasks, thereby supporting the member and task integrated management requirements of multiple computer nodes and multiple tasks in a high-security distributed computer system.

Claims (9)

1. A member and task integrated management system, wherein the high security distributed computer system (101) comprises N computer nodes (103), N being not less than 3, each computer node (103) comprising a bus network end node (104) and a member and task integrated management module (105), wherein:
the bus network end node (104) of each computer node (103) is connected with the bus network system (102) and is used for communicating with the bus network system (102) and carrying out member and task comprehensive management data packet transmission by adopting a time trigger protocol;
the member and task integrated management module (105) of each computer node (103) comprises: the system comprises an integrated management core module (10), a local state monitoring module (20), a BIT test result detection module (21), a power supply monitoring result detection module (22), a task running state detection module (23), a node task configuration detection module (30), a node ID test module (31), a task ID test module (32), a node-task-cycle configuration table (33), an information receiving and checking decoding module (40), an information checking coding and sending module (50), a node fault safety module (60), an other machine task state monitoring module (70) and a software interface module (80).
2. The integrated member and task management system according to claim 1, wherein the integrated member and task management data includes a computer ID including a check bit, computer node status information, an ID of each task, status information of each task, a frame period count of each task, and a data payload area check code.
3. The system for the integrated management of members and tasks according to claim 1, wherein the integrated management core module (10) is configured to receive the state information of the computer nodes and the state information of each task running, which are sent by the local state monitoring module (20), the computer node task configuration detection result sent by the node task configuration detection module (30), and the information of other computer nodes sent by the information receiving and checking decoding module (40); generating state information of computer nodes and state information of each task of the computer nodes; simultaneously sending the state information of the computer nodes and the state information of each task of the computer nodes to a node failure safety module (60);
the other computer node information includes computer node state information of other computer nodes, state information of tasks run by the computer nodes, computer node state information indicating the computer node, and state information indicating the computer node tasks.
4. The system for the comprehensive management of members and tasks according to claim 1, wherein the local state monitoring module (20) is configured to receive the BIT test result of the computer node state and the BIT test result of each task sent by the BIT test result detection module (21), the power monitoring test result of the computer node state and the power monitoring test result of each task sent by the power monitoring result detection module (22), and the computer node operating state monitoring result and the operating state monitoring result of each task operated by the computer node sent by the node and task operating state detection module (23); generating a computer node self state and a task state of the computer node; sending the self state and task state of the computer node to an integrated management core module (10);
the power supply monitoring result detection module (22) comprehensively obtains power supply monitoring test results of computer node states and power supply monitoring test results of tasks according to power supply monitoring test items, influence ranges and corresponding monitoring test results of all power supplies executed by software and hardware, and sends the power supply monitoring test results of the computer node states and the power supply monitoring test results of all tasks to the local state monitoring module (20);
the task running state detection module (23) comprises a frame counting accumulation judgment module and a timeout judgment module which are configured on all task sharing functional parts, and a respective frame counting accumulation judgment module and a respective timeout judgment module which are configured on each task independent functional part respectively, obtains a computer node running state monitoring result and a running state monitoring result of each task run by a computer node by adopting a frame counting accumulation judgment method and a timeout judgment method, and sends the computer node running state monitoring result and the running state monitoring result of each task run by the computer node to a local state monitoring module (20).
5. The membership and task integrated management system according to claim 1, wherein the node task configuration detection module (30) reads all task IDs corresponding to the computer node ID from the node-task-period configuration table (33) according to the computer node ID and the computer node ID status given by the node ID test module (31) and the task ID and task ID status given by the task ID test module (32) and the computer node ID status given by the task ID test module (32), and matches the task IDs given by the task ID test module (32). The inconsistency of matching, the error of the ID state of the computer node and the error of the ID state of the task represent task configuration detection faults, the configuration detection fault result is sent to the comprehensive management core module (10), and the configuration detection fault result and the task ID are sent to the information receiving and checking decoding module (40); and a node-task-period configuration table (33) for storing valid computer node IDs of the respective computer nodes of the distributed computer system, task IDs of all executable tasks configured by the computer node IDs, reconstruction priorities of different tasks, execution periods of the respective tasks, and the like.
6. The system for the comprehensive management of members and tasks as claimed in claim 1, wherein the information receiving and checking decoding module (40) receives the configuration detection fault result and the task ID given by the node task configuration detection module (30) to obtain a valid task ID, screens the data load area of the member and task comprehensive management data packet received by the bus network end node (104) using the time-triggered protocol, screens only the computer node state information of other computer nodes and the task state information of the same task ID which are the same as the task ID of the computer node, then checks and decodes the screened information and performs matching detection according to the node-task-period configuration table (33), and only retains the correct information; according to the screened correct information, obtaining computer node state information of other computer nodes, state information of tasks run by the computer nodes of other computer nodes, computer node state information indicating the computer node and state information indicating the computer node task; sending the obtained information to an integrated management core module (10); and meanwhile, the information receiving and checking decoding module (40) screens out a task period of the effective task ID according to the configuration detection fault result and the task ID, overtime detection of corresponding state information is carried out according to the task period, and data received according to the task period is not regarded as invalid data.
7. The integrated member and task management system according to claim 6, wherein the node failure security module (60) receives the state of the computer node transmitted by the information checking code and transmission module (50), latches the state of the task failure if an invalid state is received in the task period or no information is received in the task period, and transmits a disable signal to the bus network end node (104) to disable the bus network end node (104) from outputting information to the outside;
and meanwhile, the task state of the computer of the coding and sending module (50) is checked according to the information, if an invalid state is received in the task period or the information is not received in the task period, the task fault state is locked, and a task fault indication is generated to disable an output interface related to the task in the computer.
8. The membership and task integrated management system according to claim 1, wherein the other task state monitoring module (70) has a timeout detection module, and if the software transmitted by the receiving module 80 indicates the state and task state information of the other computer nodes within a specified time period, and transmits the timeout detection module to 50, and if the software transmitted by the receiving module 80 does not indicate the state information of the other computer nodes within the specified time period, the other computer nodes are set to send a failure to 50.
9. The integrated member and task management system according to claim 1, wherein the bus network system (102) supports a time-triggered protocol for data transmission, and the bus network system (102) is implemented in a bus manner or a switching network manner.
CN202011227150.8A 2020-11-05 2020-11-05 Member and task integrated management system Pending CN112231179A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011227150.8A CN112231179A (en) 2020-11-05 2020-11-05 Member and task integrated management system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011227150.8A CN112231179A (en) 2020-11-05 2020-11-05 Member and task integrated management system

Publications (1)

Publication Number Publication Date
CN112231179A true CN112231179A (en) 2021-01-15

Family

ID=74122804

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011227150.8A Pending CN112231179A (en) 2020-11-05 2020-11-05 Member and task integrated management system

Country Status (1)

Country Link
CN (1) CN112231179A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106293919A (en) * 2016-08-12 2017-01-04 中国航空工业集团公司西安飞行自动控制研究所 The built-in tasks dispatching device of a kind of Time Triggered and method
CN108183836A (en) * 2017-12-15 2018-06-19 中国航空工业集团公司西安飞行自动控制研究所 A kind of distributed synchronization bus network test system and its test method
CN109347703A (en) * 2018-11-21 2019-02-15 中国船舶重工集团公司第七六研究所 A kind of CPS node failure detection device and method
CN109684131A (en) * 2018-12-14 2019-04-26 中国航空工业集团公司西安航空计算技术研究所 A kind of mixed structure network fault tolerance system dynamic reconfiguration method based on table- driven

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106293919A (en) * 2016-08-12 2017-01-04 中国航空工业集团公司西安飞行自动控制研究所 The built-in tasks dispatching device of a kind of Time Triggered and method
CN108183836A (en) * 2017-12-15 2018-06-19 中国航空工业集团公司西安飞行自动控制研究所 A kind of distributed synchronization bus network test system and its test method
CN109347703A (en) * 2018-11-21 2019-02-15 中国船舶重工集团公司第七六研究所 A kind of CPS node failure detection device and method
CN109684131A (en) * 2018-12-14 2019-04-26 中国航空工业集团公司西安航空计算技术研究所 A kind of mixed structure network fault tolerance system dynamic reconfiguration method based on table- driven

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
杨军祥等: ""基于TTFC 网络的分布式综合化处理系统平台研究"", 《航空计算技术》, vol. 48, no. 5, pages 309 - 314 *

Similar Documents

Publication Publication Date Title
US20070242611A1 (en) Computer Hardware Fault Diagnosis
US6560720B1 (en) Error injection apparatus and method
US7324913B2 (en) Methods and apparatus for testing a link between chips
Liu et al. Online traffic-aware fault detection for networks-on-chip
US6601195B1 (en) Switch adapter testing
CN101406004A (en) Methods and arrangements to detect a failure in a communication network including bundled adapters
US20060036906A1 (en) System and method for detecting errors in a network
US8542597B2 (en) Soft error recovery for converged networks
US20090217096A1 (en) Diagnosing Communications Between Computer Systems
US6601210B1 (en) Data integrity verification in a switching network
JPH04242463A (en) State-change informing mechanism and method in data processing input/output system
AU2001241700B2 (en) Multiple network fault tolerance via redundant network control
CN101667953B (en) Reporting method of rapid looped network physical link state and device therefor
US6625745B1 (en) Network component failure identification with minimal testing
CN112231179A (en) Member and task integrated management system
US20100146344A1 (en) Multi-partition computer system, failure handling method and program therefor
US5029159A (en) Method and means for leader choosing on a token ring system
CN114124745B (en) Method and system for diagnosing MVB communication faults
CN113722143A (en) Program flow monitoring method and device, electronic equipment and storage medium
US10565146B2 (en) Interconnect and method of handling supplementary data in an interconnect
Täubrich et al. Formal specification and analysis of AFDX redundancy management algorithms
CN113051116B (en) Token bus merging selection integrity self-testing device based on instruction monitoring voting
CN112737872B (en) ARINC664P7 end system cross-network testing system and method
Yeh et al. Expert system based automatic network fault management system
JP7082084B2 (en) Information transmission / reception system, information transmission / reception method, program, interlocking logic processing device, and electronic terminal

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination