CN111610778B

CN111610778B - Self-adaptive monitoring system for improving stability of industrial control system

Info

Publication number: CN111610778B
Application number: CN202010578992.1A
Authority: CN
Inventors: 王丹; 张洲; 白轶; 罗钊航; 蒲学军; 邓江明; 王大秋; 唐华
Original assignee: Nuclear Power Institute of China
Current assignee: Nuclear Power Institute of China
Priority date: 2020-06-23
Filing date: 2020-06-23
Publication date: 2021-06-29
Anticipated expiration: 2040-06-23
Also published as: CN111610778A

Abstract

The invention discloses a self-adaptive monitoring system for improving the stability of an industrial control system, which comprises a monitoring module, a positioning module, a decision module, a maintenance module and a general knowledge base; the monitoring module monitors the memory according to the industrial control system information provided by the general knowledge base, and sends a memory abnormity notification to the positioning module when the memory abnormity is found; the positioning module screens suspicious data points according to the memory abnormity notification pushed by the monitoring module and industrial control system information provided by the universal knowledge base, and gives a proposal according to the suspicious data points and pushes the proposal to the decision module; the decision module decides the best scheme of the current round from the suggested schemes pushed by the positioning module according to the industrial control system information provided by the general knowledge base and pushes the best scheme to the maintenance module; and the maintenance module checks or recovers the industrial control system according to the current optimal scheme pushed by the decision module. The invention can check and maintain memory faults and improve the safety and stability of the industrial control system.

Description

Self-adaptive monitoring system for improving stability of industrial control system

Technical Field

The invention relates to the technical field of industrial control system testing, in particular to a self-adaptive monitoring system for improving the stability of an industrial control system.

Background

DRAM memory is an important component in industrial control systems, and can cause permanent hardware errors due to physical damage of the memory. For example, a specific bit in a DRAM chip is permanently fixed at 0 or 1, resulting in system unrecoverable and causing a serious security accident.

Because the industrial control system has high requirements on real-time performance and stability, the industrial control system is particularly applied to the industrial control system in the field of nuclear technology, shutdown maintenance and software updating are not suitable during operation, and DRAM faults can not be repaired for a long time. At present, most of schemes for improving stability and safety of industrial control systems are realized based on means such as flow monitoring, fuzzy testing or access control, and less attention is paid to monitoring a memory, so that unrecoverable faults caused by permanent faults of a DRAM memory are difficult to find.

Disclosure of Invention

The invention provides a self-adaptive monitoring system for improving the stability of an industrial control system, which can find, position and repair memory faults, thereby improving the fault-tolerant capability, safety and stability of the industrial control system, ensuring the equipment and network of the industrial control system to stably run for a longer time, carrying out automatic troubleshooting and maintenance when hardware faults occur, and reducing the safety attack on the system.

The invention is realized by the following technical scheme:

a self-adaptive monitoring system for improving the stability of an industrial control system comprises a monitoring module, a positioning module, a decision-making module, a maintenance module and a general knowledge base;

the monitoring module monitors the memory according to industrial control system information provided by the universal knowledge base, and sends a memory abnormity notification to the positioning module when the memory is found to be abnormal;

the positioning module screens suspicious data points according to the memory abnormity notification pushed by the monitoring module and industrial control system information provided by the universal knowledge base, and gives a proposal according to the suspicious data points and pushes the proposal to the decision module;

the decision module decides the best scheme of the current turn from the suggested schemes pushed by the positioning module according to the industrial control system information provided by the general knowledge base and pushes the best scheme to the maintenance module;

the maintenance module checks or recovers the industrial control system according to the current optimal scheme pushed by the decision module;

the general knowledge base is a database for recording the global information of the industrial control system.

Preferably, the information recorded by the universal database of the present invention includes: a task instruction rule set, marked defective memory regions, and an abnormal memory maintain log records.

Preferably, the monitoring module of the present invention comprises at least one data point monitor;

and the data point monitor compares the running task data value of the industrial control system with a task instruction rule set provided by the general knowledge base, and if the running task data point is found to be determined to be an abnormal task instruction, the running task data point is marked as the abnormal instruction and is pushed to the positioning module.

Preferably, the location module of the present invention comprises at least one data point verification processor;

and screening suspicious data points according to the memory exception notification pushed by the monitoring module and the information provided by the universal knowledge base so as to activate a data point verification processor, and specifically screening the suspicious data points through suspicious factors obtained by a formula (1):

in the formula, alpha is a suspicious factor and represents the correlation between the current running task and a task instruction in a general knowledge base;

InsOri_curis a monitored task instruction, which is a binary sequence;

InsOri_abnsimilar suspicious task instructions, which are binary sequences;

L_curinstruction sequence Insori for a monitored task_curLength of (d);

L_abninstruction sequences Insori for similar suspicious tasks_abnLength of (d);

Ins_curthe most similar subsequence of the suspicious task instruction sequence in the monitored task instruction sequence is obtained;

Ins_abnfor the most similar subsequence of the similar task instruction sequence to the current task instruction sequence

L_simIs the maximum similar subsequence length; if L is_cur＞L_abnThen L_sim＝L_abn；Ins_abn＝InsOri_abn；Ins_curAs Insori_curMiddle intercept and Insori_abnA sequence segment having a maximum similar subsequence length; if L is_cur＜L_abnThen L_sim＝L_cur；Ins_cur＝InsOri_cur；Ins_abnAs Insori_abnMiddle intercept and Insori_curA sequence segment having a maximum similar subsequence length; if L is_cur＝L_abnThen L_sim＝L_cur＝L_abn；Ins_cur＝InsOri_cur；Ins_abn＝InsOri_abn。

Preferably, after the data point verification processor is activated, a solution is provided for the suspicious data point to perform the whole checking process of the suspicious data point through self-adaptive circulation, and the suspicious data point is continuously upgraded until the suspicious data point is located to an abnormal memory position or the function of the suspicious data point returns to normal.

Preferably, the data point check processor of the present invention employs a state machine to effect the upgrade, with the internal state changing only when the decision block actually decides to execute a given proposal, during which the higher priority data point check processor is allowed to change global state, and if successful, the scheme level must be reset to 0.

Preferably, the proposed solution of the data point verification processor of the present invention includes at least one of the following 5 classes:

level 0: in a normal running state, no suspicious exception is found in the current environment or the exception is successfully processed;

level 1, when the memory is notified for the first time, activating the hot backup device, and suggesting a data point monitor to improve the monitoring level so as to acquire more detailed abnormal information;

level 2, if the suspicious data point error still exists, suggesting to carry out memory test on the affected memory area;

level 3 when memory testing detects erroneous memory locations, it is recommended to mark these areas as damaged memory, which is then not used. Otherwise, the safe state is recommended to be entered;

and 4, after the wrong memory area is shielded, the equipment can normally work, and if the wrong memory area cannot work, the safety state is recommended to be entered.

Preferably, the decision module of the present invention determines the priority of the proposed scheme by the following formula (2):

P_priority：＝αLevel_self (2)

in the formula, P_priorityIs the priority of the scheme;

alpha is a suspicious factor and represents the correlation between the current running task and the task instruction in the general knowledge base;

Level_selfthe current suspect data point is examined for a plan level.

Preferably, the maintenance module of the present invention takes the following measures according to the proposed scheme pushed by the decision module:

upgrading the monitoring level: activating the hot backup equipment, and increasing the monitoring level of a data point monitor;

testing the memory: testing the memory related to the data points;

marking a memory: finding that the memory unit is defective, shielding the defective memory in an operating system, recording the defective memory in a general knowledge base, and skipping the memory area in the later use;

and (4) safety state: if an error in the data point still occurs, even after the memory is masked, the security state must be entered (the system is shut down) and maintained by the security personnel;

and (3) reducing the monitoring level: the repair device hot backs up and reduces the monitoring level.

Preferably, in the invention, after the fault memory is successfully repaired, the relevant repairing process is added to the general knowledge base for updating the data model in the general database.

The invention has the following advantages and beneficial effects:

1. compared with the traditional technologies such as flow monitoring, fuzzy testing or access control, the method and the system have the advantages that the memory fault can be found, positioned and repaired by adopting the memory data point monitoring mode, particularly the permanent fault of the memory can be found, so that the fault-tolerant capability, the safety and the stability of the industrial control system are improved, the equipment and the network of the industrial control system are determined to stably run for a longer time, automatic troubleshooting and maintenance are carried out when the hardware fault occurs, and the system is prevented from being attacked by safety.

2. The monitoring system has wide application range, can be applied to the hardware fault automatic troubleshooting and maintenance of nuclear reactor industrial control systems with high requirements on instantaneity, stability and safety, and can also be applied to the hardware automatic troubleshooting and maintenance of other industrial control systems.

Drawings

The accompanying drawings, which are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the principles of the invention. In the drawings:

FIG. 1 is a schematic block diagram of the system of the present invention.

FIG. 2 is a diagram of a Modbus read command protocol format according to the present invention.

Detailed Description

Hereinafter, the term "comprising" or "may include" used in various embodiments of the present invention indicates the presence of the invented function, operation or element, and does not limit the addition of one or more functions, operations or elements. Furthermore, as used in various embodiments of the present invention, the terms "comprises," "comprising," "includes," "including," "has," "having" and their derivatives are intended to mean that the specified features, numbers, steps, operations, elements, components, or combinations of the foregoing, are only meant to indicate that a particular feature, number, step, operation, element, component, or combination of the foregoing, and should not be construed as first excluding the existence of, or adding to the possibility of, one or more other features, numbers, steps, operations, elements, components, or combinations of the foregoing.

In various embodiments of the invention, the expression "or" at least one of a or/and B "includes any or all combinations of the words listed simultaneously. For example, the expression "a or B" or "at least one of a or/and B" may include a, may include B, or may include both a and B.

Expressions (such as "first", "second", and the like) used in various embodiments of the present invention may modify various constituent elements in various embodiments, but may not limit the respective constituent elements. For example, the above description does not limit the order and/or importance of the elements described. The foregoing description is for the purpose of distinguishing one element from another. For example, the first user device and the second user device indicate different user devices, although both are user devices. For example, a first element could be termed a second element, and, similarly, a second element could be termed a first element, without departing from the scope of various embodiments of the present invention.

It should be noted that: if it is described that one constituent element is "connected" to another constituent element, the first constituent element may be directly connected to the second constituent element, and a third constituent element may be "connected" between the first constituent element and the second constituent element. In contrast, when one constituent element is "directly connected" to another constituent element, it is understood that there is no third constituent element between the first constituent element and the second constituent element.

The terminology used in the various embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the various embodiments of the invention. As used herein, the singular forms are intended to include the plural forms as well, unless the context clearly indicates otherwise. Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which various embodiments of the present invention belong. The terms (such as those defined in commonly used dictionaries) should be interpreted as having a meaning that is consistent with their contextual meaning in the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein in various embodiments of the present invention.

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to examples and accompanying drawings, and the exemplary embodiments and descriptions thereof are only used for explaining the present invention and are not meant to limit the present invention.

Example 1

The embodiment provides an adaptive monitoring system for improving the stability of an industrial control system.

The present embodiment defines the related technical terms:

monitored task instruction sequence: refers to a sequence of instructions that are included in the monitoring scope.

Current task instruction sequence: refers to the sequence of instructions generated at the task currently being executed.

Task instructions in the general knowledge base: the knowledge base is a knowledge base of the instruction sequence corresponding to the task constructed according to the accumulation of the analysis knowledge of the instruction.

Suspicious task instruction: among the instructions generated in the task execution process, according to knowledge of a knowledge base, abnormal task instructions cannot be accurately identified or represented.

As shown in fig. 1, the monitoring system of the present embodiment includes a monitoring module, a positioning module, a decision module, a maintenance module, and a general knowledge base.

A monitoring module: consisting of a data point monitor that finds and monitors specific anomalies. The main functions are as follows: monitoring DRAM memory abnormity, upgrading monitoring level and reducing monitoring level. And monitoring the memory according to system information provided by the general knowledge base, and pushing a message to the positioning module when the memory is found to be abnormal.

A positioning module: the system consists of a plurality of data point verification processors. And the positioning module screens suspicious data points according to the memory abnormity notification pushed by the data point monitor and the information provided by the universal knowledge base. And activating a data point checking processor, and proposing a proposal aiming at the suspicious data point and pushing the proposal to a decision module.

A decision module: consisting of a scenario evaluation processor. The method has the main functions of evaluating a plurality of suggested schemes pushed by a positioning module according to system information provided by a general knowledge base, and evaluating the best scheme of the current turn for checking the abnormal memory position.

A maintenance module: and the maintenance module checks or recovers the system according to the current optimal scheme pushed by the decision module, so that the system is gradually recovered to be normal.

A general knowledge base: the system global information base is mainly recorded and comprises related information such as a task instruction rule set, a marked defect memory area, log records maintained by an abnormal memory and the like.

Specifically, in this embodiment:

1. monitoring module

The monitoring module consists of a data point monitor. The data point monitor monitors memory errors based on a comparison method, only abnormal flow in communication data can be found through comparison flow analysis, and permanent faults of the memory can also be found based on the memory monitoring method. All task instruction rules in the industrial control system are extracted in a self-learning mode during system running, and a task instruction rule model is established and maintained in a general knowledge base. The data point monitor compares the data value of the running task with the task instruction model and finds an abnormal task instruction.

The data point monitor sets two levels of monitoring to ensure that the monitoring mechanism has as little impact on the production system as possible during normal operation. In the "low" monitoring level, the monitor is run once every fixed time, simply comparing the task input and output values, minimizing the impact on the system tasks. The "high" monitoring level compares all available data points, and the internal task, which may consist of many function blocks with single inputs and outputs and intermediate variables, can discover more detailed memory error information. However, the reading of these data takes a long time, the "high" monitoring level has a much larger influence on the system performance, and is only used when the "low" monitoring level is upgraded.

And the data point monitor finds an abnormal task instruction, sends a memory abnormal notification to the data point verification processor, and simultaneously starts the hot backup device to ensure that the system can normally run while troubleshooting and solving the abnormality.

2. Positioning module

The positioning module is composed of a plurality of data point verification processors. And the positioning module screens suspicious data points according to the memory abnormity notification pushed by the data point monitor and the information provided by the universal knowledge base. And activating a data point verification processor to propose a proposal for the suspicious data points.

Screening suspicious data points: the task instruction of the industrial control system consists of a 0 or 1 sequence, in order to find an abnormal instruction, the basis of screening suspicious data points is the correlation between the current running task and the task instruction in the general knowledge base, and is defined as a suspicious factor alpha, which is calculated by the following formula (1):

InsOri_curis a monitored task instruction, which is a binary sequence;

InsOri_abnsimilar suspicious task instructions, which are binary sequences;

L_curinstruction sequence Insori for a monitored task_curLength of (d);

And the alpha is a suspicious factor used for evaluating whether the data point is a suspicious data point, and the user can configure an alpha threshold.

Upon activation of the data point verification processor, a solution is proposed for the suspect data point to investigate the suspect data point. The whole checking process is realized through self-adaptive circulation, and is continuously upgraded until the abnormal memory position is positioned or the function is recovered to be normal. The upgrading step is realized by adopting a state machine method, and the internal state of the upgrading step is advanced according to defined conditions. The internal state will only change if the decision step actually decides to execute a given recommendation. During which the higher priority data point processors are allowed to change global state and if the change is successful, the scheme level must be reset to 0.

The following are 4 levels of recommendations:

level 0: normal running state, no suspicious exception is found in the current environment or the exception has been successfully processed.

Level 1-at the first memory exception notification, the hot backup device is activated and the data point monitor is advised to raise the monitoring level in order to obtain more detailed exception information.

Level 2 memory testing of the affected memory region is recommended if the suspect data point error still exists.

Level 3 when memory testing detects erroneous memory locations, it is recommended to mark these areas as damaged memory, which is then not used. Otherwise, it is recommended to enter a safe state.

And 4, after the wrong memory area is shielded, the equipment can normally work, and if the wrong memory area cannot work, the safety state is recommended to be entered. This step will wait for a successful notification that all data point check processors are correct, resetting scheme level 0.

3. A decision module:

the decision module is composed of a scheme evaluation processor. The main function is to evaluate a plurality of suggested schemes pushed by the positioning module, and to evaluate the best scheme in the current round for checking the abnormal memory position.

The evaluation criteria are defined as follows: based on the characteristics of high efficiency and real time of an industrial control system, dynamic parameters are introduced on the basis of the static priority of the formula (1).

P_priority：＝αLevel_self (2)

In the formula, P_priorityIs the priority of the scheme;

Level_selfthe current suspect data point is examined for a plan level.

The embodiment selects the best scheme of the current round according to the above standard and pushes the best scheme to the maintenance module.

4. Maintenance module

The maintenance module checks or recovers the system according to the scheme pushed by the decision module, so that the system is gradually recovered to be normal, and the measures taken according to the suggested scheme are as follows:

upgrading the monitoring level: and activating the hot backup device, and increasing the monitoring level of the data point monitor. And (5) discovering that some error occurs in the system, and immediately activating the hot backup system. For further analysis, the level of monitoring is increased to obtain more detailed information about the fault.

Testing the memory: and testing the memory related to the data points. A test memory routine is executed on the memory location of the erroneous data point.

Marking a memory: the memory cells are found to be defective, the defective memory is masked in the operating system, recorded in the common knowledge base, and the memory area is skipped in later use.

And (4) safety state: notify upper layers and suspend devices. If an error in the data point still occurs, even after the memory is masked, a safe state must be entered (the system is shut down) and maintained by the attendant.

And (3) reducing the monitoring level: the repair device hot backs up and reduces the monitoring level. And setting the repaired equipment into a hot backup mode, and reducing the monitoring level.

5. General knowledge base

The system global information base is mainly recorded and comprises a task instruction rule set, a marked defect memory area, an abnormal record and other related information.

Example 2

In this embodiment, the adaptive monitoring system provided in embodiment 1 is used to perform a test on an industrial control system based on a Modbus protocol and a Linux environment, and the specific process is as follows:

the analysis task instruction set establishes a general knowledge base, for example, in an industrial control environment based on a Modbus protocol, a Modbus read instruction format is shown in fig. 2.

A common knowledge base is created based on the instruction format shown in fig. 2, and the instruction set in this industrial control environment.

When the device receives the task and writes to a DRAM command 010300320002, the data point monitor compares the DRAM data point to the set of task command rules in the common knowledge base. Finding that the task instruction newly written into the DRAM memory has no matching rule in the common repository rule set, function code 03 is a read holding register address, and address 0032 is not a valid holding register address. The data point monitor pushes the 010300320002 command as an abnormal command to the positioning module and activates the hot backup device to ensure that the system can operate normally.

After the positioning module receives the abnormal instruction, based on the general knowledge base and the formula 1, 2 pieces of similarity are screened, and the instruction 1 is as follows: 010100320002 and instruction 2: 010300020002. calculating according to the formula 1 to obtain the suspicious factor alpha of the instruction 1₁The suspicion factor for instruction 2 is α at 0.015625₂0.3125. And 2 data point checking processors are activated, and recommendation schemes are proposed according to the scheme level of the current suspicious data point and are pushed to a decision module, wherein the initial schemes are all level 1.

The decision module scheme evaluation processor evaluates that similar instruction 1 has higher priority 1 according to the formula 2 above, and pushes the proposal scheme of similar instruction 1 to the maintenance module. The maintenance module takes measures, the monitoring level of the level data point monitor is increased, and input and output parameters of the detected instruction are examined. The detected instruction 010300320002 was found to be likely due to a memory fault in the function code bits of the similar instruction 010100320002. And the positioning module recommends the level 2 of the current instruction upgrading scheme to perform memory test.

The scheme evaluation processor evaluates the current scheme according to the formula 2 and pushes the scheme level 2 of the similar instruction 1 to the maintenance module. The maintenance module receives the scheme level 2 to perform a memory test, and finds that the bit is fixed at 1 and cannot be reset to cause an instruction exception due to a permanent memory fault occurring in the second least significant bit of 00000011 in the functional code 0x03 of the DRAM memory instruction 010300320002.

The positioning module continuously marks the memory for the similar instruction 1 recommendation scheme level 3, the scheme evaluation processor pushes the scheme for shielding the memory to the maintenance module, and the maintenance module marks the error memory as a reserved area by using a memmap command of linux, so that the error memory is not distributed in the following operation.

And when the fault memory is successfully repaired, adding the related repairing process to the general knowledge base so as to facilitate the repairing constitution later, and resetting the repaired device as a hot backup device to be used.

The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are merely exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A self-adaptive monitoring system for improving the stability of an industrial control system is characterized by comprising a monitoring module, a positioning module, a decision module, a maintenance module and a general knowledge base;

the positioning module comprises at least one data point verification processor;

InsOri_curis a monitored task instruction, which is a binary sequence;

InsOri_abnsimilar suspicious task instructions, which are binary sequences;

L_curinstruction sequence Insori for a monitored task_curLength of (d);

L_simIs the maximum similar subsequence length; if L is_cur＞L_abnThen L_sim＝L_abn；Ins_abn＝InsOri_abn；Ins_curAs Insori_curMiddle intercept and Insori_abnA sequence segment having a maximum similar subsequence length; if L is_cur＜L_abnThen L_sim＝L_cur；Ins_cur＝InsOri_cur；Ins_abnAs Insori_abnMiddle intercept and Insori_curA sequence segment having a maximum similar subsequence length; such asFruit L_cur＝L_abnThen L_sim＝L_cur＝L_abn；Ins_cur＝InsOri_cur；Ins_abn＝InsOri_abn；

2. The adaptive monitoring system for improving industrial control system stability as claimed in claim 1, wherein the information recorded by the general knowledge base includes: a task instruction rule set, marked defective memory regions, and an abnormal memory maintain log records.

3. An adaptive monitoring system for improving industrial control system stability according to claim 1, wherein said monitoring module comprises at least one data point monitor;

4. The adaptive monitoring system for improving the stability of the industrial control system according to claim 1, wherein after the data point verification processor is activated, a solution is provided for the suspicious data point to perform the whole checking process of the suspicious data point through adaptive loop, and the solution is continuously updated until the suspicious data point is located at an abnormal memory location or the function is recovered to be normal.

5. The adaptive monitoring system for improving industrial control system stability as claimed in claim 1, wherein the data point check processor uses a state machine to implement the upgrade, the internal state changes only when the decision module actually decides to execute a given recommendation, during which the higher priority data point check processor is allowed to change global state, and if the change is successful, the scheme level must be reset to 0.

6. The adaptive monitoring system for improving industrial control system stability according to claim 1, wherein the recommended scheme of the data point verification processor comprises at least one of the following 5 levels:

level 1: when the memory is notified for the first time abnormally, activating hot backup equipment, and suggesting a data point monitor to improve the monitoring level so as to acquire more detailed abnormal information;

level 2: if the suspicious data point error still exists, suggesting to perform memory test on the affected memory area;

level 3: when the memory test monitors wrong memory positions, suggesting to mark the regions as damaged memories, and then not using the damaged memories, or suggesting to enter a safe state;

level 4: after the wrong memory area is masked, the device should be able to work normally, and if not, it is recommended to enter a safe state.

7. The adaptive monitoring system for improving industrial control system stability as claimed in claim 1, wherein the decision module determines the priority of the proposed solution by the following equation (2):

P_priority：＝αLevel_self (2)

in the formula, P_priorityIs the priority of the scheme;

Level_selfthe current suspect data point is examined for a plan level.

8. The adaptive monitoring system for improving the stability of the industrial control system according to claim 1, wherein the maintenance module takes the following measures according to the proposed scheme pushed by the decision module:

testing the memory: testing the memory related to the data points;

and (4) safety state: if an error in the data point still occurs, even after the memory is shielded, the security state must be entered and maintained by the security officer;

9. The adaptive monitoring system for improving the stability of the industrial control system according to claim 1, wherein after the failed memory is successfully repaired, the relevant repairing process is added to the general knowledge base for updating the data model in the general knowledge base.