CN114676002A - PHM technology-based system operation and maintenance method and device - Google Patents

PHM technology-based system operation and maintenance method and device Download PDF

Info

Publication number
CN114676002A
CN114676002A CN202011550197.8A CN202011550197A CN114676002A CN 114676002 A CN114676002 A CN 114676002A CN 202011550197 A CN202011550197 A CN 202011550197A CN 114676002 A CN114676002 A CN 114676002A
Authority
CN
China
Prior art keywords
maintenance
alarm information
script
basic data
fault
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011550197.8A
Other languages
Chinese (zh)
Inventor
许建东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Gridsum Technology Co Ltd
Original Assignee
Beijing Gridsum Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Gridsum Technology Co Ltd filed Critical Beijing Gridsum Technology Co Ltd
Priority to CN202011550197.8A priority Critical patent/CN114676002A/en
Publication of CN114676002A publication Critical patent/CN114676002A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/22Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
    • G06F11/2263Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The invention discloses a system operation and maintenance method and device based on a PHM (physical layer management) technology, relates to the technical field of computers, and mainly aims to realize the operation and maintenance management of an IT (information technology) system by means of the PHM technology and improve the operation and maintenance management efficiency of a complex system. The main technical scheme of the invention is as follows: acquiring operation and maintenance basic data of a target system according to preset operation and maintenance parameters; inputting the operation and maintenance basic data which accord with the preset type into a corresponding prediction model to obtain alarm information of a target system; according to the content of the alarm information, matching a knowledge graph of a target system, and determining an operation and maintenance script for processing the alarm information; and calling the operation and maintenance script to process the system fault corresponding to the alarm information, and generating operation and maintenance processing information of the system fault.

Description

PHM technology-based system operation and maintenance method and device
Technical Field
The invention relates to the technical field of computers, in particular to a system operation and maintenance method and device based on a PHM (physical layer module) technology.
Background
PHM (predictive and Health Management) refers to a comprehensive technology that collects data information of a system by using sensors, monitors, manages and evaluates the Health state of the system by means of information technology and artificial intelligence reasoning algorithm, predicts the fault of the system before the system is in fault, and provides a series of maintenance and guarantee suggestions or decisions by combining the existing resource information, and the comprehensive technology integrates fault detection, isolation, Health prediction, evaluation and maintenance decisions. However, at present, PHM technology is also mainly used in industrial fields, such as applications on vehicles such as weapons, civil aircrafts or high-speed rails.
IT systems have become more and more complex as hardware and software systems have evolved more efficiently and more complex in the computer field. Therefore, in the actual operation and maintenance process of the IT system, the most direct influence caused by the complexity of the business model (or the system deployment structure) is that the fault positioning is difficult, and the cost for finding the root cause problem is high. As the system scale becomes larger, the complexity is improved, the monitoring coverage is perfect, the number of monitoring indexes is exponentially increased, the index forms are diversified, the setting of the monitoring rule based on experience is not careful, and the false alarm rate and the missing report rate are high. The failure root cause can be determined only by checking and finding a plurality of alarms after the failure happens, the efficiency is extremely low, and the recovery time of the failure is greatly increased. Therefore, there is a need for a more efficient fault monitoring and handling operation and maintenance system for the existing IT system.
Disclosure of Invention
In view of the above problems, the present invention provides a system operation and maintenance method and device based on the PHM technology, and mainly aims to implement operation and maintenance management on an IT system by means of the PHM technology, and improve operation and maintenance management efficiency on a complex system.
In order to achieve the purpose, the invention mainly provides the following technical scheme:
in a first aspect, the present invention provides a system operation and maintenance method based on a PHM technology, including:
acquiring operation and maintenance basic data of a target system according to preset operation and maintenance parameters;
inputting the operation and maintenance basic data which accord with the preset type into a corresponding prediction model to obtain alarm information of a target system;
matching a knowledge graph of a target system according to the content of the alarm information, and determining an operation and maintenance script for processing the alarm information;
and calling the operation and maintenance script to process the system fault corresponding to the alarm information, and generating operation and maintenance processing information of the system fault.
Preferably, the determining an operation and maintenance script for processing the alarm information according to the knowledge graph of the alarm information matched with the target system includes:
extracting the equipment information of the target system contained in the alarm information;
acquiring the latest knowledge graph of the target system, wherein nodes of the knowledge graph carry corresponding operation and maintenance script labels;
matching the equipment information in a knowledge graph according to a preset rule, and determining an operation and maintenance script label matched with the alarm information;
and determining an operation and maintenance script for processing the alarm information according to the operation and maintenance script tag.
Preferably, the determining, according to the operation and maintenance script tag, an operation and maintenance script for processing the alarm information includes:
when the operation and maintenance script tag is not matched, generating visual alarm information in a preset format in a system operation and maintenance interface;
and when a plurality of operation and maintenance script labels are matched, determining the operation and maintenance script for processing the alarm information according to the priority of the operation and maintenance script corresponding to the operation and maintenance script labels.
Preferably, the invoking the operation and maintenance script to process the system fault corresponding to the alarm information and generating the operation and maintenance processing information of the system fault includes:
monitoring operation and maintenance basic data corresponding to the system fault, and judging whether the system fault is removed after the operation and maintenance script is operated;
if the system fault is removed, generating an operation and maintenance processing log of the system fault;
and if not, generating visual fault alarm information in a preset format in the system operation and maintenance interface.
Preferably, the preset types of the operation and maintenance basic data include: periodic data, stationary data, irregular data; the method further comprises the following steps:
acquiring periodic operation and maintenance basic data by using a support vector machine algorithm;
acquiring stable operation and maintenance basic data by using a clustering algorithm;
and acquiring irregular type operation and maintenance basic data by using a convolutional neural network.
Preferably, the method further comprises:
and training the prediction model by using different preset types of operation and maintenance basic data so that the prediction model can predict the system fault of the input operation and maintenance basic data.
Preferably, the method further comprises:
updating the health state data of the target system according to the operation and maintenance processing information of the system fault;
and updating the operation and maintenance parameters of the target system according to the health state data.
In a second aspect, the present invention provides a system operation and maintenance device based on PHM technology, where the device includes:
the acquisition unit is used for acquiring operation and maintenance basic data of the target system according to preset operation and maintenance parameters;
the prediction unit is used for inputting the operation and maintenance basic data which are in accordance with the preset type and obtained by the acquisition unit into a corresponding prediction model to obtain alarm information of a target system;
the determining unit is used for determining an operation and maintenance script for processing the alarm information according to the knowledge graph of the alarm information, which is obtained by the predicting unit, and matched with a target system;
and the operation and maintenance unit is used for calling the operation and maintenance script determined by the determination unit to process the system fault corresponding to the alarm information and generate operation and maintenance processing information of the system fault.
Preferably, the determination unit includes:
the extraction module is used for extracting the equipment information of the target system contained in the alarm information;
the acquisition module is used for acquiring the latest knowledge graph of the target system, and nodes of the knowledge graph carry corresponding operation and maintenance script labels;
the matching module is used for matching the equipment information obtained by the extraction module in the knowledge graph obtained by the acquisition module according to a preset rule and determining an operation and maintenance script label matched with the alarm information;
and the determining module is used for determining the operation and maintenance script for processing the alarm information according to the operation and maintenance script tag determined by the matching module.
Preferably, the determining module is specifically configured to:
when the operation and maintenance script tag is not matched, generating visual alarm information in a preset format in a system operation and maintenance interface;
and when a plurality of operation and maintenance script labels are matched, determining the operation and maintenance script for processing the alarm information according to the priority of the operation and maintenance script corresponding to the operation and maintenance script labels.
Preferably, the operation and maintenance unit comprises:
the judging module is used for monitoring the operation and maintenance basic data corresponding to the system fault and judging whether the system fault is removed after the operation and maintenance script is operated;
the generation module is used for generating an operation and maintenance processing log of the system fault if the judgment module determines that the system fault is relieved; and otherwise, generating visual fault alarm information in a preset format in the system operation and maintenance interface.
Preferably, the preset types of the operation and maintenance basic data include: periodic data, stationary data, irregular data; the device further comprises: the classification unit is used for classifying the operation and maintenance basic data acquired by the acquisition unit according to preset types, and specifically comprises: acquiring periodic operation and maintenance basic data by using a support vector machine algorithm; acquiring stable operation and maintenance basic data by using a clustering algorithm; and acquiring irregular type operation and maintenance basic data by using a convolutional neural network.
Preferably, the apparatus further comprises:
and the training unit is used for training the prediction model by using different preset types of operation and maintenance basic data so that the prediction model can predict system faults of the input operation and maintenance basic data.
Preferably, the apparatus further comprises: the updating unit is used for updating the health state data of the target system according to the operation and maintenance processing information of the system fault; and updating the operation and maintenance parameters of the target system according to the health state data.
In another aspect, the present invention also provides an apparatus comprising at least one processor, and at least one memory, a bus, connected to the processor; the processor and the memory complete mutual communication through a bus; the processor is used for calling the program instructions in the memory to execute the system operation and maintenance method based on the PHM technology of the first aspect.
On the other hand, the present invention further provides a storage medium, where the storage medium is used for storing a computer program, where the computer program controls, when running, a device in which the storage medium is located to execute the system operation and maintenance method based on the PHM technology in the first aspect.
By means of the technical scheme, the system operation and maintenance method and device based on the PHM technology are a data operation and maintenance flow suitable for an IT system and set based on an operation framework of the PHM technology. According to the embodiment of the invention, the specified operation and maintenance basic data are collected and classified according to the preset types, and each type of operation and maintenance basic data is input into the corresponding prediction model to predict whether the risk of system faults exists in the current IT system, and possible system faults are output in the mode of alarm information. And matching the alarm information with the knowledge graph of the target system to determine an operation and maintenance script capable of solving the potential system fault, and implementing optimization of the IT system by executing the operation and maintenance script to prevent the potential system fault from occurring. The method and the system realize the forecasting and processing of the IT system faults and improve the operation and maintenance efficiency of the complex IT system.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
fig. 1 shows a flowchart of a system operation and maintenance method based on the PHM technology according to an embodiment of the present invention;
fig. 2 is a flowchart illustrating another system operation and maintenance method based on PHM technology according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a system operation and maintenance device based on the PHM technology according to an embodiment of the present invention;
fig. 4 shows a schematic structural diagram of another system operation and maintenance device based on the PHM technology according to an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the invention are shown in the drawings, it should be understood that the invention can be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.
The embodiment of the invention provides a system operation and maintenance method based on a PHM (physical layer management) technology, which is characterized in that fault prediction and health management are applied to an IT (information technology) system under the framework of the PHM technology to realize the purpose of efficiently operating and maintaining the IT system with a complex structure, and the specific execution steps are shown in figure 1 and comprise the following steps:
101. and acquiring operation and maintenance basic data of the target system according to preset operation and maintenance parameters.
The operation and maintenance parameters in this embodiment are predefined before the operation and maintenance system is started, but the specific content of the operation and maintenance parameters is associated with the target system to be operated and maintained, that is, the operation and maintenance parameters corresponding to the operation and maintenance parameters may also be changed according to the change of the function and the health status of the target system. That is, the initial value of the operation and maintenance parameter is preset, but during the operation and maintenance process of the target system, the initial value has the capability of changing according to the change of the system operation state.
The collection of the operation and maintenance basic data is mainly realized through collection agent equipment in the target system, the collection agent equipment collects the operation state of the target system in real time, and reports the collected operation and maintenance basic data. The operation and maintenance basic data includes, but is not limited to, resource usage information (such as CPU utilization, memory occupancy, disk read-write rate, network link data rate) of each physical device in the infrastructure layer; resource use information (such as virtual machine memory occupation) of each host (physical machine/virtual machine) in an operating system layer, and particularly shared sensitive resources (such as multi-level cache); various logical resource information (such as threads, queues, database connection pools) in the middleware layer; application related information in the application layer (such as response time, throughput, user access patterns, application component interaction behavior), etc.
The collected operation and maintenance basic data also needs to be processed into an effective form or format which can be processed by subsequent components so as to realize data monitoring, fault prediction, detection and diagnosis of system states. The processing of the data includes, but is not limited to, cleaning (filling omissions and smoothing noise), integrating (multi-node monitoring data merging) and converting (normalizing data) the data, and processing and storing historical data to build a statistical model.
102. And inputting the operation and maintenance basic data conforming to the preset type into the corresponding prediction model to obtain the alarm information of the target system.
The operation and maintenance basic data after being collected and processed are classified according to different data types, the operation and maintenance basic data of the same type are input into a corresponding prediction model, and the prediction model is obtained based on the data training of the type and is used for predicting the health state and the fault of a target system.
The alarm information of the target system obtained through the prediction model in this embodiment is fault alarm information that may occur in a future operation process of the target system according to current operation and maintenance basic data, and the alarm information may perform a hierarchical alarm according to different level indexes such as a probability of occurrence of a fault, a severity of the fault, and the like, so that the operation and maintenance system can preferentially process the serious alarm information according to different levels.
The prediction model in the step can output the health assessment data of the target system while outputting alarm information, and operation and maintenance personnel can know the real-time health state of the system by checking the assessment data.
103. And determining an operation and maintenance script for processing the alarm information according to the content of the alarm information and the knowledge graph of the target system.
The content of the alarm information in this step mainly includes information of the device in the target system that caused the system failure. The target system knowledge graph is used for representing the topological structure among all the devices in the target system, and the nodes in the knowledge graph represent specific system devices. By matching the content of the alarm information with the knowledge graph, the position of the fault equipment in the target system can be quickly located, and the associated information related to the equipment can be acquired. By analyzing the associated information, the reason causing the fault can be analyzed, and according to the reason, the embodiment of the invention can further search a processing mode for solving the fault.
In this embodiment, the system failure is handled by pre-editing operation scripts according to different system failures, and the operation scripts can perform corresponding response operations for different system failures according to specific system scenarios, so as to avoid occurrence and deterioration of the system failures. And the purpose of the step is to match the alarm information with the corresponding operation and maintenance script by using the knowledge graph.
104. And calling the operation and maintenance script to process the system fault corresponding to the alarm information, and generating operation and maintenance processing information of the system fault.
It should be noted that the system fault in this embodiment may be a latent fault that has not occurred, that is, a latent fault predicted by the prediction model according to the current operation and maintenance basic data. Therefore, the operation and maintenance script is not only used for solving a specific fault problem, but also used for optimizing the current operation environment of the target system, so that after the operation and maintenance basic data are changed, the prediction model does not output corresponding alarm information any more, and after the operation and maintenance script is optimized, operation and maintenance processing log information aiming at the potential system fault is generated.
When a system fault occurs in the target system and the operation and maintenance script cannot be operated, the operation and maintenance personnel needs to be prompted to perform processing, and the generated operation and maintenance processing information is alarm information which can be felt by the operation and maintenance personnel, such as visual interface prompt information, sound prompt information and the like.
Based on the implementation manner shown in fig. 1, IT can be seen that the system operation and maintenance method based on the PHM technology provided in the embodiment of the present invention is a data operation and maintenance flow applicable to an IT system, which is set based on an operation framework of the PHM technology. According to the embodiment of the invention, the specified operation and maintenance basic data are collected and classified according to the preset types, and each type of operation and maintenance basic data is input into the corresponding prediction model to predict whether the risk of system faults exists in the current IT system, and possible system faults are output in the mode of alarm information. And matching the alarm information with the knowledge graph of the target system to determine an operation and maintenance script capable of solving the potential system fault, and implementing optimization of the IT system by executing the operation and maintenance script to prevent the potential system fault from occurring. The method and the system realize the forecasting and processing of the IT system faults and improve the operation and maintenance efficiency of the complex IT system.
Further, a preferred embodiment of the present invention is a detailed description of the operation and maintenance basic data classification and the operation and maintenance script matching process in the operation and maintenance overall process based on the foregoing fig. 1, and specific steps thereof are shown in fig. 2, and include:
201. and acquiring operation and maintenance basic data of the target system according to preset operation and maintenance parameters.
202. And classifying the operation and maintenance basic data according to a preset type.
Since most of the operation and maintenance basic data monitored by the IT system is time series data, the preset types of the time series data are classified into periodic data, steady data, and irregular data. For different types of data, the step acquires the periodic operation and maintenance basic data through different algorithms, for example, a Support Vector Machine (SVM) algorithm is used for acquiring the periodic operation and maintenance basic data; obtaining stable operation and maintenance basic data by using a clustering algorithm (DBSCAN); and acquiring irregular type operation and maintenance basic data by using a Convolutional Neural Network (CNN).
And the classified operation and maintenance basic data are processed by the corresponding prediction model to obtain alarm information. Meanwhile, the operation and maintenance basic data can be integrated and stored with historical data according to specific types, and after the data are subjected to sampling processing, the operation and maintenance basic data can be used for training prediction models of the same type so as to improve the prediction accuracy of the prediction models.
203. And inputting the operation and maintenance basic data conforming to the preset type into the corresponding prediction model to obtain the alarm information of the target system.
The specific content of this step is the same as that of step 102, and is not described here again.
204. And extracting the equipment information of the target system contained in the alarm information, and acquiring the latest knowledge graph of the target system.
The step is divided into two parts, one is to extract equipment information from the alarm information, namely to analyze specific equipment with potential fault risk according to the alarm information, wherein the equipment information comprises the identification of the equipment and the related information of specific parts with fault risk. And the other is to acquire a knowledge graph of the target system, and the knowledge graph of the target system also has real-time updating capability because the topology of the target system is likely to change during the operation process. In the step, when the knowledge graph is acquired, the version of the knowledge graph needs to be checked to ensure that the acquired knowledge graph is the latest version, so that the accurate equipment topology structure in the current target system is determined.
It should be noted that, in the knowledge graph in this embodiment, a label of a preset operation and maintenance script is also carried, and the label is marked in a corresponding node of the knowledge graph according to a fault problem that can be handled by the operation and maintenance script.
205. And matching the equipment information in the knowledge graph according to a preset rule, and determining an operation and maintenance script label matched with the alarm information.
The preset rule in this step is used to determine the matching range, and generally, when the device information is matched with the nodes in the knowledge graph, due to the correlation between different devices in the target system, the potential failure of one alarm message may have only one device information displayed in the alarm message, but actually causes the failure to occur, and may be related to other devices associated with the device. That is to say, in a scene with some features, the device information in the alarm information may match a plurality of associated devices, which requires a specific scene to be defined by a preset rule and an extended range to be needed when matching in different scenes, instead of only corresponding to the specified device in the knowledge graph.
In addition, the operation and maintenance script tags carried by the knowledge graph in this embodiment are marked in the nodes, but the operation and maintenance script tags are not marked in each node, and only one operation and maintenance script tag is not marked in each node. Therefore, for the matching result of this step, three situations may occur, namely, no operation and maintenance script tag is matched, a unique operation and maintenance script tag is matched, and a plurality of operation and maintenance script tags are matched.
206. And determining an operation and maintenance script for processing alarm information according to the operation and maintenance script tag.
According to the three results matched in the previous step, the operation and maintenance script determined in the step is also in the following three conditions:
when the unique operation and maintenance script tag is matched, the operation and maintenance script corresponding to the unique operation and maintenance script tag can be directly determined to be the operation and maintenance script for processing alarm information.
When the operation and maintenance script tag is not matched, it is indicated that the potential system fault in the alarm information cannot be processed through the existing operation and maintenance script, at this time, the problem needs to be reflected to the operation and maintenance personnel and processed by the operation and maintenance personnel.
When a plurality of operation and maintenance script tags are matched, the operation and maintenance scripts may be called one by one, or an optimal script among the plurality of operation and maintenance scripts may be selected to run.
207. And calling the operation and maintenance script to process the system fault corresponding to the alarm information, and generating operation and maintenance processing information of the system fault.
The operation and maintenance processing information of the system fault in this step is generated according to the real-time state of the target system after the operation and maintenance script is executed. Therefore, after the operation and maintenance script is called and executed, the operation and maintenance basic data corresponding to the system fault also needs to be monitored, whether the system fault is removed after the operation and maintenance script is operated is judged, and if the system fault is removed, an operation and maintenance processing log aiming at the system fault is generated; and if not, generating visual fault alarm information in a preset format in the system operation and maintenance interface so as to prompt operation and maintenance personnel to process the alarm information.
208. And updating the operation and maintenance parameters of the target system according to the operation and maintenance processing information of the system fault.
After the operation and maintenance processing of the system fault is completed, the current operation and maintenance basic data of the target system is obtained according to the operation and maintenance processing information, the health state of the target system is judged according to the current operation and maintenance basic data, the operation and maintenance parameters of the target system are further updated according to the health state data, the operation and maintenance basic data to be acquired are updated according to the operation and maintenance basic data, and the efficiency of acquiring the target system data is optimized.
Further, as an implementation of the method embodiment shown in fig. 1-2, an embodiment of the present invention provides a system operation and maintenance device based on the PHM technology, where the device is used to implement operation and maintenance management on an IT system by using the PHM technology, and improve operation and maintenance management efficiency on a complex system. The embodiment of the apparatus corresponds to the foregoing method embodiment, and details in the foregoing method embodiment are not repeated in this embodiment for convenience of reading, but it should be clear that the apparatus in this embodiment can correspondingly implement all the contents in the foregoing method embodiment. As shown in fig. 3 in detail, the apparatus includes:
the acquisition unit 31 is used for acquiring operation and maintenance basic data of the target system according to preset operation and maintenance parameters;
the prediction unit 32 is configured to input the operation and maintenance basic data obtained by the acquisition unit 31, which conform to a preset type, into a corresponding prediction model to obtain alarm information of a target system;
the determining unit 33 is configured to determine an operation and maintenance script for processing the alarm information according to the knowledge graph of the target system matched with the content of the alarm information obtained by the predicting unit 32;
and the operation and maintenance unit 34 is configured to invoke the operation and maintenance script determined by the determining unit 33 to process the system fault corresponding to the alarm information, and generate operation and maintenance processing information of the system fault.
Further, as shown in fig. 4, the determining unit 33 includes:
an extracting module 331, configured to extract device information of the target system included in the alarm information;
an obtaining module 332, configured to obtain a latest knowledge graph of the target system, where nodes of the knowledge graph carry corresponding operation and maintenance script tags;
a matching module 333, configured to match, according to a preset rule, the device information obtained by the extracting module 331 in the knowledge graph obtained by the obtaining module 332, and determine an operation and maintenance script tag matched with the alarm information;
the determining module 334 is configured to determine, according to the operation and maintenance script tag determined by the matching module 333, an operation and maintenance script for processing the alarm information.
Further, the determining module 334 is specifically configured to:
when the operation and maintenance script tag is not matched, generating visual alarm information in a preset format in a system operation and maintenance interface;
and when a plurality of operation and maintenance script labels are matched, determining the operation and maintenance script for processing the alarm information according to the priority of the operation and maintenance script corresponding to the operation and maintenance script labels.
Further, as shown in fig. 4, the operation and maintenance unit 34 includes:
a determining module 341, configured to monitor operation and maintenance basic data corresponding to the system fault, and determine whether the system fault is resolved after the operation and maintenance script is run;
a generating module 342, configured to generate an operation and maintenance processing log of the system fault if the determining module 341 determines to remove the system fault; and otherwise, generating visual fault alarm information in a preset format in the system operation and maintenance interface.
Further, as shown in fig. 4, the preset types of the operation and maintenance basic data include: periodic data, stationary data, irregular data; the device further comprises: the classification unit 35 is configured to classify the operation and maintenance basic data acquired by the acquisition unit 31 according to preset types, and specifically includes: acquiring periodic operation and maintenance basic data by using a support vector machine algorithm; acquiring stable operation and maintenance basic data by using a clustering algorithm; and acquiring irregular type operation and maintenance basic data by using a convolutional neural network.
Further, as shown in fig. 4, the apparatus further includes:
and the training unit 36 is configured to train the prediction model by using different preset types of operation and maintenance basic data, so that the prediction model can perform system fault prediction on the input operation and maintenance basic data.
Further, as shown in fig. 4, the apparatus further includes: an updating unit 37, configured to update the health status data of the target system according to the operation and maintenance processing information of the system fault generated by the operation and maintenance unit 34; and updating the operation and maintenance parameters of the target system according to the health state data.
Further, an embodiment of the present invention further provides a processor, where the processor is configured to execute a program, where the program executes the system operation and maintenance method based on the PHM technology described in fig. 1-2 when running.
Further, an embodiment of the present invention further provides a storage medium, where the storage medium is used for storing a computer program, where the computer program controls, when running, a device in which the storage medium is located to execute the system operation and maintenance method based on the PHM technology, described in fig. 1-2 above.
Further, an embodiment of the present invention further provides an apparatus, where the apparatus includes at least one processor, and at least one memory and a bus connected to the processor; the processor and the memory complete mutual communication through a bus; the processor is used for calling the program instructions in the memory to execute the system operation and maintenance method based on the PHM technology described in fig. 1-2. The device herein may be a server, a PC, a PAD, a mobile phone, etc.
In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to the related descriptions of other embodiments.
It will be appreciated that the relevant features of the method and apparatus described above are referred to one another. In addition, "first", "second", and the like in the above embodiments are used to distinguish the embodiments, and do not represent merits of the embodiments.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
The algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose systems may also be used with the teachings herein. The required structure for constructing such a system will be apparent from the description above. Moreover, the present invention is not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best mode of the invention.
In addition, the memory may include volatile memory in a computer readable medium, Random Access Memory (RAM) and/or nonvolatile memory such as Read Only Memory (ROM) or flash memory (flash RAM), and the memory includes at least one memory chip.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and so forth) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). The memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The above are merely examples of the present application and are not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims (10)

1. A system operation and maintenance method based on PHM technology comprises the following steps:
acquiring operation and maintenance basic data of a target system according to preset operation and maintenance parameters;
inputting the operation and maintenance basic data which accord with the preset type into a corresponding prediction model to obtain alarm information of a target system;
according to the content of the alarm information, matching a knowledge graph of a target system, and determining an operation and maintenance script for processing the alarm information;
and calling the operation and maintenance script to process the system fault corresponding to the alarm information, and generating operation and maintenance processing information of the system fault.
2. The method of claim 1, wherein determining an operation and maintenance script for processing the alarm information according to the content of the alarm information matching a knowledge graph of a target system comprises:
extracting the equipment information of the target system contained in the alarm information;
acquiring the latest knowledge graph of the target system, wherein nodes of the knowledge graph carry corresponding operation and maintenance script labels;
matching the equipment information in a knowledge graph according to a preset rule, and determining an operation and maintenance script label matched with the alarm information;
and determining an operation and maintenance script for processing the alarm information according to the operation and maintenance script tag.
3. The method of claim 2, wherein determining an operation and maintenance script for processing the alarm information according to the operation and maintenance script tag comprises:
when the operation and maintenance script tag is not matched, generating visual alarm information in a preset format in a system operation and maintenance interface;
and when a plurality of operation and maintenance script labels are matched, determining the operation and maintenance script for processing the alarm information according to the priority of the operation and maintenance script corresponding to the operation and maintenance script labels.
4. The method of claim 1, wherein invoking the operation and maintenance script to process a system fault corresponding to the alarm information and generating operation and maintenance processing information of the system fault comprises:
monitoring operation and maintenance basic data corresponding to the system fault, and judging whether the system fault is removed after the operation and maintenance script is operated;
if the system fault is removed, generating an operation and maintenance processing log of the system fault;
and if not, generating visual fault alarm information in a preset format in the system operation and maintenance interface.
5. The method of claim 1, wherein the predetermined type of the operation and maintenance basic data comprises: periodic data, stationary data, irregular data; the method further comprises the following steps:
acquiring periodic operation and maintenance basic data by using a support vector machine algorithm;
acquiring stable operation and maintenance basic data by using a clustering algorithm;
and acquiring irregular type operation and maintenance basic data by using a convolutional neural network.
6. The method of claim 5, further comprising:
and training the prediction model by using different preset types of operation and maintenance basic data so that the prediction model can predict the system fault of the input operation and maintenance basic data.
7. The method according to any one of claims 1-6, further comprising:
updating the health state data of the target system according to the operation and maintenance processing information of the system fault;
and updating the operation and maintenance parameters of the target system according to the health state data.
8. A PHM technology-based system operation and maintenance device, comprising:
the acquisition unit is used for acquiring operation and maintenance basic data of the target system according to preset operation and maintenance parameters;
the prediction unit is used for inputting the operation and maintenance basic data which are obtained by the acquisition unit and accord with the preset type into a corresponding prediction model to obtain alarm information of a target system;
the determining unit is used for determining an operation and maintenance script for processing the alarm information according to the knowledge graph of the alarm information, which is obtained by the predicting unit, and matched with a target system;
and the operation and maintenance unit is used for calling the operation and maintenance script determined by the determination unit to process the system fault corresponding to the alarm information and generate operation and maintenance processing information of the system fault.
9. An apparatus comprising at least one processor, and at least one memory, bus connected to the processor; the processor and the memory complete mutual communication through the bus; the processor is used for calling the program instructions in the memory to execute the system operation and maintenance method based on PHM technology according to any one of claims 1-7.
10. A storage medium, configured to store a computer program, where the computer program controls, when executed, a device on which the storage medium is located to perform the system operation and maintenance method based on PHM technology according to any one of claims 1 to 7.
CN202011550197.8A 2020-12-24 2020-12-24 PHM technology-based system operation and maintenance method and device Pending CN114676002A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011550197.8A CN114676002A (en) 2020-12-24 2020-12-24 PHM technology-based system operation and maintenance method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011550197.8A CN114676002A (en) 2020-12-24 2020-12-24 PHM technology-based system operation and maintenance method and device

Publications (1)

Publication Number Publication Date
CN114676002A true CN114676002A (en) 2022-06-28

Family

ID=82069746

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011550197.8A Pending CN114676002A (en) 2020-12-24 2020-12-24 PHM technology-based system operation and maintenance method and device

Country Status (1)

Country Link
CN (1) CN114676002A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116774569A (en) * 2023-07-25 2023-09-19 博纯材料股份有限公司 Artificial intelligence-based method and system for updating operation system of oxygen-argon separation equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116774569A (en) * 2023-07-25 2023-09-19 博纯材料股份有限公司 Artificial intelligence-based method and system for updating operation system of oxygen-argon separation equipment
CN116774569B (en) * 2023-07-25 2024-04-05 博纯材料股份有限公司 Artificial intelligence-based method and system for updating operation system of oxygen-argon separation equipment

Similar Documents

Publication Publication Date Title
KR101984730B1 (en) Automatic predicting system for server failure and automatic predicting method for server failure
KR102118670B1 (en) System and method for management of ict infra
CN109791401B (en) Generating fault models for embedded analytics and diagnostic/prognostic reasoning
JP6650468B2 (en) Method and apparatus for operating an automation system
KR102522005B1 (en) Apparatus for VNF Anomaly Detection based on Machine Learning for Virtual Network Management and a method thereof
US9208209B1 (en) Techniques for monitoring transformation techniques using control charts
Girish et al. Anomaly detection in cloud environment using artificial intelligence techniques
JP2018045403A (en) Abnormality detection system and abnormality detection method
US11307916B2 (en) Method and device for determining an estimated time before a technical incident in a computing infrastructure from values of performance indicators
US11409962B2 (en) System and method for automated insight curation and alerting
Huong et al. Federated learning-based explainable anomaly detection for industrial control systems
US11860721B2 (en) Utilizing automatic labelling, prioritizing, and root cause analysis machine learning models and dependency graphs to determine recommendations for software products
JP2018180759A (en) System analysis system and system analysis method
CN113590451B (en) Root cause positioning method, operation and maintenance server and storage medium
US11411835B2 (en) Cognitive model determining alerts generated in a system
CN114580263A (en) Knowledge graph-based information system fault prediction method and related equipment
CN113297044B (en) Operation and maintenance risk early warning method and device
CN111669281A (en) Alarm analysis method, device, equipment and storage medium
Gupta et al. A supervised deep learning framework for proactive anomaly detection in cloud workloads
CN114676002A (en) PHM technology-based system operation and maintenance method and device
Raj et al. Cloud infrastructure fault monitoring and prediction system using LSTM based predictive maintenance
Abro et al. Artificial intelligence enabled effective fault prediction techniques in cloud computing environment for improving resource optimization
JP7215574B2 (en) MONITORING SYSTEM, MONITORING METHOD AND PROGRAM
Vargas et al. A hybrid feature learning approach based on convolutional kernels for ATM fault prediction using event-log data
Martínez et al. A Data-Driven Approach for Components Useful Life Estimation in Wind Turbines

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination