CN112631818A - Operation and maintenance abnormity repair processing method and device, computer equipment and storage medium - Google Patents

Operation and maintenance abnormity repair processing method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN112631818A
CN112631818A CN202011554497.3A CN202011554497A CN112631818A CN 112631818 A CN112631818 A CN 112631818A CN 202011554497 A CN202011554497 A CN 202011554497A CN 112631818 A CN112631818 A CN 112631818A
Authority
CN
China
Prior art keywords
repair
repairing
suggestion
maintenance
preset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011554497.3A
Other languages
Chinese (zh)
Inventor
孔军龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202011554497.3A priority Critical patent/CN112631818A/en
Publication of CN112631818A publication Critical patent/CN112631818A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0793Remedial or corrective actions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Quality & Reliability (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to the field of monitoring, and particularly discloses a method and a device for repairing and processing operation and maintenance abnormity, computer equipment and a storage medium. The method provided by the invention comprises the following steps: acquiring abnormal alarm information in the operation and maintenance process; extracting a characteristic value corresponding to the abnormal alarm information; matching a repair suggestion which corresponds to the characteristic value and has the highest matching degree from a preset operation and maintenance database; and after abnormal alarm information in the operation and maintenance process is repaired according to the repair suggestion, a repair result is obtained, machine learning is carried out by using the repair result, and the matching degree of the repair suggestion in the preset operation and maintenance database corresponding to the characteristic value is updated. The invention can automatically process the abnormal alarm information and improve the time efficiency of solving the abnormal alarm information.

Description

Operation and maintenance abnormity repair processing method and device, computer equipment and storage medium
Technical Field
The invention relates to the field of monitoring of operation and maintenance of a pedestal, in particular to a method and a device for repairing and processing operation and maintenance abnormity, computer equipment and a storage medium.
Background
At present, open source monitoring products in the market, such as zabbix, argus, open-falcon and the like, are all based on discovery and monitoring of operation and maintenance abnormal information, alarm information corresponding to the operation and maintenance abnormal information is monitored, only the alarm information is projected to monitoring personnel, and then the monitoring personnel perform self-processing.
Disclosure of Invention
Therefore, it is necessary to provide a method, an apparatus, a computer device, and a storage medium for repairing an operation and maintenance abnormality, which are used to automatically process the abnormality warning information and improve the solution timeliness of the abnormality warning information.
A method for repairing and processing operation and maintenance abnormity comprises the following steps:
acquiring abnormal alarm information in the operation and maintenance process;
extracting a characteristic value corresponding to the abnormal alarm information;
matching a repair suggestion which corresponds to the characteristic value and has the highest matching degree from a preset operation and maintenance database;
and after abnormal alarm information in the operation and maintenance process is repaired according to the repair suggestion, a repair result is obtained, machine learning is carried out by using the repair result, and the matching degree of the repair suggestion in the preset operation and maintenance database corresponding to the characteristic value is updated.
An operation and maintenance abnormity repair processing device comprises:
the acquisition module is used for acquiring abnormal alarm information in the operation and maintenance process;
the extraction module is used for extracting a characteristic value corresponding to the abnormal alarm information;
the matching module is used for matching a repair suggestion which corresponds to the characteristic value and has the highest matching degree from a preset operation and maintenance database;
and the updating module is used for repairing abnormal alarm information in the operation and maintenance process according to the repair suggestion, acquiring a repair result, performing machine learning by using the repair result, and updating the matching degree of the repair suggestion in the preset operation and maintenance database corresponding to the characteristic value.
A computer device comprises a memory, a processor and a computer program which is stored in the memory and can run on the processor, wherein the processor realizes the method for repairing the operation and maintenance abnormity when executing the computer program.
A computer-readable storage medium, which stores a computer program, and when the computer program is executed by a processor, the method for repairing and processing the operation and maintenance exception is implemented.
The method, the device, the computer equipment and the storage medium can solve the problem that abnormal alarm information cannot be automatically processed in a current open source monitoring product, repair suggestions pre-stored in an operation and maintenance database are preset through the method, the repair suggestion with the highest matching degree can be determined according to characteristic values extracted by the abnormal alarm information, the abnormal alarm information can be automatically repaired through the repair suggestions, the solving timeliness of the abnormal alarm information can be improved, the matching degree of the repair suggestions and the characteristic values in the preset operation and maintenance database can be properly adjusted according to the repair results, the sequencing can be adjusted through the new matching degree, and further the repair rate of the subsequent repair suggestions and the solving timeliness of the abnormal alarm information are improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive labor.
FIG. 1 is a schematic diagram of an application environment of a method for repairing an operation and maintenance exception according to an embodiment of the present invention;
FIG. 2 is a flowchart of a method for repairing an operation/maintenance exception according to an embodiment of the present invention;
FIG. 3 is a schematic structural diagram of a device for repairing an operation/maintenance exception according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a computer device according to an embodiment of the invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The method for repairing the operation and maintenance abnormity provided by the invention can be applied to the application environment shown in fig. 1, wherein the client communicates with the server through a network. The client may include, but is not limited to, various personal computers, laptops, smartphones, tablets, and portable wearable devices, among others. The server may be implemented as a stand-alone server or as a server cluster consisting of a plurality of servers.
In an embodiment, as shown in fig. 2, a method for repairing an operation and maintenance exception is provided, which is described by taking the application of the method to the server in fig. 1 as an example, and includes the following steps:
s10, obtaining abnormal alarm information in the operation and maintenance process;
understandably, the operation and maintenance means that after the new version is online, the system where the new version is located can be managed and the like; the specific obtaining mode is to use the open source monitoring product in the market to monitor, and after the abnormal alarm information is monitored, the abnormal alarm information sent by the open source monitoring product (such as zabbix, argus, open-falcon, and the like) can be obtained. For example, a NetWorkTimeOut occurs in the application log after the version is online; the abnormal alarm information includes but is not limited to network timeout, host performance change, program bug and the like.
S20, extracting a characteristic value corresponding to the abnormal alarm information;
understandably, feature engineering can be specifically used for extracting a feature value corresponding to the alarm information, and the feature engineering can include principal component analysis, dictionary feature extraction, nonlinear dimension reduction, popular learning and the like, so as to extract words (words can be represented in a numerical form) capable of representing the features of the abnormal alarm information from the abnormal alarm information. The embodiment can extract effective characteristic information from a large amount of abnormal alarm information, reduce the data volume, reduce the influence of noise data in the abnormal alarm information on the data processing process, and improve the performance of the data processing system.
S30, matching a repair suggestion which corresponds to the characteristic value and has the highest matching degree from a preset operation and maintenance database;
understandably, the preset operation and maintenance database is a set of knowledge base for monitoring the operation and maintenance industry, which is established according to the actual operation and maintenance operation experience of the existing management of mass data and the repair suggestion provided by the operation and maintenance experts in the operation and maintenance field; one repair suggestion may correspond to at least one feature tag and one feature tag may correspond to one feature value, and thus one repair suggestion may correspond to at least one feature value; the repair suggestions are sorted according to the matching degree in a preset operation and maintenance database, so that each repair suggestion has a matching degree and a sequence number, for example, abnormal alarm information of NetWorkTimeout appears in an application log after a version is online, the extracted characteristic values are network overtime and newly added configuration, the characteristic labels in the preset operation and maintenance database are network overtime and newly added configuration labels, and the repair suggestions matched in the preset operation and maintenance database are arranged according to the matching degree, for example, (1) the repair suggestion with 80% of matching degree containing 2 characteristic labels is to repair a network wall (specifically, an automatic wall opening and lifting service is provided according to the newly added configuration); (2) a repair recommendation containing a 10% match of 2 feature tags is to repair configuration errors (specifically, to correct the configuration); (3) the repair suggestion of 70% matching degree containing 1 feature tag is to repair the network timeout abnormality (specifically, feed back the network timeout abnormality to the network side); in the embodiment, the repair suggestion with the highest matching degree can be determined from the preset operation and maintenance database, and the current abnormal alarm information can be rapidly solved through the repair suggestion.
S40, repairing abnormal alarm information in the operation and maintenance process according to the repair suggestion, obtaining a repair result, performing machine learning by using the repair result, and updating the matching degree of the repair suggestion in the preset operation and maintenance database corresponding to the characteristic value.
Understandably, when the abnormal alarm information is network timeout, the repair suggestion with the first sequence can be used for repairing the network by opening the wall, and when the abnormal alarm information is the change of the host performance, the repair suggestion with the first sequence can be used for version rollback; the repairing result is repairing success and repairing failure, and specifically after the repairing suggestion is used, whether the system can also acquire abnormal alarm information is analyzed, such as whether the network can still be connected overtime or whether the host performance is recovered; the machine learning is used for learning whether the repair suggestion can really repair the abnormal alarm information to provide data support for the subsequent abnormal alarm information, filling and perfecting the repair function in the preset operation and maintenance database, improving the repair rate of the repair suggestion (mainly ensuring the first repair rate), and maximally ensuring that the repair result of the repair suggestion with the highest matching degree is the repair success.
In the embodiment of steps S10 to S40, the problem that the abnormal alarm information cannot be automatically processed in the current open source monitoring product can be solved, the repair suggestion pre-stored in the operation and maintenance database according to the present invention can determine the repair suggestion with the highest matching degree according to the feature value extracted from the abnormal alarm information, the abnormal alarm information can be automatically repaired according to the repair suggestion, the time effectiveness of solving the abnormal alarm information can be improved, and the present invention can properly adjust the matching degree between the repair suggestion and the feature value in the preset operation and maintenance database according to the repair result, so as to adjust the sequence according to the new matching degree, thereby improving the repair efficiency of the subsequent repair suggestion and improving the time effectiveness of solving the abnormal alarm information.
Further, the matching of the repair suggestion which corresponds to the characteristic value and has the highest matching degree from the preset operation and maintenance database includes:
querying feature tags of the hit number matched with the feature values and repair suggestions corresponding to the feature tags from the preset operation and maintenance database; one said feature value corresponding to one said feature tag; the hit number refers to the number of the feature tags corresponding to the feature values;
acquiring the matching degree of repair suggestions corresponding to the feature tags from the preset operation and maintenance database, and arranging the repair suggestions according to the hit number and the matching degree to obtain an ordering table;
and recording the repair suggestion ranked first in the ranking table as the repair suggestion which corresponds to the characteristic value and has the highest matching degree in the preset operation and maintenance database.
Understandably, the feature tags correspond to feature values, and the feature tags are stored in the preset operation and maintenance database, specifically, as described in step S30, one feature value may correspond to one feature tag, so the matching hit number is used to indicate how many feature tags the extracted feature value can be matched with, that is, whether one feature value can be matched with the repair suggestion, and after the two feature tags correspond to the two feature values, the two feature tags simultaneously correspond to one repair suggestion for repairing the network wall opening; the matching degree is a suitable processing probability of the repair suggestions corresponding to the characteristic value, wherein the suitable processing probability is influenced by a repair result, the most original matching degree of each repair suggestion is evaluated in advance by a preset operation and maintenance database according to the characteristic tags corresponding to the characteristic value, specifically, the most original matching degree is evaluated primarily according to actual operation and maintenance operation experience of management of mass data in the preset operation and maintenance database and the repair suggestions provided by operation and maintenance experts in the operation and maintenance field, and it needs to be stated that when the characteristic value is matched to the characteristic tag, the repair suggestion corresponding to the characteristic tag cannot be stated to be completely suitable for abnormal alarm information corresponding to the characteristic value; the sorting in the sorting table is determined according to the degree of matching (the sorting mentioned herein in this embodiment refers to sorting from high degree of matching to low degree of matching) and the number of hits (the sorting mentioned herein in this embodiment refers to sorting from high degree of matching to low degree of matching); the embodiment mainly determines the repair suggestion which is most matched with the abnormal alarm information through sequencing, so that the solution timeliness of the abnormal alarm information is improved.
Further, after recording the repair suggestion sorted first in the sorted list as the repair suggestion corresponding to the feature value and having the highest matching degree in the preset operation and maintenance database, the method further includes:
monitoring a repair result of repairing the abnormal alarm information by using the repair suggestion ranked first in the ranking table in real time;
when the repairing result is that repairing is successful, the repairing is confirmed to be finished;
when the repair result is repair failure, confirming whether all repair suggestions in the sorting table are completely polled;
when the polling is not finished, repairing the abnormal alarm information by using the next repairing suggestion in the sequencing list, acquiring a repairing result corresponding to the next repairing suggestion, and confirming that the repairing is finished when the repairing result is repairing success;
when polling is completed, the repair is confirmed to be completed.
Understandably, in the process of matching and querying the characteristic values in the preset operation and maintenance data, at least one repair suggestion corresponding to the characteristic values can be queried, and the repair suggestions can be sorted in the sorting table; specifically, the abnormal alarm information may be processed one by one according to the repair suggestions sorted by the sorting table, as mentioned above, (1) repairing the network wall opening; (2) repairing the configuration error; (3) repairing the network overtime abnormity; the embodiment can poll and use the repair suggestions according to the sequence of the sorting table, on one hand, the repair suggestions for solving the abnormal alarm information can be found out at the maximum probability, on the other hand, the repair suggestions sorted before are used for repairing firstly, and the polling times can be reduced to a certain extent.
Further, after confirming that the repair is finished when the polling is finished, the method further includes:
reporting the abnormal alarm information to a preset processing party in a preset reporting mode;
when the preset processing party determines that the abnormal alarm information is a preset target problem according to the repair result of the repair failure, marking a new characteristic label for a characteristic value extracted from the abnormal alarm information in the preset operation and maintenance database; and when the new characteristic label is matched with the characteristic value corresponding to the abnormal alarm information again, directly reporting the abnormal alarm information to a preset processing party in a preset reporting mode.
Understandably, the preset target problem may be a network problem and a problem that a repair suggestion cannot be provided in the preset operation and maintenance database, for example, a new feature tag corresponding to the network problem is cut off for a service provider communication cable, at this time, the new feature tag may be marked on a feature value corresponding to the abnormal alarm information, and the abnormal alarm information is directly contacted with a preset processing party through a mail or a telephone; in this embodiment, the abnormal alarm information is directly uploaded to the preset processor, and after the abnormal alarm information is repaired by the preset processor, the repair suggestion is associated with a new feature tag again and stored in the preset operation and maintenance database.
Further, after confirming the repair is finished when the polling is finished, the method includes:
informing a preset processor to carry out repair processing, and acquiring a target repair suggestion corresponding to the successful repair processing of the preset processor;
re-extracting a target characteristic value corresponding to the abnormal alarm information according to a repair result of the repair failure, updating the matching degree of a target characteristic label and a target repair suggestion corresponding to the target characteristic value in the preset operation and maintenance database through machine learning, and updating the matching degree of the characteristic label and the repair suggestion corresponding to the characteristic value;
understandably, when some abnormal alarm information which can only be manually solved occurs, a target repair suggestion of a preset processing party (expert) is obtained, for example, when the abnormal alarm information is a network problem, the target repair suggestion is to collect network operator contact information, promptly notify and additionally monitor a network, and the like, at this time, a target feature tag (extracted from the network problem) in a preset operation and maintenance database and the matching degree of the target repair suggestion need to be readjusted (the matching degree can be improved by the repair suggestion of a target newly added by the preset processing party, particularly, the applicability of the target repair suggestion after the artificial expert processing is stronger); according to the embodiment, the matching degree of various repair suggestions corresponding to the characteristic values and the target repair suggestion in the preset operation and maintenance database can be adjusted, so that a new repair scheme can be added on one hand, and the solution timeliness of the abnormal alarm information can be improved on the other hand.
When the repair result is that the repair is successful, after the repair is confirmed to be finished, the method further includes:
and performing machine learning in the preset operation and maintenance database by using the repair result which is successfully repaired, and updating the matching degree between the repair suggestion corresponding to the repair result and the feature set where the feature label corresponding to the feature value is located.
Understandably, the feature tags are stored in a feature set, one feature set can correspond to one repair suggestion, a plurality of feature tags simultaneously correspond to one repair suggestion, for example, the feature tags in the feature set are CPU (Central processing Unit) increase, memory utilization rate increase, application unavailability and associated analysis version upgrade, and the repair suggestion is version rollback; when the repair result is that the repair is successful, the matching degree between the repair suggestion in the preset operation and maintenance database and the feature set where the feature tag is located may also be adjusted (when the matching degree of the repair suggestion in the preset operation and maintenance database is not 100%).
Further, the performing machine learning in the preset operation and maintenance database by using the repair result successfully repaired, and updating the matching degree between the repair suggestion corresponding to the repair result and the feature set where the feature label corresponding to the feature value is located includes:
in the preset operation and maintenance database, after target parameters in a preset formula in machine learning are adjusted according to a preset adjusting mode, calculating the feature set repairing success rate of the feature set through the preset formula, and updating the repairing suggestion corresponding to the repairing result of successful repairing and the matching degree between feature labels corresponding to the feature values according to the feature set repairing success rate; the preset formula is that the repair success rate of the feature set is repair suggestion repair success number/the number of times of matching to the feature set; and the target parameter is the number of times of matching to the feature set.
Understandably, the preset adjusting mode is to adjust according to strict matching and fuzzy matching, when the characteristic value is strictly matched with the characteristic set, the frequency of matching the characteristic set is +1, when the characteristic value is in fuzzy matching with the characteristic set, the frequency of matching the characteristic set is + 0; the matching degree can be updated through the feature set repairing success rate, the feature value is matched with the feature value set in the future, the feature set repairing success rate is continuously iterated to serve as a data support for subsequent machine learning, the preset operation and maintenance database is continuously filled, and the repairing rate of the repairing suggestion and the matching accuracy rate of the preset operation and maintenance data are improved.
Further, after the updating the matching degree corresponding to the repair suggestion in the preset operation and maintenance database after the machine learning of the repair result, the method further includes:
acquiring a weight item related to the use of the repair suggestion; the weight items comprise the repair result, the recovery time, the manual intervention rate, a new characteristic value and a target repair suggestion;
and calculating a total weight score of the repair suggestion according to the weight term and a weight calculation formula corresponding to the weight term, and adjusting the matching degree of the repair suggestion and the characteristic value in the preset operation and maintenance database according to the total weight score.
Understandably, a certain weight proportion is given to the above weight terms, after the weight terms are normalized, a total weight score is calculated according to the weight proportion of each weight calculation formula, and the degree of matching is adjusted according to the total weight score so as to adjust the sequencing of the repair suggestions in the preset operation and maintenance database.
In summary, the above-mentioned method for repairing abnormal operation and maintenance is provided, which can solve the problem that the abnormal alarm information cannot be automatically processed in the current open source monitoring product, the repair suggestion pre-stored in the operation and maintenance database is preset by the method, the repair suggestion with the highest matching degree can be determined according to the characteristic value extracted by the abnormal alarm information, the abnormal alarm information can be automatically repaired by the repair suggestion, the time effectiveness of the abnormal alarm information can be improved, the method can properly adjust the matching degree of the repair suggestion and the characteristic value in the preset operation and maintenance database according to the repair result, so as to adjust the sequence according to the new matching degree, further improve the repair rate of the subsequent repair suggestion and improve the time effectiveness of the abnormal alarm information, and the method provides data support for machine learning in the monitoring field through the preset operation and maintenance database, and repeatedly learns, and the preset operation and maintenance database is continuously filled, so that the repair rate of the repair suggestion and the matching accuracy of the preset operation and maintenance data are improved.
It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.
In an embodiment, a device for repairing an operation and maintenance exception is provided, where the device for repairing an operation and maintenance exception corresponds to the method for repairing an operation and maintenance exception in the above embodiment one to one. As shown in fig. 3, the device for repairing an operation and maintenance abnormality includes an obtaining module 11, an extracting module 12, a matching module 13, and an updating module 14. The functional modules are explained in detail as follows:
the acquisition module 11 is used for acquiring abnormal alarm information in the operation and maintenance process;
an extracting module 12, configured to extract a feature value corresponding to the abnormal alarm information;
the matching module 13 is used for matching a repair suggestion which corresponds to the characteristic value and has the highest matching degree from a preset operation and maintenance database;
and the updating module 14 is configured to obtain a repair result after repairing the abnormal alarm information in the operation and maintenance process according to the repair suggestion, perform machine learning by using the repair result, and update the matching degree between the repair suggestion and the feature value in the preset operation and maintenance database.
Further, the matching module comprises:
the query submodule is used for querying the feature tags of the hit number matched with the feature values and the repair suggestions corresponding to the feature tags from the preset operation and maintenance database; one said feature value corresponding to one said feature tag; the hit number refers to the number of the feature tags corresponding to the feature values;
the arrangement submodule is used for acquiring the matching degree of the repair suggestions corresponding to the feature tags from the preset operation and maintenance database, and arranging the repair suggestions according to the hit number and the matching degree to obtain an ordering table;
and the recording submodule is used for recording the repair suggestion which is ranked first in the ranking table as the repair suggestion which corresponds to the characteristic value and has the highest matching degree in the preset operation and maintenance database.
Further, the device for repairing the operation and maintenance exception further comprises:
the monitoring module is used for monitoring the repairing result of repairing the abnormal alarm information by using the first repairing suggestion sequenced in the sequencing list in real time;
the first confirmation module is used for confirming that the repair is finished when the repair result is that the repair is successful;
the second confirming module is used for confirming whether all the repair suggestions in the sorting table are completely polled or not when the repair result is a repair failure;
a third confirming module, configured to, when the polling is not completed, repair the abnormal alarm information by using a next repair suggestion in the sorting table, and obtain a repair result corresponding to the next repair suggestion, and when the repair result is a repair success, confirm that the repair is completed;
and the fourth confirming module is used for confirming the end of the repair when the polling is finished.
Further, the device for repairing the operation and maintenance exception further comprises:
the reporting module is used for reporting the abnormal alarm information to a preset processing party in a preset reporting mode;
a marking module, configured to mark, in the preset operation and maintenance database, a new feature tag for a feature value extracted from the abnormal alarm information when the preset processor determines that the abnormal alarm information is a preset target problem according to the repair result of the repair failure; and when the new characteristic label is matched with the characteristic value corresponding to the abnormal alarm information again, directly reporting the abnormal alarm information to a preset processing party in a preset reporting mode.
Further, the device for repairing the operation and maintenance exception further comprises:
the target repair suggestion acquisition module is used for notifying a preset processor to carry out repair processing and acquiring a target repair suggestion corresponding to the preset processor when the repair processing is successful;
the first matching degree updating module is used for re-extracting a target characteristic value corresponding to the abnormal alarm information according to a repairing result of repairing failure, updating the matching degree of a target characteristic label and a target repairing suggestion corresponding to the target characteristic value in the preset operation and maintenance database through machine learning, and updating the matching degree of the characteristic label and the repairing suggestion corresponding to the characteristic value;
and the second matching degree updating module is used for performing machine learning in the preset operation and maintenance database by using the repairing result which is successfully repaired, and updating the matching degree between the repairing suggestion corresponding to the repairing result and the feature set where the feature label corresponding to the feature value is located.
Further, the second matching degree updating module includes:
the updating submodule is used for calculating the feature set repairing success rate of the feature set through a preset formula after target parameters in a preset formula in machine learning are adjusted in the preset operation and maintenance database according to a preset adjusting mode, and updating the matching degree between the repairing suggestion corresponding to the repairing result of successful repairing and the feature label corresponding to the feature value according to the feature set repairing success rate; the preset formula is that the repair success rate of the feature set is repair suggestion repair success number/the number of times of matching to the feature set; and the target parameter is the number of times of matching to the feature set.
Further, the device for repairing the operation and maintenance exception further comprises:
a weight item acquisition module for acquiring a weight item related to the use of the repair suggestion; the weight items comprise the repair result, the recovery time, the manual intervention rate, a new characteristic value and a target repair suggestion;
and the adjusting module is used for calculating a total weight score of the repair suggestion according to the weight term and a weight calculation formula corresponding to the weight term, and adjusting the matching degree of the repair suggestion and the characteristic value in the preset operation and maintenance database according to the total weight score.
For specific limitations of the operation and maintenance abnormality repair processing device, reference may be made to the above limitations of the operation and maintenance abnormality repair processing method, and details are not described here. All or part of each module in the operation and maintenance abnormity repair processing device can be realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 4. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer equipment is used for storing the data involved in the method for repairing and processing the operation and maintenance abnormity. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to realize a method for repairing and processing the operation and maintenance abnormity.
In one embodiment, a computer device is provided, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the steps of the method for repairing an operation and maintenance exception in the foregoing embodiments are implemented, for example, steps S10 to S40 shown in fig. 2. Alternatively, the processor, when executing the computer program, implements the functions of the modules/units of the operation and maintenance abnormality repair processing apparatus in the above-described embodiment, for example, the functions of the modules 11 to 14 shown in fig. 3. To avoid repetition, further description is omitted here.
In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored, and the computer program is executed by a processor to implement the steps of the operation and maintenance abnormality repairing method in the foregoing embodiments, such as steps S10 to S40 shown in fig. 2. Alternatively, the computer program, when executed by the processor, implements the functions of the modules/units of the operation and maintenance abnormality repair processing apparatus in the above-described embodiment, for example, the functions of the modules 11 to 14 shown in fig. 3. To avoid repetition, further description is omitted here.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, databases, or other media used in embodiments provided herein may include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present invention, and are intended to be included within the scope of the present invention.

Claims (10)

1. A method for repairing and processing operation and maintenance abnormity is characterized by comprising the following steps:
acquiring abnormal alarm information in the operation and maintenance process;
extracting a characteristic value corresponding to the abnormal alarm information;
matching a repair suggestion which corresponds to the characteristic value and has the highest matching degree from a preset operation and maintenance database;
and after abnormal alarm information in the operation and maintenance process is repaired according to the repair suggestion, a repair result is obtained, machine learning is carried out by using the repair result, and the matching degree of the repair suggestion in the preset operation and maintenance database corresponding to the characteristic value is updated.
2. The method for repairing the operation and maintenance abnormality according to claim 1, wherein the matching of the repair suggestion corresponding to the characteristic value and having the highest matching degree from the preset operation and maintenance database comprises:
querying feature tags of the hit number matched with the feature values and repair suggestions corresponding to the feature tags from the preset operation and maintenance database; one said feature value corresponding to one said feature tag; the hit number refers to the number of the feature tags corresponding to the feature values;
acquiring the matching degree of repair suggestions corresponding to the feature tags from the preset operation and maintenance database, and arranging the repair suggestions according to the hit number and the matching degree to obtain an ordering table;
and recording the repair suggestion ranked first in the ranking table as the repair suggestion which corresponds to the characteristic value and has the highest matching degree in the preset operation and maintenance database.
3. The method according to claim 2, wherein after recording the repair suggestion ranked first in the ranking table as the repair suggestion corresponding to the feature value and having the highest matching degree in the preset operation and maintenance database, the method further comprises:
monitoring a repair result of repairing the abnormal alarm information by using the repair suggestion ranked first in the ranking table in real time;
when the repairing result is that repairing is successful, the repairing is confirmed to be finished;
when the repair result is repair failure, confirming whether all repair suggestions in the sorting table are completely polled;
when the polling is not finished, repairing the abnormal alarm information by using the next repairing suggestion in the sequencing list, acquiring a repairing result corresponding to the next repairing suggestion, and confirming that the repairing is finished when the repairing result is repairing success;
when polling is completed, the repair is confirmed to be completed.
4. The method for repairing an operation and maintenance exception according to claim 3, wherein after confirming that the repair is completed when the polling is completed, the method further comprises:
reporting the abnormal alarm information to a preset processing party in a preset reporting mode;
when the preset processing party determines that the abnormal alarm information is a preset target problem according to the repair result of the repair failure, marking a new characteristic label for a characteristic value extracted from the abnormal alarm information in the preset operation and maintenance database; and when the new characteristic label is matched with the characteristic value corresponding to the abnormal alarm information again, directly reporting the abnormal alarm information to a preset processing party in a preset reporting mode.
5. The method for repairing an operation and maintenance exception according to claim 3, wherein after confirming that the repair is completed when the polling is completed, the method further comprises:
informing a preset processor to carry out repair processing, and acquiring a target repair suggestion corresponding to the successful repair processing of the preset processor;
re-extracting a target characteristic value corresponding to the abnormal alarm information according to a repair result of the repair failure, updating the matching degree of a target characteristic label and a target repair suggestion corresponding to the target characteristic value in the preset operation and maintenance database through machine learning, and updating the matching degree of the characteristic label and the repair suggestion corresponding to the characteristic value;
when the repair result is that the repair is successful, after the repair is confirmed to be finished, the method further includes:
and performing machine learning in the preset operation and maintenance database by using the repair result which is successfully repaired, and updating the matching degree between the repair suggestion corresponding to the repair result and the feature set where the feature label corresponding to the feature value is located.
6. The method according to claim 5, wherein the performing machine learning in the preset operation and maintenance database by using the repair result that is successfully repaired, and updating the matching degree between the repair suggestion corresponding to the repair result and the feature set where the feature label corresponding to the feature value is located, comprises:
in the preset operation and maintenance database, after target parameters in a preset formula in machine learning are adjusted according to a preset adjusting mode, calculating the feature set repairing success rate of the feature set through the preset formula, and updating the repairing suggestion corresponding to the repairing result of successful repairing and the matching degree between feature labels corresponding to the feature values according to the feature set repairing success rate; the preset formula is that the repair success rate of the feature set is repair suggestion repair success number/the number of times of matching to the feature set; and the target parameter is the number of times of matching to the feature set.
7. The method for repairing the operation and maintenance abnormality according to claim 1, wherein after the updating the matching degree corresponding to the repair suggestion in the preset operation and maintenance database after the machine learning of the repair result, the method further comprises:
acquiring a weight item related to the use of the repair suggestion; the weight items comprise the repair result, the recovery time, the manual intervention rate, a new characteristic value and a target repair suggestion;
and calculating a total weight score of the repair suggestion according to the weight term and a weight calculation formula corresponding to the weight term, and adjusting the matching degree of the repair suggestion and the characteristic value in the preset operation and maintenance database according to the total weight score.
8. An operation and maintenance abnormality repair processing device, comprising:
the acquisition module is used for acquiring abnormal alarm information in the operation and maintenance process;
the extraction module is used for extracting a characteristic value corresponding to the abnormal alarm information;
the matching module is used for matching a repair suggestion which corresponds to the characteristic value and has the highest matching degree from a preset operation and maintenance database;
and the updating module is used for repairing abnormal alarm information in the operation and maintenance process according to the repair suggestion, acquiring a repair result, performing machine learning by using the repair result, and updating the matching degree of the repair suggestion in the preset operation and maintenance database corresponding to the characteristic value.
9. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the method for repairing an operation and maintenance exception according to any one of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements a method for repairing an operation and maintenance exception according to any one of claims 1 to 7.
CN202011554497.3A 2020-12-24 2020-12-24 Operation and maintenance abnormity repair processing method and device, computer equipment and storage medium Pending CN112631818A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011554497.3A CN112631818A (en) 2020-12-24 2020-12-24 Operation and maintenance abnormity repair processing method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011554497.3A CN112631818A (en) 2020-12-24 2020-12-24 Operation and maintenance abnormity repair processing method and device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN112631818A true CN112631818A (en) 2021-04-09

Family

ID=75324568

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011554497.3A Pending CN112631818A (en) 2020-12-24 2020-12-24 Operation and maintenance abnormity repair processing method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112631818A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113254269A (en) * 2021-05-27 2021-08-13 山东英信计算机技术有限公司 Method, system, equipment and medium for repairing abnormal event of storage system
CN113704018A (en) * 2021-08-30 2021-11-26 平安普惠企业管理有限公司 Application operation and maintenance data processing method and device, computer equipment and storage medium
CN114968761A (en) * 2022-04-11 2022-08-30 杭州德适生物科技有限公司 Software operating environment safety supervision system based on internet
CN115150249A (en) * 2022-06-29 2022-10-04 济南浪潮数据技术有限公司 Storage system warning method, device, equipment and storage medium
WO2023061227A1 (en) * 2021-10-12 2023-04-20 华为技术有限公司 Database operation and maintenance method and apparatus

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113254269A (en) * 2021-05-27 2021-08-13 山东英信计算机技术有限公司 Method, system, equipment and medium for repairing abnormal event of storage system
CN113704018A (en) * 2021-08-30 2021-11-26 平安普惠企业管理有限公司 Application operation and maintenance data processing method and device, computer equipment and storage medium
WO2023061227A1 (en) * 2021-10-12 2023-04-20 华为技术有限公司 Database operation and maintenance method and apparatus
CN114968761A (en) * 2022-04-11 2022-08-30 杭州德适生物科技有限公司 Software operating environment safety supervision system based on internet
CN115150249A (en) * 2022-06-29 2022-10-04 济南浪潮数据技术有限公司 Storage system warning method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN112631818A (en) Operation and maintenance abnormity repair processing method and device, computer equipment and storage medium
CN107992490A (en) A kind of data processing method and data processing equipment
CN113626241B (en) Abnormality processing method, device, equipment and storage medium for application program
CN115808911B (en) Industrial Internet of things regulation and control method and system for defective products generated in production line
CN111709794A (en) Abnormal order processing method and device, computer equipment and storage medium
CN113780506A (en) Production management method, system, equipment and storage medium based on active identification
CN110069364B (en) Spare part data error correction method and device, computer equipment and storage medium
CN111832943A (en) Hardware equipment fault management method and device, electronic equipment and storage medium
CN112000582A (en) Server-side automatic test early warning method, device, equipment and storage medium
CN110489142B (en) Evaluation method and device for equipment software upgrading, storage medium and terminal
CN115208914A (en) Industrial data acquisition and analysis system and method
US20200201706A1 (en) Recovery of application from error
CN107908525B (en) Alarm processing method, equipment and readable storage medium
CN115147236A (en) Processing method, processing device and electronic equipment
CN116245339A (en) Production monitoring method and device of server equipment, storage medium and electronic device
CN111752217B (en) Method and device for determining machine tool identification code, electronic equipment and storage medium
CN113342770A (en) Data analysis method and device, storage medium and electronic equipment
CN110716101B (en) Power line fault positioning method and device, computer and storage medium
CN111381932A (en) Method and device for triggering application program change, electronic equipment and storage medium
CN109299224A (en) Solution querying method based on Zabbix, device, computer equipment
CN115687020A (en) Closed-loop management method, system, device and storage medium for device
CN112907221B (en) Self-service method, device and system
CN112101722A (en) Spare part checking method and device for nuclear power station, computer equipment and storage medium
CN117349064A (en) Task abnormal interrupt processing method and device, storage medium and electronic equipment
CN115774628A (en) Batch error reporting processing method, batch error reporting processing device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination