CN115858311A

CN115858311A - Operation and maintenance monitoring method and device, electronic equipment and readable storage medium

Info

Publication number: CN115858311A
Application number: CN202310199516.2A
Authority: CN
Inventors: 闻军; 周峰; 李晓龙
Original assignee: Beijing Shenzhou Everbright Technology Co ltd
Current assignee: Beijing Shenzhou Everbright Technology Co ltd
Priority date: 2023-03-04
Filing date: 2023-03-04
Publication date: 2023-03-28

Abstract

The application relates to an operation and maintenance monitoring method, an operation and maintenance monitoring device, electronic equipment and a readable storage medium, and relates to the technical field of computers. The method comprises the following steps: the method comprises the steps of obtaining current operation data and historical operation data corresponding to target IT equipment, judging whether the current operation data is abnormal or not based on the current operation data and the historical operation data, determining the current abnormal data based on the current operation data and the historical operation data when the current operation data of the target IT equipment is abnormal, obtaining historical abnormal data, historical abnormal types and a first corresponding relation between the historical abnormal data and the historical abnormal types, determining the current abnormal type corresponding to the current abnormal data, obtaining historical solution strategies corresponding to the historical abnormal types respectively, determining the current solution strategy based on the current abnormal type, the historical abnormal type and the historical solution strategies, and sending a solution instruction to the target IT equipment based on the current solution strategy, and can reduce the occurrence of IT equipment faults.

Description

Operation and maintenance monitoring method and device, electronic equipment and readable storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to an operation and maintenance monitoring method and apparatus, an electronic device, and a readable storage medium.

Background

With the development of science and Technology, internet Technology (IT) equipment plays an indispensable role in daily office of enterprises, the operation and maintenance of the IT equipment is a means for ensuring the normal operation of the IT equipment, and in the operation and maintenance process of the IT equipment, the IT equipment fails, for example, the IT equipment is halted, and the failed IT equipment is repaired.

And the service life of the IT equipment is reduced after the IT equipment fails, so that how to reduce the occurrence of the IT equipment failure is more and more important.

Disclosure of Invention

In order to reduce the occurrence of IT equipment faults, the application provides an operation and maintenance monitoring method, an operation and maintenance monitoring device, electronic equipment and a readable storage medium.

The above object of the present invention is achieved by the following technical solutions:

in a first aspect, a method for operation and maintenance monitoring is provided, where the method includes:

acquiring current operation data and historical operation data corresponding to target network technology IT equipment, wherein the historical operation data is not abnormal;

judging whether the current operation data is abnormal or not based on the current operation data and the historical operation data;

if the current operation data is abnormal, determining current abnormal data based on the current operation data and the historical operation data;

acquiring historical abnormal data, historical abnormal types and a first corresponding relation between the historical abnormal data and the historical abnormal types;

determining a current abnormal type corresponding to the current abnormal data based on the current abnormal data, the historical abnormal type and the first corresponding relation;

acquiring historical solution strategies corresponding to the historical exception types respectively;

and determining a current solution strategy based on the current exception type, the historical exception type and the historical solution strategy, and sending a solution instruction to target IT equipment based on the current solution strategy.

By adopting the technical scheme, the current operation data and the historical operation data corresponding to the target IT equipment are obtained, whether the current operation data are abnormal or not is judged based on the current operation data and the historical operation data, if the current operation data are abnormal, namely the abnormal data of the target IT equipment are abnormal, the current abnormal data are determined based on the current operation data and the historical operation data, after the current abnormal data are determined, the current abnormal type corresponding to the current abnormal data is determined, the historical abnormal data, the historical abnormal type and the first corresponding relation between the historical abnormal data and the historical abnormal type are obtained, the current abnormal type corresponding to the current abnormal data is determined based on the current abnormal data, the historical abnormal type and the first corresponding relation, after the current abnormal type is determined, the historical solution strategies corresponding to the historical abnormal types are obtained, the current solution strategies are determined based on the current abnormal type, the historical abnormal type and the historical solution strategies, a solution instruction is sent to the target IT equipment based on the current solution strategy, the solution strategies corresponding solution strategies are matched with the IT equipment with the abnormal data, the current abnormal data are processed in time, and the occurrence of the IT equipment faults is reduced.

In one possible implementation, the current operation data includes current CPU load data and current memory load data, and the historical operation data includes: historical CPU load data and historical memory load data;

the judging whether the current operation data is abnormal or not based on the current operation data and the historical operation data comprises any one of the following items:

comparing the current CPU load data with historical CPU load data, and/or comparing the current memory load data with historical memory load data, and judging whether the current operation data is abnormal;

and inputting the current operation data into an abnormality recognition model, and judging whether the current operation data is abnormal or not, wherein the abnormality recognition model is obtained by training according to historical operation data.

In another possible implementation manner, the current operation data further includes: the current running time length;

the method further comprises the following steps:

determining a matching operation time length based on the current operation time length and a preset operation time length;

acquiring a first matching relation between the matching operation duration and preset CPU load data and a second matching relation between the matching operation duration and preset memory load data;

determining matched CPU load data corresponding to the current CPU load data and matched memory load data corresponding to the current memory load data based on the first matching relation and the second matching relation;

determining a preset use duration corresponding to the matched CPU load data as a first use duration, and determining a preset use duration corresponding to the matched memory load data as a second use duration;

the method comprises the steps of determining a total usable time length based on a first usable time length and a second usable time length, and outputting a restarting instruction based on the total usable time length, wherein the restarting instruction is used for controlling restarting equipment to restart target IT equipment.

In another possible implementation manner, the method further includes:

acquiring current time and use time corresponding to each IT device;

determining load testing time corresponding to each IT device based on the using time;

determining time difference values corresponding to the IT equipment respectively based on the current time and the load testing time;

acquiring the device types respectively corresponding to the IT devices;

and determining the IT equipment to be detected based on the equipment type, the time difference value and a preset weight.

In another possible implementation manner, the determining, based on the current abnormal data, the historical abnormal type, and the first corresponding relationship, a current abnormal type corresponding to the current abnormal data includes:

determining matching historical abnormal data based on the current abnormal data and the historical abnormal data;

and determining the current exception type from historical exception types based on the matching historical exception data and the first corresponding relation.

In another possible implementation manner, the determining a current solution policy based on the current exception type, a historical exception type, and the historical solution policy includes:

acquiring a historical exception grade and a second corresponding relation between the historical exception type and the historical exception grade;

determining a first matching level from the historical exception levels based on the current exception type, the historical exception type and a second corresponding relation;

acquiring a historical abnormal data range, and a third corresponding relation between the historical abnormal data range and the historical abnormal level;

determining a second matching level based on the current abnormal data, the historical abnormal data range, the first matching level and the third corresponding relation;

acquiring a fourth corresponding relation between the historical abnormal level and the historical solution strategy;

determining a current resolution policy based on the second matching level, the historical resolution policy, and the fourth correspondence.

In another possible implementation manner, the determining whether the current operation data is abnormal based on the current operation data and the historical operation data further includes:

acquiring equipment parameters of target IT equipment, wherein the equipment parameters are parameters of each component of the target IT equipment;

establishing a target IT equipment model in a simulation environment based on the equipment parameters;

determining simulation test parameters of the target IT equipment based on the current operation data and the historical operation data;

acquiring a ratio of time progress in a simulation environment to time progress in a real environment, wherein the ratio of the time progress in the simulation environment to the time progress in the real environment is greater than 1;

and performing simulation operation on the target IT equipment based on the simulation test parameters and the progress ratio to obtain the fault time of the target IT equipment.

In a second aspect, an operation and maintenance monitoring apparatus is provided, which includes:

the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring current operation data and historical operation data corresponding to target network technology IT equipment, and the historical operation data is not abnormal;

the first judgment module is used for judging whether the current operation data is abnormal or not based on the current operation data and the historical operation data;

the first determining module is used for determining current abnormal data based on the current operating data and the historical operating data when the current operating data is abnormal;

the second acquisition module is used for acquiring historical abnormal data, historical abnormal types and a first corresponding relation between the historical abnormal data and the historical abnormal types;

a second determining module, configured to determine, based on the current abnormal data, the historical abnormal type, and the first corresponding relationship, a current abnormal type corresponding to the current abnormal data;

the third acquisition module is used for acquiring historical solution strategies corresponding to the historical exception types respectively;

and the third determining module is used for determining a current solution strategy based on the current exception type, the historical exception type and the historical solution strategy and sending a solution instruction to the target IT equipment based on the current solution strategy.

In one possible implementation manner, the current operation data includes current CPU load data and current memory load data, and the historical operation data includes: historical CPU load data and historical memory load data;

the first determining module, when determining whether the current operating data is abnormal based on the current operating data and the historical operating data, is specifically configured to:

comparing the current CPU load data with the historical CPU load data, and/or comparing the current memory load data with the historical memory load data, and judging whether the current operation data is abnormal or not; alternatively, the first and second electrodes may be,

the device further comprises: a fourth determining module, a fourth obtaining module, a fifth determining module, a sixth determining module and an output module, wherein,

the fourth determining module is used for determining the matching operation time length based on the current operation time length and the preset operation time length;

the fourth obtaining module is used for obtaining a first matching relation between the matching operation duration and the preset CPU load data and a second matching relation between the matching operation duration and the preset memory load data;

the fifth determining module is configured to determine, based on the first matching relationship and the second matching relationship, matching CPU load data corresponding to the current CPU load data and matching memory load data corresponding to the current memory load data;

the sixth determining module is configured to determine a preset usage duration corresponding to the matched CPU load data as a first usage duration, and determine a preset usage duration corresponding to the matched memory load data as a second usage duration;

the output module is used for determining the total usable time length based on the first usable time length and the second usable time length and outputting a restarting instruction based on the total usable time length, wherein the restarting instruction is used for controlling restarting equipment to restart target IT equipment.

In another possible implementation manner, the apparatus further includes: a using time obtaining module, a testing time determining module, a time difference value determining module, a type obtaining module and a determining module of the IT equipment to be detected,

the service time acquisition module is used for acquiring the current time and the service time corresponding to each IT device;

the test time determining module is used for determining load test time corresponding to each IT device based on the service time;

the time difference determining module is used for determining the time difference corresponding to each IT device based on the current time and the load testing time;

the type obtaining module is used for obtaining the equipment types corresponding to the IT equipment respectively;

and the determining module of the IT equipment to be detected is used for determining the IT equipment to be detected based on the equipment type, the time difference value and the preset weight.

In another possible implementation manner, when determining the current anomaly type corresponding to the current anomaly data based on the current anomaly data, the historical anomaly type, and the first corresponding relationship, the third determining module is specifically configured to:

In another possible implementation manner, when determining the current resolution policy based on the current exception type, the historical exception type, and the historical resolution policy, the fourth determining module is specifically configured to:

acquiring a historical abnormal grade and a second corresponding relation between the historical abnormal type and the historical abnormal grade;

determining a first matching level from the historical exception levels based on the current exception type, the historical exception type and the second corresponding relation;

and determining the current solution strategy based on the second matching grade, the historical solution strategy and the fourth corresponding relation.

In another possible implementation manner, the apparatus further includes: an equipment parameter obtaining module, an establishing module, a test parameter determining module, a progress ratio obtaining module and a simulation module, wherein,

the equipment parameter acquisition module is used for acquiring equipment parameters of the target IT equipment, wherein the equipment parameters are parameters of each component of the target IT equipment;

the establishing module is used for establishing a target IT equipment model in a simulation environment based on the equipment parameters;

the test parameter determining module is used for determining simulation test parameters of the target IT equipment based on the current operation data and the historical operation data;

the progress ratio acquisition module is used for acquiring a progress ratio of the time progress in the simulation environment and the time progress in the real environment, and the progress ratio is greater than 1;

and the simulation module is used for carrying out simulation operation on the target IT equipment based on the simulation test parameters and the progress ratio to obtain the fault time of the target IT equipment.

In a third aspect, an electronic device is provided, which includes:

one or more processors;

a memory;

one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more applications configured to: and executing the operation corresponding to the operation and maintenance monitoring method shown in any possible implementation manner of the first aspect.

In a fourth aspect, a computer-readable storage medium is provided, where at least one instruction, at least one program, a set of codes, or a set of instructions is stored, and the at least one instruction, the at least one program, the set of codes, or the set of instructions is loaded and executed by a processor to implement the method for operation and maintenance monitoring as shown in any one of the possible implementation manners of the first aspect.

In summary, the present application includes at least one of the following beneficial technical effects:

the application provides an operation and maintenance monitoring method, an operation and maintenance monitoring device, an electronic device and a readable storage medium, compared with the related technology, in the application, whether the current operation data is abnormal or not is judged based on the current operation data and the historical operation data by obtaining the current operation data and the historical operation data corresponding to a target IT device, if the current operation data is abnormal, namely the target IT device is abnormal, the current abnormal data is determined based on the current operation data and the historical operation data, after the current abnormal data is determined, the current abnormal type corresponding to the current abnormal data is determined, the historical abnormal data, the historical abnormal type and the first corresponding relation of the historical abnormal data and the historical abnormal type are obtained, the current abnormal type corresponding to the current abnormal data is determined based on the current abnormal data, the historical abnormal type and the first corresponding relation, the historical solution strategy corresponding to each historical abnormal type is obtained after the current abnormal type is determined, the current solution strategy is determined based on the current abnormal type, the historical abnormal type and the historical solution strategy, and the problem of the current solution strategy is solved by matching the IT device with the abnormal data, and the current solution strategy is timely processed.

Drawings

Fig. 1 is a schematic flowchart of an operation and maintenance monitoring method according to an embodiment of the present application.

Fig. 2 is a schematic structural diagram of an operation and maintenance monitoring apparatus according to an embodiment of the present application.

Fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The present application is described in further detail below with reference to figures 1-3.

The present embodiment is only for explaining the present application, and it is not limited to the present application, and those skilled in the art can make modifications of the present embodiment without inventive contribution as needed after reading the present specification, but all of them are protected by patent law within the scope of the claims of the present application.

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

In addition, the term "and/or" herein is only one kind of association relationship describing an associated object, and means that there may be three kinds of relationships, for example, a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship, unless otherwise specified.

The embodiments of the present application will be described in further detail with reference to the drawings.

The embodiment of the application provides an operation and maintenance monitoring method, which is executed by an electronic device, wherein the electronic device may be a server or a terminal device, the server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud computing services. The terminal device may be a smart phone, a tablet computer, a notebook computer, a desktop computer, and the like, but is not limited thereto, the terminal device and the server may be directly or indirectly connected through wired or wireless communication, and the embodiment of the present application is not limited thereto, where as shown in fig. 1, the method may include:

step S101, current operation data and historical operation data corresponding to target IT equipment are obtained.

The historical operating data is data without exception.

For the embodiment of the application, the electronic device may obtain current operation data corresponding to the target IT device in real time, may also obtain current operation data corresponding to the target IT device at preset time intervals, and may also obtain current operation data corresponding to the target IT device when a trigger instruction of a user is detected, which is not limited in the embodiment of the application.

For the embodiment of the present application, the historical operating data may be data of normal operation of the target IT device in historical time, the historical operating data corresponding to the target IT device may be obtained in a local storage, may also be obtained in other devices, and may also be obtained historical operating data corresponding to the target IT device input by the user, which is not limited in the embodiment of the present application.

In the application embodiment, after the electronic device acquires the current operation data and the historical operation data corresponding to the target IT device, the display device may display the current operation data and the historical operation data corresponding to the target IT device in real time, or display the current operation data and the historical operation data corresponding to the target IT device when a display instruction triggered by a user is detected, so that the user can grasp the operation condition of the target IT device in real time.

And S102, judging whether the current operation data is abnormal or not based on the current operation data and the historical operation data.

For the embodiment of the application, whether the current operation data is abnormal or not is judged through an abnormal standard based on the current operation data and the historical operation data.

And S103, if the current operation data is abnormal, determining the current abnormal data based on the current operation data and the historical operation data.

For the embodiment of the present application, after it is determined that there is an anomaly in the current operating data, it is determined, based on the current operating data and the historical operating data, that there is anomalous data in the current operating data, that is, the current anomalous data, for example, the current data includes: and determining that the data B is current abnormal data based on the current data and historical operating data.

In the embodiment of the application, after the current abnormal data is determined, the display device may display the current abnormal data in real time, or may display the current abnormal data when a display instruction triggered by a user is detected.

Step S104, acquiring historical abnormal data, historical abnormal types and a first corresponding relation between the historical abnormal data and the historical abnormal types.

For the embodiment of the application, the historical exception type is a type corresponding to the historical exception data, and the historical exception type and the first corresponding relationship between the historical exception data and the historical exception type may be obtained from a local storage or from other devices.

For the embodiment of the present application, the historical abnormal data may be operation data of the target IT device when an abnormality occurs in the past month, or may also be operation data of the target IT device when an abnormality occurs in the past year, a specific time range is not limited in the embodiment of the present application, and the historical abnormal data may also be abnormal data corresponding to a device of the same type as the target IT device.

And S105, determining a current abnormal type corresponding to the current abnormal data based on the current abnormal data, the historical abnormal type and the first corresponding relation.

For the embodiment of the application, the historical abnormal data is abnormal data existing when the equipment runs at the past time, the historical abnormal type is an abnormal type determined according to the historical abnormal data, and the current abnormal type corresponding to the current running data is determined according to the current abnormal data and the existing abnormal type.

In the above application embodiment, after the current exception type corresponding to the current exception data is determined, the display device may display the current exception type corresponding to the current exception data in real time, and may also display the current exception type corresponding to the current exception data when a display instruction triggered by a user is detected, which is not limited in this application embodiment.

And S106, acquiring historical solution strategies corresponding to the historical abnormal types respectively.

For the embodiment of the present application, the history solution policy is determined according to the history exception type, the history solution policy is a policy for solving the history exception data, the history solution policy corresponding to each history exception type may be obtained in the local storage, the history solution policy corresponding to each history exception type may be obtained in other devices, and the history solution policy corresponding to each history exception type input by the user may also be obtained, which is not limited in the embodiment of the present application.

And S107, determining a current solution strategy based on the current exception type, the historical exception type and the historical solution strategy, and sending a solution instruction to the target IT equipment based on the current solution strategy.

For the embodiment of the present application, after obtaining the historical solution policies corresponding to the respective historical exception types, based on the current exception type, the historical exception type, and the historical solution policy, the historical solution policy corresponding to the historical exception type that matches the current exception type is determined as the current solution policy corresponding to the current exception type, for example, the historical exception type that matches the current exception type a is the historical exception type a, and the historical solution policy corresponding to the historical exception type is the policy A1. And generating a solving instruction to the target IT equipment based on the current solving strategy so as to process the current abnormal data of the target IT equipment.

The embodiment of the application provides an operation and maintenance monitoring method, compared with the related technology, in the embodiment of the application, whether current operation data are abnormal or not is judged based on the current operation data and historical operation data by obtaining the current operation data and the historical operation data corresponding to target IT equipment, if the current operation data are abnormal, namely the target IT equipment is abnormal, the current abnormal data are determined based on the current operation data and the historical operation data, after the current abnormal data are determined, the current abnormal type corresponding to the current abnormal data is determined, the historical abnormal data, the historical abnormal type and a first corresponding relation between the historical abnormal data and the historical abnormal type are obtained, the current abnormal type corresponding to the current abnormal data is determined based on the current abnormal data, the historical abnormal type and the first corresponding relation, after the current abnormal type is determined, historical solution strategies corresponding to the historical abnormal types are obtained respectively, the current solution strategies are determined based on the current abnormal type, the historical abnormal types and the historical solution strategies corresponding to the IT equipment are timely processed by matching the IT equipment with the abnormal data, and the abnormal data are reduced.

In another possible implementation manner of the embodiment of the present application, the current operation data includes load data of a current Central Processing Unit (CPU) and load data of a current memory, and the historical operation data includes: historical CPU load data and historical memory load data;

judging whether the current operation data is abnormal or not based on the current operation data and the historical operation data, and specifically, the judging may include: comparing the current CPU load data with historical CPU load data, and/or comparing the current memory load data with historical memory load data, and judging whether the current operation data is abnormal; or, inputting the current operation data into the abnormality recognition model, and judging whether the current operation data is abnormal or not. In this embodiment of the application, the current CPU load data is used to represent a busy degree of the target IT device, the current memory load is used to represent a usage amount of a memory, the current CPU load data is too high or the current memory load data is too high, which may cause a failure of the target IT device, the CPU load data and the memory load data of each IT device during normal operation may be different, the average value of the historical CPU load data of the target IT device may be determined as a CPU load threshold of the target IT device during normal operation, and the average value of the historical memory load data of the target IT device may be determined as a memory load threshold of the target IT device during normal operation, the current CPU load data and the CPU load threshold are compared, and the current memory load data and the memory load data are compared, if the current CPU load data is greater than the CPU load threshold and/or the current memory load data is greater than the CPU load threshold, then the current operation data is abnormal, for example, the current operation data is determined as 80% according to the historical CPU load data of the target IT device in the past month, and if the current CPU load data is 85%, then the current operation data is abnormal.

The abnormal recognition model is obtained by training according to historical abnormal data and historical operating data.

According to the embodiment of the application, the CPU load threshold value and the memory load threshold value corresponding to the target IT equipment are accurately determined through historical CPU load data and historical memory load data, and whether the current operation data is abnormal or not is accurately judged.

For the embodiment of the application, historical abnormal data and historical abnormal data can be input into the original abnormal recognition model for training to obtain the abnormal recognition model, current operation data is input into the abnormal recognition model to obtain whether the current operation data is abnormal or not, whether the current operation data is abnormal or not is rapidly recognized according to the abnormal recognition model, and the speed of judging the abnormal condition of the current operation data of the target IT equipment is improved.

In another possible implementation manner of the embodiment of the present application, the current operation data further includes: the current running time length;

the method may further comprise: determining a matching operation time length based on the current operation time length and a preset operation time length; acquiring a first matching relation between the matching operation duration and preset CPU load data and a second matching relation between the matching operation duration and preset memory load data; determining matched CPU load data corresponding to the current CPU load data and matched memory load data corresponding to the current memory load data based on the first matching relation and the second matching relation; determining a preset use duration corresponding to the matched CPU load data as a first use duration, and determining a preset use duration corresponding to the matched memory load data as a second use duration; the total usable time length is determined based on the first usable time length and the second usable time length, and a restart instruction is output based on the total usable time length. In this embodiment of the application, the step of determining the preset duration corresponding to the matched CPU load data as the first available duration may be performed before the step of determining the preset duration corresponding to the matched memory load data as the second available duration, may be performed after the step of determining the preset duration corresponding to the matched memory load data as the second available duration, and may be performed simultaneously with the step of determining the preset duration corresponding to the matched memory load data as the second available duration.

For the embodiment of the application, when the target IT device runs for a long time, the CPU load data and the memory load data are further increased, the matched CPU load data are determined according to the current running time and the current CPU load data, the preset available time corresponding to the matched CPU load data is the first available time, the matched memory load data are determined according to the current time and the current memory load data, the preset available time corresponding to the matched memory load data is the second available time, the average value of the first available time and the second available time can be determined as the total available time, and when the total available time is reached, a restarting instruction is output to restart the target IT device, so that the running load of the target IT device is reduced, and the occurrence of IT device faults is reduced.

The restarting instruction is used for controlling the restarting equipment to restart the target IT equipment.

According to the embodiment of the application, the total usable time of the target IT equipment is determined through the current CPU load data, the current memory load data and the current operation time, and when the total usable time is reached, the target IT equipment is controlled to restart, the operation load of the target IT equipment is reduced, so that the probability of the target IT equipment having faults is reduced, and the maintenance efficiency of the target IT equipment is improved.

In another possible implementation manner of the embodiment of the present application, the method may further include: acquiring current time and use time corresponding to each IT device; determining load test time corresponding to each IT device based on the use time; determining time difference values corresponding to the IT equipment respectively based on the current time and the load testing time; acquiring the equipment types respectively corresponding to the IT equipment; and determining the IT equipment to be detected based on the equipment type, the time difference value and the preset weight. In the embodiment of the present application, the step of comparing the current CPU load data with the historical CPU load data, and/or the step of comparing the current memory load data with the historical memory load data, and determining whether the current operation data is abnormal may be performed before the step of obtaining the device types corresponding to the IT devices, respectively, or may be performed simultaneously with the step of obtaining the device types corresponding to the IT devices, respectively.

With the embodiment of the application, the usage flow of the target IT equipment in a future period of time may suddenly increase, so that the IT equipment fails and is difficult to repair, and the IT equipment needs to be subjected to load testing before the usage flow of the IT equipment increases. The service time corresponding to different IT devices may be different, so as to avoid performing a load test when the IT devices are used, which may cause a larger fault of the IT devices, and determine the test time corresponding to each IT device based on the service time corresponding to each IT device, for example, if the service time of the IT device 1 is 8-00: 2:00.

for the embodiment of the application, after the load test time corresponding to each IT device is determined, the current time is obtained, the time difference between the current time and the load test time is calculated, the device type corresponding to each IT device is obtained, the test weight corresponding to each IT device is determined based on the device type, the time difference and the preset weight, the IT device with the test weight smaller than the weight threshold value is determined as the IT device to be detected, the detection efficiency of the IT device is improved, a user can know the problem of the IT device when the load is too large in advance, the IT device is maintained in advance, the IT device is prevented from being broken down, and the maintenance efficiency of the IT device is further improved.

Another possible implementation manner of the embodiment of the application, determining the current exception type based on the current exception data, the historical exception type, and the first corresponding relationship, may specifically include: determining matching historical abnormal data based on the current abnormal data and the historical abnormal data; and determining the current exception type from the historical exception types based on the matching historical exception data and the first corresponding relation. In the embodiment of the application, the current abnormal data and the historical abnormal data are matched, the historical abnormal data which is the same as or similar to the current abnormal data, namely the matched historical abnormal data, is determined, the historical abnormal type corresponding to the matched historical abnormal data is determined based on the first corresponding relation between the historical abnormal data and the historical abnormal type, and the historical abnormal type is determined as the current abnormal type. The current abnormal type is more accurately determined through the historical abnormal type, and the current abnormal data can be conveniently processed according to the current abnormal type.

Another possible implementation manner of the embodiment of the application, determining the current solution policy based on the current exception type, the historical exception type, and the historical solution policy, may specifically include: acquiring a historical abnormal grade and a second corresponding relation between the historical abnormal type and the historical abnormal grade; determining a first matching level from the historical exception levels based on the current exception type, the historical exception type and the second corresponding relation; acquiring a historical abnormal data range, a third corresponding relation between the historical abnormal data range and the historical abnormal grade; determining a matching historical abnormal grade based on the current abnormal data, the historical abnormal data range, the first matching grade and the third corresponding relation; acquiring a fourth corresponding relation between the historical abnormal level and the historical solution strategy; and determining the current solution strategy based on the second matching grade, the historical solution strategy and the fourth corresponding relation. In this embodiment of the present application, the second corresponding relationship between the historical exception type and the historical exception level may be obtained in a local storage, may also be obtained in other devices, and may also be obtained a second corresponding relationship between the historical exception type and the historical exception level input by the user, which is not limited in this embodiment of the present application.

For the embodiment of the present application, the current exception level is used to characterize an exception degree of current exception data, corresponding exception levels of the same exception type are different, and solution policies may also be different, and a historical exception level corresponding to a historical exception type matched with the current exception type is determined as a first matching level, for example, the current exception type is a type A1, and the historical exception type includes: the type A1 is a type matched with the current abnormal type A1, and the level 1, the level 2 and the level 3 corresponding to the type A1 are first matching levels. And determining a historical abnormal data range matched with the size of the current abnormal data based on the current abnormal data and the historical abnormal data range, determining a historical abnormal grade corresponding to the historical abnormal data range matched with the size of the current abnormal data as a matched historical abnormal grade, and determining a historical solution strategy corresponding to the matched historical abnormal grade as a current solution strategy. For example, if the historical abnormal data range is 80% to 90% and the historical abnormal level is level 1, the historical abnormal data range is 90% to 95% and the abnormal level is level 2, the historical abnormal data range is 95% to 100% and the historical abnormal level is level 3, and the size of the current abnormal data is 85%, the second matching level is level 1. Determining a current resolution policy from the historical resolution policies based on the second matching level and a fourth correspondence of the historical exception level and the historical resolution policy.

According to the embodiment of the application, the current solution strategy of the current abnormal data is accurately obtained by matching the current abnormal type with the historical abnormal type and matching the current abnormal level with the historical abnormal level, and the abnormal data is processed in time to reduce the occurrence of IT equipment faults.

Another possible implementation manner of the embodiment of the application, based on the current operation data and the historical operation data, determining whether the current operation data is abnormal, and then may further include: acquiring equipment parameters of target IT equipment; establishing a target IT equipment model in a simulation environment based on the equipment parameters; determining simulation test parameters of the target IT equipment based on the current operation data and the historical operation data; acquiring a progress ratio of the time progress in the simulation environment to the time progress in the real environment, wherein the progress ratio is greater than 1; and performing simulation operation on the target IT equipment based on the simulation parameters and the progress ratio to obtain the fault duration of the target IT equipment. In this embodiment of the present application, the device parameter of the target IT device may be obtained in a local storage, may also be obtained from other devices, and may also be obtained from the device parameter of the target IT device input by the user, which is not limited in this embodiment of the present application.

The device parameters are parameters of all parts of the target IT device.

For the embodiment of the application, in order to further determine the fault time of the target IT equipment, the target IT equipment is used for simulation, the operation of the target IT equipment is maintained in advance, a target IT equipment model is established in a simulation environment through equipment parameters, the operation rule of the target IT equipment is obtained based on historical operation data, the current operation data is used as the parameter starting value of the target IT equipment, the simulation test parameters of the target IT equipment are determined according to the operation rule of the target IT equipment and the current operation data, the simulation operation is performed on the target IT equipment model based on the simulation test parameters and the progress ratio, the fault time of the target IT equipment model is obtained, the actual fault time of the target IT equipment is determined based on the fault time and the time progress of the target IT equipment model, so that a user can know the fault time of the target IT equipment in advance, the advance maintenance of the IT equipment is performed, and the occurrence of the IT equipment is reduced.

In another possible implementation manner of the embodiment of the present application, the method may further include: acquiring the items to be detected corresponding to the IT equipment respectively; determining the repetition value of each item to be detected, and determining the detection sequence corresponding to each item to be detected based on the repetition value. In this embodiment of the application, the step of obtaining the items to be detected corresponding to each IT device may be performed before the step of obtaining the current operation data and the historical operation data corresponding to the target IT device, may be performed after the step of obtaining the current operation data and the historical operation data corresponding to the target IT device, and may be performed simultaneously with the step of obtaining the current operation data and the historical operation data corresponding to the target IT device.

For the embodiment of the application, in order to ensure the maintenance efficiency of the IT equipment, the IT equipment may be tested, for example, a reliability test. An instruction needs to be sent to the IT device to test the IT device, because the types of the IT device are different, and the items to be tested are different, the repetition rate of each item to be tested is determined based on the items to be tested corresponding to each IT device, for example, the test items of the IT device 1 include: the item A1, the item A2 and the item C2, and the test items of the IT device 2 comprise: the items A2, B1, C1, and C2, and the test items of the IT device 3 include: in the items B1, B2, and C2, the repetition value of the item A1 is 1, the repetition value of the item A2 is 2, the repetition value of the item B1 is 2, the repetition value of the item C1 is 1, and the repetition value of the item C2 is 3. And determining a detection sequence of the detection items based on the repetition value of each item to be detected, sorting the items to be detected according to the size of the repetition value, and determining the item to be detected with the highest repetition rate as the sequence of the initial detection, for example, when the repetition value of the item A1 is 1, the repetition value of the item A2 is 2, and the repetition value of the item C2 is 3, the detection sequence is the item C2, the item A2, and the item A1. Through sequencing the detection items, the same detection items are detected simultaneously, so that the detection time of the IT equipment is saved, the detection efficiency of the IT equipment is improved, and the IT equipment is prevented from being broken down.

The foregoing embodiments describe a method for operation and maintenance monitoring from the perspective of a method flow, and the following embodiments describe a device for operation and maintenance monitoring from the perspective of a virtual module or a virtual unit, which are described in detail in the following embodiments.

The embodiment of the present application provides an operation and maintenance monitoring apparatus, as shown in fig. 2, the operation and maintenance monitoring apparatus 20 may specifically include: a first obtaining module 21, a first judging module 22, a first determining module 23, a second obtaining module 24, a second determining module 25, a third obtaining module 26 and a third determining module 27, wherein,

the first obtaining module 21 is configured to obtain current operation data and historical operation data corresponding to the target network technology IT device, where the historical operation data is not abnormal;

a first judging module 22, configured to judge whether the current operation data is abnormal based on the current operation data and the historical operation data;

a first determining module 23, configured to determine, when there is an abnormality in the current operating data, current abnormal data based on the current operating data and historical operating data;

a second obtaining module 24, configured to obtain historical exception data, a historical exception type, and a first corresponding relationship between the historical exception data and the historical exception type;

a second determining module 25, configured to determine, based on the current abnormal data, the historical abnormal type, and the first corresponding relationship, a current abnormal type corresponding to the current abnormal data;

a third obtaining module 26, configured to obtain history resolution policies corresponding to the history exception types, respectively;

and a third determining module 27, configured to determine a current solution policy based on the current exception type and the historical solution policy, and send a solution instruction to the target IT device based on the current solution policy.

In a possible implementation manner of the embodiment of the present application, the current operation data includes current CPU load data and current memory load data, and the historical operation data includes: historical CPU load data and historical memory load data;

the first determining module 22 is specifically configured to, when determining whether there is an abnormality in the current operation data based on the current operation data and the historical operation data:

the apparatus 20 further comprises: a fourth determining module, a fourth obtaining module, a fifth determining module, a sixth determining module and an output module, wherein,

the fourth acquisition module is used for acquiring a first matching relation between the matching operation duration and the preset CPU load data and a second matching relation between the matching operation duration and the preset memory load data;

a fifth determining module, configured to determine, based on the first matching relationship and the second matching relationship, matching CPU load data corresponding to the current CPU load data and matching memory load data corresponding to the current memory load data;

the sixth determining module is used for determining the preset using time length corresponding to the matched CPU load data as the first using time length and determining the preset using time length corresponding to the matched memory load data as the second using time length;

and the output module is used for determining the total usable time length based on the first usable time length and the second usable time length and outputting a restarting instruction based on the total usable time length, wherein the restarting instruction is used for controlling the restarting equipment to restart the target IT equipment.

In another possible implementation manner of the embodiment of the present application, when determining the current solution policy based on the current exception type, the historical exception type, and the historical solution policy, the fourth determining module is specifically configured to:

acquiring a historical abnormal data range, a third corresponding relation between the historical abnormal data range and the historical abnormal grade;

acquiring a fourth corresponding relation between the historical abnormal grade and the historical solution strategy;

In another possible implementation manner of the embodiment of the application, when determining the current exception type corresponding to the current exception data based on the current exception data, the historical exception type, and the first corresponding relationship, the third determining module 27 is specifically configured to:

and determining the current exception type from the historical exception types based on the matching historical exception data and the first corresponding relation.

In another possible implementation manner of the embodiment of the present application, the apparatus 20 further includes: an equipment parameter obtaining module, an establishing module, a test parameter determining module, a progress ratio obtaining module and a simulation module, wherein,

the device parameter acquisition module is used for acquiring device parameters of the target IT device, and the device parameters are parameters of each component of the target IT device;

the system comprises an establishing module, a simulation module and a processing module, wherein the establishing module is used for establishing a target IT equipment model in a simulation environment based on equipment parameters;

In another possible implementation manner of the embodiment of the present application, the apparatus 20 further includes: a using time obtaining module, a testing time determining module, a time difference value determining module, a type obtaining module and a determining module of the IT equipment to be detected,

the test time determining module is used for determining load test time corresponding to each IT device based on the use time;

the type acquisition module is used for acquiring the equipment types respectively corresponding to the IT equipment;

and the to-be-detected IT equipment determining module is used for determining the to-be-detected IT equipment based on the equipment type, the time difference and the preset weight.

The embodiment of the application provides an operation and maintenance monitoring device, compared with the related technology, in the embodiment of the application, whether current operation data are abnormal or not is judged based on the current operation data and historical operation data which correspond to target IT equipment, if the current operation data are abnormal, namely abnormal data occur in the target IT equipment, the current abnormal data are determined based on the current operation data and the historical operation data, after the current abnormal data are determined, the current abnormal type corresponding to the current abnormal data is determined, the historical abnormal data, the historical abnormal type and a first corresponding relation between the historical abnormal data and the historical abnormal type are obtained, the current abnormal type corresponding to the current abnormal data is determined based on the current abnormal data, the historical abnormal type and the first corresponding relation are obtained, the historical solution strategy corresponding to each historical abnormal type is obtained, a current solution is determined based on the current abnormal type, the historical abnormal type and the historical solution strategy, a solution instruction is sent to the target IT equipment based on the current solution, the historical solution strategy is matched with the IT equipment, and the abnormal data are processed in time.

It can be clearly understood by those skilled in the art that, for convenience and simplicity of description, reference may be made to the corresponding process in the foregoing method embodiment for a specific working process of the operation and maintenance monitoring apparatus described above, and details are not repeated herein.

An embodiment of the present application provides an electronic device, as shown in fig. 3, an electronic device 30 shown in fig. 3 includes: a processor 301 and a memory 303. Wherein processor 301 is coupled to memory 303, such as via bus 302. Optionally, the electronic device 30 may also include a transceiver 304. It should be noted that the transceiver 304 is not limited to one in practical applications, and the structure of the electronic device 30 is not limited to the embodiment of the present application.

The Processor 301 may be a CPU (Central Processing Unit), a general-purpose Processor, a DSP (Digital Signal Processor), an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate Array), or other Programmable logic device, transistor logic device, hardware component, or any combination thereof. Which may implement or execute the various illustrative logical blocks, modules, and circuits described in connection with the disclosure herein. The processor 301 may also be a combination of computing functions, e.g., comprising one or more microprocessors, a combination of a DSP and a microprocessor, or the like.

Bus 302 may include a path that transfers information between the above components. The bus 302 may be a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The bus 302 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in fig. 3, but this is not intended to represent only one bus or type of bus.

The Memory 303 may be a ROM (Read Only Memory) or other type of static storage device that can store static information and instructions, a RAM (Random Access Memory) or other type of dynamic storage device that can store information and instructions, an EEPROM (Electrically Erasable Programmable Read Only Memory), a CD-ROM (Compact Disc Read Only Memory) or other optical Disc storage, optical Disc storage (including Compact Disc, laser Disc, optical Disc, digital versatile Disc, blu-ray Disc, etc.), a magnetic Disc storage medium or other magnetic storage device, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to these.

The memory 303 is used for storing application program codes for executing the scheme of the application, and the processor 301 controls the execution. The processor 301 is configured to execute application program code stored in the memory 303 to implement the aspects illustrated in the foregoing method embodiments.

Wherein, the electronic device includes but is not limited to: mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., in-vehicle navigation terminals), and the like, and fixed terminals such as digital TVs, desktop computers, and the like. But also a server, etc. The electronic device shown in fig. 3 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.

The present application provides a computer-readable storage medium, on which a computer program is stored, which, when running on a computer, enables the computer to execute the corresponding content in the foregoing method embodiments. Compared with the prior art, in the embodiment of the application, whether current operation data are abnormal or not is judged by obtaining the current operation data and the historical operation data corresponding to target IT equipment based on the current operation data and the historical operation data, if the current operation data are abnormal, namely the target IT equipment has abnormal data, the current abnormal data are determined based on the current operation data and the historical operation data, after the current abnormal data are determined, the current abnormal type corresponding to the current abnormal data is determined, the historical abnormal data, the historical abnormal type and the first corresponding relation between the historical abnormal data and the historical abnormal type are obtained, the current solving strategy is determined based on the current abnormal data, the historical abnormal type and the first corresponding relation, the current abnormal type corresponding to the current abnormal data is determined, after the current abnormal type is determined, the historical solving strategy corresponding to each historical abnormal type is obtained, the current solving strategy is determined based on the current abnormal type, the historical abnormal type and the historical solving strategy, a solving instruction is sent to the target IT equipment, and the IT equipment with the abnormal data is timely processed by matching the corresponding solving strategy.

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

The foregoing is only a few embodiments of the present application and it should be noted that those skilled in the art can make various improvements and modifications without departing from the principle of the present application, and that these improvements and modifications should also be considered as the protection scope of the present application.

Claims

1. An operation and maintenance monitoring method, comprising:

acquiring current operation data and historical operation data corresponding to target network technology IT equipment, wherein the historical operation data is data without abnormality;

if the current operation data is abnormal, determining the current abnormal data based on the current operation data and the historical operation data;

2. The method of claim 1, wherein the current operating data comprises current CPU load data and current memory load data, and the historical operating data comprises: historical CPU load data and historical memory load data;

the judging whether the current operation data is abnormal or not based on the current operation data and the historical operation data comprises any one of the following steps:

comparing current CPU load data with the historical CPU load data, and/or comparing the current memory load data with the historical memory load data, and judging whether the current operation data is abnormal or not;

3. The method of claim 2, wherein the current operational data further comprises: the current running time length;

the method further comprises the following steps:

and determining the total usable time length based on the first usable time length and the second usable time length, and outputting a restarting instruction based on the total usable time length, wherein the restarting instruction is used for controlling the restarting device to restart the target IT device.

4. The method of claim 2, further comprising:

acquiring current time and use time corresponding to each IT device;

determining load test time corresponding to each IT device based on the service time;

acquiring the device types respectively corresponding to the IT devices;

and determining the IT equipment to be detected based on the equipment type, the time difference value and the preset weight.

5. The method of claim 1, wherein determining a current anomaly type corresponding to current anomaly data based on the current anomaly data, historical anomaly types, and the first correspondence comprises:

6. The method of claim 5, wherein determining a current resolution policy based on the current anomaly type, the historical anomaly type, and the historical resolution policy comprises:

determining a current resolution strategy based on the second matching level, the historical resolution strategy, and the fourth correspondence.

7. The method of claim 1, wherein determining whether the current operating data is abnormal based on the current operating data and the historical operating data further comprises:

acquiring a progress ratio of time progress in a simulation environment to time progress in a real environment, wherein the progress ratio is greater than 1;

8. An operation and maintenance monitoring device, comprising:

the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring current operation data and historical operation data corresponding to target network technology IT equipment, and the historical operation data is data without abnormality;

the first judging module is used for judging whether the current operating data is abnormal or not based on the current operating data and the historical operating data;

9. An electronic device, comprising:

one or more processors;

a memory;

one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more applications configured to: a method of performing operation and maintenance monitoring according to any one of claims 1 to 7.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out a method of operation and maintenance monitoring according to any one of claims 1 to 7.