CN111694705A - Monitoring method, device, equipment and computer readable storage medium - Google Patents
Monitoring method, device, equipment and computer readable storage medium Download PDFInfo
- Publication number
- CN111694705A CN111694705A CN201910199173.3A CN201910199173A CN111694705A CN 111694705 A CN111694705 A CN 111694705A CN 201910199173 A CN201910199173 A CN 201910199173A CN 111694705 A CN111694705 A CN 111694705A
- Authority
- CN
- China
- Prior art keywords
- preset
- judging
- operation data
- data
- task
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 81
- 238000012544 monitoring process Methods 0.000 title claims abstract description 49
- 238000012806 monitoring device Methods 0.000 claims abstract description 23
- 238000012423 maintenance Methods 0.000 claims description 47
- 238000012545 processing Methods 0.000 claims description 29
- 238000010586 diagram Methods 0.000 claims description 17
- 230000008569 process Effects 0.000 description 8
- 238000004364 calculation method Methods 0.000 description 6
- 238000011161 development Methods 0.000 description 4
- 230000036541 health Effects 0.000 description 3
- 239000002699 waste material Substances 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000004140 cleaning Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000003672 processing method Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3051—Monitoring arrangements for monitoring the configuration of the computing system or of the computing system component, e.g. monitoring the presence of processing resources, peripherals, I/O links, software programs
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computing Systems (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Debugging And Monitoring (AREA)
Abstract
The invention provides a monitoring method, a monitoring device, monitoring equipment and a computer readable storage medium, wherein the method comprises the following steps: respectively acquiring running data of a first system and a second system for resource sharing; judging the operating data according to a preset judgment rule to determine whether the first system and the second system have faults or not; and taking corresponding measures according to the judgment result so as to enable the first system and the second system to normally operate. Therefore, the running states of the two systems of resource sharing can be monitored in real time, the system problems can be found and solved as soon as possible, and the running safety of the system is improved on the basis of saving the cost.
Description
Technical Field
The present invention relates to the field of data processing, and in particular, to a monitoring method, apparatus, device, and computer-readable storage medium.
Background
With the development of science and technology, electronic commerce gradually enters the lives of users, in order to support the service requirements of numerous users, the existing e-commerce websites generally adopt a plurality of distributed systems to support services, and different distributed systems respectively process different services. However, since different distributed systems process different services, the service processing time of different distributed systems is different. For example, in practical applications, a kubernets system can be adopted to bear the main business of online shopping of users, a Hadoop system is adopted to perform operations such as cleaning, conversion and processing on mass data, and basic data required by systems such as search recommendation, artificial intelligence, unbounded retail, face recognition and the like are generated, but due to the shopping habits of users, the main pressure of the kubernets system is between 9 and 24 points in the day. At 0 to 8 a.m., 80% of the resources of the kubernets system are idle, while the Hadoop system needs to provide 24 hours a day data service. However, with the rapid development and rapid expansion of services, the Hadoop system for big data needs more and more data to be processed, and huge funds are spent each year to expand the calculation and storage capacity of the existing big data, thereby causing resource waste.
In order to solve the above technical problem, a method for transferring the service of the Hadoop system to the kubernets system for processing so as to realize resource sharing is proposed in the prior art.
However, when the method is used for service processing, the service and hardware conditions of the two systems cannot be monitored, so that the current health condition of the systems cannot be monitored in real time.
Disclosure of Invention
The invention provides a monitoring method, a monitoring device, monitoring equipment and a computer readable storage medium, which are used for solving the technical problems that the existing resource sharing method cannot monitor the service and hardware conditions of two systems of resource sharing, and further cannot monitor the current health condition of the systems in real time.
A first aspect of the present invention provides a monitoring method, including:
respectively acquiring running data of a first system and a second system for resource sharing;
judging the operating data according to a preset judgment rule to determine whether the first system and the second system have faults or not;
and taking corresponding measures according to the judgment result so as to enable the first system and the second system to normally operate.
Another aspect of the present invention provides a monitoring apparatus, comprising:
the acquisition module is used for respectively acquiring the running data of a first system and a second system for resource sharing;
the judging module is used for judging the operating data according to a preset judging rule so as to determine whether the first system and the second system have faults or not;
and the processing module is used for taking corresponding measures according to the judgment result so as to ensure that the first system and the second system operate normally.
Yet another aspect of the present invention provides a monitoring apparatus comprising: a memory, a processor;
a memory; a memory for storing the processor-executable instructions;
wherein the processor is configured to perform the monitoring method as described above by the processor.
Yet another aspect of the present invention is to provide a computer-readable storage medium having stored therein computer-executable instructions for implementing the monitoring method as described above when executed by a processor.
The monitoring method, the monitoring device, the monitoring equipment and the computer readable storage medium respectively acquire the running data of a first system and a second system for resource sharing; judging the operating data according to a preset judgment rule to determine whether the first system and the second system have faults or not; and taking corresponding measures according to the judgment result so as to enable the first system and the second system to normally operate. Therefore, the running states of the two systems of resource sharing can be monitored in real time, the system problems can be found and solved as soon as possible, and the running safety of the system is improved on the basis of saving the cost.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art according to the drawings.
FIG. 1 is a schematic diagram of a network architecture on which the present invention is based;
fig. 2 is a schematic flow chart of a monitoring method according to an embodiment of the present invention;
fig. 3 is a schematic flow chart of a monitoring method according to a second embodiment of the present invention;
fig. 4 is a schematic flowchart of a monitoring method according to a third embodiment of the present invention;
fig. 5 is a schematic flow chart of a monitoring method according to a fourth embodiment of the present invention;
fig. 6 is a schematic flowchart of a monitoring method according to a fifth embodiment of the present invention;
fig. 7 is a schematic flowchart of a monitoring method according to a sixth embodiment of the present invention;
fig. 8 is a schematic flowchart of a monitoring method according to a seventh embodiment of the present invention;
fig. 9 is a schematic flowchart of a monitoring method according to an eighth embodiment of the present invention;
fig. 10 is a schematic structural diagram of a monitoring device according to a ninth embodiment of the present invention;
fig. 11 is a schematic structural diagram of a monitoring device according to a tenth embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other examples obtained based on the examples in the present invention are within the scope of the present invention.
With the development of science and technology, electronic commerce gradually enters the lives of users, in order to support the service requirements of numerous users, the existing e-commerce websites generally adopt a plurality of distributed systems to support services, and different distributed systems respectively process different services. However, since different distributed systems process different services, the service processing time of different distributed systems is different, which results in a problem of resource waste. In order to solve the above technical problem, a method for transferring the service of the Hadoop system to the kubernets system for processing so as to realize resource sharing is proposed in the prior art. However, when the method is used for service processing, the service and hardware conditions of the two systems cannot be monitored, so that the current health condition of the systems cannot be monitored in real time. In order to solve the technical problem, the invention provides a monitoring method, a monitoring device, monitoring equipment and a computer-readable storage medium.
It should be noted that the monitoring method, apparatus, device, and computer-readable storage medium provided in the present application may be applied in a scenario of testing application software in various scenarios.
Fig. 1 is a schematic diagram of a network architecture based on the present invention, and as shown in fig. 1, the network architecture based on the present invention at least includes: a monitoring device 1, a first system 2 and a second system 3. The monitoring device 1 is in communication connection with the first system 2 and the second system 3, respectively, so as to obtain the operation data in the first system 2 and the second system 3. Wherein, the monitoring device 1 is written by C/C + +, Java, Shell or Python languages and the like; the first system 2 and the second system 3 may be server clusters in which a large amount of data is stored.
Fig. 2 is a schematic flow chart of a monitoring method according to an embodiment of the present invention, and as shown in fig. 2, the monitoring method includes:
The execution subject of the present embodiment is a monitoring device. In this embodiment, the first system and the second system may be distributed service systems or distributed databases. The first system and the second system can share resources, and specifically, the first system can deploy a task which is currently running to the second system to run, so that extra cost caused by capacity expansion can be avoided, and cost is saved. In order to ensure that both systems can operate stably, both systems need to be monitored. Specifically, the operation data of the first system and the second system may be acquired respectively. The operation data may include hardware operation data and software operation data.
And step 102, judging the operating data according to a preset judgment rule to determine whether the first system and the second system have faults.
In this embodiment, the monitoring device is preset with a plurality of judgment rules, so that the collected operation data of the first system and the second system can be judged according to the judgment rules, and whether the first system and the second system are in failure or not is determined. Specifically, the judgment rule may be preset and stored in the monitoring device, and as an implementable manner, the judgment rule may also be set by the user according to the current requirement, and the judgment rule set by the user is taken as the current preset judgment rule. Specifically, the monitoring device may interact with a user, and receive a determination rule input by the user from a preset interface.
And 103, taking corresponding measures according to the judgment result so as to enable the first system and the second system to normally operate.
In this embodiment, after the operation data is determined according to the preset determination rule, it may be determined whether the first system and the second system are currently in failure. Therefore, in order to ensure that the first system and the second system can stably operate, corresponding measures can be taken according to the judgment result. Specifically, if it is detected that the current first system and the current second system are not in fault, the first system and the second system may be continuously monitored, and if it is detected that the current first system and the current second system are in fault, the fault may be processed according to a preset processing method, so that the first system and the second system can stably operate, and thus, the service processing efficiency can be improved.
For example, the first system may be a Hadoop system, and the second system may be a Kubernetes system, where the Kubernetes system undertakes a main operation service of online shopping of a user, and the Hadoop system is used to perform operations such as cleaning, conversion, and processing on mass data to generate basic data required by systems such as search recommendation, artificial intelligence, unbounded retail, and face recognition, but the main pressure of the Kubernetes system is between 9 and 24 points in the day due to a shopping habit of the user. At 0 to 8 a.m., 80% of the resources of the kubernets system are idle, while the Hadoop system needs to provide 24 hours a day data service. However, with the rapid development and rapid expansion of services, the Hadoop system for big data needs more and more data to be processed, and huge funds are spent each year to expand the calculation and storage capacity of the existing big data, thereby causing resource waste. Therefore, in order to achieve the effect of saving cost, part of tasks on the Hadoop system can be deployed to the Kubernets system for processing. In order to ensure that the two systems can stably operate, the operation data of the two systems can be respectively collected, the operation data of the Hadoop system and the Kubernets system is judged according to a preset judgment rule, whether the Hadoop system and the Kubernets system have faults at present is determined, and corresponding processing measures are taken according to the judgment result so as to ensure that the Hadoop system and the Kubernets system can normally operate.
In the monitoring method provided by this embodiment, the operation data of the first system and the second system for resource sharing are obtained respectively; judging the operating data according to a preset judgment rule to determine whether the first system and the second system have faults or not; and taking corresponding measures according to the judgment result so as to enable the first system and the second system to normally operate. Therefore, the running states of the two systems of resource sharing can be monitored in real time, the system problems can be found and solved as soon as possible, and the running safety of the system is improved on the basis of saving the cost.
Fig. 3 is a schematic flow chart of a monitoring method according to a second embodiment of the present invention, where on the basis of any of the foregoing embodiments, the method includes:
and 203, taking corresponding measures according to the judgment result so as to enable the first system and the second system to normally operate.
In this embodiment, in order to ensure that both the two systems can operate stably, the two systems need to be monitored to respectively acquire the operation data of the first system and the second system. Specifically, the operation data of the first system and the second system may be obtained through a preset call interface in the first system and the second system.
In the above example, the Hadoop system may be called to provide a Resource Manager (Resource Manager) API, so as to obtain the storage Resource and the computing Resource conditions of the Hadoop system. On one hand, hardware running data of two systems can be obtained, and specifically, an org.apache.hadoop.fs.filesystem interface provided by Hadoop can be called; calling an FsStatus object provided by the FileSystems; calling a fsstatus and getCapacity () method to obtain the size of the total space; invoking a fsstatus and getused () method to obtain the size of the used space; and calling a fsstatus and getsharing () method to obtain the size of the residual space so as to obtain the hardware running data in the Hadoop system. In addition, hardware operation data of the kubernets system can be acquired, and specifically, the following interfaces opened in the kubernets can be called to acquire the operation data of the system: calling a Container Runtime Interface (CRI) to acquire computing resource information of the Kubernetes system; calling a Container Network Interface (CNI) to acquire Network resource information of the Kubernetes system; and calling a Container Storage Interface (CSI) to acquire storage resource information of the Kubernetes system. On the other hand, software operation data of the Hadoop system and the Kubernets system can be obtained, and it should be noted that the calculation task of the Hadoop system is not different from the calculation process of the Hadoop system when the Hadoop system operates on the Kubernets system. Only in the processing method of storing the calculation result, there are the following differences: calculating intermediate data, transition data, temporary data and other non-final result data generated in the task, storing the non-final result data in a Kubernetes system container, namely local storage of the docker, and occupying storage resources of the docker; and the final result of the calculation task needs to be stored in the hadoop system HDFS and kept. And data loss caused by resource recovery of a Kubernetes system is avoided. Therefore, the monitoring device can acquire all the acquired and calculated task data running in the Hadoop system and the Kubernet system according to the mode of acquiring the task data calculated by the Hadoop system. Specifically, an interface preset in the Hadoop system may be called to obtain task data running in the Hadoop system, optionally, task running data currently running may be obtained, corresponding task running data may be obtained according to the task running time and the end time, corresponding task running data may also be obtained according to the task identifier, and in addition, the software running data may also be obtained in various ways, which is not limited herein.
In the monitoring method provided by this embodiment, the operation data of the first system and the second system for resource sharing are respectively obtained through the preset calling interfaces in the first system and the second system, so that the operation data of the first system and the second system can be obtained, and a basis is provided for subsequent system maintenance.
Further, on the basis of any of the above embodiments, the method comprises:
respectively acquiring hardware operating data of a first system and a second system for resource sharing;
judging the operating data according to a preset judgment rule to determine whether the first system and the second system have faults or not;
and taking corresponding measures according to the judgment result so as to enable the first system and the second system to normally operate.
It can be understood that if a problem occurs in the current hardware of the system, the problem may be caused that the operation speed is slow and the task completion efficiency is low, and therefore, the hardware operation data of the first system and the hardware operation data of the second system that perform resource sharing may be obtained separately. The hardware operation data includes, but is not limited to, a total space size, a used space size, a remaining space size, and the like.
According to the monitoring method provided by the embodiment, the hardware running data of the first system and the second system for resource sharing are respectively obtained, so that a basis is provided for subsequent judgment and maintenance of the running state of the system, and the problem of low task completion efficiency caused by insufficient hardware resources is avoided.
Further, on the basis of any of the above embodiments, the method comprises:
respectively acquiring task operation data of a first system and a second system for resource sharing;
judging the operating data according to a preset judgment rule to determine whether the first system and the second system have faults or not;
and taking corresponding measures according to the judgment result so as to enable the first system and the second system to normally operate.
In this embodiment, since a software failure may cause a task to be in a halt state and unable to be completed, in order to ensure that two systems sharing resources can operate normally, the monitoring device may also monitor software operating states of the first system and the second system. Specifically, task operation data of the first system and the second system may be acquired respectively.
According to the monitoring method provided by the embodiment, the task operation data of the first system and the second system for resource sharing are respectively obtained, so that a basis is provided for subsequent judgment and maintenance of the system operation state, and the problem of low task completion efficiency caused by software failure is avoided.
Fig. 4 is a schematic flow chart of a monitoring method according to a third embodiment of the present invention, where on the basis of any one of the above embodiments, as shown in fig. 4, the method includes:
301, respectively acquiring running data of a first system and a second system for resource sharing;
and step 303, taking corresponding measures according to the judgment result so as to enable the first system and the second system to normally operate.
In this embodiment, in order to ensure that the first system and the second system sharing resources can stably operate, after the operation data of the first system and the second system are respectively obtained, the obtained operation data may be determined according to a preset period and a preset determination rule, so as to determine whether the first system and the second system are currently operating normally. The preset period may be a default period of the system, or may be determined by the operation and maintenance staff according to historical experience and current requirements, which is not limited herein. For example, if the task executed by the first system and the second system is more important, a shorter period may be set to determine the operating states of the first system and the second system, and if the task executed by the first system and the second system is less important, a longer period may be set to determine the operating states of the first system and the second system in order to save cost.
According to the monitoring method provided by the embodiment, the operation data is judged according to a preset judgment rule according to a preset period to determine whether the first system and the second system have faults, so that the operation states of the first system and the second system can be accurately determined, and the first system and the second system can be ensured to stably operate.
Fig. 5 is a schematic flow chart of a monitoring method according to a fourth embodiment of the present invention, where on the basis of any of the foregoing embodiments, as shown in fig. 5, the method includes:
and step 404, taking corresponding measures according to the judgment result so as to enable the first system and the second system to normally operate.
In this embodiment, after the hardware operation data of the first system and the second system is obtained, the current storage resource utilization rates of the first system and the second system may be determined according to the hardware operation data, and it can be understood that if the current storage resource utilization rate is higher, the problems of a slower operation speed and a lower task completion efficiency may be caused, so that the current hardware operation data may be determined according to a preset determination rule. Specifically, the determination rule may be to determine whether the number of times that the occupancy rates of the storage resources of the first system and the second system continuously exceed the preset proportional threshold exceeds a preset first threshold, for example, if the number of times that the occupancy rates of the storage resources of the first system and/or the second system continuously exceed 90% exceeds three times, it may be determined that the first system and/or the second system currently has a fault. The ratio threshold and the first threshold may be default thresholds of the system, or may be determined by the operation and maintenance staff, and the present invention is not limited herein.
According to the monitoring method provided by the embodiment, the storage resource occupancy rates of the first system and the second system are calculated according to the hardware operation data of the first system and the second system, the storage resource occupancy rates are judged according to the preset judgment rule, and whether the times that the storage resource occupancy rates of the first system and the second system continuously exceed the preset proportional threshold exceeds the preset first threshold is determined, so as to determine whether the first system and the second system are in fault, so that whether the first system and the second system are in fault at present can be accurately determined, a basis is provided for subsequent maintenance, and further, the stable operation of the first system and the second system can be ensured on the basis of saving cost.
Fig. 6 is a schematic flow chart of a monitoring method according to a fifth embodiment of the present invention, where on the basis of any of the foregoing embodiments, as shown in fig. 6, the method includes:
501, respectively acquiring task running data of a first system and a second system for resource sharing;
In this embodiment, after the task operation data of the first system and the second system is acquired, the current task completion rate and/or the task completion time may be determined according to the task operation data. It can be understood that if the current task completion rate is lower than a preset threshold, the software representing the current system fails; and if the completion time of the current task exceeds a preset threshold value, the software representing the current system fails. Therefore, the task completion rate and/or the task completion time of the first system and the second system can be determined according to the task operation data, the task completion rate is judged according to a preset judgment rule, whether the task completion rate of the first system and the second system is lower than a preset second threshold value is determined to determine whether the first system and the second system are in failure, and/or the task completion time is judged according to the preset judgment rule to determine whether the task completion time of the first system and the second system exceeds a preset third threshold value to determine whether the first system and the second system are in failure.
In the monitoring method provided by this embodiment, the task completion rates of the first system and the second system are determined according to the task operation data, the task completion rates are determined according to a preset determination rule, and whether the task completion rates of the first system and the second system are lower than a preset second threshold is determined, so as to determine whether the first system and the second system have a fault; and/or determining task completion time of the first system and the second system according to the task operation data, judging the task completion time according to a preset judgment rule, determining whether the task completion time of the first system and the second system exceeds a preset third threshold value or not, and determining whether the first system and the second system break down or not, so that whether the first system and the second system break down or not can be accurately determined, a basis is provided for subsequent maintenance, and stable operation of the first system and the second system can be guaranteed on the basis of saving cost.
Fig. 7 is a schematic flow chart of a monitoring method according to a sixth embodiment of the present invention, where on the basis of any of the foregoing embodiments, as shown in fig. 7, the method includes:
601, respectively acquiring hardware running data of a first system and a second system for resource sharing;
603, judging the storage resource occupancy rate according to a preset judgment rule, and determining whether the times that the storage resource occupancy rates of the first system and the second system continuously exceed a preset proportion threshold exceeds a preset first threshold so as to determine whether the first system and the second system have faults;
In this embodiment, if the problem of the current system is a hardware problem, the problem can be solved in an idle node expansion manner. Specifically, if it is determined that the number of times that the occupancy rates of the storage resources of the first system and the second system continuously exceed the preset proportional threshold exceeds the preset first threshold, it is necessary to determine idle cluster nodes in the current first system and the second system, and process the current task through the currently operating cluster nodes in the first system and the second system and the idle cluster nodes, so that the operation problem caused by high occupancy rates of the storage resources can be solved. Specifically, all nodes under the Hadoop cluster can be listed firstly, the currently idle nodes are added into the Hadoop cluster nodes, and the Hadoop cluster configuration file is updated, so that the newly added nodes can execute tasks. In addition, if the nodes with low operation efficiency exist in the history nodes, the nodes can be restarted. A Resource Manager (RM) is called to check whether the resource adjustment has been made effective. It should be noted that the storage resource may be understood as a memory and a hard disk, and the computing resource may be understood as a CPU. Essentially, the storage and computing resources are located on a server (a host computer for storage and computing). When the two supplement each other to form the current resource shortage, the two increase at the same time; when the current resources are redundant, the two are deleted simultaneously.
In the monitoring method provided in this embodiment, if the number of times that the occupancy rates of the storage resources of the first system and the second system continuously exceed the preset proportional threshold exceeds the preset first threshold, the current idle cluster nodes in the first system and the second system are determined, and the current task is processed through the current running cluster nodes in the first system and the second system and the idle cluster nodes. Therefore, the self-healing of the system operation fault can be realized, and the stable operation of the system is guaranteed.
Fig. 8 is a schematic flow chart of a monitoring method according to a seventh embodiment of the present invention, where on the basis of any of the foregoing embodiments, as shown in fig. 8, the method includes:
701, respectively acquiring task running data of a first system and a second system for resource sharing;
705, judging the task completion time according to a preset judgment rule, and determining whether the task completion time of the first system and the second system exceeds a preset third threshold value to determine whether the first system and the second system have a fault;
and 707, if the task completion time of the first system and the second system exceeds a preset third threshold, sending a prompt message to the operation and maintenance personnel, so that the operation and maintenance personnel perform manual operation and maintenance according to the prompt message and the operation data.
In this embodiment, if it is detected that the task completion rates of the first system and the second system are lower than a preset second threshold and/or the task completion times of the first system and the second system exceed a preset third threshold, it is determined that the software of the system fails, and at this time, a prompt message needs to be sent to the operation and maintenance staff, where the prompt message includes failure time and failure details, so that the operation and maintenance staff can perform operation and maintenance in time according to the prompt message, and thus the first system and the second system can operate normally.
In the monitoring method provided by the embodiment, if the task completion rates of the first system and the second system are lower than a preset second threshold, prompt information is sent to operation and maintenance personnel, so that the operation and maintenance personnel perform manual operation and maintenance according to the prompt information and the operation data; and/or if the task completion time of the first system and the second system exceeds a preset third threshold, sending prompt information to operation and maintenance personnel so that the operation and maintenance personnel can carry out manual operation and maintenance according to the prompt information and the operation data. Therefore, the first system and the second system can be ensured to operate normally.
Fig. 9 is a schematic flow chart of a monitoring method according to an eighth embodiment of the present invention, where on the basis of any of the foregoing embodiments, as shown in fig. 9, the method further includes:
and 802, generating a cluster state diagram according to the operation data and a preset statistical template, so that the operation and maintenance personnel can timely know the operation states of the first system and the second system according to the cluster state diagram.
In this embodiment, a statistical template may be prestored in the monitoring device, where the statistical template includes, but is not limited to, a bar statistical graph, a sector statistical graph, a broken line statistical graph, a pictogram, and the like, so that after the operation data of the first system and the second system are collected, the operation data may be added to the statistical template to generate a cluster state diagram, so that an operation and maintenance person can visually determine the operation state of the current system.
According to the monitoring method provided by the embodiment, the cluster state diagram is generated according to the operation data and the preset statistical template, so that the operation and maintenance personnel can timely know the operation states of the first system and the second system according to the cluster state diagram, the operation and maintenance personnel can visually determine the operation state of the current system, and the user experience is improved.
Fig. 10 is a schematic structural diagram of a monitoring device according to a ninth embodiment of the present invention, and as shown in fig. 10, the monitoring device includes:
an obtaining module 91, configured to obtain operation data of a first system and a second system that perform resource sharing, respectively;
a judging module 92, configured to judge the operating data according to a preset judgment rule, so as to determine whether the first system and the second system have a fault;
and the processing module 93 is configured to take corresponding measures according to the determination result, so that the first system and the second system operate normally.
The monitoring device provided in this embodiment obtains the operation data of the first system and the second system for resource sharing, respectively; judging the operating data according to a preset judgment rule to determine whether the first system and the second system have faults or not; and taking corresponding measures according to the judgment result so as to enable the first system and the second system to normally operate. Therefore, the running states of the two systems of resource sharing can be monitored in real time, the system problems can be found and solved as soon as possible, and the running safety of the system is improved on the basis of saving the cost.
Further, on the basis of any of the above embodiments, the obtaining module includes:
the first obtaining unit is used for respectively obtaining the running data of the first system and the second system for resource sharing through preset calling interfaces in the first system and the second system.
Further, on the basis of any of the above embodiments, the obtaining module includes:
and the second acquisition unit is used for respectively acquiring hardware operating data of the first system and the second system for resource sharing.
Further, on the basis of any of the above embodiments, the obtaining module includes:
and the third acquisition unit is used for respectively acquiring task operation data of the first system and the second system for resource sharing.
Further, on the basis of any of the above embodiments, the determining module includes:
and the first judgment unit is used for judging the operating data according to a preset judgment rule according to a preset period.
Further, on the basis of any of the above embodiments, the determining module includes:
the computing unit is used for computing the storage resource occupancy rates of the first system and the second system according to the hardware operation data of the first system and the second system;
and the second judging unit is used for judging the storage resource occupancy rate according to a preset judging rule and determining whether the times of the first system and the second system that the storage resource occupancy rates continuously exceed a preset proportional threshold value exceed a preset first threshold value.
Further, on the basis of any of the above embodiments, the determining module includes:
a first determining unit, configured to determine task completion rates of the first system and the second system according to the task operation data;
the third judging unit is used for judging the task completion rate according to a preset judging rule and determining whether the task completion rates of the first system and the second system are lower than a preset second threshold value or not; and/or the presence of a gas in the gas,
the second determining unit is used for determining task completion time of the first system and the second system according to the task operation data;
and the fourth judging unit is used for judging the task completion time according to a preset judging rule and determining whether the task completion time of the first system and the second system exceeds a preset third threshold value.
Further, on the basis of any of the above embodiments, the processing module includes:
a third determining unit, configured to determine a currently idle cluster node in the first system and the second system if the number of times that the occupancy rates of the storage resources of the first system and the second system continuously exceed a preset proportional threshold exceeds a preset first threshold;
and the first processing unit is used for processing the current task through the currently running cluster nodes in the first system and the second system and the idle cluster nodes.
Further, on the basis of any of the above embodiments, the processing module includes:
the second processing unit is used for sending prompt information to operation and maintenance personnel if the task completion rates of the first system and the second system are lower than a preset second threshold value, so that the operation and maintenance personnel can carry out manual operation and maintenance according to the prompt information and the operation data; and/or the presence of a gas in the gas,
and the third processing unit is used for sending prompt information to the operation and maintenance personnel if the task completion time of the first system and the second system exceeds a preset third threshold value, so that the operation and maintenance personnel can carry out manual operation and maintenance according to the prompt information and the operation data.
Further, on the basis of any of the above embodiments, the apparatus further includes:
and the generating module is used for generating a cluster state diagram according to the operating data and a preset statistical template so that the operation and maintenance personnel can timely know the operating states of the first system and the second system according to the cluster state diagram.
Fig. 11 is a schematic structural diagram of a monitoring device provided in a tenth embodiment of the present invention, and as shown in fig. 11, the monitoring device includes: a memory 111, a processor 112;
a memory 111; a memory 111 for storing instructions executable by the processor 112;
wherein the processor 112 is configured to execute the monitoring method according to any of the above embodiments by the processor 112.
The present invention further provides a computer-readable storage medium, in which computer-executable instructions are stored, and when the computer-executable instructions are executed by a processor, the computer-readable storage medium is used for implementing the monitoring method according to any one of the above embodiments.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working process of the apparatus described above may refer to the corresponding process in the foregoing method embodiment, and is not described herein again.
Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.
Claims (22)
1. A method of monitoring, comprising:
respectively acquiring running data of a first system and a second system for resource sharing;
judging the operating data according to a preset judgment rule to determine whether the first system and the second system have faults or not;
and taking corresponding measures according to the judgment result so as to enable the first system and the second system to normally operate.
2. The method of claim 1, wherein the obtaining the operation data of the first system and the second system for resource sharing respectively comprises:
and respectively acquiring the running data of the first system and the second system for resource sharing through preset calling interfaces in the first system and the second system.
3. The method of claim 1, wherein the obtaining the operation data of the first system and the second system for resource sharing respectively comprises:
hardware operation data of a first system and hardware operation data of a second system which share resources are respectively obtained.
4. The method of claim 1, wherein the obtaining the operation data of the first system and the second system for resource sharing respectively comprises:
task operation data of a first system and a second system for resource sharing are respectively obtained.
5. The method according to claim 1, wherein the determining the operation data according to a preset determination rule comprises:
and judging the operating data according to a preset judgment rule according to a preset period.
6. The method according to claim 3, wherein the determining the operation data according to a preset determination rule comprises:
calculating the storage resource occupancy rates of the first system and the second system according to the hardware operation data of the first system and the second system;
and judging the storage resource occupancy rate according to a preset judgment rule, and determining whether the times of the first system and the second system that the storage resource occupancy rates continuously exceed a preset proportional threshold value exceed a preset first threshold value.
7. The method according to claim 4, wherein the determining the operation data according to a preset determination rule comprises:
determining task completion rates of the first system and the second system according to the task operation data;
judging the task completion rate according to a preset judgment rule, and determining whether the task completion rates of the first system and the second system are lower than a preset second threshold value; and/or the presence of a gas in the gas,
determining task completion time of the first system and the second system according to the task operation data;
and judging the task completion time according to a preset judgment rule, and determining whether the task completion time of the first system and the second system exceeds a preset third threshold value.
8. The method according to claim 6, wherein taking corresponding measures according to the judgment result comprises:
if the times that the occupancy rates of the storage resources of the first system and the second system continuously exceed a preset proportional threshold exceed a preset first threshold, determining current idle cluster nodes in the first system and the second system;
and processing the current task through the current running cluster nodes in the first system and the second system and the idle cluster nodes.
9. The method according to claim 7, wherein taking corresponding measures according to the judgment result comprises:
if the task completion rates of the first system and the second system are lower than a preset second threshold value, sending prompt information to operation and maintenance personnel so that the operation and maintenance personnel can carry out manual operation and maintenance according to the prompt information and the operation data; and/or the presence of a gas in the gas,
and if the task completion time of the first system and the second system exceeds a preset third threshold, sending prompt information to operation and maintenance personnel so that the operation and maintenance personnel can carry out manual operation and maintenance according to the prompt information and the operation data.
10. The method according to any one of claims 1 to 9, wherein after the obtaining the operation data of the first system and the second system for resource sharing respectively, further comprises:
and generating a cluster state diagram according to the operation data and a preset statistical template so that the operation and maintenance personnel can timely know the operation states of the first system and the second system according to the cluster state diagram.
11. A monitoring device, comprising:
the acquisition module is used for respectively acquiring the running data of a first system and a second system for resource sharing;
the judging module is used for judging the operating data according to a preset judging rule so as to determine whether the first system and the second system have faults or not;
and the processing module is used for taking corresponding measures according to the judgment result so as to ensure that the first system and the second system operate normally.
12. The apparatus of claim 11, wherein the obtaining module comprises:
the first obtaining unit is used for respectively obtaining the running data of the first system and the second system for resource sharing through preset calling interfaces in the first system and the second system.
13. The apparatus of claim 11, wherein the obtaining module comprises:
and the second acquisition unit is used for respectively acquiring hardware operating data of the first system and the second system for resource sharing.
14. The apparatus of claim 11, wherein the obtaining module comprises:
and the third acquisition unit is used for respectively acquiring task operation data of the first system and the second system for resource sharing.
15. The apparatus of claim 11, wherein the determining module comprises:
and the first judgment unit is used for judging the operating data according to a preset judgment rule according to a preset period.
16. The apparatus of claim 13, wherein the determining module comprises:
the computing unit is used for computing the storage resource occupancy rates of the first system and the second system according to the hardware operation data of the first system and the second system;
and the second judging unit is used for judging the storage resource occupancy rate according to a preset judging rule and determining whether the times of the first system and the second system that the storage resource occupancy rates continuously exceed a preset proportional threshold value exceed a preset first threshold value.
17. The apparatus of claim 14, wherein the determining module comprises:
a first determining unit, configured to determine task completion rates of the first system and the second system according to the task operation data;
the third judging unit is used for judging the task completion rate according to a preset judging rule and determining whether the task completion rates of the first system and the second system are lower than a preset second threshold value or not; and/or the presence of a gas in the gas,
the second determining unit is used for determining task completion time of the first system and the second system according to the task operation data;
and the fourth judging unit is used for judging the task completion time according to a preset judging rule and determining whether the task completion time of the first system and the second system exceeds a preset third threshold value.
18. The apparatus of claim 16, wherein the processing module comprises:
a third determining unit, configured to determine a currently idle cluster node in the first system and the second system if the number of times that the occupancy rates of the storage resources of the first system and the second system continuously exceed a preset proportional threshold exceeds a preset first threshold;
and the first processing unit is used for processing the current task through the currently running cluster nodes in the first system and the second system and the idle cluster nodes.
19. The apparatus of claim 17, wherein the processing module comprises:
the second processing unit is used for sending prompt information to operation and maintenance personnel if the task completion rates of the first system and the second system are lower than a preset second threshold value, so that the operation and maintenance personnel can carry out manual operation and maintenance according to the prompt information and the operation data; and/or the presence of a gas in the gas,
and the third processing unit is used for sending prompt information to the operation and maintenance personnel if the task completion time of the first system and the second system exceeds a preset third threshold value, so that the operation and maintenance personnel can carry out manual operation and maintenance according to the prompt information and the operation data.
20. The apparatus of any one of claims 11-19, further comprising:
and the generating module is used for generating a cluster state diagram according to the operating data and a preset statistical template so that the operation and maintenance personnel can timely know the operating states of the first system and the second system according to the cluster state diagram.
21. A monitoring device, comprising: a memory, a processor;
a memory; a memory for storing the processor-executable instructions;
wherein the processor is configured to perform the monitoring method of any one of claims 1-10 by the processor.
22. A computer-readable storage medium having computer-executable instructions stored thereon, which when executed by a processor, implement the monitoring method of any one of claims 1-10.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910199173.3A CN111694705A (en) | 2019-03-15 | 2019-03-15 | Monitoring method, device, equipment and computer readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910199173.3A CN111694705A (en) | 2019-03-15 | 2019-03-15 | Monitoring method, device, equipment and computer readable storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111694705A true CN111694705A (en) | 2020-09-22 |
Family
ID=72475449
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910199173.3A Pending CN111694705A (en) | 2019-03-15 | 2019-03-15 | Monitoring method, device, equipment and computer readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111694705A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114118991A (en) * | 2021-11-12 | 2022-03-01 | 百果园技术(新加坡)有限公司 | Third-party system monitoring system, method, device, equipment and storage medium |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103324539A (en) * | 2013-06-24 | 2013-09-25 | 浪潮电子信息产业股份有限公司 | Job scheduling management system and method |
CN105718351A (en) * | 2016-01-08 | 2016-06-29 | 北京汇商融通信息技术有限公司 | Hadoop cluster-oriented distributed monitoring and management system |
CN106815119A (en) * | 2016-12-20 | 2017-06-09 | 曙光信息产业(北京)有限公司 | The hardware monitoring device of server |
CN106888254A (en) * | 2017-01-20 | 2017-06-23 | 华南理工大学 | A kind of exchange method between container cloud framework based on Kubernetes and its each module |
CN108255661A (en) * | 2016-12-29 | 2018-07-06 | 北京京东尚科信息技术有限公司 | A kind of method and system for realizing Hadoop cluster monitorings |
CN108881446A (en) * | 2018-06-22 | 2018-11-23 | 深源恒际科技有限公司 | A kind of artificial intelligence plateform system based on deep learning |
CN109117259A (en) * | 2018-07-25 | 2019-01-01 | 北京京东尚科信息技术有限公司 | Method for scheduling task, platform, device and computer readable storage medium |
CN109271233A (en) * | 2018-07-25 | 2019-01-25 | 上海数耕智能科技有限公司 | The implementation method of Hadoop cluster is set up based on Kubernetes |
CN109413125A (en) * | 2017-08-18 | 2019-03-01 | 北京京东尚科信息技术有限公司 | The method and apparatus of dynamic regulation distributed system resource |
-
2019
- 2019-03-15 CN CN201910199173.3A patent/CN111694705A/en active Pending
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103324539A (en) * | 2013-06-24 | 2013-09-25 | 浪潮电子信息产业股份有限公司 | Job scheduling management system and method |
CN105718351A (en) * | 2016-01-08 | 2016-06-29 | 北京汇商融通信息技术有限公司 | Hadoop cluster-oriented distributed monitoring and management system |
CN106815119A (en) * | 2016-12-20 | 2017-06-09 | 曙光信息产业(北京)有限公司 | The hardware monitoring device of server |
CN108255661A (en) * | 2016-12-29 | 2018-07-06 | 北京京东尚科信息技术有限公司 | A kind of method and system for realizing Hadoop cluster monitorings |
CN106888254A (en) * | 2017-01-20 | 2017-06-23 | 华南理工大学 | A kind of exchange method between container cloud framework based on Kubernetes and its each module |
CN109413125A (en) * | 2017-08-18 | 2019-03-01 | 北京京东尚科信息技术有限公司 | The method and apparatus of dynamic regulation distributed system resource |
CN108881446A (en) * | 2018-06-22 | 2018-11-23 | 深源恒际科技有限公司 | A kind of artificial intelligence plateform system based on deep learning |
CN109117259A (en) * | 2018-07-25 | 2019-01-01 | 北京京东尚科信息技术有限公司 | Method for scheduling task, platform, device and computer readable storage medium |
CN109271233A (en) * | 2018-07-25 | 2019-01-25 | 上海数耕智能科技有限公司 | The implementation method of Hadoop cluster is set up based on Kubernetes |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114118991A (en) * | 2021-11-12 | 2022-03-01 | 百果园技术(新加坡)有限公司 | Third-party system monitoring system, method, device, equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105357038B (en) | Monitor the method and system of cluster virtual machine | |
CN108632365B (en) | Service resource adjusting method, related device and equipment | |
CN107016480B (en) | Task scheduling method, device and system | |
CN112579304A (en) | Resource scheduling method, device, equipment and medium based on distributed platform | |
CN111966289A (en) | Partition optimization method and system based on Kafka cluster | |
CN112527484A (en) | Workflow breakpoint continuous running method and device, computer equipment and readable storage medium | |
CN112380089A (en) | Data center monitoring and early warning method and system | |
CN113656252B (en) | Fault positioning method, device, electronic equipment and storage medium | |
CN112149975B (en) | APM monitoring system and method based on artificial intelligence | |
CN111694705A (en) | Monitoring method, device, equipment and computer readable storage medium | |
CN110209497B (en) | Method and system for dynamically expanding and shrinking host resource | |
CN110750425A (en) | Database monitoring method, device and system and storage medium | |
CN111104266A (en) | Access resource allocation method and device, storage medium and electronic equipment | |
CN115712521A (en) | Cluster node fault processing method, system and medium | |
CN113590287B (en) | Task processing method, device, equipment, storage medium and scheduling system | |
CN114706893A (en) | Fault detection method, device, equipment and storage medium | |
CN113656239A (en) | Monitoring method and device for middleware and computer program product | |
CN110493071B (en) | Message system resource balancing device, method and equipment | |
CN112000720A (en) | Management method and management system for database connection and database connection pool | |
CN117076185B (en) | Server inspection method, device, equipment and medium | |
CN115934479B (en) | Interface service control method, device, storage medium and equipment | |
CN116723111B (en) | Service request processing method, system and electronic equipment | |
CN116260703A (en) | Distributed message service node CPU performance fault self-recovery method and device | |
US20220164219A1 (en) | Processing system, processing method, higher-level system, lower-level system, higher-level program, and lower-level program | |
CN108234188B (en) | Service platform resource scheduling processing method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |