CN116484373B

CN116484373B - Abnormal process checking and killing method, system, device, computer equipment and storage medium

Info

Publication number: CN116484373B
Application number: CN202310509556.2A
Authority: CN
Inventors: 杨佳欢; 陈保文; 吴佳欢
Original assignee: Hexin Technology Co ltd; Hexin Technology Suzhou Co ltd
Current assignee: Hexin Technology Co ltd; Hexin Technology Suzhou Co ltd
Priority date: 2023-05-08
Filing date: 2023-05-08
Publication date: 2024-02-23
Anticipated expiration: 2043-05-08
Also published as: CN116484373A

Abstract

The application relates to the technical field of cluster operation and maintenance and discloses an abnormal process checking and killing method, a system, a device, computer equipment and a storage medium. Acquiring user information of each process to be detected in a cluster; performing anomaly detection on each process to be detected to obtain an anomaly process; and according to the user information of the abnormal process, sending the process information of the abnormal process to the corresponding user terminal so that the user terminal can select the abnormal process to be checked and killed. The abnormal process is automatically detected, so that the abnormal process is prevented from being manually judged, and the process information of the abnormal process is sent to the user terminal so as to selectively check and kill the abnormal process, thereby avoiding the check and kill of an administrator on the abnormal process and greatly reducing the check and kill error rate of the abnormal process.

Description

Abnormal process checking and killing method, system, device, computer equipment and storage medium

Technical Field

The application relates to the technical field of cluster operation and maintenance, in particular to an abnormal process checking and killing method, system, device, computer equipment and storage medium.

Background

Clusters, which are loosely coupled multi-server processing systems formed by a set of independent computer systems, have been increasingly used by enterprises for their high performance, availability, manageability, and price availability. However, abnormal processes in the cluster occupy CPU resources of the server, resulting in a waste of computing resources. Especially for enterprises having large-scale cluster jobs, the waste of resources is more serious. Therefore, the detection of abnormal processes in the checking cluster has important significance.

At present, the detection and killing of the abnormal processes in the cluster are mainly carried out by manually logging in a corresponding server by an administrator, and the detected abnormal processes are detected and killed. However, the manual killing is difficult to perform fine screening and judgment in a large number of processes without errors, and the situation that the engineering progress is influenced due to the fact that the process is killed by mistake often occurs. Therefore, after detecting the abnormal process, related users are manually notified to determine whether to end (kill) the abnormal process, so that the processing time span for checking and killing the abnormal process is greatly improved; moreover, the abnormal process is mainly manually checked and killed one by one, and for cluster operation, an administrator is required to switch the server host machine back and forth, so that the workload of the administrator is greatly increased, and the abnormal process checking and killing efficiency is seriously reduced.

Therefore, how to improve the detection efficiency of the abnormal processes in the cluster and reduce the fault rate of the abnormal processes has become a problem to be solved in the present day.

Disclosure of Invention

In view of this, the embodiments of the present application provide a method, a system, an apparatus, a computer device, and a storage medium for checking and killing an abnormal process, so as to solve the problem of how to improve the detection efficiency of the abnormal process in a cluster and reduce the failure rate of checking and killing the abnormal process.

In a first aspect, an embodiment of the present application provides an abnormal process checking and killing method, where the method includes:

acquiring user information of each process to be detected in the cluster;

performing anomaly detection on each process to be detected to obtain an anomaly process;

and according to the user information of the abnormal process, sending the process information of the abnormal process to the corresponding user terminal so that the user terminal can select the abnormal process to be checked and killed.

In the technical scheme, the user information of each process to be detected in the cluster is automatically acquired, and the abnormal detection is automatically carried out on each process to be detected in the cluster to obtain the abnormal process in the cluster. And then the user information of the abnormal process can be obtained, and the process information of the abnormal process is sent to the user terminal corresponding to the user information of the abnormal process. The automatic detection of abnormal processes is realized, the artificial judgment of abnormal processes is avoided, and the detection efficiency improvement effect is more obvious when the clusters are larger. The abnormal process detection efficiency is improved, and meanwhile, the process information of the abnormal process is automatically sent to the user terminal of the related user, so that the user can select the abnormal process to search and kill through the user terminal, the manager is prevented from searching and killing the abnormal process, and the error rate of searching and killing the abnormal process is greatly reduced.

Optionally, after sending the process information of the abnormal process to the corresponding user terminal according to the user information of the abnormal process, so that the user terminal selects the abnormal process for killing, the method further includes:

receiving confirmation information returned by the user terminal, and extracting abnormal process searching and killing information from the confirmation information;

and when the abnormal process killing information is matched with the process information of the abnormal processes in the cluster, the abnormal processes indicated by the abnormal process killing information are killed.

In the above technical solution, after the process information of the abnormal process is sent to the corresponding user terminal, the abnormal process searching and killing information may be further extracted from the confirmation information fed back by the user through the user terminal, and when the abnormal process searching and killing information is matched with the process information of the abnormal process in the cluster, the abnormal process indicated by the abnormal process searching and killing information is searched and killed in the cluster. The abnormal process checking and killing information based on the confirmation information realizes the accurate selection and automatic checking and killing of the abnormal processes in the cluster, reduces the error rate of checking and killing the abnormal processes and improves the efficiency of checking and killing the abnormal processes.

And receiving a killing instruction input by a target user through a control host in the cluster, and killing the abnormal process indicated by the killing instruction.

In the above technical solution, after the process information of the abnormal process is sent to the corresponding user terminal, so that the user terminal selects the abnormal process to perform searching and killing, the corresponding abnormal process may also be searched and killed by receiving a searching and killing instruction input by the target user in the control host in the cluster. The method realizes the automatic checking and killing of the abnormal process in the cluster by different methods, and improves the practicability and applicability of checking and killing the abnormal process.

monitoring whether an unchecked abnormal process which exceeds a preset time threshold and is unchecked exists;

when the existence of the unchecked abnormal process is monitored, the process information of the unchecked abnormal process is sent to the corresponding user terminal according to the user information of the unchecked abnormal process, so that a user terminal user selects the unchecked abnormal process to check and kill.

In the above technical solution, after the process information of the abnormal process is sent to the corresponding user terminal, so that the user terminal selects the abnormal process to perform searching and killing, when it is detected that there is an unchecked abnormal process that exceeds a preset time threshold and is unchecked, the process information of the unchecked abnormal process is sent to the corresponding user terminal according to the user information of the unchecked abnormal process, so that the user terminal user selects the unchecked abnormal process to perform searching and killing. When the unchecked abnormal progress which exceeds the preset time threshold and is unchecked exists, the progress message of the related abnormal progress can be sent to the user terminal again, the user can select unchecked abnormal to check and kill through the user terminal through secondary reminding, and the omission of checking and killing of the abnormal progress is avoided.

Optionally, when the abnormal process checking and killing information is matched with the process information of the abnormal process in the cluster, the user terminal is located in the external network, and checks and kills the abnormal process indicated by the abnormal process checking and killing information, including:

when the abnormal process searching and killing information is matched with the process information of the abnormal process in the cluster, synchronizing the abnormal process searching and killing information from the external network to the internal network;

in the intranet, an abnormal process indicated by the abnormal process checking and killing information is checked and killed;

according to the technical scheme, when the user terminal is in the external network and the abnormal process searching and killing information is matched with the process information of the abnormal operation in the cluster, the operation of synchronizing the abnormal process searching and killing information from the external network to the internal network is realized in the internal network, the corresponding abnormal process in the cluster is automatically searched and killed, the corresponding abnormal process is not required to be screened in the internal network for searching and killing, the error rate of human factors is reduced, the working quality is improved, and meanwhile, the workload of an administrator is lightened.

Optionally, performing anomaly detection on each process to be detected to obtain an abnormal process, including:

acquiring process information of each process to be detected in the cluster, wherein the process information comprises CPU utilization rate, memory usage amount, working time and associated terminals of the process to be detected;

Determining whether the process to be detected corresponding to the process information is an abnormal process according to whether the process information meets at least two of the abnormal process judging conditions, and obtaining the abnormal process;

the judging conditions comprise that the CPU utilization rate of the process to be detected is larger than a CPU utilization rate threshold, the memory utilization amount is larger than a memory utilization amount threshold, the working time is larger than a working time threshold, and the process to be detected is out of connection with the associated terminal.

In the technical scheme, the abnormal process is detected through the preset abnormal process judging condition, the judging condition only restricts the CPU utilization rate, the memory utilization amount, the working time and the associated terminal which are the most important to the process to be detected, other more complex data are not required to be acquired and calculated, and the detection speed of the abnormal process is accelerated.

In a second aspect, an embodiment of the present application provides an abnormal process checking and killing system, where the abnormal process checking and killing system includes an abnormal process detection module, an abnormal process processing module, an information reminding module, an information feedback module, and a data synchronization module; the abnormal process detection module and the abnormal process processing module are positioned in the intranet; the information reminding module and the information feedback module are positioned in the external network; the abnormal process detection module, the abnormal process processing module, the information reminding module and the information feedback module are all in communication connection with the data synchronization module;

The abnormal process detection module is used for:

acquiring user information of each process to be detected in the cluster;

transmitting user information of the abnormal process to a data synchronization module;

the data synchronization module is used for synchronizing the user information of the abnormal process from the intranet to the extranet;

the information reminding module is used for:

reading user information of an abnormal process from a data synchronization module;

according to the user information of the abnormal process, the process information of the abnormal process is sent to the corresponding user terminal, so that the user terminal selects the abnormal process for searching and killing;

the information feedback module is used for:

when the abnormal process searching and killing information is matched with the process information of the abnormal process in the cluster, the abnormal process searching and killing information is sent to a data synchronization module;

the abnormal process processing module is used for reading abnormal process searching and killing information from the data synchronization module and searching and killing the abnormal process indicated by the abnormal process searching and killing information.

In a third aspect, an embodiment of the present application provides an abnormal process checking and killing device, where the device includes:

The acquisition module is used for acquiring the user information of each process to be detected in the cluster;

the detection module is used for carrying out abnormal detection on each process to be detected to obtain an abnormal process;

and the sending module is used for sending the process information of the abnormal process to the corresponding user terminal according to the user information of the abnormal process so that the user terminal can select the abnormal process to be checked and killed.

In a fourth aspect, embodiments of the present application provide a computer device, including: the processor executes the computer instructions, thereby executing the abnormal process checking and killing method according to the first aspect or any implementation mode corresponding to the first aspect.

In a fifth aspect, an embodiment of the present application provides a computer readable storage medium, where computer instructions are stored on the computer readable storage medium, where the computer instructions are configured to cause a computer to execute the abnormal process checking method of the first aspect or any embodiment corresponding to the first aspect.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of an anomaly process killing system according to an embodiment of the present application;

FIG. 2 is a flow diagram illustrating an abnormal process kill method according to some embodiments of the present application;

FIG. 3 is a flow diagram illustrating yet another abnormal process kill method according to some embodiments of the present application;

FIG. 4 is a block diagram of an abnormal process checking and killing device according to an embodiment of the present application;

fig. 5 is a schematic diagram of a hardware structure of a computer device according to an embodiment of the present application.

Detailed Description

For the purposes of making the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.

It should be understood that, in the embodiments of the present application, the "indication" may be a direct indication, an indirect indication, or an indication having an association relationship. For example, a indicates B, which may mean that a indicates B directly, e.g., B may be obtained by a; it may also indicate that a indicates B indirectly, e.g. a indicates C, B may be obtained by C; it may also be indicated that there is an association between a and B.

In the description of the embodiments of the present application, the term "corresponding" may indicate that there is a direct correspondence or an indirect correspondence between the two, or may indicate that there is an association between the two, or may indicate a relationship between the two and the indicated, configured, or the like.

In the embodiment of the present application, the "predefining" may be implemented by pre-storing corresponding codes, tables or other manners that may be used to indicate relevant information in devices (including, for example, terminal devices and network devices), and the specific implementation of the present application is not limited.

Fig. 1 is a schematic structural diagram of an abnormal progress checking and killing system according to an embodiment of the present application. The abnormal process checking and killing system comprises an abnormal process detection module 110, an abnormal process processing module 120, an information reminding module 130, an information feedback module 140 and a data synchronization module 150; the abnormal process detection module 110 and the abnormal process processing module 120 are located in the intranet; the information reminding module 130 and the information feedback module 140 are located in the external network; the abnormal process detection module 110, the abnormal process processing module 120, the information reminding module 130 and the information feedback module 140 are all in communication connection with the data synchronization module 150.

The abnormal process detection module 110 is used for connecting a host of a computing node in the cluster and acquiring user information of each process to be detected in the cluster; performing anomaly detection on each process to be detected to obtain an anomaly process; the user information of the abnormal process is transmitted to the data synchronization module 150.

The data synchronization module 150 is configured to open a secure channel between the partition walls of the intranet and the extranet, and perform mirror synchronization on data of the database in the intranet, and map the data to the extranet. Meanwhile, the data of the external network can be pulled into the internal network for research and development. The user information of the abnormal process can be synchronized from the intranet to the extranet.

The information reminding module 130 is used for informing and reminding the user, and specifically, reading the user information of the abnormal process from the data synchronization module 150; and according to the user information of the abnormal process, sending the process information of the abnormal process to the corresponding user terminal so that the user terminal can select the abnormal process to be checked and killed. After the user terminal receives the process information of the abnormal process, the user receives the process information of the abnormal process through the user terminal, and the information reminding module 130 completes the operations of informing and reminding the user. The process information of the abnormal process includes, but is not limited to, a host name, a user name, a process number, queue information, and user information corresponding to the abnormal process.

The information feedback module 140 is configured to process information fed back by the user through the user terminal, specifically, receive acknowledgement information returned by the user terminal, and extract abnormal process searching and killing information from the acknowledgement information; when the abnormal process killing information matches the process information of the abnormal process in the cluster, the abnormal process killing information is sent to the data synchronization module 150. The abnormal process checking and killing information comprises, but is not limited to, a host name, a user name, a process number and queue information corresponding to the abnormal process indicated by the abnormal process checking and killing information.

Optionally, the information feedback module 140 is further configured to send mismatch reminding information to the user terminal corresponding to the confirmation information when the abnormal process killing information is not matched with the process information of the abnormal process in the cluster. The mismatch alert information includes, but is not limited to, a mismatch hostname, a user name, a process number, and queue information.

The data synchronization module 150 is configured to synchronize the received abnormal process killing information from the external network to the internal network.

The abnormal process processing module 120 is configured to read abnormal process killing information from the data synchronization module 150, and kill an abnormal process indicated by the abnormal process killing information.

Optionally, the abnormal process processing module 120 may be further configured to receive a kill instruction input by the target user through the control host in the cluster, and kill an abnormal process indicated by the kill instruction. The abnormal process in the cluster can be checked and killed through different operations, and the practicability and applicability of the abnormal process checking and killing system are improved.

Optionally, the abnormal process checking and killing system may further include a configuration module 160, where the configuration module 160 is communicatively connected to the abnormal process detection module 110. The configuration module 160 is used for storing system configuration information required for the operation of the killing system. The system configuration information comprises, but is not limited to, a host name of a host in the cluster, queue information and process information to be detected in the cluster, wherein the process information to be detected comprises, but is not limited to, a process number of a process to be detected, a CPU (Central processing Unit) utilization threshold value, a memory utilization threshold value, a working time threshold value and an associated terminal.

Optionally, the configuration module 160 may be further configured to receive a modification instruction input by the target user through the control host in the cluster, and modify process information of the process to be detected indicated by the modification instruction, so that personalized processing of different processes to be detected can be implemented, a cut phenomenon is avoided, flexibility of abnormal process detection is ensured, and practical capability and applicability of the abnormal process checking and killing system are improved.

Optionally, the abnormal process detection module 110 is further configured to determine whether the process information of the process to be detected meets at least two of the abnormal process determination conditions by comparing the process information of the process to be detected with the system configuration file in the configuration module 160, so as to screen out an abnormal process; integrating the abnormal processes corresponding to the same user information to obtain an integrated abnormal process; the process information of the integrated abnormal process is stored in the data synchronization module 150. The abnormal process judgment conditions include: 1. whether the CPU usage rate of the process to be detected exceeds the CPU usage rate threshold of the process to be detected in the configuration module 160; 2. whether the memory usage of the process to be detected exceeds the memory usage threshold of the process to be detected in the configuration module 160; 3. the working time of starting the process to be detected, that is, whether the working time exceeds the working time threshold 4 of the process to be detected in the configuration module 160, and whether the process to be detected and the terminal lose association. For example; a process generated by a Guan Mou browse command may be considered an abnormal process when it has been active for more than a week or longer and has lost its association with the terminal. The abnormal process detection module 110 also performs format processing on the process information of the abnormal process before integrating the abnormal process under the same user information, so that the process information of the abnormal process is in a unified format.

Optionally, the information reminding module 130 may be further configured to send an instruction for checking and killing the abnormal process to the corresponding user terminal according to the user information of the abnormal process, so that the user directly copies and pastes the instruction for checking and/or killing the abnormal process to the command line of the control host in the cluster for executing. It is understood that the control host in the cluster may be any computer device having a function of managing the cluster.

Optionally, the information reminding module 130 may be further configured to monitor whether there is an unchecked abnormal process that exceeds a preset time threshold and is unchecked; when the existence of the unchecked abnormal process is monitored, the process information of the unchecked abnormal process is sent to the corresponding user terminal according to the user information of the unchecked abnormal process, so that a user terminal user selects the unchecked abnormal process to check and kill.

Optionally, the information reminding module 130 may be further configured to monitor whether the abnormal progress that is not checked is checked and killed within a preset time span; when the abnormal progress which is not killed in the preset time span, the progress information of the abnormal progress which is not killed is sent to a control host in the cluster so as to receive the killing instruction input by the administrator through the control host, and the abnormal progress which is not killed and indicated by the killing instruction is killed.

Optionally, the abnormal process checking and killing system may store log information generated by the abnormal process detecting module 110, the abnormal process processing module 120, the information reminding module 130, the information feedback module 140, the data synchronization module 150 and the configuration module 160 in the working process to a database, so as to facilitate later maintenance and problem positioning.

Fig. 2 is a flowchart illustrating an abnormal process checking and killing method according to an embodiment of the present application. The method is performed by an abnormal process killing system, which may be an abnormal process killing system as shown in fig. 1. As shown in fig. 2, the abnormal process checking and killing method may include the steps of:

step 201, obtaining user information of each process to be detected in the cluster.

The abnormal process checking and killing system (hereinafter referred to as checking and killing system) is connected with each host under each computing node in the cluster, and process information of a process to be detected in each host can be obtained through an interface connected with the host, wherein the process information comprises user information of the process to be detected. The user information may include, but is not limited to, the user's name, mailbox information, and social account number. The checking and killing system can acquire the process information of the process to be detected in each host computer at regular time through the interface, and can acquire the process information of the process to be detected in each host computer through the interface when receiving the process information acquisition instruction input by the target user through the control host computers in the cluster.

Step 202, performing anomaly detection on each process to be detected to obtain an anomaly process.

The checking and killing system automatically judges whether the process to be detected is an abnormal process according to whether the process information of each process to be detected is in an abnormal state so as to obtain the abnormal process in the process to be detected. It will be appreciated that different exception states may be set according to different processes to be detected, each process to be detected may be configured with one or more exception states. For example, the killing system may determine whether a downloading process is in an abnormal state according to whether a local directory update frequency corresponding to the downloading process exceeds a threshold value and/or whether a response time exceeds a preset value.

Step 203, according to the user information of the abnormal process, the process information of the abnormal process is sent to the corresponding user terminal, so that the user terminal selects the abnormal process for searching and killing.

The system for checking and killing can integrate the process information of the abnormal process under the same user information according to the user information of the abnormal process, send the integrated process information of the abnormal process to a user account indicated by the user information through the existing signaling equipment such as a mailbox server, log in a user terminal with the user account to receive the integrated process information of the abnormal process, and after receiving the integrated process information of the abnormal process through the user terminal, a user can select the abnormal process to be checked and killed so as to check and kill the abnormal process. When the killing system is the killing system shown in fig. 1, the abnormal process detection module 110 executes the steps 201 and 202, and integrates the process information of the abnormal process under the same user information to send to the information reminding module 130. And the information reminding module 130 sends the process information of the abnormal process to the corresponding user terminal according to the user information of the abnormal process, so that the user terminal selects the abnormal process for searching and killing.

In summary, the user information of each process to be detected in the cluster is automatically obtained, and the abnormality detection is automatically performed on each process to be detected in the cluster, so as to obtain the abnormal process in the cluster. And then the user information of the abnormal process can be obtained, and the process information of the abnormal process is sent to the user terminal corresponding to the user information of the abnormal process. The automatic detection of abnormal processes is realized, the artificial judgment of abnormal processes is avoided, and the detection efficiency improvement effect is more obvious when the clusters are larger. The abnormal process detection efficiency is improved, and meanwhile, the process information of the abnormal process is automatically sent to the user terminal of the related user, so that the user can selectively check and kill the abnormal process through the user terminal, the manager is prevented from checking and killing the abnormal process, and the error rate of checking and killing the abnormal process is greatly reduced.

Fig. 3 is a flowchart illustrating an abnormal process checking and killing method according to an embodiment of the present application. The method is performed by an abnormal process killing system, which may be an abnormal process killing system as shown in fig. 1. As shown in fig. 3, the abnormal process checking and killing method may include the steps of:

step 301, obtaining user information of each process to be detected in the cluster.

Please refer to step 201 in the embodiment shown in fig. 2 in detail, which is not described herein.

Optionally, in order to save computing resources and memory usage, obtaining the user information of each process to be detected in the cluster may include:

acquiring computing service state information of each host in a cluster; and screening out a target host for running the computing service according to the computing service state information.

The computing service state information is used for indicating whether the host runs the computing service or not; the checking and killing system can automatically inquire whether each host in the cluster operates the computing service through a specific system command, and obtain the computing service state information returned by each host in the cluster; the checking and killing system can judge whether the host computer runs the computing service according to the computing service state information, and if the host computer runs the computing service, the host computer is determined to be a target host computer.

And acquiring the user information of each process to be detected in the target host according to the read system configuration information.

The system configuration information comprises, but is not limited to, a host name queue of a host in the cluster and process information to be detected in the cluster, wherein the process information to be detected comprises, but is not limited to, a process number of a process to be detected, a CPU usage threshold, a memory usage threshold, a working time threshold and an associated terminal. After the target host is determined, the checking and killing system reads own system configuration information to obtain the process information to be detected in the cluster, and further extracts user information of a corresponding process to be detected from the target host according to the process information to be detected.

Step 302, performing anomaly detection on each process to be detected to obtain an anomaly process.

Optionally, in order to speed up the detection of the abnormal process, step 302 may further include the following steps:

and acquiring process information of each process to be detected in the cluster.

The process information comprises CPU utilization rate, memory utilization amount, working time and associated terminals of the process to be detected. The checking and killing system extracts the process information of each process to be detected in the cluster through an interface connected with a host in the cluster.

And determining whether the process to be detected corresponding to the process information is an abnormal process according to whether the process information meets at least two of the abnormal process judging conditions, and obtaining the abnormal process.

The judging conditions comprise that the CPU utilization rate of the process to be detected is larger than a CPU utilization rate threshold, the memory utilization amount is larger than a memory utilization amount threshold, the working time is larger than a working time threshold, and the process to be detected is out of connection with the associated terminal. The checking and killing system judges whether the process to be detected is an abnormal process or not through whether the process information of the process to be detected meets at least two of the conditions that the CPU utilization rate of the process to be detected is larger than a CPU utilization rate threshold, the memory utilization amount is larger than a memory utilization amount threshold, the working time is larger than a working time threshold and the process to be detected is out of connection with the associated terminal. The abnormal process is detected through the preset abnormal process judging conditions, the judging conditions only restrict the CPU utilization rate, the memory usage amount, the working time and the associated terminals which are the most important to the process to be detected, other more complex data do not need to be acquired and calculated, and the detection speed of the abnormal process is accelerated.

Optionally, the abnormal process judging condition may be stored in a system configuration file, and before step 302, the abnormal process checking and killing method may further include: and receiving a modification instruction input by a target user through a control host in the cluster, and modifying the abnormal process judgment condition in the system configuration file indicated by the modification instruction. Therefore, personalized processing of different processes to be detected can be realized, the phenomenon of one-cut is avoided, the flexibility of abnormal process detection is ensured, and the practical capability and the applicability of the abnormal process checking and killing system are improved.

Optionally, determining whether the process to be detected corresponding to the process information is an abnormal process according to whether the process information meets at least two of the abnormal process judging conditions, and obtaining the abnormal process may include: comparing the process information of the process to be detected with the system configuration file, and judging whether the process information of the process to be detected meets at least two of abnormal process judging conditions or not so as to screen out abnormal processes; carrying out format processing on the process information of the abnormal process to unify the format of the process information of the abnormal process; and integrating the abnormal processes corresponding to the same user information to obtain the integrated abnormal processes.

Step 303, according to the user information of the abnormal process, the process information of the abnormal process is sent to the corresponding user terminal, so that the user terminal selects the abnormal process for searching and killing.

Please refer to step 203 in the embodiment shown in fig. 2 in detail, which is not described herein.

When the killing system is the killing system shown in fig. 1, steps 301 to 302 may be performed by the abnormal process detection module 110. After the abnormal process is obtained by the abnormal process detection module 110, the process information of the abnormal process can also be sent to the data synchronization module 150 by the abnormal process detection module 110; the process information of the abnormal process is synchronized from the intranet to the extranet through the data synchronization module 150. Reading the process information of the abnormal process from the data synchronization module 150 through the information reminding module 130 so as to obtain the user information of the abnormal process; and sending the process information of the abnormal process to the corresponding user terminal so that the user terminal can select the abnormal process to be checked and killed.

And step 304, receiving confirmation information returned by the user terminal, and extracting abnormal process checking and killing information from the confirmation information.

After the process information of the abnormal process is sent to the user terminal by the checking and killing system, the user selects the abnormal process needing to be checked and killed through the user terminal, and the user returns confirmation information to the checking and killing system through the user terminal. The checking and killing system receives the confirmation information returned by the user terminal, and extracts abnormal process checking and killing information from the confirmation information, wherein the abnormal process checking and killing information comprises, but is not limited to, a host name, a user name, a process number and queue information corresponding to the abnormal process indicated by the abnormal process checking and killing information.

In step 305, when the abnormal process killing information matches with the process information of the abnormal process in the cluster, the abnormal process indicated by the abnormal process killing information is killed.

The checking and killing system compares the abnormal process indicated by the abnormal process checking and killing information with the process information of the same abnormal process in the cluster, and when the comparison results are consistent, the abnormal process checking and killing information is matched with the process information of the abnormal process in the cluster, and the abnormal process indicated by the abnormal process checking and killing information is checked and killed. And extracting abnormal process searching and killing information from the confirmation information fed back by the user through the user terminal, and searching and killing the abnormal process indicated by the abnormal process searching and killing information in the cluster when the abnormal process searching and killing information is matched with the process information of the abnormal process in the cluster. The abnormal process checking and killing information based on the confirmation information realizes the accurate selection and automatic checking and killing of the abnormal processes in the cluster, reduces the error rate of checking and killing the abnormal processes and improves the efficiency of checking and killing the abnormal processes.

Optionally, when the abnormal process searching and killing information is not matched with the process information of the abnormal process in the cluster, sending mismatch reminding information to the user terminal corresponding to the confirmation information. The mismatch alert information includes, but is not limited to, a mismatch hostname, a user name, a process number, and queue information.

Further, when the abnormal process searching and killing information is matched with the process information of the abnormal process in the cluster, the abnormal process searching and killing information is synchronized from the external network to the internal network; in the intranet, the abnormal progress indicated by the information is checked and killed.

The user terminal is in the external network and the cluster is in the internal network, and the internal network and the external network cannot directly communicate with each other from the partition wall. Therefore, when the abnormal process searching and killing information is matched with the process information of the abnormal process in the cluster, the abnormal process indicated by the abnormal process searching and killing information can be searched and killed in the cluster in the intranet after the abnormal process searching and killing information in the external network is synchronized to the intranet. When the user terminal is in the external network and the abnormal process searching and killing information is matched with the process information of the abnormal operation in the cluster, the operation of synchronizing the abnormal process searching and killing information from the external network to the internal network is realized in the internal network, the searching and killing of the corresponding abnormal process in the cluster is not needed to be manually screened in the internal network, the error rate of human factors is reduced, the working quality is improved, and the workload of an administrator is lightened.

Optionally, in order to improve the practicability and applicability of the abnormal process searching and killing, after the user receives the process information of the abnormal process through the user terminal, the user terminal can reply the confirmation information to the searching and killing system to search and kill the related abnormal process, and can log in a control host in the cluster to search and kill the abnormal process on the control host. Specifically, the searching and killing system receives a searching and killing instruction input by a target user through a control host in the cluster, and searches and kills an abnormal process indicated by the searching and killing instruction. After the process information of the abnormal process is sent to the corresponding user terminal so that the user terminal selects the abnormal process for killing, the corresponding abnormal process can be killed by receiving a killing command input by a target user in a control host in the cluster. The method realizes the automatic checking and killing of the abnormal process in the cluster by different methods, and improves the practicability and applicability of checking and killing the abnormal process.

Optionally, in order to prevent missing of the abnormal process searching and killing, further improve accuracy of the abnormal process searching and killing, after step 305, the abnormal process searching and killing method may further include:

monitoring whether an unchecked abnormal process which exceeds a preset time threshold and is unchecked exists; when the existence of the unchecked abnormal process is monitored, the process information of the unchecked abnormal process is sent to the corresponding user terminal according to the user information of the unchecked abnormal process, so that a user terminal user selects the unchecked abnormal process to check and kill.

The preset time threshold may be set by itself, and embodiments of the present application are not limited in particular. After the process information of the abnormal processes in the cluster is sent to the corresponding user terminals, the checking and killing system monitors whether the unchecked abnormal processes which exceed a preset time threshold but are not checked and killed exist in the cluster in real time. When the unchecked abnormal process which exceeds the preset time threshold and is unchecked is detected, the checking and killing system also sends the process information of the unchecked abnormal process to the corresponding user terminal according to the user information of the unchecked abnormal process, so that the user can select the unchecked abnormal process to check and kill through the user terminal. After the process information of the abnormal process is sent to the corresponding user terminal so that the user terminal selects the abnormal process for killing, when the fact that the unchecked abnormal process which exceeds the preset time threshold and is unchecked is detected, the process information of the unchecked abnormal process is sent to the corresponding user terminal according to the user information of the unchecked abnormal process so that the user terminal user selects the unchecked abnormal process for killing. When the unchecked abnormal progress which exceeds the preset time threshold and is unchecked exists, the progress message of the related abnormal progress can be sent to the user terminal again, the user can select unchecked abnormal to check and kill through the user terminal through secondary reminding, and the omission of checking and killing of the abnormal progress is avoided.

Optionally, in order to improve the killing rate of the abnormal process, after sending the process information of the abnormal process which is not killed to the corresponding user terminal according to the user information of the abnormal process which is not killed, so that the user terminal user selects the abnormal process which is not killed to kill, the abnormal process killing method may further include: monitoring whether the abnormal progress which is not checked and killed is checked and killed within a preset time span; when the abnormal progress which is not killed in the preset time span, the progress information of the abnormal progress which is not killed is sent to a control host in the cluster so as to receive the killing instruction input by the administrator through the control host, and the abnormal progress which is not killed and indicated by the killing instruction is killed. The preset time span may be any time span set manually, and the embodiment of the present application is not particularly limited. When the preset time threshold is exceeded and the unchecked abnormal process which is not checked yet, process information of the unchecked abnormal process is sent to a control host in the cluster, so that a subsequent manager checks and kills the unchecked abnormal process through an input checking and killing instruction of the control host, and the checking and killing rate of the abnormal process is improved.

When the killing system is the killing system shown in fig. 1, the killing system information feedback module 140 executes step 304, and when the abnormal process killing information matches the process information of the abnormal process in the cluster, sends the abnormal process killing information to the data synchronization module 150; the data synchronization module 150 synchronizes the abnormal process killing information from the external network to the internal network so that the abnormal process processing module 120 kills the abnormal process indicated by the abnormal process killing information.

In summary, after the process information of the abnormal process in the cluster is sent to the corresponding user terminal, the abnormal process searching and killing information can be extracted from the received confirmation information returned by the user terminal. And when the abnormal process killing information is matched with the process information of the abnormal processes in the cluster, automatically killing the abnormal processes indicated by the abnormal process killing information. The abnormal process is sent to the user group terminal to automatically notify the corresponding user while avoiding the human participation in the abnormal process searching and killing. And the abnormal process is selectively and accurately checked and killed based on the confirmation message returned by the user through the user terminal, so that the error killing rate of the abnormal process is reduced while the checking and killing efficiency of the abnormal process is improved.

Fig. 4 is a schematic structural diagram of an abnormal progress checking and killing apparatus according to an embodiment of the present application. The device is used for realizing the above embodiments and preferred embodiments, and is not described in detail. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. While the means described in the following embodiments are preferably implemented in software, implementation in hardware, or a combination of software and hardware, is also possible and contemplated. As shown in fig. 4, the abnormal process checking and killing apparatus includes:

An obtaining module 401, configured to obtain user information of each process to be detected in the cluster;

the detection module 402 is configured to perform anomaly detection on each process to be detected, so as to obtain an abnormal process;

and the sending module 403 is configured to send process information of the abnormal process to a corresponding user terminal according to the user information of the abnormal process, so that the user terminal selects the abnormal process for killing.

In some optional embodiments, the abnormal process checking and killing device further includes:

the receiving module is used for receiving the confirmation information returned by the user terminal and extracting abnormal process searching and killing information from the confirmation information;

and the searching and killing module is used for searching and killing the abnormal process indicated by the abnormal process searching and killing information when the abnormal process searching and killing information is matched with the process information of the abnormal process in the cluster.

In some optional embodiments, the killing module is further configured to receive a killing instruction input by the target user through the control host in the cluster, and kill an abnormal process indicated by the killing instruction.

the monitoring module is used for monitoring whether an unchecked abnormal process which exceeds a preset time threshold and is unchecked exists;

And the sending module is also used for sending the process information of the unchecked abnormal process to the corresponding user terminal according to the user information of the unchecked abnormal process when the unchecked abnormal process is monitored, so that a user of the user terminal selects the unchecked abnormal process to be checked and killed.

In some optional embodiments, the sending module is further configured to synchronize the abnormal process killing information from the external network to the internal network when the abnormal process killing information matches with process information of an abnormal process in the cluster;

the checking and killing module is also used for checking and killing abnormal processes indicated by the abnormal process checking and killing information in the intranet

In some alternative embodiments, the detection module is further configured to:

The abnormal progress checking and killing device in this embodiment is in the form of a functional unit, where a unit refers to an ASIC circuit, a processor and a memory executing one or more software or a fixed program, and/or other devices that can provide the above functions.

Further functional descriptions of the above respective modules and units are the same as those of the above corresponding embodiments, and are not repeated here.

The embodiment of the application also provides computer equipment, which is provided with the abnormal process checking and killing device shown in the figure 4.

Referring to fig. 5, fig. 5 is a schematic structural diagram of a computer device according to an alternative embodiment of the present application, as shown in fig. 5, the computer device includes: one or more processors 10, memory 20, and interfaces for connecting the various components, including high-speed interfaces and low-speed interfaces. The various components are communicatively coupled to each other using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the computer device, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display device coupled to the interface. In some alternative embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. Also, multiple computer devices may be connected, each providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system). One processor 10 is illustrated in fig. 5.

The processor 10 may be a central processor, a network processor, or a combination thereof. The processor 10 may further include a hardware chip, among others. The hardware chip may be an application specific integrated circuit, a programmable logic device, or a combination thereof. The programmable logic device may be a complex programmable logic device, a field programmable gate array, a general-purpose array logic, or any combination thereof.

Wherein the memory 20 stores instructions executable by the at least one processor 10 to cause the at least one processor 10 to perform the methods shown in implementing the above embodiments.

The memory 20 may include a storage program area that may store an operating system, at least one application program required for functions, and a storage data area; the storage data area may store data created from the use of the computer device of the presentation of a sort of applet landing page, and the like. In addition, the memory 20 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some alternative embodiments, memory 20 may optionally include memory located remotely from processor 10, which may be connected to the computer device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

Memory 20 may include volatile memory, such as random access memory; the memory may also include non-volatile memory, such as flash memory, hard disk, or solid state disk; the memory 20 may also comprise a combination of the above types of memories.

The computer device also includes a communication interface 30 for the computer device to communicate with other devices or communication networks.

The embodiments also provide a computer readable storage medium, and the method according to the embodiments of the present application described above may be implemented in hardware, firmware, or as recordable storage medium, or as computer code downloaded over a network that is originally stored in a remote storage medium or a non-transitory machine-readable storage medium and is to be stored in a local storage medium, so that the method described herein may be stored on such software process on a storage medium using a general purpose computer, special purpose processor, or programmable or dedicated hardware. The storage medium can be a magnetic disk, an optical disk, a read-only memory, a random access memory, a flash memory, a hard disk, a solid state disk or the like; further, the storage medium may also comprise a combination of memories of the kind described above. It will be appreciated that a computer, processor, microprocessor controller or programmable hardware includes a storage element that can store or receive software or computer code that, when accessed and executed by the computer, processor or hardware, implements the methods illustrated by the above embodiments.

Although embodiments of the present application have been described in connection with the accompanying drawings, various modifications and variations may be made by those skilled in the art without departing from the spirit and scope of the application, and such modifications and variations are within the scope defined by the appended claims.

Claims

1. An abnormal process checking and killing method, which is characterized by comprising the following steps:

acquiring user information of each process to be detected in the cluster;

according to the user information of the abnormal process, the process information of the abnormal process is sent to a corresponding user terminal, so that the user terminal selects the abnormal process for searching and killing;

the obtaining the user information of each process to be detected in the cluster comprises the following steps:

acquiring computing service state information of each host in a cluster; screening out a target host for running the computing service according to the computing service state information, wherein the computing service state information is used for indicating whether the host runs the computing service;

acquiring user information of each process to be detected in a target host according to the read system configuration information, wherein the system configuration information comprises a host name queue of the host in the cluster and process information to be detected in the cluster;

2. The method according to claim 1, wherein after transmitting the process information of the abnormal process to the corresponding user terminal according to the user information of the abnormal process so that the user terminal selects the abnormal process for killing, the method further comprises:

and receiving a killing instruction input by a target user through a control host in the cluster, and killing an abnormal process indicated by the killing instruction.

3. The method according to any one of claims 1 to 2, wherein after sending the process information of the abnormal process to the corresponding user terminal according to the user information of the abnormal process, so that the user terminal selects the abnormal process for killing, the method further comprises:

when the presence of the unchecked abnormal process is monitored, the process information of the unchecked abnormal process is sent to the corresponding user terminal according to the user information of the unchecked abnormal process, so that a user terminal user selects the unchecked abnormal process to be checked and killed.

4. The method according to claim 1, wherein the user terminal is in an external network, and when the abnormal process killing information matches with the process information of the abnormal process in the cluster, the step of killing the abnormal process indicated by the abnormal process killing information includes:

and in the intranet, searching and killing the abnormal process indicated by the abnormal process searching and killing information.

5. The method according to claim 1, wherein performing anomaly detection on the processes to be detected to obtain anomaly processes includes:

determining whether a process to be detected corresponding to the process information is an abnormal process according to whether the process information meets at least two of abnormal process judging conditions, and obtaining the abnormal process;

6. The abnormal process checking and killing system is characterized by comprising an abnormal process detection module, an abnormal process processing module, an information reminding module, an information feedback module and a data synchronization module; the abnormal process detection module and the abnormal process processing module are positioned in the intranet; the information reminding module and the information feedback module are positioned in the external network; the abnormal process detection module, the abnormal process processing module, the information reminding module and the information feedback module are all in communication connection with the data synchronization module;

the abnormal process detection module is used for:

acquiring user information of each process to be detected in the cluster;

transmitting the user information of the abnormal process to the data synchronization module;

the data synchronization module is used for synchronizing the user information of the abnormal process from an intranet to an extranet;

the information reminding module is used for:

reading user information of the abnormal process from the data synchronization module;

The information feedback module is used for:

when the abnormal process searching and killing information is matched with the process information of the abnormal process in the cluster, the abnormal process searching and killing information is sent to the data synchronization module;

the abnormal process processing module is used for reading the abnormal process checking and killing information from the data synchronization module and checking and killing the abnormal process indicated by the abnormal process checking and killing information;

the abnormal process detection module is used for:

and acquiring user information of each process to be detected in the target host according to the read system configuration information, wherein the system configuration information comprises a host name queue of the host in the cluster and process information to be detected in the cluster.

7. An abnormal process checking and killing device, characterized in that the device comprises:

the sending module is used for sending the process information of the abnormal process to the corresponding user terminal according to the user information of the abnormal process so that the user terminal can select the abnormal process to be checked and killed;

the acquisition module is further used for:

8. A computer device, comprising:

a memory and a processor, the memory and the processor being communicatively connected to each other, the memory having stored therein computer instructions, the processor executing the computer instructions to perform the abnormal process checking method of any one of claims 1 to 5.

9. A computer-readable storage medium having stored thereon computer instructions for causing a computer to execute the abnormal process killing method according to any one of claims 1 to 5.