CN115442270A - Full-stack high-performance computing cluster monitoring system - Google Patents

Full-stack high-performance computing cluster monitoring system Download PDF

Info

Publication number
CN115442270A
CN115442270A CN202211073239.2A CN202211073239A CN115442270A CN 115442270 A CN115442270 A CN 115442270A CN 202211073239 A CN202211073239 A CN 202211073239A CN 115442270 A CN115442270 A CN 115442270A
Authority
CN
China
Prior art keywords
unit
monitoring
performance
data
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211073239.2A
Other languages
Chinese (zh)
Inventor
王玲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Xinyida Computing Technology Co ltd
Original Assignee
Nanjing Xinyida Computing Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Xinyida Computing Technology Co ltd filed Critical Nanjing Xinyida Computing Technology Co ltd
Priority to CN202211073239.2A priority Critical patent/CN115442270A/en
Publication of CN115442270A publication Critical patent/CN115442270A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0805Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability
    • H04L43/0817Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability by checking functioning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/069Management of faults, events, alarms or notifications using logs of notifications; Post-processing of notifications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1433Vulnerability analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/12Protocols specially adapted for proprietary or special-purpose networking environments, e.g. medical networks, sensor networks, networks in vehicles or remote metering networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Computing Systems (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Environmental & Geological Engineering (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a full-stack high-performance computing cluster monitoring system, which comprises a monitoring module, a performance testing module, an ore digging program cleaning module and a data information security defense module, wherein the monitoring module is used for monitoring the performance of a mine digging program; the monitoring module is used for acquiring and summarizing data of each computing node, performing normalization processing, performing auxiliary monitoring on the current high-performance computing cluster application program and improving the accuracy of judging the running state of the high-performance computing cluster application; the performance testing module is used for determining a testing platform, deploying a system, testing the performance of the system, deploying an application, testing the application and analyzing data. According to the invention, the monitoring module is arranged to collect and summarize data of each computing node, and then normalization processing is carried out to carry out auxiliary monitoring on the current high-performance computing cluster application program, so that the accuracy of judging the running state of the high-performance computing cluster application program is improved, and the running controllability and stability of the high-performance cluster application program are obviously improved.

Description

Full-stack high-performance computing cluster monitoring system
Technical Field
The invention belongs to the technical field of monitoring systems, and particularly relates to a full-stack high-performance computing cluster monitoring system.
Background
Many modern project developments need to master multiple technologies to reduce communication cost and solve the problems of insufficient resources and closed loop. The value of the whole stack to the service is very large, and the technical capability of the whole stack has important influence on the overall planning of the whole service, the judgment and selection of the technical scheme, the positioning and the solution of the problems and the like. In addition, for entrepreneurial companies which are not complete in matching with various talents, various problems can be solved by the full stack, the multi-face is kept independently, the cost is saved, and the rapid development of services can be promoted in the early stage.
Traditional high performance cluster's purchasing cost is high, lead cycle length, and full stack formula high performance calculates the advantage: instantly acquiring HPC resources; the charging method supports various charging modes such as machine hour, month, season, year and the like, and saves the cost of customers; computing and storing massive elastic peaks and valleys meeting the business, and quickly completing a computing task; various computing resources such as the latest Intel and AMD platform CPU, the latest V100/P100 GPU and FPGA are adopted to easily meet the latest application requirements; an industry solution provides convenient SaaS application integration; and the corresponding operation flow is completed in a graphical interaction mode, so that the user can concentrate on application innovation.
The existing full-stack high-performance computing cluster monitoring system still has some problems: the running state of a traditional high-performance computing cluster application program is inconvenient to judge, the judging accuracy is reduced, the running controllability and the stability of the high-performance cluster application program are reduced, and how to discover a hidden ore digging program of a system and delete and clean the hidden ore digging program is provided for a high-energy computing cluster, so that the problem to be solved by the existing monitoring system is solved.
Disclosure of Invention
The present invention is directed to a full-stack high-performance computing cluster monitoring system to solve the problems set forth in the background art.
In order to achieve the purpose, the invention provides the following technical scheme: the full-stack high-performance computing cluster monitoring system comprises a monitoring module, a performance testing module, an ore digging program cleaning module and a data information security defense module;
the monitoring module is used for acquiring and summarizing data of each computing node, performing normalization processing, performing auxiliary monitoring on the current high-performance computing cluster application program, and improving the accuracy of judging the running state of the high-performance computing cluster application;
the performance testing module is used for testing the performance state of the system and rapidly acquiring the characteristics of application software by determining a testing platform, deploying the system, testing the performance of the system, deploying the application, testing the application and analyzing the data;
the mine excavation program cleaning module is used for improving the conventional mine excavation program cleaning process, utilizing an open source tool and compiling a monitoring script of the mine excavation program cleaning module, so that a hidden mine excavation program and a network forwarding mode thereof under a high-performance computing cluster system can be quickly found and cleaned;
the data information security defense module is used for realizing the security of communication between an internal network and an external network through a firewall, filtering and intrusion protection are carried out on contents, the active defense function of the information is realized by adopting the principles of deepening, layering and active security defense, and the monitoring of each node can be realized while defense is carried out, so that intrusion detection and virus spreading are prevented, and the security performance of the information is improved.
Preferably, the monitoring system further comprises a base layer, an intermediate layer and an application layer, wherein the base layer comprises a monitoring host and bottom layer resources, the bottom layer resources comprise cpu, a memory, network throughput, hard disk I/O and hard disk usage, the intermediate layer comprises nginx, redis, MQ, mySQL and Tomcat, and the application layer comprises throughput of HTTP access, response time, return code, call link analysis, performance bottleneck and monitoring of a user side.
Preferably, the monitoring system further comprises a log system, the log system is used for storing data of the base layer, the middle layer and the application layer, and the log system is used for formatting log data, standardizing the format of the monitoring data and performing unified log analysis.
Preferably, the monitoring module includes an acquisition unit, a data processing unit, a training unit and an anomaly prediction unit, the acquisition unit is configured to acquire data of each computing node, the data processing unit is configured to perform threshold preprocessing and normalization processing on the data, the training unit is configured to train the data after the threshold preprocessing and normalization processing to form a deep network LSTM, and the anomaly prediction unit is configured to input the single data after the threshold preprocessing and normalization processing into the deep network LSTM to perform high-performance computing cluster application anomaly prediction.
Preferably, the method for cleaning the excavation program in the excavation program cleaning module specifically comprises the following steps:
s1, judging whether an ore digging program exists in a computing node cluster;
s2, acquiring a process number of an ore excavation program: the method for acquiring the process number of the ore excavation program comprises the following steps: judging whether the ore excavation program hides the process number, if not, directly obtaining the process number, and if so, searching the process number of the hidden ore excavation program by using an open source tool;
and S3, inquiring the interactive communication nodes capable of being connected to the Internet according to the process numbers, checking the communication nodes and closing the data flow of the mining program.
Preferably, the data information security defense module includes active defense unit, protocol analysis unit, prevents hot wall unit and monitoring unit, active defense unit is used for adopting deepening, layering and active security defense principle, realizes the function to the active defense of information, protocol analysis unit is used for realizing right monitoring unit, prevent hot wall unit play the effect of support, prevent hot wall unit is used for realizing the security of internal network and external network communication, monitoring unit is used for realizing content filtering and intrusion protection.
Preferably, the active defense unit comprises a safety early warning unit, a safety protection unit, a safety monitoring unit, a safety response unit, a system recovery unit and a safety counterattack unit, the safety early warning unit comprises a vulnerability early warning unit, a behavior early warning unit and an attack trend early warning unit, the vulnerability early warning unit is used for providing a patching opportunity for users, the behavior early warning unit and the attack trend early warning unit are used for predicting attack behaviors existing in a network by observing abnormal flow of the network, the safety protection unit is used for realizing network virus protection and Trojan killing and preventing network Trojan and virus from spreading, the safety monitoring unit is used for mining by adopting a software or hardware association rule analysis technology, the safety response unit is used for blocking safety threats of the defense system, the system recovery unit adopts an online incremental backup mode to realize backup of resource information, and the safety counterattack unit is used for realizing damage to an attack source.
Preferably, the monitoring system calculates the throughput of the HTTP access according to real-time data of the decision factor, and the formula is as follows:
Figure BDA0003830122590000041
wherein, P is a decision coefficient, ei is real-time data of a decision factor i, eimax is the current upper limit value of the decision factor i, and Wi is the weight of the decision factor i.
Preferably, the parallel number of threads is determined in the performance test module according to the following formula:
Figure BDA0003830122590000042
wherein, P is the parallel number of the threads; x is the input data volume; and S is the preset data processing speed of the single thread.
Preferably, the data information security defense module is further configured to evaluate and score the network security according to the acquired information based on an existing data set and an established risk evaluation model through a bayesian network machine learning algorithm, and specifically includes the following steps:
step one, definition of classification levels: the method comprises five grades of A, B, C, D and E, wherein the safety protection degree represented by the grade A is the highest, the safety protection degree represented by the grade E is the lowest, and the probability that the collected data information belongs to a certain grade is known according to Bayesian theorem as follows:
Figure BDA0003830122590000043
wherein, the vector X is the collected event set, the variables C and k are some specific risk level, specifically, P (C = C | X = X) is the conditional probability of the risk level of the collected event set, P (C = C) is the prior probability of the risk level, P (X = X | C = C) is the probability of different levels calculated from the collected events, and the denominator is the prior probability of the collected events themselves;
step two, with the help of the thought of naive Bayes, the feature vector X is assumed: that is, assuming that the features of each dimension in X are independent from each other, there is no relation between features, and the formula is obtained:
Figure BDA0003830122590000051
where the vector X is the set of all events collected, X k N is the number of all elements for a specific event element;
step three, substituting the formula in the step one into the formula in the step two to obtain the class probability of the unknown sample with the feature vector X, wherein the formula is expressed as follows:
Figure BDA0003830122590000052
and the grade of the unknown sample with the characteristic vector X is the risk grade of the system network security at the moment.
Compared with the prior art, the invention has the beneficial effects that:
(1) According to the invention, the monitoring module is arranged to collect and summarize data of each computing node, and the current high-performance computing cluster application program is subjected to normalization processing to perform auxiliary monitoring, so that the accuracy of judging the running state of the high-performance computing cluster application program is improved, and the running controllability and stability of the high-performance cluster application program are obviously improved.
(2) The performance testing module is arranged, and the performance state of the system can be tested and the characteristics of the application software can be rapidly obtained by determining the testing platform, deploying the system, testing the performance of the system, deploying the application, testing the application and analyzing the data.
(3) According to the invention, by arranging the ore excavation program cleaning module, improving the conventional ore excavation program cleaning process, and utilizing the open-source tool and compiling the own monitoring script, the hidden ore excavation program and the network forwarding mode thereof under the high-performance computing cluster system can be quickly found and cleaned, the system safety guarantee is provided for the high-new-energy computing cluster, and the stability of the operation of the high-performance computing cluster is improved while the resource waste is reduced.
(4) By arranging the data information security defense module, the invention can realize active defense on information security, can defend the interior of the system from being attacked, has high defense performance and high equipment stability, can monitor each node while defending, prevents intrusion detection and virus spread, and improves the security performance of information.
Drawings
FIG. 1 is a block diagram of the present invention;
FIG. 2 is a block diagram of a monitoring module according to the present invention;
FIG. 3 is a block diagram of the data information security defense module of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1-3, the present invention provides a technical solution: the full-stack high-performance computing cluster monitoring system comprises a monitoring module, a performance testing module, an ore digging program cleaning module and a data information security defense module;
the monitoring module is used for acquiring and summarizing data of each computing node, performing normalization processing, performing auxiliary monitoring on the current high-performance computing cluster application program, and improving the accuracy of judging the running state of the high-performance computing cluster application;
the performance testing module is used for testing the performance state of the system and quickly acquiring the characteristics of application software by determining a testing platform, deploying the system, testing the performance of the system, deploying the application, testing the application and analyzing the data;
the mining program cleaning module is used for improving the conventional mining program cleaning process, utilizing an open-source tool and compiling a monitoring script of the mining program cleaning module, so that the mining program cleaning module can quickly find and clean a hidden mining program and a network forwarding mode thereof under a high-performance computing cluster system;
the data information security defense module is used for realizing the security of communication between an internal network and an external network through a firewall, filtering and intrusion protection are carried out on contents, the active defense function of the information is realized by adopting the principles of deepening, layering and active security defense, and the monitoring of each node can be realized while defense is carried out, so that intrusion detection and virus spreading are prevented, and the security performance of the information is improved.
In this embodiment, preferably, the monitoring system further includes a base layer, an intermediate layer, and an application layer, where the base layer includes a monitoring host and a bottom layer resource, the bottom layer resource includes cpu, a memory, a network throughput, a hard disk I/O, and a hard disk usage, the intermediate layer includes nginx, redis, MQ, mySQL, and Tomcat, and the application layer includes throughput of HTTP access, response time, a return code, a call link analysis, a performance bottleneck, and monitoring of a user side.
In this embodiment, preferably, the monitoring system further includes a log system, the log system is configured to store data of the base layer, the intermediate layer, and the application layer, and the log system is configured to format log data, standardize a format of monitoring data, and perform unified log analysis.
In this embodiment, preferably, the monitoring module includes an acquisition unit, a data processing unit, a training unit, and an anomaly prediction unit, where the acquisition unit is configured to acquire data of each computing node, the data processing unit is configured to perform threshold preprocessing and normalization processing on the data, the training unit is configured to train the data subjected to the threshold preprocessing and normalization processing to form a deep network LSTM, and the anomaly prediction unit is configured to input the single data subjected to the threshold preprocessing and normalization processing into the deep network LSTM to perform high-performance computing cluster application anomaly prediction.
In this embodiment, preferably, the method for cleaning the excavation program in the excavation program cleaning module specifically includes:
s1, judging whether an ore digging program exists in a computing node cluster;
s2, acquiring a process number of an ore excavation program: the method for acquiring the process number of the ore excavation program comprises the following steps: judging whether the process number is hidden by the ore excavation program, if not, directly acquiring the process number, and if hidden, searching the process number of the hidden ore excavation program by using an open source tool;
and S3, inquiring the interactive communication nodes capable of being connected to the Internet according to the process numbers, checking the communication nodes and closing the data flow of the mining program.
In this embodiment, preferably, the data information security defense module includes an active defense unit, a protocol analysis unit, a firewall unit and a monitoring unit, the active defense unit is configured to adopt deepening, layering and active security defense principles to realize active defense function on information, the protocol analysis unit is configured to realize that the monitoring unit and the firewall unit play a supporting role, the firewall unit is configured to realize security of communication between an internal network and an external network, and the monitoring unit is configured to realize content filtering and intrusion protection.
In this embodiment, it is preferable that the active defense unit includes a security early warning unit, a security protection unit, a security monitoring unit, a security response unit, a system recovery unit, and a security counterattack unit, the security early warning unit includes a vulnerability early warning unit, a behavior early warning unit, and an attack trend early warning unit, the vulnerability early warning unit is used for providing a chance of patching for a user, the behavior early warning unit and the attack trend early warning unit are used for predicting an attack behavior existing in a network by observing abnormal traffic of the network, the security protection unit is used for implementing network virus protection and Trojan killing, and preventing network Trojan and virus from spreading, the security monitoring unit is used for mining by using a software or hardware association rule analysis technology, the security response unit is used for blocking security threats of the defense system, the system recovery unit uses an online incremental backup mode to implement backup of resource information, and the security counterattack unit is used for implementing damage to an attack source.
In this embodiment, preferably, the throughput of HTTP access is calculated in the monitoring system through real-time data of the decision factor, and the formula is calculated as follows:
Figure BDA0003830122590000081
wherein, P is a decision coefficient, ei is real-time data of a decision factor i, eimax is the current upper limit value of the decision factor i, and Wi is the weight of the decision factor i.
In this embodiment, preferably, the parallel number of threads is determined in the performance test module according to the following formula:
Figure BDA0003830122590000091
wherein, P is the thread parallel number; x is input data volume; and S is the preset data processing speed of the single thread.
In this embodiment, preferably, the data information security defense module is further configured to evaluate and score the network security according to the collected information based on an existing data set and an established risk evaluation model through a bayesian network machine learning algorithm, and specifically includes the following steps:
step one, definition of classification levels: the method comprises five grades of A, B, C, D and E, wherein the safety protection degree represented by the grade A is the highest, the safety protection degree represented by the grade E is the lowest, and the probability that the collected data information belongs to a certain grade is known according to Bayesian theorem as follows:
Figure BDA0003830122590000092
wherein, the vector X is the collected event set, the variables C and k are some specific risk level, specifically, P (C = C | X = X) is the conditional probability of the risk level of the collected event set, P (C = C) is the prior probability of the risk level, P (X = X | C = C) is the probability of different levels calculated from the collected events, and the denominator is the prior probability of the collected events themselves;
step two, with the help of the thought of naive Bayes, the characteristic vector X is assumed: that is, assuming that the features of each dimension in X are independent from each other, there is no relation between features, and the formula is obtained:
Figure BDA0003830122590000093
where the vector X is the set of all events collected, X k N is the number of all elements for a specific event element;
step three, substituting the formula in the step one into the formula in the step two to obtain the class probability of the unknown sample with the characteristic vector X, wherein the formula is expressed as follows:
Figure BDA0003830122590000101
the rank of the unknown sample with the feature vector X is the risk rank of the system network security at this time.
The principle and the advantages of the invention are as follows: according to the invention, the monitoring module is arranged to collect and summarize data of each computing node, and then normalization processing is carried out to carry out auxiliary monitoring on the current high-performance computing cluster application program, so that the accuracy of judging the running state of the high-performance computing cluster application program is improved, and the running controllability and stability of the high-performance cluster application program are obviously improved; by setting the performance test module, the system performance state can be tested and the application software characteristics can be rapidly obtained by determining the test platform, carrying out system deployment, carrying out system performance test, carrying out application deployment, carrying out application test and analyzing data; by arranging the ore excavation program cleaning module, improving the conventional ore excavation program cleaning process, and utilizing an open-source tool and compiling a monitoring script, the hidden ore excavation program and the network forwarding mode thereof under the high-performance computing cluster system can be quickly found and cleaned, the system safety guarantee is provided for the high-new-energy computing cluster, and the running stability of the high-performance computing cluster is improved while the resource waste is reduced; through setting up data information security defense module, can realize when the initiative defense to the safety of information, can also defend the inside attack that receives of system, defense performance is high, and equipment stability can be high, when the defense, can also prevent the spreading of intrusion detection and virus to the control of each node, has improved the security performance of information.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (10)

1. Full stack formula high performance calculation cluster monitored control system, its characterized in that: the system comprises a monitoring module, a performance testing module, an ore digging program cleaning module and a data information security defense module;
the monitoring module is used for acquiring and summarizing data of each computing node, performing normalization processing, performing auxiliary monitoring on the current high-performance computing cluster application program, and improving the accuracy of judging the running state of the high-performance computing cluster application;
the performance testing module is used for testing the performance state of the system and quickly acquiring the characteristics of application software by determining a testing platform, deploying the system, testing the performance of the system, deploying the application, testing the application and analyzing the data;
the mining program cleaning module is used for improving the conventional mining program cleaning process, utilizing an open-source tool and compiling a monitoring script of the mining program cleaning module, so that the mining program cleaning module can quickly find and clean a hidden mining program and a network forwarding mode thereof under a high-performance computing cluster system;
the data information security defense module is used for realizing the security of communication between an internal network and an external network through a firewall, filtering and intrusion protection are carried out on contents, the active defense function of the information is realized by adopting the principles of deepening, layering and active security defense, and the monitoring of each node can be realized while defense is carried out, so that intrusion detection and virus spreading are prevented, and the security performance of the information is improved.
2. The full-stack high-performance computing cluster monitoring system of claim 1, wherein: the monitoring system further comprises a base layer, a middle layer and an application layer, wherein the base layer comprises a monitoring host and bottom layer resources, the bottom layer resources comprise a cpu, a memory, network throughput, hard disk I/O and hard disk usage, the middle layer comprises nginx, redis, MQ, mySQL and Tomcat, and the application layer comprises the throughput of HTTP access, response time, return codes, call link analysis, performance bottlenecks and monitoring of user sides.
3. The full-stack high-performance computing cluster monitoring system of claim 2, wherein: the monitoring system also comprises a log system, wherein the log system is used for storing the data of the basic layer, the middle layer and the application layer, and the log system is used for formatting log data, standardizing the format of monitoring data and carrying out unified log analysis.
4. The full-stack high-performance computing cluster monitoring system of claim 1, wherein: the monitoring module comprises an acquisition unit, a data processing unit, a training unit and an anomaly prediction unit, wherein the acquisition unit is used for acquiring data of each computing node, the data processing unit is used for carrying out threshold value preprocessing and normalization processing on the data, the training unit is used for training the data subjected to the threshold value preprocessing and the normalization processing to form a deep network LSTM, and the anomaly prediction unit is used for inputting the single data subjected to the threshold value preprocessing and the normalization processing into the deep network LSTM to carry out high-performance computing cluster application anomaly prediction.
5. The full-stack high-performance computing cluster monitoring system of claim 1, wherein: the method for cleaning the ore excavation program in the ore excavation program cleaning module specifically comprises the following steps:
s1, judging whether an ore digging program exists in a computing node cluster;
s2, acquiring a process number of an ore excavation program: the method for acquiring the process number of the ore excavation program comprises the following steps: judging whether the process number is hidden by the ore excavation program, if not, directly acquiring the process number, and if hidden, searching the process number of the hidden ore excavation program by using an open source tool;
and S3, inquiring the interactive communication nodes capable of being connected to the Internet according to the process numbers, checking the communication nodes and closing the data flow of the mining program.
6. The full-stack high-performance computing cluster monitoring system of claim 1, wherein: the data information security defense module comprises an active defense unit, a protocol analysis unit, a firewall unit and a monitoring unit, wherein the active defense unit is used for adopting deepening, layering and active security defense principles to realize active defense functions on information, the protocol analysis unit is used for realizing the support effect of the monitoring unit and the firewall unit, the firewall unit is used for realizing the communication security between an internal network and an external network, and the monitoring unit is used for realizing content filtering and intrusion protection.
7. The full-stack high-performance computing cluster monitoring system of claim 6, wherein: the active defense unit comprises a safety early warning unit, a safety protection unit, a safety monitoring unit, a safety response unit, a system recovery unit and a safety counterattack unit, the safety early warning unit comprises a vulnerability early warning unit, a behavior early warning unit and an attack trend early warning unit, the vulnerability early warning unit is used for providing a patching opportunity for users, the behavior early warning unit and the attack trend early warning unit are used for predicting attack behaviors existing in a network by observing abnormal flow of the network, the safety protection unit is used for achieving network virus protection and Trojan investigation and killing and preventing network Trojan and virus spreading, the safety monitoring unit is used for excavating by adopting a software or hardware association rule analysis technology, the safety response unit is used for blocking safety threats of a defense system, the system recovery unit adopts an online incremental backup mode to achieve backup of resource information, and the safety counterattack unit is used for achieving damage to an attack source.
8. The full-stack high-performance computing cluster monitoring system of claim 1, wherein: the monitoring system calculates the throughput of HTTP access through real-time data of decision factors, and the formula is as follows:
Figure FDA0003830122580000031
wherein, P is a decision coefficient, ei is real-time data of a decision factor i, eimax is the current upper limit value of the decision factor i, and Wi is the weight of the decision factor i.
9. The full-stack high-performance computing cluster monitoring system of claim 1, wherein: the performance testing module determines the parallel line number of the threads according to the following formula:
Figure FDA0003830122580000032
wherein, P is the parallel number of the threads; x is input data volume; and S is the preset data processing speed of the single thread.
10. The full-stack high-performance computing cluster monitoring system of claim 1, wherein: the data information security defense module is also used for evaluating and scoring the network security based on the existing data set and the established risk evaluation model through a Bayesian network machine learning algorithm according to the acquired information, and specifically comprises the following steps:
step one, defining classification levels: the method comprises five grades of A, B, C, D and E, wherein the safety protection degree represented by the grade A is the highest, the safety protection degree represented by the grade E is the lowest, and the probability that the collected data information belongs to a certain grade is known according to Bayesian theorem as follows:
Figure FDA0003830122580000041
wherein, the vector X is the collected event set, the variables C and k are some specific risk level, specifically, P (C = C | X = X) is the conditional probability of the risk level of the collected event set, P (C = C) is the prior probability of the risk level, P (X = X | C = C) is the probability of different levels calculated from the collected events, and the denominator is the prior probability of the collected events themselves;
step two, with the help of the thought of naive Bayes, the feature vector X is assumed: that is, assuming that the features of each dimension in X are independent from each other, there is no relation between features, and the formula is obtained:
Figure FDA0003830122580000042
where the vector X is the set of all events collected, X k N is the number of all elements for a specific event element;
step three, substituting the formula in the step one into the formula in the step two to obtain the class probability of the unknown sample with the feature vector X, wherein the formula is expressed as follows:
Figure FDA0003830122580000043
and the grade of the unknown sample with the characteristic vector X is the risk grade of the system network security at the moment.
CN202211073239.2A 2022-09-02 2022-09-02 Full-stack high-performance computing cluster monitoring system Pending CN115442270A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211073239.2A CN115442270A (en) 2022-09-02 2022-09-02 Full-stack high-performance computing cluster monitoring system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211073239.2A CN115442270A (en) 2022-09-02 2022-09-02 Full-stack high-performance computing cluster monitoring system

Publications (1)

Publication Number Publication Date
CN115442270A true CN115442270A (en) 2022-12-06

Family

ID=84246405

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211073239.2A Pending CN115442270A (en) 2022-09-02 2022-09-02 Full-stack high-performance computing cluster monitoring system

Country Status (1)

Country Link
CN (1) CN115442270A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108449351A (en) * 2018-03-27 2018-08-24 许昌学院 A kind of information security Initiative Defense and monitoring system
CN109101395A (en) * 2018-07-27 2018-12-28 曙光信息产业(北京)有限公司 A kind of High Performance Computing Cluster application monitoring method and system based on LSTM
KR20190010956A (en) * 2017-07-24 2019-02-01 주식회사 시큐리티인사이드 intelligence type security log analysis method
CN110401649A (en) * 2019-07-17 2019-11-01 湖北央中巨石信息技术有限公司 Information Security Risk Assessment Methods and system based on Situation Awareness study
CN111143143A (en) * 2019-12-26 2020-05-12 北京神州绿盟信息安全科技股份有限公司 Performance test method and device
CN111447113A (en) * 2020-03-25 2020-07-24 中国建设银行股份有限公司 System monitoring method and device
CN112052053A (en) * 2020-10-10 2020-12-08 国科晋云技术有限公司 Method and system for cleaning mining program in high-performance computing cluster

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20190010956A (en) * 2017-07-24 2019-02-01 주식회사 시큐리티인사이드 intelligence type security log analysis method
CN108449351A (en) * 2018-03-27 2018-08-24 许昌学院 A kind of information security Initiative Defense and monitoring system
CN109101395A (en) * 2018-07-27 2018-12-28 曙光信息产业(北京)有限公司 A kind of High Performance Computing Cluster application monitoring method and system based on LSTM
CN110401649A (en) * 2019-07-17 2019-11-01 湖北央中巨石信息技术有限公司 Information Security Risk Assessment Methods and system based on Situation Awareness study
CN111143143A (en) * 2019-12-26 2020-05-12 北京神州绿盟信息安全科技股份有限公司 Performance test method and device
CN111447113A (en) * 2020-03-25 2020-07-24 中国建设银行股份有限公司 System monitoring method and device
CN112052053A (en) * 2020-10-10 2020-12-08 国科晋云技术有限公司 Method and system for cleaning mining program in high-performance computing cluster

Similar Documents

Publication Publication Date Title
Hassan et al. Tactical provenance analysis for endpoint detection and response systems
CN109347801B (en) Vulnerability exploitation risk assessment method based on multi-source word embedding and knowledge graph
Moustafa et al. Big data analytics for intrusion detection system: Statistical decision-making using finite dirichlet mixture models
CN110958220B (en) Network space security threat detection method and system based on heterogeneous graph embedding
Wang A multinomial logistic regression modeling approach for anomaly intrusion detection
US8108929B2 (en) Method and system for detecting intrusive anomalous use of a software system using multiple detection algorithms
CN113965404A (en) Network security situation self-adaptive active defense system and method
US9369484B1 (en) Dynamic security hardening of security critical functions
Murtaza et al. A host-based anomaly detection approach by representing system calls as states of kernel modules
CN116781430B (en) Network information security system and method for gas pipe network
Elsayed et al. PredictDeep: security analytics as a service for anomaly detection and prediction
US20230011004A1 (en) Cyber security sandbox environment
Gonaygunta Machine learning algorithms for detection of cyber threats using logistic regression
US11886587B2 (en) Malware detection by distributed telemetry data analysis
CN111726351B (en) Bagging-improved GRU parallel network flow abnormity detection method
Angelini et al. An attack graph-based on-line multi-step attack detector
CN115795330A (en) Medical information anomaly detection method and system based on AI algorithm
Lagzian et al. Frequent item set mining-based alert correlation for extracting multi-stage attack scenarios
CN107623677B (en) Method and device for determining data security
Mei et al. CTScopy: hunting cyber threats within enterprise via provenance graph-based analysis
Ye et al. An intrusion detection approach based on system call sequences and rules extraction
Xuan et al. New approach for APT malware detection on the workstation based on process profile
CN115442270A (en) Full-stack high-performance computing cluster monitoring system
CN112988327A (en) Container safety management method and system based on cloud edge cooperation
Xinguang et al. Intrusion detection based on system calls and homogeneous Markov chains

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination