CN111352820A - Method, equipment and device for predicting and monitoring running state of high-performance application - Google Patents

Method, equipment and device for predicting and monitoring running state of high-performance application Download PDF

Info

Publication number
CN111352820A
CN111352820A CN202010154757.1A CN202010154757A CN111352820A CN 111352820 A CN111352820 A CN 111352820A CN 202010154757 A CN202010154757 A CN 202010154757A CN 111352820 A CN111352820 A CN 111352820A
Authority
CN
China
Prior art keywords
key information
running state
data file
intermediate data
application
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202010154757.1A
Other languages
Chinese (zh)
Inventor
李龙翔
刘羽
杨振宇
于占乐
王倩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202010154757.1A priority Critical patent/CN111352820A/en
Publication of CN111352820A publication Critical patent/CN111352820A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring
    • G06F11/3476Data logging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention provides a method, equipment and a device for predicting and monitoring a running state of a high-performance application, wherein the method comprises the following steps: collecting a system log and an operation log generated in the operation of a target platform, sequencing messages in the system log and the operation log according to time, corresponding entries with the same time, and storing the entries as an intermediate data file; extracting key information of the intermediate data file by adopting a natural semantic processing tool in data mining, and marking character information in the extracted key information by using a corresponding digital feature vector; and respectively analyzing the numbers, the time and the text information marked by the digital feature vectors in the intermediate data file through a model trained through a machine learning algorithm, and judging the running state of the application based on the analysis result. The invention can provide the application running state in real time, improve the utilization rate of platform computing resources and reduce the queuing waiting time of computing tasks of users.

Description

Method, equipment and device for predicting and monitoring running state of high-performance application
Technical Field
The present invention relates to the field of computers, and more particularly, to a method, device and apparatus for predicting and monitoring an operating state of a high-performance application.
Background
A high performance or supercomputing (HPC) cluster is a computer with very large computational performance and scale, and programs running on such a computing cluster typically use parallel algorithms to solve complex computational problems by dividing the computational task into many small problems. As computing demands for different applications have increased, more and more computing applications have begun to be solved using high performance computers. The method has the advantages that the application running state is accurately judged, the application running time is predicted, and the like, and the method plays an important role in maintaining the high-performance cluster, can effectively improve the platform running efficiency, reduces the queuing waiting time of users, and improves the user experience. However, as the scale of high-performance computers increases during the daily operation of cloud computing or supercomputing platforms, the challenge of maintaining the normal operation of the computers also increases. The difficulty in maintenance is not only the large amount of synchronized data that the system is producing every moment, but also the difficulty in analyzing the data to obtain useful information about the operating conditions of the system. In addition, since different applications may generate a large amount of information during running, such as different job logs and application logs, the conventional manual method needs personnel to have certain basic knowledge of recalculation and application when determining the running state of the application. However, the manual method cannot analyze the mass data generated by the platform in time, so that the operation conditions of the applications on different nodes of the current platform cannot be judged in time.
At present, a plurality of system automation operation and maintenance tools are provided, and more mature schemes comprise schemes based on statistical methods, machine learning methods and the like. In the statistical-based method, an anomaly score is given by testing the test data, and if the anomaly score is higher than a threshold value, it is considered as an anomaly point. The method can provide more accurate prediction on the premise of setting a proper threshold value and adjusting parameters. However, although this statistically based abnormality detection method can provide a more accurate prediction on the premise that an appropriate threshold value is set and parameters are adjusted, it is very difficult to adjust the threshold value and the parameters. In addition, each variable is assumed to satisfy statistical distribution, and most training schemes also rely on an assumption process, which is not in accordance with the limit in the practical application process.
The second category belongs to a machine learning-based method, and mainly comprises a classification algorithm and a clustering algorithm. The classification algorithm is a supervised machine learning algorithm, and the necessary premise is that the class to which the classification data included in the training set belongs is known. The clustering algorithm is an unsupervised machine learning algorithm, and is generally used for clustering sample data based on distance to identify abnormal points, but the method has the defect that faults which do not appear in training samples cannot be pre-warned. At present, machine learning methods are used for assisting system anomaly detection, and most of the methods are used for analyzing a single log file. In the running process of the high-performance application, the normal running of the program depends on the normal running of the platform operating system, the job scheduling system and the application. The application running state cannot be comprehensively judged only by using a single log file.
Disclosure of Invention
In view of this, an object of the embodiments of the present invention is to provide a method, a device, and a device for predicting and monitoring an operating state of a high-performance application, which implement real-time analysis on log files at different levels in an operating process of the high-performance application by combining data mining and machine learning, so as to improve task scheduling and utilization of a high-performance platform.
Based on the above object, an aspect of the embodiments of the present invention provides a method for predicting and monitoring an operation state of a high-performance application, including the following steps:
collecting a system log and an operation log generated in the operation of a target platform, sequencing messages in the system log and the operation log according to time, corresponding entries with the same time, and storing the entries as an intermediate data file;
extracting key information of the intermediate data file by adopting a natural semantic processing tool in data mining, and marking character information in the extracted key information by using a corresponding digital feature vector;
and respectively analyzing the numbers, the time and the text information marked by the digital feature vectors in the intermediate data file through a model trained through a machine learning algorithm, and judging the running state of the application based on the analysis result.
In some embodiments, the application execution state includes: normal operation, user termination, node error, and run timeout.
In some embodiments, the extracting key information of the intermediate data file by using a natural semantic processing tool in data mining, and labeling text information in the extracted key information with a corresponding digital feature vector includes:
and extracting the key information in the intermediate data file by adopting a topic model LDA method in text modeling, and taking the probability distribution of the extracted key information as a characteristic vector of the key information.
In some embodiments, the analyzing the numbers, the time and the text information labeled by the digital feature vectors in the intermediate data file through a model trained through a machine learning algorithm, and the determining the application running state based on the analysis result includes:
and receiving the log file which is existed in the application and is processed by the preprocessing module and the data analysis module and the corresponding running state thereof as training data, and training the model by a machine learning algorithm.
Another aspect of the embodiments of the present invention provides a high performance application running state prediction and monitoring device, including:
the system comprises a preprocessing module, a data processing module and a data processing module, wherein the preprocessing module is configured to collect a system log and an operation log generated in the running of a target platform, sort the messages in the system log and the operation log according to time, correspond the entries with the same time and store the corresponding entries as an intermediate data file;
the data analysis module is configured to extract key information of the intermediate data file by adopting a natural semantic processing tool in data mining, and mark character information in the extracted key information by using a corresponding digital feature vector;
and the automatic monitoring module is configured to analyze the numbers, the time and the text information marked by the digital feature vectors in the intermediate data file through a model trained through a machine learning algorithm, and judge the application running state based on the analysis result.
In some embodiments, the automated monitoring module is configured to:
and receiving the log file which is existed in the application and is processed by the preprocessing module and the data analysis module and the corresponding running state thereof as training data, and training the model by a machine learning algorithm.
In some embodiments, the machine learning algorithm comprises: decision trees, random forests, artificial neural networks, bayesian learning.
In some embodiments, the operational state includes: normal operation, user termination, node error, and run timeout.
In some embodiments, the data analysis module is further configured to:
and extracting the key information in the intermediate data file by adopting a topic model LDA method in text modeling, and taking the probability distribution of the extracted key information as a characteristic vector of the key information.
Another aspect of the embodiments of the present invention provides a high performance application running state predicting and monitoring apparatus, including:
at least one processor; and
a memory storing program code executable by the processor, the program code implementing the method of any of the above when executed by the processor.
The invention has the following beneficial technical effects: according to the method, the device and the device for predicting and monitoring the running state of the high-performance application, provided by the embodiment of the invention, on a large-scale high-performance computing platform, by means of a data mining and machine learning model, a computer can automatically judge the running state of the high-performance application, so that the manual pressure is reduced; by using the data mining method, the application running state can be provided in real time, the utilization rate of platform computing resources is improved, and the queuing waiting time of computing tasks of users is reduced.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other embodiments can be obtained by using the drawings without creative efforts.
FIG. 1 is a flow chart of a high performance application run state prediction and monitoring method according to the present invention;
FIG. 2 is a flow chart of automated monitoring and forecasting by the high performance application operating condition forecasting and monitoring device of the present invention;
fig. 3 is a schematic diagram of a hardware configuration of a high-performance application operation state prediction and monitoring apparatus according to the present invention.
Detailed Description
Embodiments of the present invention are described below. However, it is to be understood that the disclosed embodiments are merely examples and that other embodiments may take various and alternative forms. The figures are not necessarily to scale; certain features may be exaggerated or minimized to show details of particular components. Therefore, specific structural and functional details disclosed herein are not to be interpreted as limiting, but merely as a representative basis for teaching one skilled in the art to variously employ the present invention. As one of ordinary skill in the art will appreciate, various features illustrated and described with reference to any one of the figures may be combined with features illustrated in one or more other figures to produce embodiments that are not explicitly illustrated or described. The combination of features shown provides a representative embodiment for a typical application. However, various combinations and modifications of the features consistent with the teachings of the present invention may be desired for certain specific applications or implementations.
In order to make the objects, technical solutions and advantages of the present invention more apparent, the following embodiments of the present invention are described in further detail with reference to the accompanying drawings.
In the running process of the high-performance application, the normal running of a target platform operating system is not only relied on, but also the normal work of a job scheduling system (such as Slurm, Moab and the like) is required. In the operation and maintenance process of the high-performance cluster, management personnel are required to monitor the state of the cluster system all the time, and meanwhile, the operation state of the operation scheduling system in the operation process of the high-performance application is concerned, so that errors in the operation of the application are avoided. In maintaining large-scale clusters, it becomes difficult to analyze the large amounts of synchronized data that are generated by the system at all times due to the need to monitor the operational status of the applications around the clock to obtain useful information about the operational status of the system. However, by using big data and artificial intelligence technology, the system and the job logs can be automatically analyzed, the real-time analysis and prediction of the running states of the system and the job scheduling system are realized, and the normal running of the cluster system is guaranteed.
In view of the above, an aspect of the embodiments of the present invention provides a method for predicting and monitoring an operation state of a high performance application, as shown in fig. 1, including the following steps:
step S101: collecting a system log and an operation log generated in the operation of a target platform, sequencing messages in the system log and the operation log according to time, corresponding entries with the same time, and storing the entries as an intermediate data file;
step S102: extracting key information of the intermediate data file by adopting a natural semantic processing tool in data mining, and marking character information in the extracted key information by using a corresponding digital feature vector;
step S103: and respectively analyzing the numbers, the time and the text information marked by the digital feature vectors in the intermediate data file through a model trained through a machine learning algorithm, and judging the running state of the application based on the analysis result.
In some embodiments, the application running state comprises: normal operation, user termination, node error, and run timeout.
In some embodiments, the extracting key information of the intermediate data file by using a natural semantic processing tool in data mining, and labeling text information in the extracted key information with a corresponding digital feature vector includes: and extracting the key information in the intermediate data file by adopting a topic model LDA method in text modeling, and taking the probability distribution of the extracted key information as a characteristic vector of the key information.
In some embodiments, the analyzing the numbers, the time and the text information labeled by the digital feature vectors in the intermediate data file through the model trained through the machine learning algorithm, respectively, and the determining the application running state based on the analysis result includes: and receiving the log file which is existed in the application and is processed by the preprocessing module and the data analysis module and the corresponding running state thereof as training data, and training the model by a machine learning algorithm.
In some embodiments, during the training phase, system logs and job logs generated during the running of high performance applications on the target platform are collected, as well as the running state of the high performance applications as a test set. And analyzing the application logs in the test set by adopting a preprocessing module and a data analysis module to obtain corresponding digital feature vectors. And inputting the processed test set into a model based on deep learning for training to obtain a trained model. In the system deployment stage, the trained model is deployed on a target platform to configure the monitoring system. When the application runs, the preprocessing module automatically reads the system and the job log for processing to generate an intermediate data file. The data analysis module automatically analyzes the intermediate data file, generates a digital characteristic vector and substitutes the digital characteristic vector into the monitoring system to generate a high-performance application running state. For the condition that the monitoring system returns errors, such as 'node error' and 'operation overtime', current log information and corresponding errors are stored, so that users and operation and maintenance personnel can conveniently check the errors generated in the system or the job submitting process.
Where technically feasible, the technical features listed above for the different embodiments may be combined with each other or changed, added, omitted, etc. to form further embodiments within the scope of the invention.
It can be seen from the above embodiments that, in the high-performance application running state prediction and monitoring method provided by the embodiments of the present invention, on a large-scale high-performance computing platform, with the help of a data mining and machine learning model, a computer can automatically judge the running state of the high-performance application, and reduce the manual pressure; by using the data mining method, the application running state can be provided in real time, the utilization rate of platform computing resources is improved, and the queuing waiting time of computing tasks of users is reduced.
In view of the above object, another aspect of the embodiments of the present invention provides a high performance application running state predicting and monitoring device, as shown in fig. 2, including:
the system comprises a preprocessing module, a data processing module and a data processing module, wherein the preprocessing module is configured to collect a system log and an operation log generated in the running of a target platform, sort the messages in the system log and the operation log according to time, correspond the entries with the same time and store the corresponding entries as an intermediate data file;
the data analysis module is configured to extract key information of the intermediate data file by adopting a natural semantic processing tool in data mining, and mark character information in the extracted key information by using a corresponding digital feature vector;
and the automatic monitoring module is configured to analyze the numbers, the time and the text information marked by the digital feature vectors in the intermediate data file through a model trained through a machine learning algorithm, and judge the application running state based on the analysis result.
In some embodiments, the automated monitoring module is configured to: and receiving the log file which is existed in the application and is processed by the preprocessing module and the data analysis module and the corresponding running state thereof as training data, and training the model by a machine learning algorithm.
In some embodiments, the machine learning algorithm comprises: decision trees, random forests, artificial neural networks, bayesian learning. For example, in an embodiment according to the invention that employs a random forest training auto-detection module in machine learning, the main advantage of random forests is that they can be automatically evaluated using test sets, without cross-validation or using a separate test set to obtain an unbiased estimate of the error.
In some embodiments, the operational state includes: normal operation, user termination, node error, and run timeout. When a random deep forest method is used for training a model, the existing application running logs and running results need to be collected as training data, and states are divided into four types of 'normal running', 'user termination', 'node error' and 'running overtime' according to the running results of the previous application. Of course, it should be understood that the user may classify the operating conditions as desired, or further refine the classification.
In some embodiments, the data analysis module is further configured to: and extracting the key information in the intermediate data file by adopting a topic model LDA method in text modeling, and taking the probability distribution of the extracted key information as a characteristic vector of the key information.
In some embodiments, for a large number of log records generated by different components, the job log is mapped to the same-time entries in the system log using a preprocessing tool, generating an intermediate data file. The intermediate data file stores the time-varying related content in the log and contains the corresponding information of the system and the job log at each time. And then extracting key information of the intermediate data file by using a natural semantic processing tool in data mining, expressing the contained information by using a series of characteristic vectors, finally converting the information into numerical value form vectors which are used as input of a machine learning model, wherein the method also comprises the step of dividing application operation results into four types of normal operation, user termination, node error, operation overtime and the like according to the conventional operation log, and thus training the mined data and the corresponding operation results by using a random forest method in machine learning. After the training model is obtained, the digital vector generated by the application and processed by the preprocessing tool and the data mining tool is input into the trained model, so that the application running state on the high-performance platform can be predicted in real time, and the application running state is judged.
In some embodiments, as shown in fig. 2, the present invention comprises three parts: the device comprises a preprocessing module, a data analysis module and an automatic detection module. The pre-processing module creates an intermediate data file by parsing the text file. The method comprises the steps of after the application process is monitored to be started, extracting all system logs and job log messages after the operation is started, sequencing the messages in all the logs according to time, and storing the corresponding items of the system logs and the job logs with the same time as each other as an intermediate data file.
The data mining tool processes the acquired log file by using a data analysis means, and the final aim is to describe the text content in the intermediate data into a form of a digital vector. The intermediate data text contains data such as text, number, time, and the like, and therefore, the effect of directly performing data processing is poor. For this reason, the strategy adopted by us is to separate and analyze text, numeric and temporal contents separately, and the final goal is to describe the text contents in a set of system logs as numeric feature vectors. In the data processing process, a topic model LDA (latent Dirichlet allocation) method in text modeling is adopted to extract topics (key information) contained in the intermediate data text, and the probability distribution of the extracted topics is used as a feature vector.
And finally, training an automatic detection module by adopting a random forest method in machine learning. The main advantage of random forests is that they can be automatically evaluated using test sets without cross-validation or using a separate test set to obtain an unbiased estimate of the error. When the random deep forest method is used for training the model, the existing application running log and running results are required to be collected as training data, states are divided into four types of normal running, user termination, node error, running overtime and the like according to the previous application running results, the trained model can be deployed on a monitoring platform, analysis data of log files are provided through a preprocessing module and a data analysis module, and the application running state can be judged in real time.
The core of the invention is to provide an artificial intelligence related detection platform for the high-performance application running process, so that an artificial intelligence system can automatically detect and predict the application running state. According to monitoring a system and an operation log generated in the running process of the high-performance application, extracting characteristic keywords for identification, classifying by using a machine learning method, and finally realizing the prediction of the running state of the high-performance application.
It can be seen from the foregoing embodiments that, the high-performance application running state prediction and monitoring device provided in the embodiments of the present invention introduces a data mining and machine learning method into real-time monitoring of the running state of the high-performance computing application, establishes a high-performance cluster automatic operation and maintenance platform, realizes maximum utilization of computing resources, and allows a computer to automatically determine the running state of the high-performance application, thereby reducing labor pressure.
In view of the above, in another aspect, an embodiment of the present invention provides a high performance application running state predicting and monitoring apparatus, including:
at least one processor; and
a memory storing program code executable by the processor, the program code implementing the method of any of the above when executed by the processor.
Fig. 3 is a schematic hardware structure diagram of an embodiment of a high-performance application operating state predicting and monitoring apparatus provided in the present invention.
Taking the computer apparatus shown in fig. 3 as an example, the computer apparatus includes a processor 301 and a memory 302, and may further include: an input device 303 and an output device 304.
The processor 301, the memory 302, the input device 303 and the output device 304 may be connected by a bus or other means, and fig. 3 illustrates the connection by a bus as an example.
The memory 302 is a non-volatile computer-readable storage medium, and can be used to store non-volatile software programs, non-volatile computer-executable programs, and modules, such as program instructions/modules corresponding to the high-performance application running state prediction and monitoring method in the embodiment of the present application. The processor 301 executes various functional applications of the server and data processing by running nonvolatile software programs, instructions and modules stored in the memory 302, that is, implements the high performance application running state prediction and monitoring method of the above-described method embodiment.
The memory 302 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the high-performance application operation state prediction and monitoring method, and the like. Further, the memory 302 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, memory 302 optionally includes memory located remotely from processor 301, which may be connected to a local module via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The input device 303 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the computer apparatus for the high performance application operation state prediction and monitoring method. The output means 304 may comprise a display device such as a display screen.
Program instructions/modules corresponding to the one or more high-performance application running state prediction and monitoring methods are stored in the memory 302, and when executed by the processor 301, the high-performance application running state prediction and monitoring methods in any of the above-described method embodiments are executed.
Any embodiment of the computer device executing the method for predicting and monitoring the running state of the high-performance application can achieve the same or similar effects as any corresponding embodiment of the method.
Finally, it should be noted that, as will be understood by those skilled in the art, all or part of the processes in the methods of the above embodiments may be implemented by a computer program, which may be stored in a computer-readable storage medium, and when executed, may include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), a Random Access Memory (RAM), or the like.
In addition, the apparatuses, devices and the like disclosed in the embodiments of the present invention may be various electronic terminal devices, such as a mobile phone, a Personal Digital Assistant (PDA), a tablet computer (PAD), a smart television and the like, or may be a large terminal device, such as a server and the like, and therefore the scope of protection disclosed in the embodiments of the present invention should not be limited to a specific type of apparatus, device. The client disclosed in the embodiment of the present invention may be applied to any one of the above electronic terminal devices in the form of electronic hardware, computer software, or a combination of both.
Furthermore, the method disclosed according to an embodiment of the present invention may also be implemented as a computer program executed by a CPU, and the computer program may be stored in a computer-readable storage medium. The computer program, when executed by the CPU, performs the above-described functions defined in the method disclosed in the embodiments of the present invention.
Further, the above method steps and system elements may also be implemented using a controller and a computer readable storage medium for storing a computer program for causing the controller to implement the functions of the above steps or elements.
Further, it should be appreciated that the computer-readable storage media (e.g., memory) described herein can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory. By way of example, and not limitation, nonvolatile memory can include Read Only Memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM), which can act as external cache memory. By way of example and not limitation, RAM is available in a variety of forms such as synchronous RAM (DRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), and Direct Rambus RAM (DRRAM). The storage devices of the disclosed aspects are intended to comprise, without being limited to, these and other suitable types of memory.
Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as software or hardware depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the disclosed embodiments of the present invention.
The various illustrative logical blocks, modules, and circuits described in connection with the disclosure herein may be implemented or performed with the following components designed to perform the functions described herein: a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination of these components. A general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP, and/or any other such configuration.
The steps of a method or algorithm described in connection with the disclosure herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. The ASIC may reside in a user terminal. In the alternative, the processor and the storage medium may reside as discrete components in a user terminal.
In one or more exemplary designs, the functions may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer. By way of example, and not limitation, such computer-readable media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a general-purpose or special-purpose computer, or a general-purpose or special-purpose processor. Also, any connection is properly termed a computer-readable medium. For example, if the software is transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, Digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, Digital Versatile Disc (DVD), floppy disk, blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
It should be understood that, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly supports the exception. It should also be understood that "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items.
The numbers of the embodiments disclosed in the embodiments of the present invention are merely for description, and do not represent the merits of the embodiments.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk, an optical disk, or the like.
The above-described embodiments are possible examples of implementations and are presented merely for a clear understanding of the principles of the invention. Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, of embodiments of the invention is limited to these examples; within the idea of an embodiment of the invention, also technical features in the above embodiment or in different embodiments may be combined and there are many other variations of the different aspects of an embodiment of the invention as described above, which are not provided in detail for the sake of brevity. Therefore, any omissions, modifications, substitutions, improvements, and the like that may be made without departing from the spirit and principles of the embodiments of the present invention are intended to be included within the scope of the embodiments of the present invention.

Claims (10)

1. A high-performance application running state prediction and monitoring method is characterized by comprising the following steps:
collecting a system log and an operation log generated in the operation of a target platform, sequencing messages in the system log and the operation log according to time, corresponding entries with the same time, and storing the entries as an intermediate data file;
extracting key information of the intermediate data file by adopting a natural semantic processing tool in data mining, and marking character information in the extracted key information by using a corresponding digital feature vector;
and respectively analyzing the numbers, the time and the text information marked by the digital feature vectors in the intermediate data file through a model trained through a machine learning algorithm, and judging the running state of the application based on the analysis result.
2. The method of claim 1, wherein the application run state comprises: normal operation, user termination, node error, and run timeout.
3. The method of claim 1, wherein extracting key information of the intermediate data file by using a natural semantic processing tool in data mining, and labeling text information in the extracted key information with corresponding digital feature vectors comprises:
and extracting the key information in the intermediate data file by adopting a topic model LDA method in text modeling, and taking the probability distribution of the extracted key information as a characteristic vector of the key information.
4. The method of claim 1, wherein the analyzing the numbers, the time and the text information labeled by the digital feature vectors in the intermediate data file through the model trained through the machine learning algorithm, respectively, and the determining the application running state based on the analysis result comprises:
and receiving the log file which is existed in the application and is processed by the preprocessing module and the data analysis module and the corresponding running state thereof as training data, and training the model by a machine learning algorithm.
5. A high performance application run state prediction and monitoring device, comprising:
the system comprises a preprocessing module, a data processing module and a data processing module, wherein the preprocessing module is configured to collect a system log and an operation log generated in the running of a target platform, sort the messages in the system log and the operation log according to time, correspond the entries with the same time and store the corresponding entries as an intermediate data file;
the data analysis module is configured to extract key information of the intermediate data file by adopting a natural semantic processing tool in data mining, and mark character information in the extracted key information by using a corresponding digital feature vector;
and the automatic monitoring module is configured to analyze the numbers, the time and the text information marked by the digital feature vectors in the intermediate data file through a model trained through a machine learning algorithm, and judge the application running state based on the analysis result.
6. The device of claim 5, wherein the auto-monitoring module is configured to:
and receiving the log file which is existed in the application and is processed by the preprocessing module and the data analysis module and the corresponding running state thereof as training data, and training the model by a machine learning algorithm.
7. The apparatus of claim 5, wherein the machine learning algorithm comprises: decision trees, random forests, artificial neural networks, bayesian learning.
8. The apparatus of claim 5, wherein the operational state comprises: normal operation, user termination, node error, and run timeout.
9. The device of claim 5, wherein the data analysis module is further configured to:
and extracting the key information in the intermediate data file by adopting a topic model LDA method in text modeling, and taking the probability distribution of the extracted key information as a characteristic vector of the key information.
10. A high performance application run state prediction and monitoring apparatus, comprising:
at least one processor; and
a memory storing program code executable by the processor, the program code implementing the method of any one of claims 1-4 when executed by the processor.
CN202010154757.1A 2020-03-08 2020-03-08 Method, equipment and device for predicting and monitoring running state of high-performance application Withdrawn CN111352820A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010154757.1A CN111352820A (en) 2020-03-08 2020-03-08 Method, equipment and device for predicting and monitoring running state of high-performance application

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010154757.1A CN111352820A (en) 2020-03-08 2020-03-08 Method, equipment and device for predicting and monitoring running state of high-performance application

Publications (1)

Publication Number Publication Date
CN111352820A true CN111352820A (en) 2020-06-30

Family

ID=71192576

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010154757.1A Withdrawn CN111352820A (en) 2020-03-08 2020-03-08 Method, equipment and device for predicting and monitoring running state of high-performance application

Country Status (1)

Country Link
CN (1) CN111352820A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112612664A (en) * 2020-12-24 2021-04-06 百度在线网络技术(北京)有限公司 Electronic equipment testing method and device, electronic equipment and storage medium
CN113778790A (en) * 2021-08-19 2021-12-10 北京仿真中心 Method and system for monitoring state of computing system based on Zabbix

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112612664A (en) * 2020-12-24 2021-04-06 百度在线网络技术(北京)有限公司 Electronic equipment testing method and device, electronic equipment and storage medium
CN112612664B (en) * 2020-12-24 2024-04-02 百度在线网络技术(北京)有限公司 Electronic equipment testing method and device, electronic equipment and storage medium
CN113778790A (en) * 2021-08-19 2021-12-10 北京仿真中心 Method and system for monitoring state of computing system based on Zabbix

Similar Documents

Publication Publication Date Title
US11294754B2 (en) System and method for contextual event sequence analysis
CN112148772A (en) Alarm root cause identification method, device, equipment and storage medium
KR20220114986A (en) Apparatus for VNF Anomaly Detection based on Machine Learning for Virtual Network Management and a method thereof
EP3918472B1 (en) Techniques to detect fusible operators with machine learning
JP2018045403A (en) Abnormality detection system and abnormality detection method
CN114757468B (en) Root cause analysis method for process execution abnormality in process mining
CN111738331A (en) User classification method and device, computer-readable storage medium and electronic device
CN114090406A (en) Electric power Internet of things equipment behavior safety detection method, system, equipment and storage medium
CN111582341A (en) User abnormal operation prediction method and device
CN111352820A (en) Method, equipment and device for predicting and monitoring running state of high-performance application
CN112506750A (en) Big data processing system for mass log analysis and early warning
Xie et al. Logm: Log analysis for multiple components of hadoop platform
CN113282920B (en) Log abnormality detection method, device, computer equipment and storage medium
CN112508723B (en) Financial risk prediction method and device based on automatic preferential modeling and electronic equipment
CN117675691A (en) Remote fault monitoring method, device, equipment and storage medium of router
CN116843395A (en) Alarm classification method, device, equipment and storage medium of service system
CN111967003A (en) Automatic wind control rule generation system and method based on black box model and decision tree
CN116155541A (en) Automatic machine learning platform and method for network security application
CN116225848A (en) Log monitoring method, device, equipment and medium
CN111475380B (en) Log analysis method and device
Zhu et al. A Performance Fault Diagnosis Method for SaaS Software Based on GBDT Algorithm.
CN114385398A (en) Request response state determination method, device, equipment and storage medium
CN114610590A (en) Method, device and equipment for determining operation time length and storage medium
CN111209158B (en) Mining monitoring method and cluster monitoring system for server cluster
CN110458383B (en) Method and device for realizing demand processing servitization, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20200630