CN111209179A - Method, device and system for collecting and analyzing system operation and maintenance data - Google Patents

Method, device and system for collecting and analyzing system operation and maintenance data Download PDF

Info

Publication number
CN111209179A
CN111209179A CN202010324455.4A CN202010324455A CN111209179A CN 111209179 A CN111209179 A CN 111209179A CN 202010324455 A CN202010324455 A CN 202010324455A CN 111209179 A CN111209179 A CN 111209179A
Authority
CN
China
Prior art keywords
data
wmi
collecting
analyzing
decision tree
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010324455.4A
Other languages
Chinese (zh)
Inventor
查文宇
张艳清
王纯斌
张永飞
殷腾蛟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Sefon Software Co Ltd
Original Assignee
Chengdu Sefon Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Sefon Software Co Ltd filed Critical Chengdu Sefon Software Co Ltd
Priority to CN202010324455.4A priority Critical patent/CN111209179A/en
Publication of CN111209179A publication Critical patent/CN111209179A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring
    • G06F11/3476Data logging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Testing And Monitoring For Control Systems (AREA)

Abstract

The invention discloses a method, a device and a system for collecting and analyzing system operation and maintenance data. The system collects data of Windows equipment, including CPU performance, hardware equipment information, memory information, service information, process information and the like, based on the Windows management specification WMI, models the collected data of different equipment to form a prediction model, and judges the running state of the monitored system according to the prediction result of the prediction model. The problem of current operation and maintenance system control generally rely on the threshold value of artificially setting for each parameter to realize, the threshold value scope of setting also generally obtains according to experience, can't adjust in order to adapt to system load more to every system, and the flexibility is relatively poor is solved.

Description

Method, device and system for collecting and analyzing system operation and maintenance data
Technical Field
The invention relates to the field of system operation and maintenance, in particular to a method, a device and a system for collecting and analyzing system operation and maintenance data.
Background
The operation and maintenance refers to the operation and maintenance of each stage of the life cycle of the network, the server and the service so as to ensure the stability, the availability and the cost controllability of the system. The operation and maintenance duty runs through the life cycle of the product, and an automatic and intelligent platform is needed to help an operation and maintenance engineer to complete service delivery and service quality guarantee for users at the lowest cost and the highest speed. The operation and maintenance platform is developed after an operation and maintenance platform research engineer understands service requirements, and mainly comprises the following steps: the system comprises a machine management, a resource management, a network management, a framework infrastructure, a deployment platform, a configuration management platform, a data management platform, a monitoring platform, a capacity management, a flow management, a fault management, a service scheduling platform, a workflow engine, a right management, an operation and maintenance metadata management and an operation and maintenance unified portal.
WMI: windows Management Instrumentation, Windows Management Specification: is a core Windows management technology; users may manage local and remote computers using WMI. WMI allows access to a variety of operating system building blocks through a common interface. WMI serves as a specification and infrastructure through which nearly all Windows resources can be accessed, configured, managed, and monitored, such as a user initiating a process on a remote computing machine; setting a process to run at a specific date and time; remotely starting the computer; obtaining an installed program list of a local or remote computer; query Windows event logs for local or remote computers, and so on.
The existing operation and maintenance system monitoring is generally realized by manually setting the threshold of each parameter, the set threshold range is generally obtained according to experience, adjustment cannot be performed on each system to adapt to the system load, and the flexibility is poor.
Disclosure of Invention
The invention aims to: the method, the device and the system for collecting and analyzing the operation and maintenance data of the system are provided, and the problems that the existing operation and maintenance system monitoring is generally realized by manually setting the threshold value of each parameter, the set threshold value range is generally obtained according to experience, the adjustment cannot be performed on each system to adapt to the system load better, and the flexibility is poor are solved.
The technical scheme adopted by the invention is as follows:
a decision tree model of system WMI data is established, and the established decision tree model is used for analyzing the acquired system WMI data. The method comprises the steps of collecting data of Windows equipment, including CPU performance, hardware equipment information, memory information, service information, process information and the like based on Windows management specification WMI, modeling aiming at the collected data of different equipment to form a prediction model based on a decision tree model, judging the running state of a monitored system according to the prediction result of the prediction model, and feeding back the running state of the monitored system to operation and maintenance personnel in time.
Further, the WMI data includes at least one of CPU performance, hardware device information, memory information, service information, and process information.
Further, the output of the decision tree model of the WMI data comprises normal operation and abnormal error reporting.
Further, the output of the decision tree model of the WMI data further includes an optimal state. The output also comprises an optimal state, the output shows that the system operates in the optimal state, operation and maintenance personnel can automatically adjust the operation state of the monitored system according to the result, the operation and maintenance monitoring and service monitoring capabilities are improved, and the intelligent operation, maintenance and monitoring capabilities are realized.
Further, the method for establishing the decision tree model of the system WMI data comprises the following steps:
s1, determining the output of a decision tree model of the WMI data, and marking and classifying the data of each category in the WMI data;
s2, taking all records in the WMI data as a node;
s3, traversing each segmentation mode of each category in the nodes, finding the segmentation mode with the largest information gain ratio, and segmenting the nodes to obtain child nodes;
and S4, circularly executing the step S3 for each child node until the purity of each child node meets the preset requirement to obtain a decision tree model of the WMI data.
The partitioning of the decision tree model may take three indices:
entropy: the lower the entropy value, the more ordered the system; the higher the entropy value, the more chaotic the system;
information gain and information gain ratio: the difference of the entropies before and after dividing the data set by a certain characteristic can represent the uncertainty of the sample set, and the larger the entropy is, the larger the uncertainty of the sample is. Therefore, the difference value of the entropy of the set before and after the division can be used for measuring the dividing effect of the sample set by using the current characteristics;
gini coefficient: since the Gini coefficient is a probability that two samples are randomly extracted from the data set and the class labels are inconsistent, the smaller the Gini coefficient is, the higher the purity of the data set is.
And (3) generating a decision tree: the decision tree is generated recursively starting from the root node, usually using information gain maximization or information gain ratio maximization as the optimal feature. The method is equivalent to continuously selecting local optimal features by using information gain or segmenting a training set into subsets capable of being segmented basically correctly.
Further, the index of the purity is one of a Gini coefficient, entropy, and error rate. The decision tree is constructed based on sample probability and purity, and whether the data set is 'pure' can be judged through three indexes: gini coefficient, entropy, error rate. The larger the values of the three indices, the less pure the data. Smaller values indicate purer data.
A collection and analysis device for system operation and maintenance data comprises:
a memory for storing executable instructions, WMI data, and a decision tree model;
and the processor is used for executing the executable instructions stored in the memory to realize the method for collecting and analyzing the operation and maintenance data of the system.
The system comprises a monitored system, a WMI acquisition system for acquiring WMI data of the monitored system and a monitoring system for monitoring and analyzing the WMI data, wherein the monitoring system comprises the system operation and maintenance data acquisition and analysis device.
In summary, due to the adoption of the technical scheme, the invention has the beneficial effects that:
1. according to the method, the device and the system for collecting and analyzing the system operation and maintenance data, manual intervention is eliminated, the relevant threshold value is automatically generated, and the flexibility and the system load adaptability are relatively better;
2. according to the method, the device and the system for collecting and analyzing the system operation and maintenance data, the requirement on data preprocessing is low based on a decision tree technology, the data type and the conventional type attributes can be processed simultaneously, and a feasible and good-effect result can be made on a large data source in a relatively short time;
3. the invention relates to a method, a device and a system for collecting and analyzing system operation and maintenance data, which are based on WMI technology and have strong universality.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts, wherein:
FIG. 1 is a schematic diagram of the system of the present invention;
fig. 2 is a schematic diagram of a conventional operation and maintenance system.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail with reference to fig. 1 to 2, the described embodiments should not be construed as limiting the present invention, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein is for the purpose of describing embodiments of the invention only and is not intended to be limiting of the invention.
Before further detailed description of the embodiments of the present invention, terms and expressions mentioned in the embodiments of the present invention are explained, and the terms and expressions mentioned in the embodiments of the present invention are applied to the following explanations.
WMI data: data obtained according to Windows Management Instrumentation, Windows Management Specification;
decision tree: in machine learning, a decision tree is a predictive model that represents a mapping between object attributes and object values.
Example 1
A decision tree model of system WMI data is established, and the established decision tree model is used for analyzing the acquired system WMI data. The system collects data of Windows equipment, including CPU performance, hardware equipment information, memory information, service information, process information and the like, based on the Windows management specification WMI, and finally carries out modeling aiming at the collected data of different equipment based on a decision tree model to form a prediction model and find out the equipment information or service information which shows the best.
Example 2
In this embodiment, based on embodiment 1, the WMI data includes at least one of CPU performance, hardware device information, memory information, service information, and process information.
Further, the output of the decision tree model of the WMI data comprises normal operation and abnormal error reporting.
Further, the output of the decision tree model of the WMI data further includes an optimal state.
Example 3
In this embodiment, based on embodiment 1, the method for establishing a decision tree model of system WMI data includes the following steps:
s1, determining the output of a decision tree model of the WMI data, and marking and classifying the data of each category in the WMI data;
s2, taking all records in the WMI data as a node;
s3, traversing each segmentation mode of each category in the nodes, finding the segmentation mode with the largest information gain ratio, and segmenting the nodes to obtain child nodes;
and S4, circularly executing the step S3 for each child node until the purity of each child node meets the preset requirement to obtain a decision tree model of the WMI data.
The partitioning of the decision tree model may take three indices:
entropy: the lower the entropy value, the more ordered the system; the higher the entropy value, the more chaotic the system;
information gain and information gain ratio: the difference of the entropies before and after dividing the data set by a certain characteristic can represent the uncertainty of the sample set, and the larger the entropy is, the larger the uncertainty of the sample is. Therefore, the difference value of the entropy of the set before and after the division can be used for measuring the dividing effect of the sample set by using the current characteristics;
gini coefficient: since the Gini coefficient is a probability that two samples are randomly extracted from the data set and the class labels are inconsistent, the smaller the Gini coefficient is, the higher the purity of the data set is.
And (3) generating a decision tree: the decision tree is generated recursively starting from the root node, usually using information gain maximization or information gain ratio maximization as the optimal feature. The method is equivalent to continuously selecting local optimal features by using information gain or segmenting a training set into subsets capable of being segmented basically correctly.
Further, the index of the purity is one of a Gini coefficient, entropy, and error rate. The decision tree is constructed based on sample probability and purity, and whether the data set is 'pure' can be judged through three indexes: gini coefficient, entropy, error rate. The larger the values of the three indices, the less pure the data. Smaller values indicate purer data.
Example 4
A collection and analysis device for system operation and maintenance data comprises:
a memory for storing executable instructions, WMI data, and a decision tree model;
and the processor is used for executing the executable instructions stored in the memory to realize the method for collecting and analyzing the operation and maintenance data of the system.
Example 5
As shown in fig. 1, a system operation and maintenance data acquisition and analysis system includes a monitored system, a WMI acquisition system for acquiring WMI data of the monitored system, and a monitoring system for performing monitoring and analysis on the WMI data, where the monitoring system includes an apparatus for acquiring and analyzing system operation and maintenance data as described above.
As shown in fig. 2, the operation and maintenance data collecting and analyzing system of the conventional system includes a monitored system, an operation and maintenance collecting system, a storage system, a data analyzing system, a monitoring system and a management system, wherein the operation and maintenance data of the monitored system is collected by the operation and maintenance data collecting system in the work flow, the operation and maintenance data is stored in the storage system, the data analyzing system analyzes the operation and maintenance data according to an operation and maintenance data threshold set in the monitoring system, an analysis result is fed back to the monitoring system, the monitoring system gives an alarm to the management system when the operation and maintenance data exceeds the threshold, and an operator can set the threshold of the monitoring system through the management system.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method can be implemented in other ways. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, the functional modules in the embodiments of the present invention may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention. It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (8)

1. A method for collecting and analyzing system operation and maintenance data is characterized by comprising the following steps: and establishing a decision tree model of system WMI data, and analyzing the collected system WMI data by using the established decision tree model.
2. The method for collecting and analyzing the operation and maintenance data of the system according to claim 1, wherein the method comprises the following steps: the WMI data includes at least one of CPU performance, hardware device information, memory information, service information, and process information.
3. The method for collecting and analyzing the operation and maintenance data of the system according to claim 1, wherein the method comprises the following steps: and the output of the decision tree model of the WMI data comprises normal operation and abnormal error reporting.
4. The method for collecting and analyzing the operation and maintenance data of the system according to claim 3, wherein the method comprises the following steps: the output of the decision tree model of the WMI data further comprises an optimal state.
5. The method for collecting and analyzing the operation and maintenance data of the system according to claim 1, wherein the method comprises the following steps: the method for establishing the decision tree model of the system WMI data comprises the following steps:
s1, determining the output of a decision tree model of the WMI data, and marking and classifying the data of each category in the WMI data;
s2, taking all records in the WMI data as a node;
s3, traversing each segmentation mode of each category in the nodes, finding the segmentation mode with the largest information gain ratio, and segmenting the nodes to obtain child nodes;
and S4, circularly executing the step S3 for each child node until the purity of each child node meets the preset requirement to obtain a decision tree model of the WMI data.
6. The method for collecting and analyzing the operation and maintenance data of the system according to claim 5, wherein the method comprises the following steps: the purity index is one of Gini coefficient, entropy and error rate.
7. The utility model provides a collection analytical equipment of system operation and maintenance data which characterized in that: the method comprises the following steps:
a memory for storing executable instructions, WMI data, and a decision tree model;
a processor for executing the executable instructions stored in the memory to implement the method for collecting and analyzing the operation and maintenance data of the system as claimed in claim 1.
8. A system operation and maintenance data acquisition and analysis system is characterized in that: comprising a monitored system, a WMI collecting system for collecting WMI data of the monitored system and a monitoring system for monitoring and analyzing the WMI data, wherein the monitoring system comprises a system operation and maintenance data collecting and analyzing device as claimed in claim 7.
CN202010324455.4A 2020-04-23 2020-04-23 Method, device and system for collecting and analyzing system operation and maintenance data Pending CN111209179A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010324455.4A CN111209179A (en) 2020-04-23 2020-04-23 Method, device and system for collecting and analyzing system operation and maintenance data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010324455.4A CN111209179A (en) 2020-04-23 2020-04-23 Method, device and system for collecting and analyzing system operation and maintenance data

Publications (1)

Publication Number Publication Date
CN111209179A true CN111209179A (en) 2020-05-29

Family

ID=70782255

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010324455.4A Pending CN111209179A (en) 2020-04-23 2020-04-23 Method, device and system for collecting and analyzing system operation and maintenance data

Country Status (1)

Country Link
CN (1) CN111209179A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106951359A (en) * 2017-02-28 2017-07-14 深圳市华傲数据技术有限公司 A kind of system health degree determination method and device
CN108683530A (en) * 2018-04-28 2018-10-19 北京百度网讯科技有限公司 Data analysing method, device and the storage medium of multi-dimensional data
CN109522193A (en) * 2018-10-22 2019-03-26 网宿科技股份有限公司 A kind of processing method of operation/maintenance data, system and device
CN109784504A (en) * 2018-12-24 2019-05-21 贵州宇豪科技发展有限公司 Data center's long-distance intelligent operation management method and system
CN110428127A (en) * 2019-06-19 2019-11-08 深圳壹账通智能科技有限公司 Automated analysis method, user equipment, storage medium and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106951359A (en) * 2017-02-28 2017-07-14 深圳市华傲数据技术有限公司 A kind of system health degree determination method and device
CN108683530A (en) * 2018-04-28 2018-10-19 北京百度网讯科技有限公司 Data analysing method, device and the storage medium of multi-dimensional data
CN109522193A (en) * 2018-10-22 2019-03-26 网宿科技股份有限公司 A kind of processing method of operation/maintenance data, system and device
CN109784504A (en) * 2018-12-24 2019-05-21 贵州宇豪科技发展有限公司 Data center's long-distance intelligent operation management method and system
CN110428127A (en) * 2019-06-19 2019-11-08 深圳壹账通智能科技有限公司 Automated analysis method, user equipment, storage medium and device

Similar Documents

Publication Publication Date Title
CN108415789B (en) Node fault prediction system and method for large-scale hybrid heterogeneous storage system
WO2021052394A1 (en) Model training method, apparatus, and system
EP3798846A1 (en) Operation and maintenance system and method
US20090204551A1 (en) Learning-Based Method for Estimating Costs and Statistics of Complex Operators in Continuous Queries
EP4020315A1 (en) Method, apparatus and system for determining label
CN112183758A (en) Method and device for realizing model training and computer storage medium
CN112769605B (en) Heterogeneous multi-cloud operation and maintenance management method and hybrid cloud platform
CN113779272A (en) Data processing method, device and equipment based on knowledge graph and storage medium
CN115225536B (en) Virtual machine abnormality detection method and system based on unsupervised learning
CN111176953B (en) Abnormality detection and model training method, computer equipment and storage medium
Shekhawat et al. Datacenter workload classification and characterization: An empirical approach
CN110427298A (en) A kind of Automatic Feature Extraction method of distributed information log
CN114327964A (en) Method, device, equipment and storage medium for processing fault reasons of service system
CN112148578A (en) IT fault defect prediction method based on machine learning
CN116132263B (en) Alarm solution recommending method and device, electronic equipment and storage medium
CN103744897A (en) Associated search method and associated search system for fault information, and network management system
CN115422003A (en) Data quality monitoring method and device, electronic equipment and storage medium
CN115858794A (en) Abnormal log data identification method for network operation safety monitoring
CN117371933A (en) Intelligent laboratory management system based on Internet of things
CN115130847A (en) Equipment portrait modeling method and system
CN114090393A (en) Method, device and equipment for determining alarm level
CN115883392B (en) Data perception method and device of computing power network, electronic equipment and storage medium
CN117170724A (en) Automatic updating method, device and equipment for AI model for detecting business abnormality
CN111209179A (en) Method, device and system for collecting and analyzing system operation and maintenance data
CN110415136B (en) Service capability evaluation system and method for power dispatching automation system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200529