CN108074022A - A kind of hardware resource analysis and appraisal procedure based on concentration O&M - Google Patents

A kind of hardware resource analysis and appraisal procedure based on concentration O&M Download PDF

Info

Publication number
CN108074022A
CN108074022A CN201610989588.7A CN201610989588A CN108074022A CN 108074022 A CN108074022 A CN 108074022A CN 201610989588 A CN201610989588 A CN 201610989588A CN 108074022 A CN108074022 A CN 108074022A
Authority
CN
China
Prior art keywords
analysis
data
indexes
index
hardware resource
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610989588.7A
Other languages
Chinese (zh)
Inventor
邢颖
郎燕生
张印
李强
韩锋
李森
张振
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Corp of China SGCC
China Electric Power Research Institute Co Ltd CEPRI
Original Assignee
State Grid Corp of China SGCC
China Electric Power Research Institute Co Ltd CEPRI
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Corp of China SGCC, China Electric Power Research Institute Co Ltd CEPRI filed Critical State Grid Corp of China SGCC
Priority to CN201610989588.7A priority Critical patent/CN108074022A/en
Publication of CN108074022A publication Critical patent/CN108074022A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/06Electricity, gas or water supply
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Abstract

The present invention relates to a kind of based on the hardware resource analysis for concentrating O&M and appraisal procedure, comprise the following steps:1) type of system hardware resources monitoring index is defined;2) system hardware resources indexs divides the cycle to gather;3) analysis and the assessment models of system hardware resources monitoring index are established;4) system hardware resources analysis and assessment.Technical solution provided by the invention is for the automation equipment resource data for the multilevel system for concentrating O&M pattern, propose a variety of effective analyses and appraisal procedure, with forecasting system resource development trend, the regular and existing security risk of digging system operational process, dispatching automation operation maintenance personnel is assisted to grasp the problem of system is likely to occur comprehensively, multistage intelligent dispatching of power netwoks control system Risk-recovery level is promoted, meets the fast-developing requirement to Dispatching Control System maintenance work of power grid.

Description

Hardware resource analysis and evaluation method based on centralized operation and maintenance
Technical Field
The invention relates to an analysis and evaluation method of a power system, in particular to a hardware resource analysis and evaluation method based on centralized operation and maintenance.
Background
The intelligent power grid dispatching control system is a world-wide maximum-scale power grid dispatching control system intensively organized by national power grid companies and intensively researched and developed by multiple technical enterprises, adopts a multistage hierarchical structure, realizes transverse integration and longitudinal communication of power grid dispatching services, and realizes real-time monitoring, accident cooperative treatment and global economic dispatching of a super-large power grid. During the thirteen-five period, along with the rapid development of the extra-high voltage alternating current-direct current interconnected power grid, the requirements of safe and stable operation of the power grid on the intelligent power grid dispatching control system are continuously improved.
The management system of the national power grid company is changing to intensification and lean, the operation and maintenance mode of the intelligent power grid dispatching control system begins to change from a dispersed operation and maintenance mode to a centralized operation and maintenance mode, in order to comprehensively enhance the safety and stability of the intelligent power grid dispatching control system and improve the availability level of system dispatching automation equipment resources, a centralized operation and maintenance center is urgently needed to uniformly monitor and accurately judge the running state and the development trend of the equipment, advanced analysis and evaluation technology is needed to be used as a support, state evaluation work in the centralized operation and maintenance mode is a grapple, safety early warning of multi-level dispatching control system equipment is realized, the comprehensive improvement of the full-process management level of dispatching automation specialties is promoted, the rapid development of the technical level of the dispatching automation specialties is promoted, and the safety production basis of the power grid is realized.
The resource monitoring and analysis and evaluation of the existing scheduling control system are limited in each regulation center, data among the multi-stage systems are not shared, analysis content is mainly focused on real-time monitoring and short-term data statistics, unified expression of the resource use condition of the multi-stage systems is lacked, effective analysis and evaluation of a large amount of historical information and comprehensive indexes are lacked, and the massive data resources of the multi-stage systems cannot be fully utilized to find potential safety hazards of the system in time. Aiming at the safety accidents of the recent national grid company system, the company provides a special action of 'three-check three-reinforcement' safety, and further clearly and deeply performs hidden danger troubleshooting and treatment work; the massive data of a multilevel system must be fully utilized to find the deep potential safety hazard of the dispatching control system, and an effective analysis and evaluation method is adopted.
Disclosure of Invention
In order to solve the defects in the prior art, the invention aims to provide a hardware resource analysis and evaluation method based on centralized operation and maintenance, and provides various effective analysis and evaluation methods aiming at the resource data of automation equipment of a multilevel system in a centralized operation and maintenance mode so as to predict the development trend of system resources, mine the rules and potential safety hazards in the operation process of the system, assist dispatching automation operation and maintenance personnel in comprehensively mastering the possible problems of the system, improve the risk defense level of the multilevel intelligent power grid dispatching control system, and meet the requirement of rapid development of a power grid on the operation and maintenance work of the dispatching control system.
The purpose of the invention is realized by adopting the following technical scheme:
the invention provides a hardware resource analysis and evaluation method based on centralized operation and maintenance, and the improvement is that the method comprises the following steps:
1) Defining the type of a system hardware resource monitoring index;
2) Collecting hardware resource indexes of the system in cycles;
3) Establishing an analysis and evaluation model of system hardware resource monitoring indexes;
4) And analyzing and evaluating system hardware resources.
Further, in the step 1), the hardware resource monitoring indexes include three monitoring indexes of disk space utilization rate, CPU occupancy rate and memory utilization rate of all application servers and important workstations accessed to the scheduling data network; or the monitoring index is divided into a real-time index and a periodic statistical index according to the acquisition period, wherein the real-time index requires an updating period of 3-5 seconds, and the periodic statistical index has an updating period of 5 or 10 minutes.
Furthermore, in the three monitoring indexes,
CPU occupancy rate-the CPU instant utilization rate of a certain node in a scheduling data network, one value is recorded in a data point table and is sent by remote measurement;
memory utilization-the immediate utilization of the memory of a node in each system; recording a value in a data point table, and transmitting the value by using remote measurement;
disk space utilization rate-the disk space utilization rate of a certain node in each system, and the space utilization rate of the root directory and the main user directory of the node; a value is recorded in a table of data points and sent by telemetry.
Further, the step 2) comprises the following steps:
step 2.1, acquiring real-time indexes;
and 2.2, periodically collecting statistical indexes.
Further, the step 2.1 of acquiring the real-time index comprises: : the real-time indexes are subjected to data transmission with each regulation and control center scheduling control system by a pre-acquisition server accessed to a scheduling data network, character string data blocks of a DL476 protocol are respectively adopted to carry out data transmission, and the collection and transmission of the hardware resource monitoring indexes comprise sending ends of system data of various regions, receiving ends of the system data and setting of a communication transmission protocol to realize the centralized collection of the hardware resource monitoring indexes of the intelligent power grid scheduling control system; acquiring required hardware resource monitoring indexes from real-time databases of various local systems according to a data communication index file agreed by a system data transmitting end and a system data receiving end in advance, and transmitting the hardware resource monitoring indexes to a centralized operation and maintenance center; the centralized operation and maintenance center data receiving program establishes TCP connection with the provincial and above regulation and control center acquisition systems, receives various data and stores the data into a real-time database of the operation and maintenance center system; the method comprises the following steps:
(1) Firstly, creating a TCP connection;
(2) Sending a starting application: DL476 is A _ ASSOCIATE;
(3) Receiving a starting confirmation: DL476 is A _ ASSOCIATE _ ACK;
(4) The sending end scans the data in the data communication index file instantly, and sends a data message if the data is changed or the time reaches the requirement of the full data cycle;
(5) The receiving end confirms the received data message;
(6) If no data transmission exists within 15 seconds, the sending end or the receiving end sends a test message, and the opposite end confirms the test message; DL476 is A _ TEST;
(7) The sending end or the receiving end closes the connection, and the opposite end gives confirmation; DL476 indicates disconnection as a _ ABORT and a _ ABORT _ ACK indicates a disconnection acknowledgement.
Further, in step 2.2: the periodic statistical indexes comprise a file service client, an index acquisition client and an index summarizing and analyzing client; the periodic statistical indexes comprise a file service client, an index acquisition client and an index summarizing and analyzing client, the periodic indexes are encrypted and transmitted to a centralized operation and maintenance center according to a set period through an scp or ftp file service, and the periodic indexes are directly stored in a historical database through decompression, decryption and analysis programs by the centralized operation and maintenance center;
the step 2.2 comprises the following steps:
1) Acquiring indexes on each node server or workstation at a certain time point, reading the acquisition period from the configuration library, and writing the indexes into the real-time database;
2) The historical sampling program writes the acquisition indexes into a historical library according to a sampling period and archives the acquisition indexes;
3) The analysis and collection program reads out the sampled data in the historical database, processes the basic data through the analysis and evaluation program, writes the basic data into the corresponding historical index database, and forms an analysis and evaluation log file;
4) The file service client of the regulation and control center reads the analysis and evaluation log files periodically, and transmits the encrypted files to the appointed directory of the file service client appointed by the centralized operation and maintenance center at regular time through ftp or scp service;
5) And the centralized operation and maintenance center decompresses and decrypts the files in the designated directory, writes the analyzed and evaluated indexes into a real-time library, and monitors the files through a human-computer interface, wherein the refreshing period of the data is the same as the acquisition period.
Further, in the step 3), the analysis and evaluation model includes a system hardware resource monitoring index analysis and evaluation recursive model and a multi-level system hardware resource monitoring index analysis and evaluation cubic model.
Further, the system hardware resource monitoring index analysis and evaluation recursive model forms high-level analysis and evaluation indexes by low-level basic data through recursive calculation, and the high-level analysis and evaluation indexes comprise six parts, namely configuration data, basic data, monitoring indexes, single equipment statistical indexes, system resource statistical indexes and calculation evaluation indexes; respectively as follows:
1) Configuring data, namely the number, the type and the model attribute information of various types of equipment monitored by the system and the limit value setting of various equipment monitoring indexes;
2) Acquiring basic data, including CPU utilization rate, memory occupancy rate and disk partition utilization rate, wherein system hardware resource monitoring indexes are generated by monitoring, analyzing and counting the basic data;
3) Monitoring indexes-quantities for monitoring the basic collected data by configuring index limit values in the data, including out-of-limit start-stop time, full-load start-stop time and resource growth rate;
4) A single device statistics index, namely statistics is carried out on each device monitoring index by day, month and year, wherein the statistics comprises duration, times, detailed information, out-of-limit rate, full load rate and growth rate of various monitoring indexes, system hardware resources are classified and counted locally according to the device model, type and manufacturer attribute in configuration data, and targeted statistical data are generated;
5) System resource statistics indexes, namely statistics is carried out on monitoring indexes by day, month and year, wherein the statistics comprises duration, times, percentage, out-of-limit rate, full load rate and growth rate of various monitoring indexes; evaluating and analyzing the use condition of hardware resources of a system from the whole to generate global statistical data;
6) And calculating evaluation indexes, namely performing numerical analysis on the statistical indexes by day, month and year, wherein the numerical analysis comprises average value, maximum value, minimum value, quartile percentage distribution, time accumulation distribution, calculation according to index results and equipment types and duration, and respectively generating basic data for predicting the risk of the system hardware resources from local and whole.
9. The hardware resource analyzing and evaluating method of claim 7, wherein the multi-level system hardware resource monitoring index analyzing and evaluating cube model analyzes and evaluates system hardware resources from A, Q and Y dimensions, respectively, where a represents a system belonging to different areas of the three-level control centers of nation, cents, and provinces, Q represents a certain evaluation index, Y represents time, and each unit cube represents an average value of the evaluation index in a certain area in a certain time period.
Further, the step 4) includes defining and calculating basic indexes of system hardware resources, analyzing risk trend of the system hardware resources, analyzing cluster centers of system hardware resource monitoring indexes, and analyzing association of the hardware resource monitoring indexes.
Further, the definition and calculation of the system hardware resource basic index comprise:
1) Single device daily violation rate = single device daily violation duration/24;
2) The daily out-of-limit rate of certain equipment of the single system = daily out-of-limit duration of certain equipment of the single system/24 times the number of certain equipment;
3) The daily out-of-limit rate of certain equipment in the whole system = daily out-of-limit duration of certain equipment in the whole system/24 times the number of certain equipment in the whole system;
4) The total system hardware resource daily out-of-limit rate = total system hardware resource daily out-of-limit duration/24 × total system monitoring equipment number;
5) Single device monthly violation rate = single device monthly violation duration/24 monthly days;
6) The single system certain equipment month out-of-limit rate = single system certain equipment month out-of-limit duration/24 times of month days times of certain equipment number;
7) The equipment month out-of-limit rate of the whole system = equipment month out-of-limit duration of the whole system/24 days per month per number of equipment of the whole system;
8) Full system hardware resource monthly violation rate = full system hardware resource monthly violation duration/24 monthly days + number of full system monitoring devices;
9) The annual limit-crossing rate of the single device = the sum of the monthly limit-crossing rates of the single device;
10 Year out-of-limit rate of a certain type of single-system equipment = sum of month out-of-limit rates of the certain type of single-system equipment;
11 Year out-of-limit rate of certain type of equipment in the whole system = sum of month out-of-limit rates of certain type of equipment in the whole system;
12 Annual out-of-limit rate of the full system hardware resources = sum of the out-of-limit rates of the full system hardware resources in each month;
13 Monthly average out-of-limit rate for a certain class of indicators = the sum of calculated values of a certain class of indicators at each month of the year/12 (number of months);
14 Daily average out-of-limit rate for a certain class of indicators = sum of calculated values of a certain class of indicators for each day of the year/365 or 366 (number of annual days);
15 A rate of increase of individual device resource usage = (individual device resource usage at time t-individual device resource usage at time m)/(t-m), where t > m;
16 Individual device violation rate percentage = individual device violation rate/class of device violation rate of the system 100%;
17 Percent of certain type of equipment out-of-limit rate of a certain system = certain type of equipment out-of-limit rate of a certain system/hardware resource out-of-limit rate of the system 100%;
18 Percent of class of equipment out-of-limit for system wide = class of equipment out-of-limit for system wide/100% of hardware resources out-of-limit for system wide;
19 System resource full rate percentage = 100% times full system resource/times out of limit system resource;
20 System resource full rate percentage = system resource full duration/system resource out-of-limit duration 100%;
21 System resource usage quartile percentage analysis: quartering and equally dividing the resource according to 0-100%, and calculating the ratio and distribution of the resource utilization rate in each interval;
the unit time is converted to hours.
Further, the system hardware resource risk trend analysis comprises:
assuming that the influencing factors are x1, x2, …, xk, it is known from regression analysis that:
Y t =β 1 x 12 x 2 +…+β p x p +Z (4-1)
Y t =β 1 x 12 x 2 +…+β p x p +Z (4-1)
wherein: y is an observed value of the evaluation index, yt represents a t-th observed value as a prediction object, and Z is an error, where β 0 ,β 12 ,...,β p P is a set of numbers not all being zero, P is a number field, Y t ,Y t-1 ,...,Y t-p Respectively representing the t-th observation value and the t-1 st observation value, the t-p th observation value as a prediction object Y t The law of the method is represented by the following formula under the influence of self change,
Y t =β 1 Y t-12 Y t-2 +…+β p Y t-p +Z t (4-2)
the error terms have a dependency relationship at different periods, and are expressed by the following formula,
Z t =ε t1 ε t-12 ε t-2 ...+α q ε t-q (4-3)
wherein epsilon tt-1 ,...,ε t-q Represents a unit vector, α 12 ,...,α q P is a set of numbers not all being zero, P is a number field, and thus, an ARMA model expression of the evaluation index is obtained:
Y t =β 01 Y t-12 Y t-2 +…+β p Y t-pt1 ε t-12 ε t-2 …+α q ε t-q (4-4)
predicting the future trend of the resource utilization rate through calculation of a data model, and evaluating the risk of the use condition of the hardware resource;
the centralized operation and maintenance center uses an ARIMA model to perform time sequence analysis on the out-of-limit duration of the utilization rate of the disk partitions, and the steps are as follows:
1, checking whether a missing value exists in a time sequence of an index to be calculated, if the missing value exists, filling the missing value by using previous time interval data, and if the missing value does not exist in the previous time interval data, using next time interval data;
analyzing randomness, stationarity and seasonality of the time sequence by utilizing an autocorrelation analysis mode and a partial correlation analysis mode, and selecting a time sequence analysis model for calculation (basic mathematical definition);
3, fitting the calculation indexes (basic mathematical definition) after the data model is determined, and forming a time sequence analysis chart according to the relation between the fitted data and time;
and 4, calculating the trend of the indexes through analysis of a time series fitting curve shape, and combining the numerical analysis results of various indexes to show the risks existing in the service condition of the system resources (if the observed quantity is the failure rate, the fitted result is the trend result of risk evaluation).
Furthermore, the analysis of the system hardware resource monitoring index clustering center uses a division method to analyze, namely a data set with N tuples or records is given, K groups are constructed by the division method, each group represents a cluster, and K is less than N; and K groups satisfy the following condition:
<1> each packet contains at least one data record;
<2> each data record belongs to and only belongs to one group;
for a given number K of packets, given an initial grouping method, and later changing the grouping by iterative iterations such that the grouping scheme after each improvement is better than the previous one, comprising the steps of:
1> initialization: inputting a gene expression matrix as an object set X, inputting a specified clustering number N, and randomly selecting N objects in the X as initial clustering centers; setting iteration termination conditions;
2> performing iteration: assigning the data objects to the closest cluster centers according to a similarity criterion, thereby forming a class; initializing a membership degree matrix (the membership degree belongs to the concept in the fuzzy evaluation function);
3, updating the clustering center; then taking the average vector of each type as a new clustering center, and redistributing the data objects;
step 2 and step 3 are repeatedly executed until a termination condition is met, wherein the termination condition comprises setting the maximum cycle number or the convergence error tolerance of the clustering center;
5> evaluation criteria:
suppose there are m data sources, c cluster centers, μ c is the c-th cluster center, x (i) Represents the ith data object, i represents the number from 1 to m, and is a counting unit, and mu represents the clustering center; the meaning of the formula means that the data in each class and the center of each cluster are subjected to the difference of the sum of squares to make J minimum, which means that the segmentation effect is best;
and finally forming a distribution diagram by clustering analysis of the system hardware resource evaluation indexes, and obtaining the evaluation indexes and a numerical clustering center.
Further, the hardware resource monitoring index correlation analysis includes: the centralized operation and maintenance center performs association analysis on the system hardware resource evaluation index and the key operation process of the system, defines association rules, and performs association analysis by using an Apriori algorithm, wherein the steps of the association analysis are as follows:
1) For the hardware resource evaluation index and the application type to which the process belongs (regarding the classification of the application, the application type for explaining data acquisition exists in the previous patent, and the standard classification definition special for the application type exists in the intelligent power grid dispatching control system. The relevant books or standards can be consulted) for classification; sequencing according to time, recording simultaneous resource assessment indexes under each process fault, and summing;
2) The process with the frequency of high-frequency change or abnormal condition of the resource evaluation index when each process fails is subjected to branch subtraction, and accidental factors are removed;
3) And when the faults of other processes are removed from the rest process and resource evaluation index corresponding table, the resource evaluation index has high frequency (the high frequency is a relative quantity and can be defined by self according to the size of a sample space). In the invention, when the frequency is more than 10%, the record of the change or the abnormity is considered to have high correlation, so that the current correlation analysis result is prevented from being influenced by the abnormity of other processes;
4) And calculating the frequency of the high-frequency change or abnormality of the remaining processes and the resource evaluation index corresponding table, carrying out branch reduction on the processes with the frequency lower than 10%, namely removing uncertainty, considering that the remaining processes and the resource evaluation index corresponding table have strong correlation, and calculating the confidence coefficient.
Compared with the closest prior art, the technical scheme provided by the invention has the following excellent effects:
1) The monitoring range and the index type of the hardware resources of the intelligent power grid dispatching control system are standardized in the centralized operation and maintenance mode, the unified and standardized evaluation standard of the hardware resources of the system is formed, and the comparison of standardized indexes in the same industry range is facilitated.
2) The data acquisition method in a multi-mode in a period is adopted, the data transmission pressure of each-level intelligent power grid dispatching control system in a centralized operation and maintenance mode is reduced, the adoption mode is light and flexible, and the individualized configuration of the monitoring indexes of the multi-level system is facilitated.
3) An analysis and evaluation model of hardware resource monitoring indexes of the intelligent power grid dispatching control system is established in a centralized operation and maintenance mode, the composition and the relation of various indexes are determined, the analysis, evaluation and comparison of single-system hardware resources and multi-level system hardware resources are realized, and basic data are provided for the deep analysis of the system hardware resource indexes.
4) A numerical analysis method for hardware resource analysis and evaluation of the intelligent power grid dispatching control system is provided in a centralized operation and maintenance mode, time series is adopted to analyze and evaluate risk trends of the hardware resources of the system, clustering analysis is adopted to calculate numerical distribution centers of various evaluation indexes, association analysis is adopted to mine potential risks caused by abnormal key processes to the utilization rate of the system resources, and the association degree of the two is numerically expressed through calculation of conditional probability.
Drawings
FIG. 1 is a flow chart of a real-time indicator transmission provided by the present invention;
FIG. 2 is a flow chart of the periodic indicator transmission provided by the present invention;
FIG. 3 is a diagram of a multi-level system hardware resource index analysis and evaluation cube model provided by the present invention;
FIG. 4 is a schematic diagram of cluster analysis of utilization rate of system hardware resources according to the present invention, in which FIGS. 4 (a) and 4 (b) are evaluation indexes, respectively, and points in FIGS. 4 (c) -4 (f) are internal members of various types after cluster analysis;
fig. 5 is a flowchart of a hardware resource analysis and evaluation method based on centralized operation and maintenance provided by the present invention.
Detailed Description
The following provides a more detailed description of embodiments of the present invention, with reference to the accompanying drawings.
The following description and the drawings sufficiently illustrate specific embodiments of the invention to enable those skilled in the art to practice them. Other embodiments may incorporate structural, logical, electrical, process, and other changes. The examples merely typify possible variations. Individual components and functions are optional unless explicitly required, and the sequence of operations may vary. Portions and features of some embodiments may be included in or substituted for those of others. The scope of embodiments of the invention encompasses the full ambit of the claims, and all available equivalents of the claims. Embodiments of the invention may be referred to herein, individually or collectively, by the term "invention" merely for convenience and without intending to voluntarily limit the scope of this application to any single invention or inventive concept if more than one is in fact disclosed.
According to the method for analyzing and evaluating the hardware resources of the centralized operation and maintenance-based intelligent power grid dispatching control system, a flow chart of the method is shown in FIG. 5, and the method comprises the following steps:
1) The method comprises the steps of defining the type of system hardware resource monitoring indexes, and aiming at three monitoring indexes of disk space utilization rate, CPU occupancy rate and memory utilization rate of all application servers and important workstations accessed into a scheduling data network. In order to reduce the transmission pressure of communication data, the monitoring index is further divided into a real-time index and a periodic statistical index according to an acquisition period, wherein the real-time index requires an updating period of 3-5 seconds, and the periodic statistical index requires an updating period of 5 or 10 minutes;
defining the index type of the system hardware resource:
the centralized operation and maintenance center monitors the use condition of hardware resources of the intelligent power grid dispatching control system, and mainly aims at three monitoring indexes of disk space utilization rate, CPU occupancy rate and memory utilization rate of all application servers and important workstations accessed to a dispatching data network. In order to reduce the transmission pressure of communication data, the monitoring indexes are further divided into real-time indexes and periodic statistical indexes according to the acquisition period, wherein the real-time indexes require an update period of 3-5 seconds, and the periodic statistical indexes have an update period of 5 or 10 minutes. The specific definitions of the three types of indicators are as follows:
CPU utilization-the instantaneous utilization of the CPU by a node in each system. A value is recorded in a table of data points and transmitted using telemetry.
Memory utilization-the immediate utilization of the memory of a node in each system. A value is recorded in a table of data points and transmitted using telemetry.
Disk space utilization rate-the disk space utilization rate of a certain node in each system, is mainly concerned about the space utilization rates of the root directory and the main user directory of the node. A value is recorded in a table of data points and transmitted using telemetry.
2) The method for collecting hardware resource indexes of a system in cycles comprises the steps that a pre-collection server of real-time indexes, which is accessed to a scheduling data network, adopts a character string data block of a DL476 protocol to respectively carry out data transmission with each regulation and control center scheduling control system; the periodic indexes are encrypted and transmitted to a centralized operation and maintenance center according to a set period through an scp or ftp file service, and the periodic indexes are directly stored in a historical database through decompression, decryption and analysis programs by the centralized operation and maintenance center.
2.1, acquiring real-time indexes:
the real-time index adopts a character string data block of a DL476 protocol to respectively carry out data transmission with each regulation and control center dispatching control system through a preposed acquisition server accessed to a dispatching data network, and the collection and transmission of the hardware resource monitoring index comprise an acquisition end of system data of each region, a receiving end of the system data and a setting of a communication transmission protocol to realize the centralized collection of the hardware resource monitoring index of the intelligent power grid dispatching control system. In order to achieve the purpose, the centralized operation and maintenance center needs to have a receiving function of each system hardware resource monitoring index, and the provincial and above regulation and control center has a forwarding function of the system hardware resource monitoring index.
Firstly, according to a data communication index file agreed by both parties in advance, acquiring a required hardware resource monitoring index from each local system real-time library and transmitting the hardware resource monitoring index to a centralized operation and maintenance center. And the centralized operation and maintenance center data receiving program establishes TCP connection with the provincial and above regulation and control center acquisition systems, receives various data and stores the data in a real-time library of the operation and maintenance center system. The transmission flow of the real-time index is shown in fig. 1:
1) A TCP connection is first created.
2) Sending a starting application: DL476 is A _ ASSOCIATE.
3) Receiving a starting confirmation: DL476 is A _ ASSOCIATE _ ACK.
4) The sending end scans the data in the data communication index file instantly, and sends the data message if the data is changed or the time reaches the requirement of the full data cycle.
5) The receiving end confirms the received data message.
6) If there is no system data transmission within 15 seconds, the sending end or the receiving end can send a test message, and the opposite end gives confirmation. DL476 is a _ TEST (cause code differentiation TEST and TEST validation).
7) The sender or receiver may close the connection and the opposite may give an acknowledgement. DL476 indicates disconnection as a _ ABORT and disconnection as a _ ABORT _ ACK.
2.2 periodic index Collection
The periodic index consists of a file service client, an index acquisition client and an index summarizing and analyzing client. The method comprises the steps that an index collection client is deployed on collected equipment, hardware resource monitoring indexes are extracted in a mode of reading a historical database, running a test program, a system log, operating system information and the like, the hardware resource monitoring indexes are sent to an index summarizing and analyzing client to classify, count, summarize and analyze the indexes locally in a control center, the processed indexes are compressed and then sent to a front service client, a periodical index is encrypted and transmitted to a centralized operation and maintenance center according to a set period through a scp or ftp file service, and the periodical index is directly stored in the historical database through a decompression, decryption and analysis program by the centralized operation and maintenance center to be analyzed and counted in a later period. The transmission flow of the periodicity index is shown in fig. 2:
(1) And acquiring indexes on each node server or workstation at a certain time point, reading the acquisition period from the configuration library, and writing the indexes into the real-time database.
(2) And the historical sampling program writes the acquisition indexes into a historical library according to the sampling period for archiving.
(3) And the analysis summarizing program reads out the sampled data in the historical library, processes the basic data through the analysis and evaluation program, writes the processed basic data into the corresponding historical index library, and forms an analysis and evaluation log file.
(4) And the file service client of the regulation and control center reads the analysis and evaluation log file periodically, and transmits the encrypted file to the specified directory of the file service client specified by the centralized operation and maintenance center at regular time through ftp or scp service.
(5) And the centralized operation and maintenance center decompresses and decrypts the files in the designated directory, writes the analyzed and evaluated indexes into a real-time library, and monitors the files through a human-computer interface, wherein the refreshing period of the data is the same as the acquisition period.
3) And establishing an analysis and evaluation model of the system hardware resource monitoring indexes, wherein the analysis and evaluation model comprises a system hardware resource monitoring index analysis and evaluation recursive model and a multi-level system hardware resource monitoring index analysis and evaluation cubic model.
Specifically, the method comprises the following steps:
3.1 System hardware resource monitoring index analysis and evaluation recursion model
The system resource monitoring index analysis and evaluation adopts a recursion model, low-level basic data form high-level analysis and evaluation indexes through recursion calculation, and the high-level analysis and evaluation indexes comprise six parts, namely configuration data, basic data, monitoring indexes, single equipment statistical indexes, system resource statistical indexes and calculation and evaluation indexes, and are shown in the following table 1.
1) Configuration data-the number of various devices monitored by the system, the type of the devices and other attribute information and the limit value setting of various device monitoring indexes.
2) And basic data acquisition, including CPU utilization rate, memory occupancy rate and disk partition utilization rate, wherein system hardware resource monitoring indexes are generated by monitoring, analyzing and counting basic data.
3) Monitoring indicators-the amount of monitoring of the basic collected data by configuring the indicator limits in the data, including the out-of-limit start-stop time, the full load start-stop time, and the resource growth rate.
4) The single equipment statistical index is used for counting the monitoring indexes of each equipment according to the day, month and year, comprises the duration, the times, the detailed information, the out-of-limit rate, the full load rate and the growth rate of various monitoring indexes, and classifies and counts the system hardware resources from part according to the attributes of equipment models, types, manufacturers and the like in the configuration data to generate targeted statistical data.
5) The system resource statistics index is used for counting the monitoring indexes by day, month and year and comprises duration, times, percentage, out-of-limit rate, full load rate and growth rate of various monitoring indexes. And evaluating and analyzing the use condition of the hardware resources of a system from the whole to generate global statistical data.
6) And calculating evaluation indexes, namely performing numerical analysis on the statistical indexes by day, month and year, wherein the numerical analysis comprises average value, maximum value, minimum value, quartile percentage distribution, time accumulation distribution, calculation according to index results and equipment types and duration, and respectively generating basic data for predicting the risk of the system hardware resources from local and whole.
TABLE 1 System hardware resource monitoring index analysis and evaluation recursion model table
3.2 multilevel system hardware resource monitoring index analysis and evaluation cube model
The analysis and evaluation of the hardware resource monitoring indexes of the multilevel system adopt a cube model, and as shown in fig. 3, the analysis and evaluation of the hardware resources of the system are respectively carried out from three dimensions of A, Q and Y, wherein A represents a system belonging to different areas of a three-level regulation center of nation, division and province, Q represents a certain evaluation index, and Y represents time. The cube model can analyze and compare different statistical indexes of systems in different regions through the transition of a time axis, wherein each unit cube represents the average value of the index in a certain time period in a certain region.
4) The system hardware resource analysis and evaluation method comprises a system hardware resource basic index definition and calculation method, a system hardware resource risk trend analysis method, a system hardware resource monitoring index clustering center analysis method and a hardware resource monitoring index correlation analysis method.
4.1 calculation of basic index
The unit time is converted to hours.
1) Single device daily overrun = single device daily overrun duration/24
2) Daily out-of-limit rate of certain type of equipment in single system = daily out-of-limit duration of certain type of equipment in single system/24 × number of certain type of equipment
3) Daily out-of-limit rate of certain equipment in the whole system = daily out-of-limit duration of certain equipment in the whole system/24 times number of certain equipment in the whole system
4) Daily out-of-limit rate of hardware resources of the whole system = daily out-of-limit duration of hardware resources of the whole system/24 × number of monitoring devices of the whole system
5) Single device monthly violation rate = single device monthly violation duration/24 monthly days
6) Single system certain equipment month out-of-limit rate = single system certain equipment month out-of-limit duration/24 × monthly days and certain equipment number
7) Full system class of device monthly violation rate = full system class of device monthly violation duration/24 × monthly days × number of full system class of devices
8) Full system hardware resource monthly violation rate = full system hardware resource monthly violation duration/24 monthly days and full system monitoring device count
9) Annual limit rate of single device = sum of monthly limit rates of single device
10 Year out-of-limit rate of certain type of single-system equipment = sum of month out-of-limit rates of certain type of single-system equipment
11 Year out-of-limit rate for a certain class of equipment in the whole system = sum of month out-of-limit rates for a certain class of equipment in the whole system
12 Year out-of-limit rate of full system hardware resources = sum of month out-of-limit rates of full system hardware resources
13 Average out-of-limit rate of certain index class = sum of calculated values of certain index class of each month of the whole year/12 (number of months)
14 Average daily overrun = the sum of the calculated values of the index over each day of the year 365 or 366 (annual days)
15 Rate of increase of individual device resource usage = (individual device resource usage at time t-individual device resource usage at time m)/(t-m), where t > m
16 Individual device violation rate percentage = individual device violation rate/class of device violation rate of the system 100%
17 Percent of out-of-limit rate of certain type of equipment in a certain system = out-of-limit rate of certain type of equipment in a certain system/out-of-limit rate of hardware resources of the system 100%
18 Percent of out-of-limit rate of certain type of equipment in the whole system = 100% of out-of-limit rate of certain type of equipment in the whole system/out-of-limit rate of hardware resources in the whole system
19 System resource full load percentage = 100% times full load/times out of limit of system resource%
20 System resource full load percentage = system resource full load duration/system resource out-of-limit duration 100%
21 System resource usage quartile percentage analysis: and (4) carrying out quartile equally dividing according to 0-100%, and calculating the ratio and distribution of the resource utilization rate in each interval.
4.2 Risk Trend analysis
The risk trend analysis of the hardware resources of the intelligent power grid dispatching control system adopts an ARIMA model of a time sequence analysis method to model historical data of the evaluation indexes, a data sequence formed by the evaluation indexes along with the time lapse is regarded as a random sequence, and the dependency relationship of the random variable group reflects the time continuity of the original data. On one hand, the influence of external factors is influenced, on the other hand, the self-changing rule is provided, the influence factors are assumed to be x1, x2, …, xk, and the regression analysis is carried out,
Y t =β 1 x 12 x 2 +…+β p x p +Z (4-1)
Y t =β 1 x 12 x 2 +…+β p x p +Z (4-1)
wherein: y is an observed value of the evaluation index, yt represents a t-th observed value, t is a subscript, yt means an observed value of the evaluation index as a prediction object, Z is an error, where β is 0 ,β 12 ,...,β p P is a set of numbers not all being zero, P is a number field, Y t ,Y t-1 ,...,Y t-p Respectively representing the t-th observation value and the t-1 st observation value, the t-p th observation value as a prediction object Y t The law of the method is represented by the following formula under the influence of self change,
Y t =β 1 Y t-12 Y t-2 +…+β p Y t-p +Z t (4-2)
the error terms have a dependency relationship at different periods, and are expressed by the following formula,
Z t =ε t1 ε t-12 ε t-2 ...+α q ε t-q (4-3)
wherein epsilon tt-1 ,...,ε t-q Representing a unit vector, α 12 ,...,α q P is a set of numbers not all being zero, P is a number field, and thus, an ARMA model expression of the evaluation index is obtained:
Y t =β 01 Y t-12 Y t-2 +…+β p Y t-pt1 ε t-12 ε t-2 ...+α q ε t-q (4-4)
and predicting the future trend of the resource utilization rate through calculation of the data model, and evaluating the risk of the hardware resource utilization condition. The step of using the ARIMA model by the centralized operation and maintenance center to carry out time sequence analysis on the out-of-limit duration of the disk utilization rate is as follows:
1) And checking whether the time sequence of the index to be calculated has missing values, and if the missing values exist, filling the time sequence with data of the last time interval (the data of the last time interval does not exist and the data of the next time interval).
2) And analyzing randomness, stationarity and seasonality of the time sequence by using methods such as autocorrelation analysis, partial correlation analysis and the like, and selecting a reasonable time sequence analysis model for calculation.
3) And fitting the calculation indexes after the data model is determined, and forming a time series analysis chart according to the relation between the fitted data and time.
4) The index trend is calculated through the analysis of the time series fitting curve shape, and the risk of the system resource use condition is explained by combining the numerical analysis results of various indexes.
4.3 Cluster center analysis
Clustering analysis, also known as group analysis, is a statistical analysis method for studying (sample or index) classification problems, and is also an important algorithm for data mining. Clustering (Cluster) analysis is composed of several patterns (patterns), which are typically vectors of a metric (measure) or a point in a multidimensional space. Cluster analysis is based on similarity, with more similarity between patterns in one cluster than between patterns not in the same cluster. The clustering analysis of the utilization rate of the system hardware resources is mainly performed by using a partitioning method (partitioning methods), that is, given a data set with N tuples or records, the partitioning method will construct K groups, each group represents a cluster, and K < N. And the K packets satisfy the following condition:
(1) Each group at least comprises a data record;
(2) Each data record belongs to and only belongs to one group;
for a given K, the algorithm first gives an initial grouping method, and then changes the grouping by iterative methods, so that the grouping scheme after each improvement is better than the previous one, and the so-called good criterion is: the closer records in the same group the better, while the farther records in different groups the better. General steps of the clustering algorithm:
1) And (5) initializing. Inputting a gene expression matrix as an object set X, inputting a specified clustering number N, and randomly selecting N objects in the X as initial clustering centers. Iteration stop conditions are set, such as maximum loop times or cluster center convergence error margins.
2) And (6) performing iteration. The data objects are assigned to the closest cluster centers according to a similarity criterion, thereby forming a class. And initializing a membership matrix.
3) And updating the clustering center. The data objects are then reassigned with the average vector of each class as the new cluster center.
4) And repeatedly executing the second step and the third step until the stopping condition is met.
5) Evaluation criteria:
suppose there are M data sources, C cluster centers. Mu.s c Is the cluster center. The meaning of this formula is to make the sum of the squares of the differences between the data in each class and the center of each cluster, J being the smallest, means that the segmentation is the best.
The cluster analysis of the system hardware resource evaluation index finally forms a distribution graph, and obtains the evaluation index and a numerical clustering center, as shown in fig. 4 (wherein, the (a) and (b) graphs are the evaluation index, the (b) to (f) graphs are X aggregation points, and the (c) to (f) graphs are the internal members of each class after the cluster analysis).
4.4 correlation analysis:
the Apriori algorithm is used for the association analysis of the system hardware resource evaluation index, and the algorithm is an algorithm which has the most influence on mining the frequent item set of the Boolean association rule. The algorithm uses a priori knowledge of the nature of the frequent itemset, using an iterative method called layer-by-layer searching. First, a set of frequent 1-item sets is found. This set is denoted L 1 。L 1 Collections L for finding frequent 2-item sets 2 And L is 2 For finding L 3 And so on until a frequent k-term set cannot be found. Find each L κ One data scan is required. The specific steps comprise a connecting step and a pruning step which are carried out in an iterative way.
A connecting step: to find L κ Through L κ-1 Concatenating with itself produces a set of candidate k-term sets. The set of candidate items is denoted C κ . Is provided with L 1 And L 2 Is L κ-1 A set of items in (1). Symbol L i [j]Represents L i Item (e.g., L) 1 [k-2]Represents L 1 The last 3 item of (1). For convenience, it is assumed that the items in the transaction or set of items are ordered in lexicographic order. Execution of connection L κ-1 (ii) a Wherein L is κ-1 Are connectable if their first (k-2) entries are the same; i.e. L κ-1 Element L of 1 And L 2 Is connectable if (L) 1 [1]=L 2 [1])∧(L 1 [2]=L 2 [2])∧…∧(L 1 [k-2]=L 2 [k-2])∧(L 1 [k-1]<L 2 [k-1]). Condition (L) 1 [k-1]<L 2 [k-1]) It is simply guaranteed that no duplication occurs. Connection L 1 And L 2 The resulting set of items is L 1 [1],L 1 [2]…L 1 [k-1],L 2 [k-1]。
Pruning: c κ Is L κ A superset of (c); that is, its members may or may not be frequent, but all the frequent k-term sets are contained in C κ In (1). Scanning the database to determine C κ To determine L κ (i.e., by definition, all candidates whose count value is not less than the minimum support count are frequent and thus belong to L κ ). However, C κ May be large and thus the amount of computation involved is large. To compress C κ Apriori properties were used in the following way: any infrequent (k-1) -item set is not a subset of the likely frequent k-item set. Thus, if the (k-1) -subset of a candidate set of k-items is not at L κ-1 Then the candidate is unlikely to be frequent, and thus can be represented by C κ Is deleted. This subset testing can be done quickly using a hash tree of all the frequent item sets.
The centralized operation and maintenance center performs correlation analysis on the system hardware resource evaluation index and the key operation process of the system, defines a correlation rule, and performs correlation analysis by using an Apriori algorithm, wherein the steps of the correlation analysis are as follows:
1) Classifying hardware resource evaluation indexes and processes; and respectively sequencing according to time, recording simultaneous resource evaluation indexes under the fault of each process, and summing.
2) And (4) carrying out branch subtraction on the processes of which the frequency of the resource evaluation indexes when each process fails to have high-frequency changes or abnormal conditions is less than 2, and removing accidental factors.
3) When the faults of other processes are removed from the rest process and resource evaluation index corresponding table, the resource evaluation index also has high-frequency change or abnormal record, so that the current correlation analysis result is prevented from being influenced by the abnormality of other processes.
4) And calculating the frequency of the high-frequency change or abnormality of the remaining processes and the resource evaluation index corresponding table, carrying out branch reduction on the processes with the frequency lower than 10%, namely removing uncertainty, considering that the remaining processes and the resource evaluation index corresponding table have strong correlation, and calculating the confidence coefficient.
Although the present invention has been described in detail with reference to the above embodiments, those skilled in the art can make modifications and equivalents to the embodiments of the present invention without departing from the spirit and scope of the present invention, which is set forth in the claims of the present application.

Claims (14)

1. A hardware resource analysis and evaluation method based on centralized operation and maintenance is characterized by comprising the following steps:
1) Defining the type of a system hardware resource monitoring index;
2) Collecting hardware resource indexes of the system in cycles;
3) Establishing an analysis and evaluation model of system hardware resource monitoring indexes;
4) And analyzing and evaluating system hardware resources.
2. The hardware resource analysis and evaluation method according to claim 1, wherein in the step 1), the hardware resource monitoring indicators include three monitoring indicators of disk space utilization, CPU occupancy, and memory utilization of all application servers and important workstations accessing the scheduling data network; or the monitoring index is divided into a real-time index and a periodic statistical index according to the acquisition period, wherein the real-time index requires an updating period of 3-5 seconds, and the periodic statistical index has an updating period of 5 or 10 minutes.
3. The hardware resource analysis and evaluation method of claim 2, wherein, of the three monitoring indicators,
CPU occupancy rate-the CPU instant utilization rate of a certain node in a scheduling data network, one value is recorded in a data point table and is sent by remote measurement;
memory utilization-the immediate memory utilization of a node in each system; recording a value in a data point table, and transmitting the value by using remote measurement;
disk space utilization rate-the disk space utilization rate of a certain node in each system, and the space utilization rate of the root directory and the main user directory of the node; a value is recorded in a table of data points and transmitted using telemetry.
4. The hardware resource analysis and evaluation method of claim 1, wherein said step 2) comprises the steps of:
step 2.1, acquiring real-time indexes;
and 2.2, periodically collecting statistical indexes.
5. The hardware resource analysis and evaluation method of claim 4, wherein the step 2.1 of collecting the real-time metrics comprises: the real-time indexes are subjected to data transmission with each regulation and control center scheduling control system by a pre-acquisition server accessed to a scheduling data network, character string data blocks of a DL476 protocol are respectively adopted to carry out data transmission, and the collection and transmission of the hardware resource monitoring indexes comprise sending ends of system data of various regions, receiving ends of the system data and setting of a communication transmission protocol to realize the centralized collection of the hardware resource monitoring indexes of the intelligent power grid scheduling control system; acquiring required hardware resource monitoring indexes from real-time databases of various local systems according to a data communication index file agreed by a system data transmitting end and a system data receiving end in advance, and transmitting the hardware resource monitoring indexes to a centralized operation and maintenance center; the centralized operation and maintenance center data receiving program establishes TCP connection with the provincial and above regulation and control center acquisition systems, receives various data and stores the data into a real-time database of the operation and maintenance center system; the method comprises the following steps:
(1) Firstly, creating a TCP connection;
(2) Sending a starting application: DL476 is A _ ASSOCIATE;
(3) Receiving a starting confirmation: DL476 is A _ ASSOCIATE _ ACK;
(4) The sending end scans the data in the data communication index file instantly, and sends a data message if the data is changed or the time reaches the requirement of the full data cycle;
(5) The receiving end confirms the received data message;
(6) If no data transmission exists within 15 seconds, the sending end or the receiving end sends a test message, and the opposite end confirms the test message; DL476 is A _ TEST;
(7) The sending end or the receiving end closes the connection, and the opposite end gives confirmation; DL476 indicates disconnection as a _ ABORT and disconnection as a _ ABORT _ ACK.
6. The hardware resource analysis and evaluation method of claim 4, wherein in step 2.2, the periodic statistical indicators are composed of a file service client, an indicator collection client, and an indicator summarization and analysis client; the periodic statistical indexes comprise a file service client, an index acquisition client and an index summarizing and analyzing client, the periodic indexes are encrypted and transmitted to a centralized operation and maintenance center according to a set period through an scp or ftp file service, and the periodic indexes are directly stored in a historical database through decompression, decryption and analysis programs by the centralized operation and maintenance center; the step 2.2 comprises the following steps:
1) Acquiring indexes on each node server or workstation at a certain time point, reading the acquisition period from the configuration library, and writing the indexes into the real-time database;
2) The historical sampling program writes the acquisition indexes into a historical library according to a sampling period and archives the acquisition indexes;
3) The analysis and summary program reads out the sampled data in the historical library, processes the basic data through the analysis and evaluation program, writes the processed basic data into the corresponding historical index library, and forms an analysis and evaluation log file;
4) The file service client side of the regulation and control center reads the analysis and evaluation log file according to the period, and transmits the encrypted file to the appointed directory of the file service client side appointed by the centralized operation and maintenance center at regular time through ftp or scp service;
5) And the centralized operation and maintenance center decompresses and decrypts the files in the designated directory, writes the analyzed and evaluated indexes into a real-time library, and monitors the files through a human-computer interface, wherein the refreshing period of the data is the same as the acquisition period.
7. The hardware resource analysis and evaluation method of claim 1, wherein in the step 3), the analysis and evaluation model comprises a system hardware resource monitoring index analysis and evaluation recursive model and a multi-level system hardware resource monitoring index analysis and evaluation cubic model.
8. The hardware resource analysis and evaluation method of claim 7, wherein the system hardware resource monitoring indicators analysis and evaluation recursive model forms high-level analysis and evaluation indicators from low-level basic data through recursive computation, and comprises six parts, namely configuration data, basic data, monitoring indicators, single equipment statistical indicators, system resource statistical indicators and computation evaluation indicators; respectively as follows:
1) Configuring data, namely the number, the type and the model attribute information of various types of equipment monitored by the system and the limit value setting of various equipment monitoring indexes;
2) Acquiring basic data, including CPU utilization rate, memory occupancy rate and disk partition utilization rate, wherein system hardware resource monitoring indexes are generated by monitoring, analyzing and counting the basic data;
3) Monitoring indexes-the amount of monitoring the basic collected data by configuring index limit values in the data, including out-of-limit start-stop time, full-load start-stop time and resource growth rate;
4) A single device statistics index, namely statistics is carried out on each device monitoring index by day, month and year, wherein the statistics comprises duration, times, detailed information, out-of-limit rate, full load rate and growth rate of various monitoring indexes, system hardware resources are classified and counted locally according to the device model, type and manufacturer attribute in configuration data, and targeted statistical data are generated;
5) System resource statistics indexes, namely statistics is carried out on monitoring indexes by day, month and year, wherein the statistics comprises duration, times, percentage, out-of-limit rate, full load rate and growth rate of various monitoring indexes; evaluating and analyzing the use condition of hardware resources of a system from the whole to generate global statistical data;
6) And calculating evaluation indexes, namely performing numerical analysis on the statistical indexes by day, month and year, wherein the numerical analysis comprises average value, maximum value, minimum value, quartile percentage distribution, time accumulation distribution, calculation according to index results and equipment types and duration, and respectively generating basic data for predicting the risk of the system hardware resources from local and whole.
9. The hardware resource analysis and evaluation method of claim 7, wherein the multi-level system hardware resource monitoring index analysis and evaluation cube model analyzes and evaluates the system hardware resources from A, Q and Y dimensions, respectively, where a represents a system belonging to different areas of the three-level control center of the state, the province and the province, Q represents a certain evaluation index, Y represents time, and each unit cube represents an average value of the evaluation index in a certain time period in a certain area.
10. The hardware resource analysis and evaluation method of claim 1, wherein the step 4) comprises defining and calculating basic indexes of system hardware resources, analyzing risk trend of system hardware resources, analyzing cluster center of system hardware resource monitoring indexes, and analyzing association of hardware resource monitoring indexes.
11. The hardware resource analysis and evaluation method of claim 10, wherein the defining and calculating of the system hardware resource base indicator comprises:
1) Single device daily overrun = single device daily overrun duration/24;
2) The daily out-of-limit rate of certain equipment of the single system = daily out-of-limit duration of certain equipment of the single system/24 times the number of certain equipment;
3) The daily out-of-limit rate of certain equipment in the whole system = daily out-of-limit duration of certain equipment in the whole system/24 times the number of certain equipment in the whole system;
4) The daily out-of-limit rate of the hardware resources of the whole system = daily out-of-limit duration of the hardware resources of the whole system/24 the number of the monitoring equipment of the whole system;
5) Single device monthly violation rate = single device monthly violation duration/24 days of month;
6) The single-system certain equipment month out-of-limit rate = single-system certain equipment month out-of-limit duration/24 monthly days + the number of certain equipment;
7) The monthly out-of-limit rate of certain equipment in the whole system = the monthly out-of-limit duration of certain equipment in the whole system/24 monthly days and the number of certain equipment in the whole system;
8) Full system hardware resource monthly violation rate = full system hardware resource monthly violation duration/24 monthly days + number of full system monitoring devices;
9) The annual limit-crossing rate of the single device = the sum of the monthly limit-crossing rates of the single device;
10 Year out-of-limit rate of a certain type of single-system equipment = sum of month out-of-limit rates of the certain type of single-system equipment;
11 Year out-of-limit rate of certain type of equipment in the whole system = sum of month out-of-limit rates of certain type of equipment in the whole system;
12 Annual out-of-limit rate of the full system hardware resources = sum of the out-of-limit rates of the full system hardware resources in each month;
13 Monthly average out-of-limit rate for certain class of indicators = 12 (number of months) of the sum of calculated values of certain class of indicators at each month of the year;
14 Daily average out-of-limit rate for a certain class of indicators = sum of calculated values of a certain class of indicators for each day of the year/365 or 366 (number of annual days);
15 A rate of increase of individual device resource usage = (individual device resource usage at time t-individual device resource usage at time m)/(t-m), where t > m;
16 Individual device violation rate percentage = individual device violation rate/class of device violation rate of the system 100%;
17 Percent of certain type of equipment out-of-limit rate of a certain system = certain type of equipment out-of-limit rate of a certain system/hardware resource out-of-limit rate of the system 100%;
18 Percent of class of equipment out-of-limit for system wide = class of equipment out-of-limit for system wide/100% of hardware resources out-of-limit for system wide;
19 System resource full rate percentage = 100% times full system resource/times out of limit system resource;
20 System resource full rate percentage = system resource full duration/system resource out-of-limit duration 100%;
21 System resource usage quartile percentage analysis: quartering and equally dividing the resource according to 0-100%, and calculating the ratio and distribution of the resource utilization rate in each interval;
the unit time is converted to hours.
12. The hardware resource analysis and evaluation method of claim 10, wherein the system hardware resource risk trend analysis comprises:
assuming the influencing factors are x1, x2, …, xk, it is known from regression analysis:
Y t =β 1 x 12 x 2 +…+β p x p +Z (4-1)
wherein: y is an observed value of the evaluation index, yt represents a t-th observed value, which is a prediction object, and Z is an error, wherein Is a set of numbers not all of zero, P is a number field, Y t ,Y t-1 ,...,Y t-p Respectively representing the t-th observation value and the t-1 st observation value, the t-p th observation value as a prediction object Y t The law of the method is represented by the following formula under the influence of self change,
Y t =β 1 Y t-12 Y t-2 +…+β p Y t-p +Z t (4-2)
the error terms have a dependency relationship at different periods, and are expressed by the following formula,
Z t =ε t1 ε t-12 ε t-2 …+α q ε t-q (4-3)
wherein epsilon t ,ε t-1 ,...,ε t-q The unit vector is represented by a vector of units,is a set of numbers which are not all zero, P is a number field, and thus, an ARMA model expression of the evaluation index is obtained:
Y t =β 01 Y t-12 Y t-2 +…+β p Y t-pt1 ε t-12 ε t-2 …+α q ε t-q (4-4)
predicting the future trend of the resource utilization rate through calculation of a data model, and evaluating the risk of the use condition of the hardware resource;
the centralized operation and maintenance center uses an ARIMA model to perform time sequence analysis on the out-of-limit duration of the utilization rate of the disk partitions, and the steps are as follows:
1, checking whether a missing value exists in a time sequence of an index to be calculated, if the missing value exists, filling the time sequence with the previous time interval data, and if the missing value does not exist, filling the time sequence with the next time interval data;
analyzing randomness, stationarity and seasonality of the time sequence by using an autocorrelation analysis mode and a partial correlation analysis mode, and selecting a time sequence analysis model for calculation;
fitting the calculation indexes after the data model is determined, and forming a time sequence analysis chart according to the relation between the fitted data and time;
and 4, calculating the index trend through analysis of the time series fitting curve shape, and explaining the risk of the system resource use condition by combining the numerical analysis results of various indexes.
13. The hardware resource analysis and evaluation method of claim 10, wherein said system hardware resource monitoring index cluster center analysis is performed using a partition method, i.e. given a data set with N tuples or records, the partition method will construct K groups, each group representing a cluster, K < N; and K groups satisfy the following condition:
<1> each packet contains at least one data record;
<2> each data record belongs to and only belongs to one group;
for a given number K of packets, given an initial grouping method, and later changing the grouping by iterative iterations such that the grouping scheme after each improvement is better than the previous one, comprising the steps of:
1> initialization: inputting a gene expression matrix as an object set X, inputting a specified clustering number N, and randomly selecting N objects in X as initial clustering centers; setting an iteration termination condition;
2> performing iteration: assigning the data objects to the closest cluster centers according to a similarity criterion, thereby forming a class; initializing a membership matrix;
3> updating the clustering center; then taking the average vector of each class as a new clustering center, and redistributing the data objects;
step 2 and step 3 are repeatedly executed until a termination condition is met, wherein the termination condition comprises setting the maximum cycle number or the convergence error tolerance of the clustering center;
5> evaluation criteria:
suppose there are m data sources, c cluster centers, μ c is the c-th cluster center, x (i) Represents the ith data object, i represents the number from 1 to m, and is a counting unit, and mu represents the clustering center; the meaning of the formula means that the data in each class and the center of each cluster are subjected to the difference of the sum of squares to make J minimum, which means that the segmentation effect is best;
and finally forming a distribution diagram by clustering analysis of the system hardware resource evaluation indexes, and obtaining the evaluation indexes and a numerical clustering center.
14. The hardware resource analysis and evaluation method of claim 10 wherein the hardware resource monitoring indicator correlation analysis comprises: the centralized operation and maintenance center performs association analysis on the system hardware resource evaluation index and the key operation process of the system, defines association rules, and performs association analysis by using an Apriori algorithm, wherein the steps of the association analysis are as follows:
1) Classifying the hardware resource evaluation index and the application type of the process; sequencing is carried out according to time, resource assessment indexes at the same time are recorded under the fault of each process, and summation is carried out;
2) The process with the frequency of high-frequency change or abnormal condition of the resource evaluation index when each process fails is subjected to branch subtraction, and accidental factors are removed;
3) High-frequency change or abnormal record of the resource evaluation index when other processes have faults is removed from the rest process and resource evaluation index corresponding table, so that the current correlation analysis result is prevented from being influenced by other process abnormalities;
4) And calculating the frequency of the left process and the resource evaluation index corresponding table when high-frequency change or abnormality occurs, carrying out branch subtraction on the process with the frequency lower than 10%, namely removing uncertainty, and calculating the confidence coefficient, wherein the left process and the resource evaluation index corresponding table are considered to have strong correlation.
CN201610989588.7A 2016-11-10 2016-11-10 A kind of hardware resource analysis and appraisal procedure based on concentration O&M Pending CN108074022A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610989588.7A CN108074022A (en) 2016-11-10 2016-11-10 A kind of hardware resource analysis and appraisal procedure based on concentration O&M

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610989588.7A CN108074022A (en) 2016-11-10 2016-11-10 A kind of hardware resource analysis and appraisal procedure based on concentration O&M

Publications (1)

Publication Number Publication Date
CN108074022A true CN108074022A (en) 2018-05-25

Family

ID=62154559

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610989588.7A Pending CN108074022A (en) 2016-11-10 2016-11-10 A kind of hardware resource analysis and appraisal procedure based on concentration O&M

Country Status (1)

Country Link
CN (1) CN108074022A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108696530A (en) * 2018-06-01 2018-10-23 北京中海闻达信息技术有限公司 A kind of online encryption data safety evaluation method and device
CN109408347A (en) * 2018-09-28 2019-03-01 北京九章云极科技有限公司 A kind of index real-time analyzer and index real-time computing technique
CN109460344A (en) * 2018-09-26 2019-03-12 国家计算机网络与信息安全管理中心 A kind of the O&M analysis method and system of server
CN109656790A (en) * 2018-10-29 2019-04-19 平安科技(深圳)有限公司 System prompt control method, device, computer and computer readable storage medium
CN110275773A (en) * 2018-10-30 2019-09-24 湖北省农村信用社联合社网络信息中心 Paas resource circulation utilization index system based on truthful data models fitting
CN114070707A (en) * 2020-11-10 2022-02-18 北京市天元网络技术股份有限公司 Internet performance monitoring method and system
CN115147008A (en) * 2022-08-02 2022-10-04 中国神华能源股份有限公司 Power plant unit storage resource real-time assessment method and system based on data lake technology
CN115373507A (en) * 2022-10-26 2022-11-22 北京品立科技有限责任公司 Whole machine resource balance management method and system based on electric energy loss
CN116744321A (en) * 2023-08-11 2023-09-12 中维建技术有限公司 Data regulation and control method for intelligent operation and maintenance integrated platform for 5G communication
CN117688464A (en) * 2024-02-04 2024-03-12 国网上海市电力公司 Hidden danger analysis method and system based on multi-source sensor data

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2013201874A (en) * 2012-03-26 2013-10-03 Toshiba Corp Demand control method for electric power system, and its system
CN103401699A (en) * 2013-07-18 2013-11-20 深圳先进技术研究院 Cloud data center security monitoring early warning system and method
CN105184886A (en) * 2015-09-01 2015-12-23 浪潮集团有限公司 Cloud data center intelligence inspection system and cloud data center intelligence inspection method
CN105515820A (en) * 2015-09-25 2016-04-20 上海北塔软件股份有限公司 Health analysis method for operation and maintenance management
CN105681298A (en) * 2016-01-13 2016-06-15 成都安信共创检测技术有限公司 Data security abnormity monitoring method and system in public information platform
CN105868876A (en) * 2015-01-21 2016-08-17 国家电网公司 Centralized operation and maintenance fault closed-loop processing method based on process monitoring
CN106022477A (en) * 2016-05-18 2016-10-12 国网信通亿力科技有限责任公司 Intelligent analysis decision system and method

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2013201874A (en) * 2012-03-26 2013-10-03 Toshiba Corp Demand control method for electric power system, and its system
CN103401699A (en) * 2013-07-18 2013-11-20 深圳先进技术研究院 Cloud data center security monitoring early warning system and method
CN105868876A (en) * 2015-01-21 2016-08-17 国家电网公司 Centralized operation and maintenance fault closed-loop processing method based on process monitoring
CN105184886A (en) * 2015-09-01 2015-12-23 浪潮集团有限公司 Cloud data center intelligence inspection system and cloud data center intelligence inspection method
CN105515820A (en) * 2015-09-25 2016-04-20 上海北塔软件股份有限公司 Health analysis method for operation and maintenance management
CN105681298A (en) * 2016-01-13 2016-06-15 成都安信共创检测技术有限公司 Data security abnormity monitoring method and system in public information platform
CN106022477A (en) * 2016-05-18 2016-10-12 国网信通亿力科技有限责任公司 Intelligent analysis decision system and method

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108696530A (en) * 2018-06-01 2018-10-23 北京中海闻达信息技术有限公司 A kind of online encryption data safety evaluation method and device
CN109460344B (en) * 2018-09-26 2023-04-28 国家计算机网络与信息安全管理中心 Operation and maintenance analysis method and system of server
CN109460344A (en) * 2018-09-26 2019-03-12 国家计算机网络与信息安全管理中心 A kind of the O&M analysis method and system of server
CN109408347A (en) * 2018-09-28 2019-03-01 北京九章云极科技有限公司 A kind of index real-time analyzer and index real-time computing technique
CN109656790A (en) * 2018-10-29 2019-04-19 平安科技(深圳)有限公司 System prompt control method, device, computer and computer readable storage medium
CN110275773A (en) * 2018-10-30 2019-09-24 湖北省农村信用社联合社网络信息中心 Paas resource circulation utilization index system based on truthful data models fitting
CN110275773B (en) * 2018-10-30 2020-08-28 湖北省农村信用社联合社网络信息中心 Paas resource recycling index system based on real data model fitting
CN114070707A (en) * 2020-11-10 2022-02-18 北京市天元网络技术股份有限公司 Internet performance monitoring method and system
CN115147008A (en) * 2022-08-02 2022-10-04 中国神华能源股份有限公司 Power plant unit storage resource real-time assessment method and system based on data lake technology
CN115373507A (en) * 2022-10-26 2022-11-22 北京品立科技有限责任公司 Whole machine resource balance management method and system based on electric energy loss
CN115373507B (en) * 2022-10-26 2023-01-06 北京品立科技有限责任公司 Whole machine resource balance management method and system based on electric energy loss
CN116744321A (en) * 2023-08-11 2023-09-12 中维建技术有限公司 Data regulation and control method for intelligent operation and maintenance integrated platform for 5G communication
CN116744321B (en) * 2023-08-11 2023-11-14 中维建技术有限公司 Data regulation and control method for intelligent operation and maintenance integrated platform for 5G communication
CN117688464A (en) * 2024-02-04 2024-03-12 国网上海市电力公司 Hidden danger analysis method and system based on multi-source sensor data
CN117688464B (en) * 2024-02-04 2024-04-19 国网上海市电力公司 Hidden danger analysis method and system based on multi-source sensor data

Similar Documents

Publication Publication Date Title
CN108074022A (en) A kind of hardware resource analysis and appraisal procedure based on concentration O&M
Bahga et al. Analyzing massive machine maintenance data in a computing cloud
CN103176974B (en) The method and apparatus of access path in optimization data storehouse
CN111027615B (en) Middleware fault early warning method and system based on machine learning
CN106708016A (en) Failure monitoring method and failure monitoring device
EP3679487A1 (en) Apparatus and method for real time analysis, predicting and reporting of anomalous database transaction log activity
CN102130783A (en) Intelligent alarm monitoring method of neural network
CN109753591A (en) Operation flow predictability monitoring method
CN112445844B (en) Financial data management control system of big data platform
CN112785108A (en) Power grid operation data correlation analysis method and system based on regulation cloud
Ishii et al. An online data access prediction and optimization approach for distributed systems
Vazhkudai et al. GUIDE: a scalable information directory service to collect, federate, and analyze logs for operational insights into a leadership HPC facility
CN115544519A (en) Method for carrying out security association analysis on threat information of metering automation system
CN112487053B (en) Abnormal control extraction working method for mass financial data
CN113689079A (en) Transformer area line loss prediction method and system based on multivariate linear regression and cluster analysis
CN110888850B (en) Data quality detection method based on electric power Internet of things platform
Lin et al. Using Computing Intelligence Techniques to Estimate Software Effort
CN110415136B (en) Service capability evaluation system and method for power dispatching automation system
Lazar et al. Predicting network traffic using TCP anomalies
CN112463853B (en) Financial data behavior screening working method through cloud platform
Khan et al. Predictive process monitoring using a Markov model technique
De Fazio et al. CaseID Detection for Process Mining: A Heuristic-Based Methodology
CN113890018B (en) Power distribution network weak point identification method based on data association analysis
CN117195292B (en) Power business evaluation method based on data fusion and edge calculation
Wei et al. A Method of Abnormal Measurement Screening for Special Transformer Users Based on Correlation Measurement Algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 100192 Beijing city Haidian District Qinghe small Camp Road No. 15

Applicant after: CHINA ELECTRIC POWER RESEARCH INSTITUTE Co.,Ltd.

Applicant after: STATE GRID CORPORATION OF CHINA

Address before: 100192 Beijing city Haidian District Qinghe small Camp Road No. 15

Applicant before: China Electric Power Research Institute

Applicant before: State Grid Corporation of China

RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180525