CN111475377A - Method and system for detecting health degree of data center and storage medium - Google Patents

Method and system for detecting health degree of data center and storage medium Download PDF

Info

Publication number
CN111475377A
CN111475377A CN202010228287.9A CN202010228287A CN111475377A CN 111475377 A CN111475377 A CN 111475377A CN 202010228287 A CN202010228287 A CN 202010228287A CN 111475377 A CN111475377 A CN 111475377A
Authority
CN
China
Prior art keywords
data
maintenance
health degree
stability
data center
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010228287.9A
Other languages
Chinese (zh)
Inventor
李晓文
李季龙
李世英
童荪
林兵
郭家溢
买吾浪江·艾依提
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Unicom Guangdong Industrial Internet Co Ltd
Original Assignee
China Unicom Guangdong Industrial Internet Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Unicom Guangdong Industrial Internet Co Ltd filed Critical China Unicom Guangdong Industrial Internet Co Ltd
Priority to CN202010228287.9A priority Critical patent/CN111475377A/en
Publication of CN111475377A publication Critical patent/CN111475377A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3051Monitoring arrangements for monitoring the configuration of the computing system or of the computing system component, e.g. monitoring the presence of processing resources, peripherals, I/O links, software programs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3055Monitoring arrangements for monitoring the status of the computing system or of the computing system component, e.g. monitoring if the computing system is on, off, available, not available
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data
    • G06F11/3072Monitoring arrangements determined by the means or processing involved in reporting the monitored data where the reporting involves data filtering, e.g. pattern matching, time or event triggered, adaptive or policy-based reporting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06311Scheduling, planning or task assignment for a person or group
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/20Administration of product repair or maintenance

Abstract

The invention discloses a method for detecting health degree of a data center, which comprises the following steps: acquiring operation and maintenance data of a data center; determining a weight value of the operation and maintenance data; generating a health degree score of the data center through the weight value and the operation and maintenance data; generating an operation and maintenance scheme of the data health center according to the health degree score of the data center; the operation and maintenance data comprises: the method comprises the steps of determining health degree grading proportion of various factors in a data center through a weight distribution algorithm according to alarm data, operation and maintenance personnel data, data center parameters, work order data and emergency data, and finally generating the health degree of the data center, wherein the health degree is used as a basis for improving and optimizing an operation and maintenance scheme of the data center; compared with the data center operation and maintenance condition evaluation method in the prior art, the omnibearing and multi-angle data health degree detection method provided by the technical scheme of the invention is more scientific and reliable, has better adaptability, and can be widely applied to the technical field of data center operation and maintenance.

Description

Method and system for detecting health degree of data center and storage medium
Technical Field
The invention relates to the technical field of operation and maintenance of data centers, in particular to a method and a system for detecting health degree of a data center and a storage medium.
Background
The noun explains:
the role of a Data Center (abbreviated as IDC, Internet Data Center) can be summarized as a base for providing resource services.
Data Center Health (DCH) is an important reference index of a data Center.
Currently, global IT is developing into the cloud era, and cloud computing has become a new generation of IT infrastructure. Therefore, in recent years, data center construction enters an explosion period, the traffic carried by the data center is generally large, and the safe and stable operation of the data center is the most important for both data center operators and users. How to evaluate the reliability of a data center, from the construction aspect of the current data center, the construction standard generally reaches the international T3 and the international A level, so from the perspective of data center operation, the concept of the health degree of the data center is proposed, and the reliability of each data center is fed back.
However, the method for detecting the health degree of the data health center provided by the prior art rarely considers the level of the fault and whether the fault is completed within a specified time, and only considers that the fault occurs, the sum of time is calculated, and the sum of time is used as a numerator; for example, if a one-level (highest level) fault is not completed in time and a four-level (lowest level) fault is completed in time, the sum of the times is just 0, then its processing timeliness becomes 100% according to the above formula, which is obviously unreasonable. Meanwhile, the scheme provided by the prior art ignores the influence of factors such as equipment stability, operation and maintenance team capability level and operation and maintenance team stability on the health degree of the data center.
Disclosure of Invention
To solve at least one of the above problems, the present invention is directed to: in order to achieve the above technical objects, the present invention provides an all-round and scientific method for detecting health degree of a data center, a system and a storage medium for implementing the method for detecting health degree of a data center, wherein the method comprises:
in one aspect, the invention provides a method for detecting health degree of a data center, which comprises the following steps:
acquiring operation and maintenance data of a data center;
determining a weight value of the operation and maintenance data;
generating a health degree score of the data center through the weight value and the operation and maintenance data;
generating an operation and maintenance scheme of the data health center according to the health degree score of the data center;
wherein, the operation and maintenance data comprises: alarm data, operation and maintenance personnel data, data center parameters, work order data and emergency data.
In some embodiments of the present invention, the step of determining the weight value of the operation and maintenance data specifically includes:
generating a judgment matrix of the operation and maintenance data;
normalizing the judgment matrix to obtain a sum vector;
normalizing the sum vector to obtain a weight vector;
and determining the weight value according to the weight vector.
In some embodiments of the present invention, the step of generating the health score of the data center by using the weight value and the operation and maintenance data specifically includes:
generating the health degree of equipment stability according to the operation and maintenance data;
generating the health degree of the ability level of the operation and maintenance team according to the operation and maintenance data;
generating the stability health degree of the operation and maintenance team according to the operation and maintenance data;
and generating a health degree score of the data center according to the health degree of the equipment stability, the health degree of the capacity level of the operation and maintenance team and the health degree of the stability of the operation and maintenance team in combination with the weight value.
In some embodiments of the present invention, the step of generating the health degree of the stability of the device according to the operation and maintenance data specifically includes:
acquiring alarm data in the operation and maintenance data;
determining the stability of the moving ring equipment according to the alarm data;
determining the stability of the heating and ventilation equipment according to the alarm data;
determining the stability of the video equipment according to the alarm data;
determining the stability of the access control equipment according to the alarm data;
the health degree of the equipment stability is generated through the stability of the moving ring equipment, the stability of the heating and ventilation equipment, the stability of the video equipment and the stability of the access control equipment.
In some embodiments of the present invention, the step of generating the health degree of the capacity level of the operation and maintenance team according to the operation and maintenance data specifically includes:
acquiring operation and maintenance personnel data, work order data and emergency data in the operation and maintenance data;
determining the density of operation and maintenance personnel according to the operation and maintenance personnel data;
determining the evidence holding ratio of the operation and maintenance personnel according to the data of the operation and maintenance personnel;
determining the maintenance inspection work order processing timeliness rate according to the work order data;
determining the alarm work order processing timeliness rate according to the work order data;
determining an emergency capacity score according to the emergency data;
and generating the health degree of the capacity level of the operation and maintenance team according to the density of the operation and maintenance personnel, the duty ratio of the operation and maintenance personnel, the processing timeliness rate of the maintenance inspection work order, the processing timeliness rate of the alarm work order and the emergency capacity value.
In some embodiments of the present invention, the step of generating the health degree of the stability of the operation and maintenance team according to the operation and maintenance data specifically includes:
acquiring operation and maintenance personnel data in the operation and maintenance data;
determining the stability rate of key personnel according to the data of the operation and maintenance personnel;
determining the personnel stability rate according to the operation and maintenance personnel data;
and generating the health degree of the stability of the operation and maintenance team according to the key personnel stability rate and the personnel stability rate.
In some embodiments of the invention, the alarm data includes active alarm data and historical alarm data.
In a second aspect, a technical solution of the present invention further provides a system for detecting a health degree of a data center, which can correspondingly implement a method for detecting a health degree of a data center in the foregoing embodiments, and includes:
the data acquisition module is used for acquiring operation and maintenance data of the data health center;
the data preprocessing module is used for determining the weight value of the operation and maintenance data;
the DCH calculation module is used for generating a health degree score of the data center according to the weight value and the operation and maintenance data and generating an operation and maintenance scheme of the data health center according to the health degree score of the data center;
wherein, the data preprocessing module includes:
the equipment stability module is used for generating the health degree of equipment stability;
the operation and maintenance team competence level module is used for generating the health degree of the operation and maintenance team competence level;
and the operation and maintenance team stability module is used for generating the health degree of the operation and maintenance team stability.
In a third aspect, the present invention further provides another data center health degree detection system, including at least one processor; at least one memory for storing at least one program; when the at least one program is executed by the at least one processor, the at least one program causes the at least one processor to implement a method for data center health detection.
In a fourth aspect, the present invention further provides a storage medium, in which a processor-executable program is stored, and the processor-executable program is used to implement a method for detecting health of a data center when executed by a processor.
Advantages and benefits of the present invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention:
according to the method for detecting the health degree of the data center, provided by the technical scheme of the invention, indexes of the health degree of the data center are generated from data such as alarm data, operation and maintenance personnel data, data center parameters, work order data, emergency data and the like, the health degree value proportion of various factors in the data center is determined through a weight distribution algorithm, and the health degree of the data center is finally generated and used as the basis for improving and optimizing the operation and maintenance scheme of the data center; compared with the data center operation and maintenance condition evaluation method in the prior art, the omnibearing and multi-angle data health degree detection method provided by the technical scheme of the invention is more scientific and reliable, and has better adaptability.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following description is made on the drawings of the embodiments of the present invention or the related technical solutions in the prior art, and it should be understood that the drawings in the following description are only for convenience and clarity of describing some embodiments in the technical solutions of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a flowchart illustrating steps of a method for detecting health of a data center according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating steps for generating health of device stability based on operation and maintenance data according to an embodiment of the present invention;
FIG. 3 is a flowchart illustrating steps for generating a health level of an operation and maintenance team based on operation and maintenance data according to an embodiment of the present invention;
FIG. 4 is a flowchart illustrating steps for generating a health degree of the stability of the operation and maintenance team according to the operation and maintenance data in the embodiment of the present invention;
FIG. 5 is a block diagram of a data center health detection system according to an embodiment of the present invention;
FIG. 6 is a block diagram of another data center health monitoring system according to an embodiment of the invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention. The step numbers in the following embodiments are provided only for convenience of illustration, the order between the steps is not limited at all, and the execution order of each step in the embodiments can be adapted according to the understanding of those skilled in the art.
Referring to fig. 1, the secure resource operation and maintenance platform system in the technical solution of the present invention is a method for detecting health degree of a data center, including steps S01-S04:
s01, acquiring operation and maintenance data of the data health center; in this embodiment, the operation and maintenance data includes alarm data, operation and maintenance personnel data, data center parameters, work order data, and emergency data, and step S01 may be further subdivided into steps S011 to S015:
s011, acquiring alarm data; since the data center needs to provide resources for external services, the data center must provide a stable environment in terms of power, heating and ventilation, door access, video, and the like to provide services for external services. Whether equipment used by the dynamic loop (power, heating ventilation), heating ventilation, entrance guard and video is stable or not is crucial to the data center. In order to obtain and detect various indexes of the equipment, in the implementation process, a collector can be installed on each equipment, and the collector transmits the collected equipment indexes to a data center for storage; and then, configuring an alarm strategy, distributing alarm levels while configuring the alarm strategy, matching the acquired equipment indexes with corresponding alarm thresholds, wherein different thresholds are matched with different alarm levels, and the influence generated by different alarm levels is different.
For example, in some embodiments, a UPS is selected as a collection source of alarm data (the UPS belongs to a device type in a moving loop), 3 groups of storage batteries are arranged in the UPS, one group of storage batteries temporarily does not affect the external voltage due to uneven voltage distribution but needs to perform early warning, and the early warning information is set to be the warning level of general warning; and the output voltage of the other group of storage batteries seriously exceeds the standard, if the continuous operation can cause the whole group of storage batteries to be incapable of working and emergency treatment is needed, the condition is set as the alarm level of emergency alarm.
If the equipment indexes collected in real time are matched with the corresponding alarm strategies, alarm information can be generated. In some embodiments, the alarm data includes activity alarm data and historical alarm data, i.e. generated alarm information, and is firstly recorded in an activity alarm table, and when the alarm is recovered, the alarm information is transcribed into the historical alarm table by the activity alarm table; therefore, the alarm data and the alarm level need to be obtained from the active alarm table and the historical alarm table at the same time.
S012, acquiring operation and maintenance personnel data; specifically, a personnel information table is obtained, wherein detailed information of personnel (such as names, whether relevant professional certificates exist, whether management personnel exist, time of entering work, time of leaving work and the like) of the data center is recorded in the table, and when the detailed information changes, the information can be updated into a storage unit such as a database in a quasi-real-time manner, so that corresponding authority control can be performed on the personnel.
S013, obtaining data center parameters; specifically, a data center parameter table is obtained, wherein the data center parameter table records the general profile (such as building area, number of designed racks, total design power, and when to put into production) of the data center in detail, and meanwhile, a fixed period can be set, and special fields such as the actual number of used racks and the like are obtained by timing calculation.
S014, acquiring work order data; specifically, a work order table of the data center is obtained, and the work order table contains work order data of the data center, such as: the work order table records the creation time of the work order, the type of the work order (such as maintenance, inspection and alarm), how to transfer, the content and the state of the work order, and the like in detail.
S015, acquiring emergency data; specifically, the record of the emergency event is obtained, the record also comprises the record of the emergency drilling, and the details of the emergency event, including the time and the grade of the emergency event (drilling), the content of the emergency event (drilling), the completion time, the rectification measure and other data, are obtained from the record.
S02, determining the weight value of the operation and maintenance data; specifically, in this embodiment, an Analytic Hierarchy Process (AHP) is used to solve, and in some embodiments, the step S02 can be further detailed as steps S021 to S024:
s021, generating a judgment matrix of the operation and maintenance data; for example: if X is considered to be1And X2Of equal importance, X3Slightly less important than the other two, then X1,X2And X3The determination matrix a of (a) may be:
Figure BDA0002428396650000051
s022, normalizing the judgment matrix and obtaining a sum vector; in particular, equation (1) is normalized to the matrix by columns, i.e.
Figure BDA0002428396650000061
i is a row, j is a column, and m is the number of elements in the matrix; i.e. normalized matrix
Figure BDA0002428396650000062
Then adding the formula (2) according to rows to obtain a sum vector wiBy the formula:
Figure BDA0002428396650000063
to obtain w1=1.35,w2=1.35,w3=0.3。
S023, normalizing the sum vector to obtain a weight vector; specifically, the sum vector is normalized to the weight vector:
w=(0.45 0.45 0.1)T……(4)
s024, determining a weight value according to the weight vector; finally judging whether one-time inspection is satisfied, if yes, X1,X2And X3The values of (a) are 0.45, 0.45 and 0.1, respectively.
S03, generating a health degree score of the data center through the weight value and the operation and maintenance data; in some embodiments, step S03 can be further subdivided into steps S031-S034, where:
s031, according to the operation and maintenance data, generating the health degree of the stability of the equipment; referring to fig. 2, the levels and the amounts of alarm data in each device type in a time period are counted from an active alarm table and a historical alarm table; then generating the score of each equipment type according to the alarm type weight and the alarm quantity; obtaining the health degree score of the equipment stability according to the weight of each equipment type and the score of each equipment type; specifically, the health degree of the equipment stability module consists of three parts, namely the stability of moving ring equipment, the stability of BA heating and ventilation equipment and the stability of video access control; meanwhile, in the embodiment, when the full scores of all the parts are set to be 10 scores, step S031 may be further subdivided into steps S0311-S0314:
s0311, confirm the stability of the moving loop apparatus; firstly, the alarm quantity of each alarm level in the dynamic ring equipment in the time period is counted from the activity alarm table (activity alarm data) and the historical alarm table (historical alarm data), and the dynamic ring equipment stability DBI1Comprises the following steps:
Figure BDA0002428396650000064
in the formula (5), siThe alarm quantity of the moving ring equipment of the class to which the alarm belongs in the time period; total ofeThe total number of the moving ring equipment; t represents a time period; lambda [ alpha ]iIn this embodiment, the alarm level is divided into 4 levels or 7 levels for the weight corresponding to the alarm level, and the lower the number is, the lower the number isThe more serious the alarm condition is; so that λ is low in the number of gradation stepsiThe larger the value of (A), the more the level increasesiDecreasing from large to small. Further formulating formula (5) to obtain:
Figure BDA0002428396650000071
in the formula (6), n is the number of alarm levels.
S0312, confirm the stability of BA (warm and ventilating) apparatus; firstly, the alarm quantity of each alarm level in BA equipment in a time period is counted from an activity alarm table (activity alarm data) and a historical alarm table (historical alarm data), and then the stability DBI of the BA heating and ventilation equipment2Comprises the following steps:
Figure BDA0002428396650000072
in the formula (7), sBAiThe alarm quantity of the BA equipment of the grade to which the alarm belongs in the time period; total ofBAThe total number of BA devices; t represents a time period; lambda [ alpha ]BAiIs the weight of the corresponding alarm level. Further formulating formula (7) to obtain:
Figure BDA0002428396650000073
in the formula (8), n is the number of alarm levels.
S0313, confirm video and stability of entrance guard ' S apparatus, should count video and every warning quantity of warning grade in entrance guard ' S apparatus in the time cycle from activity alarm table (activity alarm data) and historical alarm table (historical alarm data) at first, further video and stability DBI of entrance guard ' S apparatus3Comprises the following steps:
Figure BDA0002428396650000074
in formula (9), sDViThe alarm quantity of the entrance guard video equipment of the alarm belonging grade in the time period; total ofDVFor access controlA total number of video devices; t represents a time period; lambda [ alpha ]DViIs the weight of the corresponding alarm level. The formula (9) is arranged to obtain:
Figure BDA0002428396650000075
in the formula (10), n is the number of alarm levels.
S0314, health degree DQI of equipment stability1Then it is:
DQI1=γ1×DBI12×DBI23×DBI3……(11)
in formula (11), γ1、γ2、γ3Weight value of gamma corresponding to moving ring equipment, BA heating and ventilation equipment and video and entrance guard equipment123=1。γ1、γ2、γ3The value of (c) can still be determined by AHC analytic hierarchy process. For example, in a certain embodiment, the moving-loop device stability module is more important than the heating and ventilation device stability, and the matrix of the heating and ventilation device stability more important than the door video device stability may be set as:
Figure BDA0002428396650000081
finally, the gamma is obtained1、γ2、γ3The values of (A) are 0.63, 0.26, 0.11, respectively.
S032, referring to fig. 3, generating a health degree of the operation and maintenance team capacity level according to the operation and maintenance data; specifically, the health degree of the competence level of the operation and maintenance team consists of five parts, namely an operation and maintenance personnel density score, an operation and maintenance personnel maintenance ratio score, a maintenance inspection work order processing timeliness score, an alarm (fault) processing timeliness score and an emergency competence score; in the embodiment, if the full portions of the portions are 10 minutes, step S032 can be further subdivided into steps S0321-S0326:
s0321, determining the density of operation and maintenance personnel; the operation and maintenance personnel density represents the reasonability of the personnel number in the data center, and is oneFor a huge data center, reasonable number of people maintain reasonable equipment, and the data center can be better managed. Acquiring the number s of operation and maintenance personnel of the data center in the time period T from the operation and maintenance personnel dataWTotal number of racks in data center in time period TFAnd calculating the density DBI of the operation and maintenance personnel4
Figure BDA0002428396650000082
In the formula (13), Q is the optimal ratio of the number of the operation and maintenance persons to the number of the racks, and can be adjusted.
S0322, determining the evidence holding ratio of the operation and maintenance personnel; the evidence holding ratio of operation and maintenance personnel represents the importance of professional personnel in data center personnel. Further extracting the number s of the licensees in the operation and maintenance personnel of the data center in the time period T from the operation and maintenance personnel dataLAnd the number s of operation and maintenance personnel of the data center in the time period TWAnd calculating the density DBI of the operation and maintenance personnel5
Figure BDA0002428396650000083
In the formula (14), M is the optimal ratio of the number of the licensees to the number of the operation and maintenance persons, and can be adjusted.
S0323, determining the processing timeliness rate of the maintenance inspection work order; the equipment of the data center needs to be regularly patrolled and maintained, so that the fault of the equipment can be found as early as possible to avoid generating influence, and meanwhile, the service life of the equipment can be prolonged. Specifically, the work order number s which is not completed within a specified time in the intermediate period is obtained from the work order dataU(ii) a And total of all work pieces of T in time periodTAnd calculating to obtain maintenance inspection work order processing and time rate value DBI6
Figure BDA0002428396650000084
S0324, determining the processing timeliness rate of the alarm work order; alarms generated by equipment need to be processed in timeIf the processing is not timely, the method can possibly cause great hidden danger to users of the data center, influence production activities and suffer loss. Obtaining the alarm work order number s which is not recovered according to the specified time in the time period from the work order dataAiAnd total number of alarm work orders in time period TA(ii) a The corresponding alarm levels of different alarms are different, so the time required to process different alarms and even different settings is different. Combining two factors of alarm grade and specified processing time, the score DBI of alarm processing and time rate7The formula of (1) is:
Figure BDA0002428396650000091
in the formula (16), λAiIs the weight of the corresponding alarm level. Further formulating formula (16) as follows:
Figure BDA0002428396650000092
in the formula (17), n is the number of levels of the warning level.
S0325, determining an emergency capacity score; whether the data center is processed within a set time or not when an emergency event occurs determines whether the influence on the user can be reduced to the minimum or not, so that the emergency capacity is a key point value in the data. Extracting total number of emergency events in time period from emergency dataEMAnd the number S of emergency events not completed within a predetermined time within the time period TEMiThen calculate the score DBI of the emergency ability8
Figure BDA0002428396650000093
In the formula (18), λEMiFor the weight of the emergency level, equation (18) may be further formulated as:
Figure BDA0002428396650000094
in the formula (19), n is the number of emergency steps.
S0326, and finally generating the health DQI of the ability level of the operation and maintenance team2
DQI2=γ4×DBI45×DBI56×DBI67×DBI78×DBI8……(20)
Gamma in the formula (20)1、y5、γ6、γ7、γ8Weights, γ, for each part score45678=1。γ4、γ5、γ6、γ7、γ8The value of (c) can still be determined by AHP analytic hierarchy process.
S033, referring to fig. 4, generating a health degree of the stability of the operation and maintenance team according to the operation and maintenance data; specifically, the health degree of the stability of the operation and maintenance team consists of two parts, namely a key personnel stability rate value and a personnel stability rate value; in the embodiment, similarly, when the full scores of all the parts are 10 scores, the step S033 may be further subdivided into steps S0331 to S0333:
s0331, determining the stability rate of key personnel; acquiring operation and maintenance personnel data in the operation and maintenance data, and extracting the number s of loss of key personnel in the time period T from the operation and maintenance personnel datacentralloss(ii) a And total number of key personnel in time period TcenterThen the key personnel stability score DBI9
Figure BDA0002428396650000095
S0332, determining the personnel stability rate; acquiring the operation and maintenance personnel data in the operation and maintenance data, and extracting the number s of the operation and maintenance personnel lost in the time period T from the operation and maintenance personnel dataloss(ii) a And the total number s of operation and maintenance personnel in the time period TWThen the personnel stability score DBI10
Figure BDA0002428396650000101
S0333, DQI for generating stability of operation and maintenance team3By the following formula:
DQI3=γ9×DBI910×DBI10……(23)
in the formula (23), γ9、γ10Is a weight value, γ910=1,γ9、γ10The value of (c) can still be determined by AHP analytic hierarchy process.
S034, generating a data center health degree score according to the health degree of the equipment stability, the health degree of the operation and maintenance team capacity level and the health degree of the operation and maintenance team stability in combination with the weight value. Specifically, according to the generated three data DQI1、DQI2、DQI3Calculating formula by Data Center Health (DCH):
DCH=η1DQI12DQI23DQI3……(24)
η therein123=1;DQI1DQI, a health measure of the stability of the apparatus and its systems2DQI for the fitness level of the operation and maintenance team3For the health of the stability of the operation and maintenance team, η1、η2、η3The value of (c) is solved using an Analytic Hierarchy Process (AHP).
S04, generating a data health center operation and maintenance scheme according to the data center health degree score; specifically, the health degree of the data center is compared with the currently generated health degree according to a preset threshold value of the health degree of the data center; when the current operation and maintenance scheme is higher than the threshold value, the current operation and maintenance scheme of the data center is maintained; when the health is below the threshold, it is first necessary to locate the part with the lower comparison score, for example: if the equipment stability score is low, further carrying out positioning analysis on the situation and generating and implementing corresponding measures through further carrying out backward deduction through a weight analysis process: the maintenance frequency and the routing inspection frequency of the equipment are increased, obstacles are removed in advance, the equipment is replaced when necessary, and the like.
Next, system embodiments for implementing method embodiments proposed according to embodiments of the present invention are described with reference to the accompanying drawings.
Referring to fig. 5, an embodiment of the system of the present invention comprises:
the data acquisition module is used for acquiring operation and maintenance data of the data health center;
the data preprocessing module is used for determining the weight value of the operation and maintenance data;
the DCH calculation module is used for generating a health degree score of the data center according to the weight value and the operation and maintenance data and generating an operation and maintenance scheme of the data health center according to the health degree score of the data center;
wherein, the data preprocessing module includes:
the equipment stability module is used for generating the health degree of equipment stability;
the operation and maintenance team competence level module is used for generating the health degree of the operation and maintenance team competence level;
and the operation and maintenance team stability module is used for generating the health degree of the operation and maintenance team stability.
Referring to fig. 6, an embodiment of the present invention provides a system for detecting health of a data center, including:
at least one processor;
at least one memory for storing at least one program;
the at least one program, when executed by the at least one processor, causes the at least one processor to implement the method for data center health detection.
The embodiment of the invention also provides a storage medium, wherein a program executable by a processor is stored in the storage medium, and the program executable by the processor is used for realizing the method for detecting the health degree of the data center when being executed by the processor.
The functions of the above-described embodiments, if implemented in the form of software functional units and sold or used as independent products, may be stored in a computer-readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
As can be summarized from the above specific implementation process, the technical solution provided by the present invention has the following advantages or advantages compared with the prior art:
1. the technical scheme of the invention innovatively provides the concept of the health degree of the data centers, is used for feeding back the reliability of each data center, and provides standard and scientific industrial indexes.
2. According to the technical scheme, the equipment stability, the operation and maintenance team capacity level and the operation and maintenance team stability are used as indexes influencing the health degree of a data center, so that the detection of the health degree of the data center is more comprehensive and scientific;
3. in the implementation process of the technical scheme, the index of the number of alarms (faults) is also considered, the grade of the alarms (faults) is considered in the index, because the stability of the data center is affected differently due to the different grades of the alarms (faults), and the health degree is more detailed and accurate through more specific data indexes.
While the preferred embodiments of the present invention have been illustrated and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (10)

1. A method for detecting the health degree of a data center is characterized by comprising the following steps:
acquiring operation and maintenance data of a data center;
determining a weight value of the operation and maintenance data;
generating a data center health degree score through the weight value and the operation and maintenance data;
generating a data health center operation and maintenance scheme according to the data center health degree score;
the operation and maintenance data comprises: alarm data, operation and maintenance personnel data, data center parameters, work order data and emergency data.
2. The method according to claim 1, wherein the step of determining the weight value of the operation and maintenance data specifically includes:
generating a judgment matrix of the operation and maintenance data;
normalizing the judgment matrix to obtain a sum vector;
normalizing the sum vector to obtain a weight vector;
and determining a weight value according to the weight vector.
3. The method for detecting the health degree of the data center according to claim 1, wherein: the step of generating the health degree score of the data center according to the weight value and the operation and maintenance data specifically includes:
generating the health degree of equipment stability according to the operation and maintenance data;
generating the health degree of the ability level of the operation and maintenance team according to the operation and maintenance data;
generating the stability health degree of the operation and maintenance team according to the operation and maintenance data;
and generating a data center health degree score according to the health degree of the equipment stability, the health degree of the operation and maintenance team capacity level and the health degree of the operation and maintenance team stability by combining the weight values.
4. The method for detecting the health degree of the data center according to claim 3, wherein the step of generating the health degree of the stability of the equipment according to the operation and maintenance data specifically includes:
acquiring alarm data in the operation and maintenance data;
determining the stability of the moving ring equipment according to the alarm data;
determining the stability of the heating and ventilation equipment according to the alarm data;
determining the stability of the video equipment according to the alarm data;
determining the stability of the access control equipment according to the alarm data;
and generating the health degree of the equipment stability through the stability of the moving ring equipment, the stability of the heating and ventilating equipment, the stability of the video equipment and the stability of the access control equipment.
5. The method for detecting health degree of data center according to claim 3, wherein the step of generating the health degree of the ability level of the operation and maintenance team according to the operation and maintenance data specifically comprises:
acquiring operation and maintenance personnel data, work order data and emergency data in the operation and maintenance data;
determining the density of operation and maintenance personnel according to the operation and maintenance personnel data;
determining the evidence holding ratio of the operation and maintenance personnel according to the operation and maintenance personnel data;
determining the maintenance inspection work order processing timeliness rate according to the work order data;
determining the processing timeliness rate of the alarm work order according to the work order data;
determining an emergency capacity score according to the emergency data;
and generating the health degree of the capacity level of the operation and maintenance team according to the density of the operation and maintenance personnel, the duty ratio of the operation and maintenance personnel, the processing timeliness rate of the maintenance inspection work order, the processing timeliness rate of the alarm work order and the emergency capacity value.
6. The method for detecting health of a data center according to claim 3, wherein the step of generating the health of the stability of the operation and maintenance team according to the operation and maintenance data specifically comprises:
acquiring operation and maintenance personnel data in the operation and maintenance data;
determining the stability rate of key personnel according to the operation and maintenance personnel data;
determining the personnel stability rate according to the operation and maintenance personnel data;
and generating the health degree of the stability of the operation and maintenance team according to the key personnel stability rate and the personnel stability rate.
7. The method for detecting the health degree of the data center according to claim 1, wherein the alarm data comprises active alarm data and historical alarm data.
8. A data center health detection system, comprising:
the data acquisition module is used for acquiring operation and maintenance data of the data health center;
the data preprocessing module is used for determining a weight value of the operation and maintenance data;
the DCH calculation module is used for generating a health degree score of the data center according to the weight value and the operation and maintenance data and generating an operation and maintenance scheme of the data health center according to the health degree score of the data center;
the data preprocessing module comprises:
the equipment stability module is used for generating the health degree of equipment stability;
the operation and maintenance team competence level module is used for generating the health degree of the operation and maintenance team competence level;
and the operation and maintenance team stability module is used for generating the health degree of the operation and maintenance team stability.
9. A data center health detection system, comprising:
at least one processor;
at least one memory for storing at least one program;
when executed by the at least one processor, the at least one program causes the at least one processor to implement a method for data center health detection as claimed in any one of claims 1-7.
10. A storage medium having stored therein a program executable by a processor, characterized in that: the processor-executable program when executed by a processor is for implementing a data center health detection method as claimed in any one of claims 1-7.
CN202010228287.9A 2020-03-27 2020-03-27 Method and system for detecting health degree of data center and storage medium Pending CN111475377A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010228287.9A CN111475377A (en) 2020-03-27 2020-03-27 Method and system for detecting health degree of data center and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010228287.9A CN111475377A (en) 2020-03-27 2020-03-27 Method and system for detecting health degree of data center and storage medium

Publications (1)

Publication Number Publication Date
CN111475377A true CN111475377A (en) 2020-07-31

Family

ID=71749291

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010228287.9A Pending CN111475377A (en) 2020-03-27 2020-03-27 Method and system for detecting health degree of data center and storage medium

Country Status (1)

Country Link
CN (1) CN111475377A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112069047A (en) * 2020-09-02 2020-12-11 鹏城实验室 Computational ecology detection method, device, equipment and storage medium
CN112257984A (en) * 2020-09-24 2021-01-22 南方电网调峰调频发电有限公司 State monitoring method based on health degree evaluation of power equipment
CN114363934A (en) * 2021-12-30 2022-04-15 中国电信股份有限公司 Base station health degree evaluation method, device, equipment and medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102081622A (en) * 2009-11-30 2011-06-01 中国移动通信集团贵州有限公司 Method and device for evaluating system health degree
US20160359872A1 (en) * 2015-06-05 2016-12-08 Cisco Technology, Inc. System for monitoring and managing datacenters
CN108228412A (en) * 2016-12-15 2018-06-29 中国电子科技集团公司电子科学研究院 A kind of method and device based on system health degree faults of monitoring system and hidden danger
CN108733532A (en) * 2017-04-18 2018-11-02 北京京东尚科信息技术有限公司 Health degree management-control method, device, medium and the electronic equipment of big data platform
CN109685344A (en) * 2018-12-14 2019-04-26 广东电网有限责任公司 A kind of power equipment O&M strategy determines method, apparatus and storage medium
CN110659832A (en) * 2019-09-26 2020-01-07 北京市天元网络技术股份有限公司 Method and equipment for detecting health degree of 5G network element

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102081622A (en) * 2009-11-30 2011-06-01 中国移动通信集团贵州有限公司 Method and device for evaluating system health degree
US20160359872A1 (en) * 2015-06-05 2016-12-08 Cisco Technology, Inc. System for monitoring and managing datacenters
CN108228412A (en) * 2016-12-15 2018-06-29 中国电子科技集团公司电子科学研究院 A kind of method and device based on system health degree faults of monitoring system and hidden danger
CN108733532A (en) * 2017-04-18 2018-11-02 北京京东尚科信息技术有限公司 Health degree management-control method, device, medium and the electronic equipment of big data platform
CN109685344A (en) * 2018-12-14 2019-04-26 广东电网有限责任公司 A kind of power equipment O&M strategy determines method, apparatus and storage medium
CN110659832A (en) * 2019-09-26 2020-01-07 北京市天元网络技术股份有限公司 Method and equipment for detecting health degree of 5G network element

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112069047A (en) * 2020-09-02 2020-12-11 鹏城实验室 Computational ecology detection method, device, equipment and storage medium
CN112069047B (en) * 2020-09-02 2023-02-07 鹏城实验室 Computational ecology detection method, device, equipment and storage medium
CN112257984A (en) * 2020-09-24 2021-01-22 南方电网调峰调频发电有限公司 State monitoring method based on health degree evaluation of power equipment
CN112257984B (en) * 2020-09-24 2022-11-18 南方电网调峰调频发电有限公司 State monitoring method based on health degree evaluation of power equipment
CN114363934A (en) * 2021-12-30 2022-04-15 中国电信股份有限公司 Base station health degree evaluation method, device, equipment and medium

Similar Documents

Publication Publication Date Title
CN111475377A (en) Method and system for detecting health degree of data center and storage medium
CN108763957A (en) A kind of safety auditing system of database, method and server
CN104809933B (en) A kind of power grid is without script emergency drilling system, drilling method and equipment
CN108846585A (en) A kind of hidden danger of coal mine processing management system
CN106101252A (en) Information Security Risk guard system based on big data and trust computing
CN110417721A (en) Safety risk estimating method, device, equipment and computer readable storage medium
CN111815132A (en) Network security management information publishing method and system for power monitoring system
CN110222525A (en) Database manipulation auditing method, device, electronic equipment and storage medium
CN107944702A (en) A kind of network security step analysis appraisal procedure, device and computer-readable recording medium
CN110162445A (en) The host health assessment method and device of Intrusion Detection based on host log and performance indicator
CN116797404A (en) Intelligent building operation and maintenance supervision system based on big data and data processing
CN105915402A (en) Industrial control network security protection system
CN110853744A (en) Hospital quality control management system under big data
CN113071966A (en) Elevator fault prediction method, device, equipment and storage medium
CN114139735A (en) Moving ring monitoring platform
CN113868509A (en) Science and technology policy data information consultation service system based on cloud computing
CN116578990A (en) Comprehensive monitoring technology based on digital operation and maintenance of data center
CN111930726A (en) Off-line form-based grade protection evaluation data acquisition and analysis method and system
CN114511227A (en) Power monitoring system network security policy arranging and handling method and system
CN113612625A (en) Network fault positioning method and device
CN110415136B (en) Service capability evaluation system and method for power dispatching automation system
CN106649034A (en) Visual intelligent operation and maintenance method and platform
CN113946464B (en) Alarm noise reduction method combining model and experience pre-training and parallel deduction
CN108304731A (en) A kind of method, system and information processing platform that management business data calls
CN112488873A (en) Intelligent mining construction method for health codes and state tracks of power supply and utilization equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200731

RJ01 Rejection of invention patent application after publication