CN109784504A - Data center's long-distance intelligent operation management method and system - Google Patents

Data center's long-distance intelligent operation management method and system Download PDF

Info

Publication number
CN109784504A
CN109784504A CN201811582893.XA CN201811582893A CN109784504A CN 109784504 A CN109784504 A CN 109784504A CN 201811582893 A CN201811582893 A CN 201811582893A CN 109784504 A CN109784504 A CN 109784504A
Authority
CN
China
Prior art keywords
data
busy percentage
cpu busy
data center
server
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811582893.XA
Other languages
Chinese (zh)
Inventor
陈金明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guizhou Yuhao Science And Technology Development Co Ltd
Original Assignee
Guizhou Yuhao Science And Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guizhou Yuhao Science And Technology Development Co Ltd filed Critical Guizhou Yuhao Science And Technology Development Co Ltd
Priority to CN201811582893.XA priority Critical patent/CN109784504A/en
Publication of CN109784504A publication Critical patent/CN109784504A/en
Pending legal-status Critical Current

Links

Abstract

The present invention relates to data center's O&M fields, disclose data center's long-distance intelligent operation management method and system, by obtaining data center's operation data and environmental data;According to decision rule and prediction model, the operation data and the environmental data are predicted;Management and fault pre-alarming are scheduled to data center apparatus according to prediction result.The present invention can effectively reduce the energy of data center, provide fault pre-alarming function, reduce failure rate, improve the utilization rate of resource and maintain the stabilization of system.

Description

Data center's long-distance intelligent operation management method and system
Technical field
The present invention relates to data center's O&M field more particularly to data center's long-distance intelligent operation management method and it is System.
Background technique
In recent years, with the fast development of information technology and network technology, especially in cloud computing, big data, Internet of Things Development and promotion under, data center also tends to high-density development, and data center's structure is increasingly complicated, and enterprise is in data The demand of the heart, server, link, service response etc. and requirement to operation system operational reliability, good experience It is higher and higher, meanwhile, to data center monitoring, higher requirements are also raised, and professional is in short supply, the O&M of data center Management work is also faced with huge challenge.
One typical data center contains various elements, and in software and service layer, it includes network, application, void Quasi-ization, server, storage etc.;In facilities such as infrastructure levels, including power, environment, HVAC, security protection;In O&M level, packet Include daily maintenance, inspection and anti-natural calamity etc..It can be said that data center is a complicated combined system.
Due to data center apparatus huge number, it is tired that the horizontal irregular one side of equipment manufacturer results in monitoring of tools Difficulty, fault location is difficult, on the other hand not in time due to equipment manufacturer's after-sale service, causes service response slow, can not quickly solve Failure problems.In the prior art, most data center all builds the infrastructure monitoring system (DCIM system) for having oneself and solves The technical problem, and DCIM system in the prior art remains in simple monitoring and data statistics is shown, it is not right Monitoring data carries out profound analysis, it is difficult to find the inducement and general character of failure.
Summary of the invention
The present invention provides data center's long-distance intelligent operation management method and system, solves data center in the prior art and transports Dimension monitoring does not carry out profound analysis to monitoring data, it is difficult to the technical issues of finding the inducement and general character of failure.
The purpose of the present invention is what is be achieved through the following technical solutions:
Data center's long-distance intelligent operation management method, comprising:
Obtain data center's operation data and environmental data, wherein the operation data includes that operation of air conditioner data, IT are set Standby operation data, network management data and UPS data, the environmental data include data center computer room environmental data and climatic environment number According to;
According to decision rule and prediction model, the operation data and the environmental data are predicted;
Management and fault pre-alarming are scheduled to data center apparatus according to prediction result.
Data center's long-distance intelligent operation management system, comprising:
Module is obtained, for obtaining data center's operation data and environmental data, wherein the operation data includes air-conditioning Operation data, information technoloy equipment operation data, network management data and UPS data, the environmental data include data center computer room environment number According to weather environmental data;
Prediction module, for being carried out to the operation data and the environmental data according to decision rule and prediction model Prediction;
Operation module, for being scheduled management and fault pre-alarming to data center apparatus according to prediction result.
The present invention provides data center's long-distance intelligent operation management method and system, by obtaining data center's operation data And environmental data;According to decision rule and prediction model, the operation data and the environmental data are predicted;According to pre- It surveys result and management and fault pre-alarming is scheduled to data center apparatus.The present invention can effectively reduce the energy of data center, mention For fault pre-alarming function, failure rate is reduced, the utilization rate of resource is improved and maintains the stabilization of system.
Detailed description of the invention
It in order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, below will be to institute in embodiment Attached drawing to be used is needed to be briefly described, it should be apparent that, the accompanying drawings in the following description is only some implementations of the invention Example, for those of ordinary skill in the art, without any creative labor, can also obtain according to these attached drawings Obtain other attached drawings.
Fig. 1 is data center's long-distance intelligent operation management method flow diagram of the embodiment of the present invention;
Fig. 2 is data center's long-distance intelligent operation management system structural schematic diagram of the embodiment of the present invention.
Specific embodiment
In order to make the foregoing objectives, features and advantages of the present invention clearer and more comprehensible, with reference to the accompanying drawing and specific real Applying mode, the present invention is described in further detail.
Present invention implementation provides data center's long-distance intelligent operation management method, as shown in Figure 1, comprising:
Step 101 obtains data center's operation data and environmental data;
Wherein, the operation data includes operation of air conditioner data, information technoloy equipment operation data, network management data and UPS number According to the environmental data includes data center computer room environmental data and weather environmental data;
Step 102, according to decision rule and prediction model, the operation data and the environmental data are predicted;
Step 103 is scheduled management and fault pre-alarming to data center apparatus according to prediction result.
Wherein, step 102 can specifically include:
Step 102-1, virtual machine (vm) migration operation is carried out to data center apparatus according to prediction result or server suspend mode is grasped Work or server power-off operation;Alternatively,
Step 102-2, early warning instruction is carried out according to failure of the prediction result to data center apparatus.
Step 102-1 can specifically include:
A, judge whether the cpu busy percentage of server is greater than cpu busy percentage upper limit threshold;
B, when cpu busy percentage is not more than cpu busy percentage upper limit threshold, judge whether cpu busy percentage is less than cpu busy percentage Lower threshold;When cpu busy percentage is less than cpu busy percentage lower threshold, time series autoregression AR model prediction future m is utilized The cpu busy percentage at a moment, when whether the cpu busy percentage for judging m moment is respectively less than cpu busy percentage lower threshold, into The operation of row virtual machine (vm) migration or server sleep operation or server power-off operation;Wherein, the time series autoregression AR mould Type can be substituted by support vector machines or neural network model.
Wherein, consuming energy for server of leaving unused is about the 60% of full-load operation energy consumption, and method through this embodiment will be not busy It sets the virtual machine run on server to be migrated, and suspend mode or power-off operation is carried out to server, can effectively reduce server Energy consumption.
C, when cpu busy percentage is greater than cpu busy percentage upper limit threshold, time series autoregression AR model prediction future is utilized The cpu busy percentage at m moment, when whether the cpu busy percentage for judging m moment is all larger than cpu busy percentage upper limit threshold, into The operation of row virtual machine (vm) migration or server sleep operation or server power-off operation.
Wherein, in order to ensure the happy operation of service, the virtual machine loaded in high service is migrated, realizes load The function of sharing.
In addition, weather monitoring can also be carried out by the external environment to data center in the present embodiment, by outside Environmental data inside environmental monitoring data and computer room carries out confluence analysis, the trip information of bonding apparatus, temperature information, Fault message etc. constructs simulation model, establishes the incidence relation between external environment condition parameter and computer room items operating parameter, thus Realize that the automated intelligent of air-conditioning setting cryogenic temperature and humidity is adjusted, in the case where ensuring that various equipment are normally and efficiently run, Extend the natural cooling time, the shortening electric refrigerating operaton time reaches reduction so as to save the energy consumption of HVAC refrigeration system PUE, the purpose for realizing energy-saving run.
Step 102-2 can specifically include:
Operation data is analyzed according to decision-tree model or Apriori model, is determined in data by decision rule The failure risk grade of heart equipment sends early warning instruction to operator according to risk class.Such as: by obtaining key equipment The historical operating parameter and its essential attribute information of such as battery, historical failure information, building environment parameter, to these data Mining analysis is carried out, a prediction model is constructed, then by the prediction model in originally implementing, in conjunction with corresponding Risk-warning Rule, so that it may look-ahead and identification a part there are the battery packs of high risk likelihood of failure, and by warning information with The operational system on foreground is integrated, regular real-time update risk label, so that reminding operation maintenance personnel to safeguard and replace in advance should Group battery reduces the possibility of delay machine to avoid the generation of failure.
The present invention provides data center's long-distance intelligent operation management method, by obtaining data center's operation data and environment Data;According to decision rule and prediction model, the operation data and the environmental data are predicted;According to prediction result Management and fault pre-alarming are scheduled to data center apparatus.The present invention can effectively reduce the energy of data center, provide failure Warning function reduces failure rate, improves the utilization rate of resource and maintain the stabilization of system.
The embodiment of the invention also provides data center's long-distance intelligent operation management systems, as shown in Figure 2, comprising:
Module 210 is obtained, for obtaining data center's operation data and environmental data, wherein the operation data includes Operation of air conditioner data, information technoloy equipment operation data, network management data and UPS data, the environmental data include data center computer room ring Border data and weather environmental data;
Prediction module 220, for according to decision rule and prediction model, to the operation data and the environmental data into Row prediction;
Operation module 230, for being scheduled management and fault pre-alarming to data center apparatus according to prediction result.
Wherein, the operation module 230, comprising:
Scheduling unit 231, for carrying out virtual machine (vm) migration operation or server to data center apparatus according to prediction result Sleep operation or server power-off operation;
Prewarning unit 232, for carrying out early warning instruction according to failure of the prediction result to data center apparatus.
The scheduling unit 231, comprising:
Judgment sub-unit 2311, for judging whether the cpu busy percentage of server is greater than cpu busy percentage upper limit threshold;
First executes subelement 2312, for judging CPU benefit when cpu busy percentage is not more than cpu busy percentage upper limit threshold Whether it is less than cpu busy percentage lower threshold with rate;When cpu busy percentage is less than cpu busy percentage lower threshold, time series is utilized The cpu busy percentage at autoregression AR m moment of model prediction future, when whether the cpu busy percentage for judging m moment is respectively less than When cpu busy percentage lower threshold, virtual machine (vm) migration operation or server sleep operation or server power-off operation are carried out;
Second executes subelement 2313, for utilizing time sequence when cpu busy percentage is greater than cpu busy percentage upper limit threshold The cpu busy percentage at column autoregression AR m moment of model prediction future, when whether the cpu busy percentage for judging m moment is all larger than When cpu busy percentage upper limit threshold, virtual machine (vm) migration operation or server sleep operation or server power-off operation are carried out.
The prewarning unit 232 is specifically used for analyzing operation data according to decision-tree model or Apriori model, The failure risk grade that data center apparatus is determined by decision rule sends early warning to operator according to risk class and refers to Show.
Through the above description of the embodiments, those skilled in the art can be understood that the present invention can be by Software adds the mode of required hardware platform to realize, naturally it is also possible to all implemented by hardware, but in many cases before Person is more preferably embodiment.Based on this understanding, technical solution of the present invention contributes to background technique whole or Person part can be embodied in the form of software products, which can store in storage medium, such as ROM/RAM, magnetic disk, CD etc., including some instructions are used so that a computer equipment (can be personal computer, service Device or the network equipment etc.) execute method described in certain parts of each embodiment of the present invention or embodiment.
The present invention is described in detail above, specific case used herein is to the principle of the present invention and embodiment party Formula is expounded, and the above description of the embodiment is only used to help understand the method for the present invention and its core ideas;Meanwhile it is right In those of ordinary skill in the art, according to the thought of the present invention, change is had in specific embodiments and applications Place, in conclusion the contents of this specification are not to be construed as limiting the invention.

Claims (9)

1. data center's long-distance intelligent operation management method characterized by comprising
Obtain data center's operation data and environmental data, wherein the operation data includes operation of air conditioner data, information technoloy equipment fortune Row data, network management data and UPS data, the environmental data include data center computer room environmental data and weather environmental data;
According to decision rule and prediction model, the operation data and the environmental data are predicted;
Management and fault pre-alarming are scheduled to data center apparatus according to prediction result.
2. data center's long-distance intelligent operation management method according to claim 1, which is characterized in that described according to prediction As a result the step of management and fault pre-alarming being scheduled to data center apparatus, comprising:
Virtual machine (vm) migration operation or server sleep operation are carried out to data center apparatus according to prediction result or server shuts down Operation;Alternatively,
Early warning instruction is carried out according to failure of the prediction result to data center apparatus.
3. data center's long-distance intelligent operation management method according to claim 2, which is characterized in that described according to prediction As a result the step of virtual machine (vm) migration operation or server sleep operation or server power-off operation being carried out to data center apparatus, packet It includes:
Judge whether the cpu busy percentage of server is greater than cpu busy percentage upper limit threshold;
When cpu busy percentage is not more than cpu busy percentage upper limit threshold, judge whether cpu busy percentage is less than cpu busy percentage lower limit threshold Value;When cpu busy percentage is less than cpu busy percentage lower threshold, the time series autoregression AR m moment of model prediction future is utilized Cpu busy percentage carried out virtual when whether the cpu busy percentage for judging m moment is respectively less than cpu busy percentage lower threshold Machine migration operation or server sleep operation or server power-off operation;
When cpu busy percentage is greater than cpu busy percentage upper limit threshold, when using time series autoregression AR model prediction future m The cpu busy percentage at quarter carries out empty when whether the cpu busy percentage for judging m moment is all larger than cpu busy percentage upper limit threshold Quasi- machine migration operation or server sleep operation or server power-off operation.
4. data center's long-distance intelligent operation management method according to claim 3, which is characterized in that described according to prediction As a result the step of early warning instruction being carried out to the failure of data center apparatus, comprising:
Operation data is analyzed according to decision-tree model or Apriori model, determines that data center sets by decision rule Standby failure risk grade sends early warning instruction to operator according to risk class.
5. data center's long-distance intelligent operation management method according to claim 3, which is characterized in that the time series Autoregression AR model can be substituted by support vector machines or neural network model.
6. data center's long-distance intelligent operation management system characterized by comprising
Module is obtained, for obtaining data center's operation data and environmental data, wherein the operation data includes operation of air conditioner Data, information technoloy equipment operation data, network management data and UPS data, the environmental data include data center computer room environmental data and Climatic environment data;
Prediction module, for predicting the operation data and the environmental data according to decision rule and prediction model;
Operation module, for being scheduled management and fault pre-alarming to data center apparatus according to prediction result.
7. data center's long-distance intelligent operation management system according to claim 6, which is characterized in that the operation mould Block, comprising:
Scheduling unit, for carrying out virtual machine (vm) migration operation or server sleep operation to data center apparatus according to prediction result Or server power-off operation;
Prewarning unit, for carrying out early warning instruction according to failure of the prediction result to data center apparatus.
8. data center's long-distance intelligent operation management system according to claim 7, which is characterized in that the scheduling is single Member, comprising:
Judgment sub-unit, for judging whether the cpu busy percentage of server is greater than cpu busy percentage upper limit threshold;
First executes subelement, for whether judging cpu busy percentage when cpu busy percentage is not more than cpu busy percentage upper limit threshold Less than cpu busy percentage lower threshold;When cpu busy percentage is less than cpu busy percentage lower threshold, time series autoregression AR is utilized The cpu busy percentage at m moment of model prediction future, when whether the cpu busy percentage for judging m moment is respectively less than cpu busy percentage When lower threshold, virtual machine (vm) migration operation or server sleep operation or server power-off operation are carried out;
Second executes subelement, for utilizing time series autoregression when cpu busy percentage is greater than cpu busy percentage upper limit threshold The cpu busy percentage at AR m moment of model prediction future is utilized when whether the cpu busy percentage for judging m moment is all larger than CPU When rate upper limit threshold, virtual machine (vm) migration operation or server sleep operation or server power-off operation are carried out.
9. data center's long-distance intelligent operation management system according to claim 8, which is characterized in that the prewarning unit Specifically for being analyzed according to decision-tree model or Apriori model operation data, determined in data by decision rule The failure risk grade of heart equipment sends early warning instruction to operator according to risk class.
CN201811582893.XA 2018-12-24 2018-12-24 Data center's long-distance intelligent operation management method and system Pending CN109784504A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811582893.XA CN109784504A (en) 2018-12-24 2018-12-24 Data center's long-distance intelligent operation management method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811582893.XA CN109784504A (en) 2018-12-24 2018-12-24 Data center's long-distance intelligent operation management method and system

Publications (1)

Publication Number Publication Date
CN109784504A true CN109784504A (en) 2019-05-21

Family

ID=66498339

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811582893.XA Pending CN109784504A (en) 2018-12-24 2018-12-24 Data center's long-distance intelligent operation management method and system

Country Status (1)

Country Link
CN (1) CN109784504A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110705062A (en) * 2019-09-20 2020-01-17 苏州智博汇能电子科技股份有限公司 Cabinet energy consumption remote statistical metering method based on 5G
CN111209179A (en) * 2020-04-23 2020-05-29 成都四方伟业软件股份有限公司 Method, device and system for collecting and analyzing system operation and maintenance data
CN111614504A (en) * 2020-06-02 2020-09-01 国网山西省电力公司电力科学研究院 Power grid regulation and control data center service characteristic fault positioning method and system based on time sequence and fault tree analysis
CN116384979A (en) * 2023-04-27 2023-07-04 圣麦克思智能科技(江苏)有限公司 IDC operation and maintenance service support system and method thereof
CN116468427A (en) * 2023-06-19 2023-07-21 南京祥泰系统科技有限公司 Equipment operation and maintenance intelligent supervision system and method based on big data
CN116594798A (en) * 2023-04-19 2023-08-15 浪潮智慧科技有限公司 Data center maintenance method, equipment and medium based on inspection robot
CN117077594A (en) * 2023-08-22 2023-11-17 合芯科技有限公司 Method, system, computer equipment and medium for monitoring simulation accelerator

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103345298A (en) * 2013-07-16 2013-10-09 山东省计算中心 Data center energy saving system and method based on virtual IT resource distribution technology
CN105323111A (en) * 2015-11-17 2016-02-10 南京南瑞集团公司 Operation and maintenance automation system and method
WO2018137402A1 (en) * 2017-01-26 2018-08-02 华南理工大学 Cloud data centre energy-saving scheduling implementation method based on rolling grey prediction model

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103345298A (en) * 2013-07-16 2013-10-09 山东省计算中心 Data center energy saving system and method based on virtual IT resource distribution technology
CN105323111A (en) * 2015-11-17 2016-02-10 南京南瑞集团公司 Operation and maintenance automation system and method
WO2018137402A1 (en) * 2017-01-26 2018-08-02 华南理工大学 Cloud data centre energy-saving scheduling implementation method based on rolling grey prediction model

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
朱德剑等: "数据中心虚拟机节能管理机制", 《计算机科学》 *
蒋志文: "大数据分析技术在数据中心运维中的应用", 《信息与电脑(理论版)》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110705062A (en) * 2019-09-20 2020-01-17 苏州智博汇能电子科技股份有限公司 Cabinet energy consumption remote statistical metering method based on 5G
CN111209179A (en) * 2020-04-23 2020-05-29 成都四方伟业软件股份有限公司 Method, device and system for collecting and analyzing system operation and maintenance data
CN111614504A (en) * 2020-06-02 2020-09-01 国网山西省电力公司电力科学研究院 Power grid regulation and control data center service characteristic fault positioning method and system based on time sequence and fault tree analysis
CN116594798A (en) * 2023-04-19 2023-08-15 浪潮智慧科技有限公司 Data center maintenance method, equipment and medium based on inspection robot
CN116594798B (en) * 2023-04-19 2024-02-20 浪潮智慧科技有限公司 Data center maintenance method, equipment and medium based on inspection robot
CN116384979A (en) * 2023-04-27 2023-07-04 圣麦克思智能科技(江苏)有限公司 IDC operation and maintenance service support system and method thereof
CN116384979B (en) * 2023-04-27 2023-09-26 圣麦克思智能科技(江苏)有限公司 IDC operation and maintenance service support system and method thereof
CN116468427A (en) * 2023-06-19 2023-07-21 南京祥泰系统科技有限公司 Equipment operation and maintenance intelligent supervision system and method based on big data
CN116468427B (en) * 2023-06-19 2023-08-25 南京祥泰系统科技有限公司 Equipment operation and maintenance intelligent supervision system and method based on big data
CN117077594A (en) * 2023-08-22 2023-11-17 合芯科技有限公司 Method, system, computer equipment and medium for monitoring simulation accelerator

Similar Documents

Publication Publication Date Title
CN109784504A (en) Data center's long-distance intelligent operation management method and system
Zhu et al. A three-dimensional virtual resource scheduling method for energy saving in cloud computing
WO2018137402A1 (en) Cloud data centre energy-saving scheduling implementation method based on rolling grey prediction model
US8762522B2 (en) Coordinating data center compute and thermal load based on environmental data forecasts
Zhang et al. Cool cloud: A practical dynamic virtual machine placement framework for energy aware data centers
US20150271023A1 (en) Cloud estimator tool
CN107645410A (en) A kind of virtual machine management system and method based on OpenStack cloud platforms
CN107203255A (en) Power-economizing method and device are migrated in a kind of network function virtualized environment
CN107220125A (en) A kind of cloud resource dispatching method and device
Monil et al. QoS-aware virtual machine consolidation in cloud datacenter
CN108287749A (en) A kind of data center's total management system cloud resource dispatching method
Ahmad et al. Optimization‐based workload distribution in geographically distributed data centers: A survey
Wang et al. Research on virtual machine consolidation strategy based on combined prediction and energy-aware in cloud computing platform
Dewangan et al. Autonomic cloud resource management
Vorozhtsov et al. Resource control system stability of mobile data centers
Liao et al. Energy consumption optimization scheme of cloud data center based on SDN
Xu et al. Enhancing Kubernetes Automated Scheduling with Deep Learning and Reinforcement Techniques for Large-Scale Cloud Computing Optimization
Witkowski et al. Enabling sustainable clouds via environmentally opportunistic computing
Zhang et al. GreenDRL: managing green datacenters using deep reinforcement learning
CN106201658A (en) A kind of migration virtual machine destination host multiple-objection optimization system of selection
Do Comparison of allocation schemes for virtual machines in energy-aware server farms
Ghoreyshi Energy-efficient resource management of cloud datacenters under fault tolerance constraints
Rasouli et al. Virtual machine placement in cloud systems using learning automata
Chaudhry et al. Considering thermal-aware proactive and reactive scheduling and cooling for green data-centers
CN112654077B (en) Energy-saving method and device, and computer storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190521