WO2020135806A1 - 一种应用于数据中心的运维方法和运维设备 - Google Patents

一种应用于数据中心的运维方法和运维设备 Download PDF

Info

Publication number
WO2020135806A1
WO2020135806A1 PCT/CN2019/129603 CN2019129603W WO2020135806A1 WO 2020135806 A1 WO2020135806 A1 WO 2020135806A1 CN 2019129603 W CN2019129603 W CN 2019129603W WO 2020135806 A1 WO2020135806 A1 WO 2020135806A1
Authority
WO
WIPO (PCT)
Prior art keywords
service quality
cloud node
private cloud
quality evaluation
dimensions
Prior art date
Application number
PCT/CN2019/129603
Other languages
English (en)
French (fr)
Inventor
包塔林
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2020135806A1 publication Critical patent/WO2020135806A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/147Network analysis or design for predicting network behaviour
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/50Testing arrangements
    • H04L43/55Testing of service level quality, e.g. simulating service usage
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network

Definitions

  • the invention relates to the field of information technology, in particular to an operation and maintenance method and operation and maintenance equipment applied to a data center.
  • Operation and maintenance refers to the process of managing and maintaining data center and/or data center services through a series of steps and methods.
  • the services provided by the data center include IT, software, and Internet-related services, as well as other services.
  • Data centers are usually equipped with O&M equipment.
  • O&M equipment is used to provide O&M services to users.
  • Operation and maintenance services include operation and maintenance of data centers, for example, real-time monitoring, fault handling, capacity management, and application deployment of data centers.
  • Quality of service (QoS, Quality of Service) characterizes users' satisfaction with the services provided by the data center, and is used to measure the quality of service in the data center.
  • the level of service quality can be measured by the service quality evaluation value.
  • the operation and maintenance equipment monitors the data center service quality evaluation value by monitoring the data center resources.
  • the value of the service quality evaluation value can also be predicted according to the change trend of the service quality evaluation value, thereby using the predicted value of the service quality evaluation value to carry out the data center Operation and maintenance, such as the prediction of data center failures.
  • predicting the evaluation value of the service quality of the data center requires a large amount of computing resources and storage resources.
  • the data center is a hybrid cloud data center and the quality of service indicators of the private cloud nodes of the hybrid cloud data center are predicted, the computing resources and storage resources of the private cloud nodes are often insufficient to support the calculation and storage space required for the prediction, so The operation and maintenance equipment of the nodes on the private cloud cannot predict the service quality indicators of the private cloud nodes.
  • an embodiment of the present invention provides an operation and maintenance method applied to a data center, where the data center includes a private cloud node and a public cloud node.
  • the method includes: the public cloud node receives a plurality of historical data sent by the private cloud node, each historical data is a comprehensive evaluation value obtained according to the N-dimension service quality evaluation value of the private cloud node, wherein the N dimensions
  • the service quality evaluation value of represents the service quality of the private cloud node in the N dimensions, N is an integer not less than 2;
  • the public cloud node predicts the comprehensive evaluation value of the private cloud node based on the multiple historical data, The predicted value is obtained; the public cloud node determines that the predicted value meets the alarm rule; in response to the above determination, the public cloud node sends an alarm message to the private cloud node.
  • the private cloud node uses the computing power of the public cloud node to predict the comprehensive evaluation value by sending the historical data of the comprehensive evaluation value to the public cloud node, thereby performing failure on the private cloud node Early warning and operation and maintenance before the occurrence.
  • public cloud nodes have more powerful computing and storage capabilities than private cloud nodes
  • the embodiment of the present invention uses a common node to predict the comprehensive evaluation value of the private cloud node, compared to the prediction done on the private cloud node , You can introduce more historical data with comprehensive evaluation values for larger-scale calculations. As a result, the accuracy of the prediction is improved, and the calculation speed is faster, providing a more efficient and accurate operation and maintenance method for the data center.
  • the private cloud node includes a physical device for providing cloud services
  • the N-dimensional service quality includes the service quality of the cloud service and the physical device Quality of service.
  • the method further includes: the private cloud node according to the private cloud node in the first time period The service quality evaluation values of the N dimensions within the first historical data among the multiple historical data are obtained.
  • the comprehensive evaluation value of service quality is introduced, which is used to give comprehensive, intuitive and comprehensive parameters for the service quality of private cloud nodes based on the evaluation value of service quality in multiple dimensions. More comprehensive, macro and intuitive monitoring of service quality reduces complexity and improves user experience.
  • the private cloud node is based on the N-dimensional service quality of the private cloud node in the first time period
  • the evaluation value obtains the first historical data in the plurality of historical data, including: the private cloud node normalizes the service quality evaluation values of the N dimensions in the first time period; the private cloud node according to the normalization
  • the first N-dimensional service quality evaluation value and the weight of the service quality evaluation value of each dimension are used to obtain the first historical data.
  • the method further includes: the private cloud node obtains N*(N- 1)/2 importance degree parameters, each of which represents the comparison value of the service quality evaluation values of any two dimensions among the service quality evaluation values of the N dimensions; the private cloud node according to the N*(N- 1)/2 importance degree parameters to obtain the weight of service quality evaluation value of each dimension.
  • an embodiment of the present invention provides an O&M device for O&M of a data center, characterized in that the data center includes a private cloud node and a public cloud node, and the O&M device includes: deployment on the private cloud node
  • the monitoring module of is used to: monitor the N-dimensional service quality of the private cloud node; obtain a comprehensive evaluation value according to the N-dimensional service quality evaluation value, wherein the N-dimensional service quality evaluation values respectively represent the private The quality of service of cloud nodes in the N dimensions, where N is an integer not less than 2; multiple historical data are sent to the prediction module deployed on the public cloud node, where each historical data is based on the private cloud node's N
  • the comprehensive evaluation value obtained from the evaluation value of service quality in each dimension.
  • the operation and maintenance equipment also includes: a prediction module deployed on the public cloud node, for: receiving multiple historical data sent by the monitoring module; predicting the comprehensive evaluation value of the private cloud node based on the multiple historical data, Obtain the predicted value; determine that the predicted value meets the alarm rules; in response to the above determination, send an alarm message to the private cloud node.
  • a prediction module deployed on the public cloud node for: receiving multiple historical data sent by the monitoring module; predicting the comprehensive evaluation value of the private cloud node based on the multiple historical data, Obtain the predicted value; determine that the predicted value meets the alarm rules; in response to the above determination, send an alarm message to the private cloud node.
  • the detection module on the private cloud node sends the historical data of the comprehensive evaluation value to the prediction module of the public cloud node, and utilizes the computing power of the public cloud node to predict the comprehensive evaluation value, thereby pre-warning and pre-failure the private cloud node Operation and maintenance.
  • public cloud nodes have more powerful computing and storage capabilities.
  • the private cloud node includes a physical device for providing cloud services
  • the N-dimensional service quality includes the service quality of the cloud service and the physical device Quality of service.
  • the comprehensive evaluation value of service quality is introduced, which is used to give comprehensive, intuitive and comprehensive parameters for the service quality of private cloud nodes based on the evaluation value of service quality in multiple dimensions. More comprehensive, macro and intuitive monitoring of service quality reduces complexity and improves user experience.
  • the monitoring module is configured to, according to the N dimensions of the private cloud node in the first time period
  • the service quality evaluation value obtains the first historical data in the plurality of historical data, including: normalizing the service quality evaluation values of the N dimensions in the first time period; based on the normalized N dimensions
  • the service quality evaluation value and the weight of the service quality evaluation value of each dimension are used to obtain the first historical data.
  • the monitoring module is further used to: obtain N*(N-1) of the service quality evaluation values of the N dimensions /2 importance degree parameters, and each importance degree parameter represents the comparison value of the service quality evaluation values of any two dimensions among the N dimension service quality evaluation values; according to the N*(N-1)/2 importance values Degree parameter, to obtain the weight of service quality evaluation value of each dimension.
  • an embodiment of the present invention provides a data center, wherein the data center includes at least one computing device, and the at least one computing device includes a processor and a memory, and the processor executes program instructions in the memory to Implement various methods performed by the public cloud node and the private cloud node in the first aspect.
  • embodiments of the present invention provide a computer program product and a non-volatile computer-readable storage medium, where the computer program product and the non-volatile computer-readable storage medium contain computer instructions, and the computing device executes the computer instructions To implement various methods in the first aspect of the embodiments of the present invention.
  • FIG. 1 is a schematic diagram of a data center architecture provided by an embodiment of the present invention.
  • FIG. 2 is a schematic diagram of a hybrid cloud data center provided by an embodiment of the present invention.
  • FIG. 3 is a schematic diagram of an operation and maintenance device deployment provided by an embodiment of the present invention.
  • FIG. 4 is a schematic diagram of a data center operation and maintenance method in an embodiment of the present invention.
  • FIG. 5 is a schematic diagram of a method for obtaining a comprehensive evaluation value in an embodiment of the present invention.
  • FIG. 6 is a schematic diagram of an operation and maintenance device according to an embodiment of the present invention.
  • FIG. 7 is a schematic diagram of a computing device in a data center according to an embodiment of the present invention.
  • the data center in the embodiment of the present invention is shown as the data center 100 in FIG. 1.
  • the data center 100 includes resources 110, and based on the resources 110, the data center 100 provides services 120.
  • the services 120 are all deployed on resources 110.
  • Services 120 include operation and maintenance services 121, computing services, storage services, network services, management services, data services, security services, and so on.
  • the operation and maintenance service 121 is used to operate and maintain the data center 100.
  • the resources 110 include physical resources and/or virtual resources. Specifically, the resources 110 include computing resources 111, storage resources 112, network resources 113, operation and maintenance equipment 140, and the like.
  • the computing resources 111 include computing devices used to provide computing power to the data center 100, including physical computing devices and/or virtual computing devices, for example, physical servers, or virtual machines or containers running on the physical servers.
  • the storage resources 112 include storage devices used to provide storage capabilities for the data center 100, including physical storage devices and/or virtual depositor devices, such as storage arrays or virtual storage devices.
  • the network resource 113 includes network devices for providing storage capacity for the data center 100, including physical network devices and/or virtual network devices, such as switches, routers, virtual switches, virtual routers, and the like. In practical applications, computing resources 111, storage resources 112, and network resources 113 may be deployed in the data center 100.
  • the computing devices, storage devices, and network devices in the computing resources 111, storage resources 112, and network resources 113 may be used to directly provide services to users, and may also be used to support or manage services provided to users.
  • the data center 100 in which virtual machines, virtual storage devices, or virtual network devices are deployed is a cloud data center. Based on the resources 110, the cloud data center provides cloud services to users as needed.
  • the resources 110 of the cloud data center include physical resources and virtual resources.
  • Cloud data centers include public cloud data centers, private cloud data centers, and hybrid cloud data centers.
  • a public cloud data center is a cloud environment shared by several organizations and/or users.
  • the services required by users are provided by an independent, third-party provider, and all users share all resources on this public cloud data center.
  • a private cloud data center is a data center exclusively owned by an organization or user.
  • Public cloud data centers provided by third-party providers usually have powerful computing and storage capabilities.
  • a private cloud data center if the data center is exclusive to an organization, members of the organization share all the resources of the private cloud data center, and users who do not belong to the organization cannot access the services provided by the data center; if the data center is If a user is exclusive, other users cannot access the services provided by this data center.
  • private cloud data centers have weaker computing power and storage capacity than public cloud data. However, private cloud data centers are more exclusive to organizations or users, so private cloud data centers have higher security.
  • Hybrid cloud data centers combine the advantages of both public and private cloud data centers.
  • the hybrid cloud data center 200 includes a public cloud node 212 and a private cloud node 211. Both the public cloud node 212 and the private cloud node 211 have computing resources, storage resources, and network resources.
  • the service 120 of the hybrid cloud data center 200 is deployed based on the public cloud node 212 and the private cloud node 211, and the service 120 includes an operation and maintenance service 121.
  • the public cloud node 212 has powerful computing and storage capabilities and is shared by several organizations and/or users of its resources; the private cloud node 211 resources are exclusively shared by an organization or user, thereby providing higher security for the organization or user performance.
  • the services deployed on the public cloud node 212 of the hybrid cloud data center 200 often require strong computing power or storage capacity, but the requirements for security performance are relatively low; while the services deployed on the private cloud node 211 have computing power or storage capacity The requirements are lower, but the safety performance is higher.
  • the computing power of the public cloud node is used to predict the service quality index of the private cloud node.
  • the embodiment of the invention provides a data center operation and maintenance method.
  • This method can be applied to the hybrid cloud data center 200 to provide the operation and maintenance service 121 for the hybrid cloud data center 200.
  • This method may be performed by the operation and maintenance device 300 shown in FIG. 3.
  • the operation and maintenance equipment 300 is deployed in the hybrid cloud data center 200.
  • the operation and maintenance device 300 includes a first operation and maintenance unit 310 and a second operation and maintenance unit 320; the first operation and maintenance unit 310 is deployed in the private cloud node 211, and the computing resources, storage resources, and network in the private cloud node 211 Resource realization; The second operation and maintenance unit 320 is deployed in the public cloud node 212, and is implemented by the computing resources, storage resources, and network resources in the public cloud node 211.
  • the operation and maintenance method described in the embodiments of the present invention will be described below with reference to FIGS. 3 and 4. As shown in FIG. 4, the method includes the following steps.
  • the first operation and maintenance unit 310 of the private cloud node 211 obtains multiple sets of historical data of N-dimension service quality evaluation values of the private cloud node 211, and the N-dimension service quality evaluation values respectively represent the private cloud node 211 in N
  • the service quality of the dimension, N is an integer not less than 2, and each group of historical data includes service quality evaluation values of N dimensions within a time period.
  • Table 1 shows some service quality evaluations, which are private cloud service service quality, server service quality, storage service quality, and network service quality, respectively, and are different types of service quality evaluations such as performance, availability, and reliability. .
  • Each service quality evaluation represents the service quality of the data center in the corresponding dimension.
  • the service quality evaluation of the private cloud service response time is a performance indicator, which represents the speed of the private cloud node's response to business requests in the private cloud service. Dimension of service quality.
  • the N service quality evaluations in the embodiments of the present invention are not limited to the service quality indicators shown in Table 1.
  • the first operation and maintenance unit 310 of the private cloud node 211 obtains multiple historical data of comprehensive index values based on multiple sets of historical data, and each historical data is synthesized based on the N-dimensional service quality evaluation values of the private cloud node 211 Evaluation value.
  • the first operation and maintenance unit 310 of the private cloud node 211 obtains the first historical data among the multiple historical data of the comprehensive index value according to the N-dimensional service quality evaluation value of the private cloud node 211 in the first time period,
  • the first historical data is one of multiple historical data of comprehensive index values
  • the first time period is one of multiple time periods corresponding to multiple sets of historical data of service quality evaluation values of N dimensions.
  • the historical data of the comprehensive evaluation value of the time period is calculated according to the set of historical data.
  • An embodiment of the present invention provides a historical data method for obtaining a comprehensive evaluation value according to a service quality evaluation value within a period of time, as shown in FIG. 5.
  • the service quality evaluation values of the N dimensions may have different units, for example, the unit of storage mean time between failures and the time between failures of physical server draws are seconds, while the unit of storage device availability and physical server availability is times. Before the comprehensive evaluation value is obtained, the service quality evaluation value of N dimensions needs to be normalized to eliminate its unit.
  • the embodiments of the present invention provide a formula for normalizing the service quality evaluation values of N dimensions.
  • the service quality evaluation value x i is normalized according to the following formula to obtain the normalized service quality evaluation value y i :
  • the value of i is any integer from 1 to N
  • x i is any service quality evaluation value of service quality evaluation values in N dimensions
  • y i is the normalized service quality evaluation value
  • min is N The smallest service quality evaluation value among the service quality evaluation values of each dimension
  • max is the largest service quality evaluation value among the service quality evaluation values of N dimensions.
  • the comprehensive evaluation value P is obtained according to the following formula:
  • the embodiments of the present invention provide a method for obtaining the weight of each service quality evaluation value according to the importance degree parameter.
  • the importance parameter represents the comparison value of any two service quality evaluation values among N dimension service quality evaluation values.
  • the evaluation values of the service quality indicators in N dimensions correspond to N*(N-1)/2 importance degree parameters, and N*(N-1)/2 importance degree parameters are used as the element elements of the matrix to construct the judgment matrix A, then
  • the feature vector W corresponding to the largest feature root of the judgment matrix A represents the weight of the service quality evaluation value of N dimensions.
  • a ij is an importance parameter, and the values of i and j are integers from 1 to N.
  • a ij represents the comparison value of the service quality evaluation value corresponding to x i and the service quality evaluation value corresponding to x j .
  • the first operation and maintenance unit of the private cloud node monitors the N-dimension service quality evaluation values in real time, and calculates the comprehensive evaluation value based on the N-dimension service quality evaluation values in real time. Due to the limited storage resources and computing resources on the private cloud, after obtaining the historical data of the comprehensive evaluation value, the first operation and maintenance unit uploads the historical data of the comprehensive evaluation value to the public cloud node, and stores the history of the comprehensive evaluation value in the public cloud node data.
  • the second operation and maintenance unit 320 of the public cloud node 212 acquires multiple historical data of the comprehensive evaluation value sent by the private cloud node.
  • the public cloud node 212 predicts the running status of the service of the private cloud node based on the multiple historical data of the comprehensive evaluation value.
  • the second operation and maintenance unit uses multiple historical data of the comprehensive evaluation value as a training set to obtain a prediction model of the comprehensive evaluation value.
  • the prediction model of the comprehensive evaluation value can be obtained from the training set of multiple historical data containing the comprehensive evaluation value.
  • the training method includes a Recurrent Neural Network (RNN) training method, especially a Long Short-Term Memory (LSTM) training method.
  • RNN Recurrent Neural Network
  • LSTM Long Short-Term Memory
  • any method for obtaining a prediction model based on the training set can be used in the embodiments of the present invention.
  • the second operation and maintenance unit predicts the comprehensive evaluation value based on the prediction model to obtain the prediction value.
  • the predicted value reflects the trend of service status of private cloud node services.
  • the second operation and maintenance unit 320 of the public cloud node 212 determines that the predicted value meets the alarm rule.
  • the second operation and maintenance unit 320 of the public cloud node 212 sends an alarm message to the first operation and maintenance unit 310 of the private cloud node 211, so that the first operation and maintenance unit 310 responds to the private cloud node 211 according to the alarm message Perform operation and maintenance.
  • the first operation and maintenance unit 310 of the private cloud node 211 performs operation and maintenance on the private cloud node 211 according to the received alarm message, such as fault query, fault removal, and capacity expansion.
  • the operation and maintenance device 300 in the embodiment of the present invention includes a first operation and maintenance unit 310 and a second operation and maintenance unit 320.
  • the data center 200 includes private cloud nodes and public cloud nodes.
  • the first operation and maintenance unit 310 includes a monitoring module 311 and a processing module 312; the second operation and maintenance module includes a prediction module 313.
  • the modules on the first operation and maintenance unit 310 are respectively deployed on the private cloud node 211, and the modules on the second operation and maintenance unit 320 are respectively deployed on the public cloud node 313.
  • the monitoring module 311 is used to: monitor the N-dimension service quality of the private cloud node 211; obtain a comprehensive evaluation value according to the N-dimension service quality evaluation values, wherein the N-dimension service quality evaluation values respectively represent the private cloud
  • multiple historical data are sent to the prediction module 313 deployed on the public cloud node 212, where each historical data is based on the private cloud node 211
  • the comprehensive evaluation value obtained from the service quality evaluation values of N dimensions.
  • the prediction module 313 is used to: receive multiple historical data sent by the monitoring module 311; predict the comprehensive evaluation value of the private cloud node 211 according to the multiple historical data to obtain a predicted value; determine that the predicted value meets the alarm rule; respond to the above It is determined that an alarm message is sent to the processing module 312 of the private cloud node 211.
  • the processing module 312 fan alarm message performs operation and maintenance on the private cloud node 211.
  • the private cloud node 211 includes a physical device for providing the service 120 as shown in FIG. 2, and the N-dimensional service quality includes the service quality of the service 120 and the service quality of the physical device.
  • the monitoring module 311 is configured to obtain the first historical data among the plurality of historical data according to the service quality evaluation values of the N dimensions of the private cloud node 211 in the first time period, including: The service quality evaluation values of the N dimensions within the time period are normalized; the first historical data is obtained according to the normalized service quality evaluation values of the N dimensions and the weight of the service quality evaluation value of each dimension .
  • the monitoring module 311 is further configured to: obtain N*(N-1)/2 importance degree parameters of the service quality evaluation values of the N dimensions, and each importance degree parameter represents the service quality of the N dimensions The comparison value of the service quality evaluation value of any two dimensions in the evaluation value; according to the N*(N-1)/2 importance degree parameters, the weight of the service quality evaluation value of each dimension is obtained.
  • the embodiment of the present application further provides a data center 700 as shown in FIG. 7.
  • the data center 700 includes at least one computing device 710 and at least one computing device 720.
  • the data center 700 may be used to implement the hybrid cloud data center 200 as shown in FIG. 3.
  • the public cloud node 212, the private cloud node 211, and the operation and maintenance equipment 300 in the hybrid cloud data center 200 are all deployed on at least one computing device 710 and/or Or at least one computing device 720.
  • the private cloud node 211 is deployed on at least one computing device 710
  • the public cloud node 212 is deployed on at least one computing device 720.
  • the first operation and maintenance unit 310 on the private cloud node 211 is deployed on at least one computing device 710
  • the second operation and maintenance unit 320 on the public cloud node 212 is deployed on at least one computing device 720.
  • the computing device 710 may include a processing unit 711 and a communication interface 712.
  • the processing unit 711 is used to execute the functions defined by the operating system and various software programs running on the computing device, including the functions of the foregoing modules in the first operation and maintenance unit 310.
  • the computing device 720 may include a processing unit 721 and a communication interface 722.
  • the processing unit 721 is used to execute the functions defined by the operating system running on the computing device and various software programs, including the functions of the foregoing modules in the second operation and maintenance unit 320.
  • the communication interface 712 and the communication interface 722 are used for communication and interaction with other devices.
  • the other devices may be other computing devices.
  • the communication interface 712 and the communication interface 722 may be network adapter cards.
  • the computing device 710 may further include an input/output interface 713, and the input/output interface 713 is connected with an input/output device for receiving input information and outputting operation results.
  • the input/output interface 713 may be a mouse, a keyboard, a display, or an optical drive.
  • the computing device 710 may further include an auxiliary storage 714, which is also generally called external storage.
  • the storage medium of the auxiliary storage 714 may be a magnetic medium (for example, a floppy disk, a hard disk, a magnetic tape), an optical medium (for example, an optical disk), or Semiconductor media (such as solid state drives) and so on.
  • the processing unit 711 may have various specific implementation forms.
  • the processing unit 711 may include a processor 7111 and a memory 7112.
  • the processor 7111 performs related operations according to program instructions stored in the memory 7112.
  • the processor 7111 may be a central processing unit (CPU) ) Or graphics processor (graphics processing unit, GPU), the processor 7111 may be a single-core processor or a multi-core processor.
  • the processing unit 711 can also be implemented by using a logic device with built-in processing logic, such as a field programmable gate array (English full name: Field Programmable Gate Array, abbreviation: FPGA) or a digital signal processor (English: digital signal processor, DSP), etc.
  • the computing device 710 in FIG. 7 is only an example of a computing device.
  • the computing device 710 may include more or fewer components than those shown in FIG. 7, or have different component configurations.
  • the computing device 720 may also include an input/output interface 713.
  • the processing unit 712 of the computing device 720 may also have various specific implementation forms, for example, the processing unit 721 may include a processor 7211 and a memory 7212, and the processor 7211 is based on the memory 722
  • the program instructions stored in perform relevant operations, or are implemented separately using logic devices with built-in processing logic.
  • the computing device 720 may include more or fewer components than the computing device 710, or have a different configuration of components.
  • the disclosed system, device, and method may be implemented in other ways.
  • the device embodiments described above are only schematic.
  • the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented.
  • the displayed or discussed mutual couplings or direct couplings or communication connections may be indirect couplings or communication connections through some interfaces, devices, or units, and may also be electrical, mechanical, or other forms of connection.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Quality & Reliability (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本发明实施例提供一种应用于混合云数据中心的运维方法。混合云数据中心的私有云节点,根据私有云节点在N个维度的服务质量评价值获得综合评价值后,向部署在混合云数据中心的公有云节点上的预测模块发送多个历史数据,其中,每一个历史数据为根据该私有云节点的N个维度的服务质量评价值获得的综合评价值。公有云节点接收多个历史数据,根据多个历史数据对私有云节点的综合评价值进行预测。本方法利用共有节点来预测私有云节点的综合评价值,相较于在私有云节点上完成的预测,可以引入更多的综合评价值的历史数据,进行更大规模的计算,实现了准确度更高、时延更低的更高效的运维。

Description

一种应用于数据中心的运维方法和运维设备 技术领域
本发明涉及信息技术领域,尤其涉及一种应用于数据中心的运维方法和运维设备。
背景技术
运维是指通过一系列步骤和方法,管理与维护数据中心和/或数据中心的服务的过程。数据中心所提供的服务包括IT、软件和互联网相关的服务,也包括其他服务。数据中心通常部署有运维设备。运维设备用于对用户提供运维服务。运维服务包括对数据中心的运维,例如,对数据中心进行实时监控、故障处理、容量管理、应用部署等。
运维设备所提供的运维服务的重要功能之一是对数据中心的服务质量的监测,此外,运维设备的很多功能的实现也依赖于运维设备对数据中心的服务质量的监测。服务质量(QoS,Quality of Service)表征用户对数据中心提供的服务的满意程度,用来衡量数据中心的服务质量。服务质量的高低可由服务质量评价值来衡量。运维设备通过对数据中心的资源的监控进而实现对数据中心的服务质量评价值的监测。除此之外,根据监测得到的服务质量评价值的动态曲线,还可以根据服务质量评价值的变化趋势对服务质量评价值的值进行预测,从而利用服务质量评价值的预测值来进行数据中心的运维,例如数据中心故障的预测。
通常情况下,对数据中心的服务质量评价值进行预测需要大量的计算资源和存储资源。当数据中心是混合云数据中心,对混合云数据中心的私有云节点的服务质量指标进行预测时,私有云节点的计算资源和存储资源往往不足以支撑预测所需的运算量和存储空间,因此私有云上节点的运维设备无法实现对私有云节点的服务质量指标的预测。
发明内容
第一方面,本发明实施例提供一种应用于数据中心的运维方法,该数据中心包括私有云节点和公有云节点。方法包括:该公有云节点接收该私有云节点发送的多个历史数据,每一个历史数据为根据该私有云节点的N个维度的服务质量评价值获得的综合评价值,其中,该N个维度的服务质量评价值分别表征该私有云节点在该N个维度的服务质量,N为不小于2的整数;该公有云节点根据该多个历史数据对该私有云节点的综合评价值进行预测,得到预测值;该公有云节点确定该预测值满足告警规则;响应上述确定,该公有云节点发送告警消息至该私有云节点。
在本发明实施例提供的运维方法中,私有云节点通过将综合评价值的历史数据发送至公有云节点,利用了公有云节点的计算能力来预测综合评价值,从而对私有云节点进行故障发生之前的预警和运维。由于公有云节点比私有云节点具有更强大的计算能力和存储能力,因此,本发明实施例利用共有节点来预测私有云节点的综合评价值的方式,相较于在私有云节点上完成的预测,可以引入更多的综合评价值的历史数据,进行更大规模的计算。从而,提高了预测的准确性,且计算速度更快,为数据中心提供了一种更高效、准确的运维方式。
结合第一方面,在第一方面的第一种可能的实现方式中,该私有云节点包括用于提供云服务的物理设备,该N个维度的服务质量包括云服务的服务质量和该物理设备的服务质量。
引入不同维度的多个服务质量评价值,从资源提供的服务、提供服务的资源的工作状态等多个维度,对私有云节点的服务质量进行考察或者监测,使得对私有云节点的运维更加精确,更能有全面地反应私有云节点的服务质量。
结合第一方面或第一方面的第一种可能的实现方式,在第一方面的第二种可能的实现方式中,该方法还包括:该私有云节点根据该私有云节点在第一时间段内的该N个维度的服务质量评价值得到该多个历史数据中的第一历史数据。
引入了服务质量的综合评价值,用于在综合多个维度的服务质量评价值的基础上,针对私有云节点的服务质量高低给出全面、直观和综合性的参数,用于对数据中心的服务质量的更全面、宏观和直观的监控,降低了复杂度,提升用户体验。
结合第一方面的第二种可能的实现方式,在第一方面的第三种可能的实现方式中,该私有云节点根据该私有云节点在第一时间段内的该N个维度的服务质量评价值得到该多个历史数据中的第一历史数据,包括:该私有云节点对该第一时间段内的该N个维度的服务质量评价值进行归一化;该私有云节点根据归一化后的N个维度的服务质量评价值以及每个维度的服务质量评价值的权重,得到该第一历史数据。
结合第一方面的第三种实现方式,在第一方面的第四种可能的实现方式中,该方法还包括:该私有云节点获取该N个维度的服务质量评价值的N*(N-1)/2个重要程度参数,每个重要程度参数表征该N个维度的服务质量评价值中的任意两个维度的服务质量评价值的比较值;该私有云节点根据该N*(N-1)/2个重要程度参数,获取每个维度的服务质量评价值的权重。
第二方面,本发明实施例提供一种对数据中心运维的运维设备,其特征在于,该数据中心包括私有云节点和公有云节点,该运维设备包括:部署在该私有云节点上的监测模块,用于:监测该私有云节点的N个维度的服务质量;根据该N个维度的服务质量评价值获得综合评价值,其中,该N个维度的服务质量评价值分别表征该私有云节点在该N个维度的服务质量,N为不小于2的整数;向部署在该公有云节点上的预测模块发送多个历史数据,其中,每一个历史数据为根据该私有云节点的N个维度的服务质量评价值获得的综合评价值。该运维设备还包括:部署在该公有云节点上的预测模块,用于:接收该监测模块发送的多个历史数据;根据该多个历史数据对该私有云节点的综合评价值进行预测,得到预测值;确定该预测值满足告警规则;响应上述确定,发送告警消息至该私有云节点。
私有云节点上的检测模块通过将综合评价值的历史数据发送至公有云节点的预测模块,利用了公有云节点的计算能力来预测综合评价值,从而对私有云节点进行故障发生之前的预警和运维。由于相比私有云节点,公有云节点具有更强大的计算能力和存储能力,利用共有节点来预测私有云节点的综合评价值,相较于在私有云节点上完成的预测,可以引入更多的综合评价值的历史数据,进行更大规模的计算,实现了准确度更高、时延更低的更高效的运维。
结合第二方面,在第二方面的第一种可能的实现方式中,该私有云节点包括用于 提供云服务的物理设备,该N个维度的服务质量包括云服务的服务质量和该物理设备的服务质量。
引入不同维度的多个服务质量评价值,从资源提供的服务、提供服务的资源的工作状态等多个维度,对私有云节点的服务质量进行考察或者监测,使得对私有云节点的运维更加精确,更能有全面地反应私有云节点的服务质量。
引入了服务质量的综合评价值,用于在综合多个维度的服务质量评价值的基础上,针对私有云节点的服务质量高低给出全面、直观和综合性的参数,用于对数据中心的服务质量的更全面、宏观和直观的监控,降低了复杂度,提升用户体验。
结合第二方面的第一种可能的实现方式,在第二方面的第二种可能的实现方式中,该监测模块用于,根据该私有云节点在第一时间段内的该N个维度的服务质量评价值得到该多个历史数据中的第一历史数据,包括:对该第一时间段内的该N个维度的服务质量评价值进行归一化;根据归一化后的N个维度的服务质量评价值以及每个维度的服务质量评价值的权重,得到该第一历史数据。
结合第二方面的第二种实现方式,在第二方面的第三种可能的实现方式中,该监测模块还用于:获取该N个维度的服务质量评价值的N*(N-1)/2个重要程度参数,每个重要程度参数表征该N个维度的服务质量评价值中的任意两个维度的服务质量评价值的比较值;根据该N*(N-1)/2个重要程度参数,获取每个维度的服务质量评价值的权重。
第三方面,本发明实施例提供一种数据中心,其特征在于,该数据中心包括至少一个计算设备,该至少一个计算设备包括处理器和存储器,该处理器执行该存储器中的程序指令,以实现第一方面中公有云节点和私有云节点执行的各种方法。
第四方面,本发明实施例提供一种计算机程序产品和非易失性计算机可读存储介质,其中计算机程序产品和非易失性计算机可读存储介质中包含计算机指令,计算设备执行计算机指令用于实现本发明实施例第一方面中的各种方法。
附图说明
图1为本发明实施例提供的一种数据中心架构的示意图;
图2为本发明实施例提供的一种混合云数据中心的示意图;
图3为本发明实施例提供的一种运维设备部署的示意图
图4为本发明实施例中一种数据中心运维的方法的示意图;
图5为本发明实施例中一种获取综合评价值的方法的示意图;
图6为本发明实施例中一种运维设备的示意图;
图7为本发明实施例中一种数据中心中的计算设备的示意图。
具体实施方式
本发明实施例中的数据中心如图1中的数据中心100所示。数据中心100包括资源110,基于资源110,数据中心100提供服务120。服务120均部署在资源110上。服务120包括运维服务121、计算服务、存储服务、网络服务、管理服务、数据服务、安全服务等。运维服务121用于运维数据中心100。资源110包括物理资源和/或虚拟资源,具体滴,资源110包括计算资源111、存储资源112、网络资源113和运维设备140等。计算资源111包括用于为数据中心100提供计算能力的计算设备,包括物理 计算设备和/或虚拟计算设备,例如,物理服务器,或运行在物理服务器上的虚拟机或容器。存储资源112包括用于为数据中心100提供存储能力的存储设备,包括物理存储设备和/或虚拟存户设备,例如存储阵列或者虚拟存储设备。网络资源113包括用于为数据中心100提供存储能力的网络设备,包括物理网络设备和/或虚拟网络设备,例如交换机、路由器、虚拟交换机、虚拟路由器等。实际应用中,计算资源111、存储资源112和网络资源113可以部署在数据中心100中。计算资源111、存储资源112、网络资源113中的计算设备、存储设备和网络设备,可以用于直接为用户提供服务,也可用于支撑或管理提供给用户的服务等。
部署有有虚拟机、虚拟存储设备或者虚拟网络设备的数据中心100为云数据中心。云数据中心基于资源110,按需向用户提供云服务,云数据中心的资源110包括物理资源和虚拟资源。
云数据中心包括公有云数据中心、私有云数据中心和混合云数据中心。
公有云数据中心是由若干组织和/或用户共享使用的云环境。在公有云数据中心中,用户所需的服务由一个独立的、第三方提供商提供,所有用户共享这个公有云数据中心上的所有资源。
私有云数据中心是由某个组织或用户独享的数据中心。由第三方提供商提供的公有云数据中心通常具有强大的计算能力和存储能力。在私有云数据中心中,若数据中心是某个组织独享的,组织的成员共享该私有云数据中心的所有资源,不属于该组织的用户无法访问这个数据中心提供的服务;若数据中心是某个用户独享的,则其他用户无法访问这个数据中心提供的服务。在一般情况下,私有云数据中心的计算能力和存储能力弱于公有云数据,但由于私有云数据中心由组织或用户独享,私有云数据中心的安全性较高。
混合云数据中心综合了公有云数据中心和私有云数据中心二者的优点。如图2所示,混合云数据中心200包括公有云节点212和私有云节点211。公有云节点212及私有云节点211均具有计算资源、存储资源和网络资源。混合云数据中心200的服务120基于公有云节点212和私有云节点211部署,服务120包括运维服务121。公有云节点212具有强大的计算能力和存储能力,由其资源若干组织和/或用户共享;私有云节点211的资源由某个组织或用户独享,从而为该组织或用户提供较高的安全性能。混合云数据中心200的公有云节点212上部署的服务,往往需要强大计算能力或存储能力,但对安全性能的要求相对较低;而部署在私有云节点211的服务,对计算能力或存储能力要求较低,但对安全性能要求较高。
本发明实施例在对混合云数据中心中的私有节点运维的过程中,利用公有云节点的计算能力,对私有云节点的服务质量指标进行预测。
本发明实施例提供了一种数据中心的运维方法。该方法可以应用于混合云数据中心200,用于提供对混合云数据中心200的运维服务121。该方法可以由图3所示的运维设备300来执行。如图3所示,运维设备300部署在混合云数据中心200中。具体地,运维设备300包括第一运维单元310和第二运维单元320;第一运维单元310部署在私有云节点211中,由私有云节点211中的计算资源、存储资源和网络资源实现;第二运维单元320部署在公有云节点212中,由公有云节点211中的计算资源、存储 资源和网络资源实现。下面将结合图3和图4对本发明实施例描述的运维方法进行描述。如图4所示,该方法包括下述步骤。
s401,私有云节点211的第一运维单元310获取私有云节点211的N个维度的服务质量评价值的多组历史数据,N个维度的服务质量评价值分别表征私有云节点211在N个维度的服务质量的高低,N为不小于2的整数,每组历史数据包括一个时间段内的N个维度的服务质量评价值。
示例性地,表1中示出了部分服务质量评价,分别属于私有云服务服务质量、服务器服务质量、存储服务质量、网络服务质量,分别为性能、可用性、可靠性等不同类型的服务质量评价。每个服务质量评价为表征数据中心在对应维度下的服务质量,例如,私有云服务响应时间这一服务质量评价为性能指标,它表征私有云节点在私有云服务对业务请求的响应速度这一维度的服务质量。本发明实施例中的N个服务质量评价不限于表1中所示的服务质量指标。
表1
Figure PCTCN2019129603-appb-000001
Figure PCTCN2019129603-appb-000002
s402,私有云节点211的第一运维单元310根据多组历史数据得到综合指标值的多个历史数据,每一个历史数据为根据私有云节点211的N个维度的服务质量评价值获得的综合评价值。
具体地,私有云节点211的第一运维单元310根据私有云节点211在第一时间段内的N个维度的服务质量评价值得到综合指标值的多个历史数据中的第一历史数据,其中,第一历史数据为综合指标值的多个历史数据中的一个,第一时间段为N个维度的服务质量评价值的多组历史数据对应的多个时间段中的一个。
通常情况下,得到一个时间段的N个维度的服务质量评价值的一组历史数据后,即根据该组历史数据对该时间段的综合评价值的历史数据进行计算。本发明实施例提供一种根据一个时间段内服务质量评价值的得到综合评价值的一个历史数据方法如图5所示。
s4021,对N个维度的服务质量评价值进行归一化。
N个维度的服务质量评价值可能具有不同的单位,例如存储平均故障间隔时间、物理服务器平局故障间隔时间的单位为秒,而存储设备可用性、物理服务器可用性的单位为次。得到综合评价值之前,需要对N个维度的服务质量评价值进行归一化以消除其单位。
具体地,本发明实施例提供一种对N个维度的服务质量评价值进行归一化处理的公式。根据以下公式对服务质量评价值x i进行归一化处理,得到归一化的服务质量评价值y i
Figure PCTCN2019129603-appb-000003
其中,i的取值为1至N的任意整数,x i为N个维度的服务质量评价值中的任一服务质量评价值,y i为归一化后的服务质量评价值,min为N个维度的服务质量评价值中最小的服务质量评价值,max为N个维度的服务质量评价值中最大的服务质量评价值。
s4022,采用多属性决策(Multiple Attribute Decision Making,MADM)的思想,根据每个服务质量指标的权重,处理归一化后的N个维度的服务质量评价值,得到综合评价值。服务质量评价值x i的权重w i表征了该服务质量评价值在根据N个维度的服务质量评价值来评价数据中心的服务质量的高低时的重要性。具体地,根据以下公式获得综合评价值P:
Figure PCTCN2019129603-appb-000004
在不易获得每个服务质量评价的权重的情况下,本发明实施例提供了一种根据重要程度参数获取每个服务质量评价值的权重的方法。
重要程度参数表征N个维度的服务质量评价值中的任意两个服务质量评价值的比 较值。N个维度的服务质量指标评价值对应N*(N-1)/2个重要程度参数,以N*(N-1)/2个重要程度参数为矩阵的元元素,构造判断矩阵A,则判断矩阵A的最大特征根对应的特征向量W即表征N个维度的服务质量评价值的权重。
Figure PCTCN2019129603-appb-000005
其中,a ij为重要程度参数,i、j的取值均为1至N的整数,a ij表征x i对应的服务质量评价值与x j对应的服务质量评价值的比较值。特征向量W为为1*N阶矩阵,特征向量的元即为N个维度的服务质量评价值的权重,即W=(w 1,w 2,....,w n),w i服务质量指标x i的权重。
通常情况下,私有云节点的第一运维单元会实时监控N个维度的服务质量评价值,并实时地根据N个维度的服务质量评价值的值计算综合评价值。由于私有云上的存储资源、计算资源有限,获得综合评价值的历史数据之后,第一运维单元将综合评价值的历史数据上传至公有云节点,在公有云节点中存储综合评价值的历史数据。
s403,公有云节点212的第二运维单元320获取所述私有云节点发送的综合评价值的多个历史数据。
s404,公有云节点212基于综合评价值的多个历史数据对私有云节点的服务的运行状况进行预测。
具体地,第二运维单元以综合评价值的多个历史数据作为训练集,得到综合评价值的预测模型。通过神经网络及深度学习的方法,可根据包含综合评价值的多个历史数据的训练集,得到综合评价值的预测模型。优选地,训练方法包括递归神经网络(Recurrent Neural Network,RNN)训练法,尤其是长短期记忆(Long Short-Term Memory,LSTM)训练法。除此之外,任何根据训练集得到预测模型的方法都可用于本发明实施例。
第二运维单元基于预测模型对综合评价值进行预测,得到预测值。预测值反映了私有云节点的服务的运行状况趋势。
s405,公有云节点212的第二运维单元320确定预测值满足告警规则。
s405,响应上述确定,公有云节点212的第二运维单元320将告警消息发送至私有云节点211的第一运维单元310,以使得第一运维单元310根据告警消息对私有云节点211进行运维。
s406,私有云节点211的第一运维单元310根据接收到的告警消息,对私有云节点211进行运维,如故障查询、故障排除、扩容等。
通过上述方法可以获得数据中心的综合指标,从而实现对数据中心的服务质量直观、全面、定量的评价,提高数据中心运维效率,便于后续的预警、故障识别等运维操作。
本发明实施例中的运维设备300包括第一运维单元310和第二运维单元320。如图6所示,该数据中心200包括私有云节点和公有云节点。第一运维单元310包括监测模块311、处理模块312;第二运维模块包括包括预测模块313。第一运维单元310上的各模块分别部署在私有云节点211,第二运维单元320上的各模块分别部署在公 有云节点313。
监测模块311,用于:监测私有云节点211的N个维度的服务质量;根据该N个维度的服务质量评价值获得综合评价值,其中,该N个维度的服务质量评价值分别表征私有云节点211在该N个维度的服务质量,N为不小于2的整数;向部署在公有云节点212上的预测模块313发送多个历史数据,其中,每一个历史数据为根据私有云节点211的N个维度的服务质量评价值获得的综合评价值。
预测模块313,用于:接收监测模块311发送的多个历史数据;根据该多个历史数据对私有云节点211的综合评价值进行预测,得到预测值;确定该预测值满足告警规则;响应上述确定,发送告警消息至该私有云节点211的处理模块312。
处理模块312歌迷告警消息对私有云节点211进行运维。
可选地,私有云节点211包括用于提供如图2中所示的服务120的物理设备,该N个维度的服务质量包括服务120的服务质量和该物理设备的服务质量。
可选地,监测模块311用于,根据私有云节点211在第一时间段内的该N个维度的服务质量评价值得到该多个历史数据中的第一历史数据,包括:对该第一时间段内的该N个维度的服务质量评价值进行归一化;根据归一化后的N个维度的服务质量评价值以及每个维度的服务质量评价值的权重,得到该第一历史数据。
可选地,该监测模块311还用于:获取该N个维度的服务质量评价值的N*(N-1)/2个重要程度参数,每个重要程度参数表征该N个维度的服务质量评价值中的任意两个维度的服务质量评价值的比较值;根据该N*(N-1)/2个重要程度参数,获取每个维度的服务质量评价值的权重。
本申请实施例还提供一种数据中心700如图7所示。数据中心700包括至少一个计算设备710和至少一个计算设备720。数据中心700可用于实现如图3中所示的混合云数据中心200,混合云数据中心200中的公有云节点212、私有云节点211、运维设备300均部署在至少一个计算设备710和/或至少一个计算设备720上。具体地,私有云节点211部署在至少一个计算设备710上,公有云节点212部署在至少一个计算设备720上。对应地,私云节点211上的第一运维单元310部署在至少一个计算设备710上,公有云节点212上的第二运维单元320部署在至少一个计算设备720上。计算设备710可以包括处理单元711和通信接口712,处理单元711用于执行计算设备上运行的操作系统以及各种软件程序所定义的功能,包括前述第一运维单元310中各模块的功能。计算设备720可以包括处理单元721和通信接口722,处理单元721用于执行计算设备上运行的操作系统以及各种软件程序所定义的功能,包括前述第二运维单元320中各模块的功能。通信接口712以及通信接口722用于与其他设备进行通信交互,其他设备可以是其它计算设备,具体地,通信接口712以及通信接口722可以是网络适配卡。
可选地,计算设备710还可以包括输入/输出接口713,输入/输出接口713连接有输入/输出设备,用于接收输入的信息,输出操作结果。输入/输出接口713可以为鼠标、键盘、显示器、或者光驱等。可选地,该计算设备710还可以包括辅助存储器714,一般也称为外存,辅助存储器714的存储介质可以是磁性介质(例如,软盘、硬盘、磁 带)、光介质(例如光盘)、或者半导体介质(例如固态硬盘)等。处理单元711可以有多种具体实现形式,例如处理单元711可以包括处理器7111和内存7112,处理器7111根据内存7112中存储的程序指令执行相关的操作,处理器7111可以为中央处理器(CPU)或图像处理器(graphics processing unit,GPU),处理器7111可以是单核处理器或多核处理器。处理单元711也可以单独采用内置处理逻辑的逻辑器件来实现,例如现场可编程门阵列(英文全称:Field Programmable Gate Array,缩写:FPGA)或数字信号处理器(英文:digital signal processor,DSP)等。此外,图7中的计算设备710仅仅是一个计算设备的一个例子,计算设备710可能包含相比于图7中展示的更多或者更少的组件,或者有不同的组件配置方式。
同样地,计算设备720也可以包括输入/输出接口713.计算设备720的处理单元712也可以具有多种具体实现形式,例如处理单元721可以包括处理器7211和内存7212,处理器7211根据内存722中存储的程序指令执行相关的操作,或者单独采用内置处理逻辑的逻辑器件来实现。计算设备720可能包含相比于计算设备710的更多或者更少的组件,或者有不同的组件配置方式。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另外,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口、装置或单元的间接耦合或通信连接,也可以是电的,机械的或其它的形式连接。
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到各种等效的修改或替换,这些修改或替换都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应以权利要求的保护范围为准。

Claims (10)

  1. 一种应用于数据中心的运维方法,其特征在于,所述数据中心包括私有云节点和公有云节点,所述方法包括:
    所述公有云节点接收所述私有云节点发送的多个历史数据,每一个历史数据为根据所述私有云节点的N个维度的服务质量评价值获得的综合评价值,其中,所述N个维度的服务质量评价值分别表征所述私有云节点在所述N个维度的服务质量,N为不小于2的整数;
    所述公有云节点根据所述多个历史数据对所述私有云节点的综合评价值进行预测,得到预测值;
    所述公有云节点确定所述预测值满足告警规则;
    响应上述确定,所述公有云节点发送告警消息至所述私有云节点。
  2. 根据权利要求1中所述的方法,其特征在于,所述私有云节点包括用于提供云服务的物理设备,所述N个维度的服务质量包括云服务的服务质量和所述物理设备的服务质量。
  3. 根据权利要求1或2中所述的方法,其特征在于,所述方法还包括:
    所述私有云节点根据所述私有云节点在第一时间段内的所述N个维度的服务质量评价值得到所述多个历史数据中的第一历史数据。
  4. 根据权利要求3中所述的方法,其特征在于,所述私有云节点根据所述私有云节点在第一时间段内的所述N个维度的服务质量评价值得到所述多个历史数据中的第一历史数据,包括:
    所述私有云节点对所述第一时间段内的所述N个维度的服务质量评价值进行归一化;
    所述私有云节点根据归一化后的N个维度的服务质量评价值以及每个维度的服务质量评价值的权重,得到所述第一历史数据。
  5. 根据权利要求3中所述的方法,其特征在于,所述方法还包括:
    所述私有云节点获取所述N个维度的服务质量评价值的N*(N-1)/2个重要程度参数,每个重要程度参数表征所述N个维度的服务质量评价值中的任意两个维度的服务质量评价值的比较值;
    所述私有云节点根据所述N*(N-1)/2个重要程度参数,获取每个维度的服务质量评价值的权重。
  6. 一种对数据中心运维的运维设备,其特征在于,所述数据中心包括私有云节点和公有云节点,所述运维设备包括:
    部署在所述私有云节点上的监测模块,用于:
    监测所述私有云节点的N个维度的服务质量;
    根据所述N个维度的服务质量评价值获得综合评价值,其中,所述N个维度的服务质量评价值分别表征所述私有云节点在所述N个维度的服务质量,N为不小于2的整数;
    向部署在所述公有云节点上的预测模块发送多个历史数据,其中,每一个历史数 据为根据所述私有云节点的N个维度的服务质量评价值获得的综合评价值;
    所述预测模块,用于:
    接收所述监测模块发送的所述多个历史数据;
    根据所述多个历史数据对所述私有云节点的综合评价值进行预测,得到预测值;
    确定所述预测值满足告警规则;
    响应上述确定,发送告警消息至所述私有云节点。
  7. 根据权利要求1中所述的运维设备,其特征在于,所述私有云节点包括用于提供云服务的物理设备,所述N个维度的服务质量包括云服务的服务质量和所述物理设备的服务质量。
  8. 根据权利要求6或7所述的运维设备,其特征在于,所述监测模块用于:
    对第一时间段内的所述N个维度的服务质量评价值进行归一化;
    根据归一化后的N个维度的服务质量评价值以及每个维度的服务质量评价值的权重,得到所述多个历史数据中的第一历史数据。
  9. 根据权利要求8中所述的运维设备,其特征在于,所述监测模块还用于:
    获取所述N个维度的服务质量评价值的N*(N-1)/2个重要程度参数,每个重要程度参数表征所述N个维度的服务质量评价值中的任意两个维度的服务质量评价值的比较值;
    根据所述N*(N-1)/2个重要程度参数,获取每个维度的服务质量评价值的权重。
  10. 一种数据中心,其特征在于,所述数据中心包括第一计算设备和第二计算设备,所述第一计算设备包括第一处理器和第一存储器,所述第二计算设备包括第二处理器和第二存储器,所述第一处理器执行所述第一存储器中的程序指令,以实现权利要求1-5中所述的私有云节点执行的方法,所述第二处理器执行所述第二存储器中的程序指令,以实现权利要求1-5中所述的公有云节点执行的方法。
PCT/CN2019/129603 2018-12-28 2019-12-28 一种应用于数据中心的运维方法和运维设备 WO2020135806A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811622320.5 2018-12-28
CN201811622320.5A CN109981333B (zh) 2018-12-28 2018-12-28 一种应用于数据中心的运维方法和运维设备

Publications (1)

Publication Number Publication Date
WO2020135806A1 true WO2020135806A1 (zh) 2020-07-02

Family

ID=67076482

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/129603 WO2020135806A1 (zh) 2018-12-28 2019-12-28 一种应用于数据中心的运维方法和运维设备

Country Status (2)

Country Link
CN (1) CN109981333B (zh)
WO (1) WO2020135806A1 (zh)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112118131A (zh) * 2020-09-01 2020-12-22 紫光云(南京)数字技术有限公司 一种高可靠快捷扩容的云资源管理方法
CN112561318A (zh) * 2020-12-14 2021-03-26 清华大学 一种数据中心能源系统综合评价分析工具
CN112667594A (zh) * 2021-01-14 2021-04-16 北京智源人工智能研究院 一种基于混合云资源的异构计算平台及模型训练方法
CN116614431A (zh) * 2023-07-19 2023-08-18 中国电信股份有限公司 数据处理方法、装置、电子设备和计算机可读存储介质
CN117033880A (zh) * 2023-10-10 2023-11-10 北京金信润天信息技术股份有限公司 数据中心自动化运维方法、装置、设备及存储介质

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109981333B (zh) * 2018-12-28 2022-03-25 华为云计算技术有限公司 一种应用于数据中心的运维方法和运维设备
CN111416735B (zh) * 2020-03-02 2021-05-11 河海大学 基于联邦学习的移动边缘环境下安全QoS预测方法
CN113590571B (zh) * 2021-09-29 2022-01-18 睿至科技集团有限公司 一种私有云资源和公有云资源的共享方法及系统

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140075021A1 (en) * 2012-09-07 2014-03-13 Oracle International Corporation System and method for providing a cloud computing environment
CN106293872A (zh) * 2016-07-27 2017-01-04 云南电网有限责任公司信息中心 一种基于资源池化的sla资源均衡管控方法
CN107992951A (zh) * 2017-12-11 2018-05-04 上海市信息网络有限公司 云管理平台的容量告警方法、系统、存储器及电子设备
CN109981333A (zh) * 2018-12-28 2019-07-05 华为技术有限公司 一种应用于数据中心的运维方法和运维设备

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140351648A1 (en) * 2013-05-24 2014-11-27 Connectloud, Inc. Method and Apparatus for Dynamic Correlation of Large Cloud Firewall Fault Event Stream
CN106886469A (zh) * 2017-04-10 2017-06-23 深圳第线通信有限公司 一种云计算容灾管理方法
CN107895176B (zh) * 2017-11-13 2021-08-24 国网湖南省电力有限公司 一种面向水电机群广域监测诊断的雾计算系统及方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140075021A1 (en) * 2012-09-07 2014-03-13 Oracle International Corporation System and method for providing a cloud computing environment
CN106293872A (zh) * 2016-07-27 2017-01-04 云南电网有限责任公司信息中心 一种基于资源池化的sla资源均衡管控方法
CN107992951A (zh) * 2017-12-11 2018-05-04 上海市信息网络有限公司 云管理平台的容量告警方法、系统、存储器及电子设备
CN109981333A (zh) * 2018-12-28 2019-07-05 华为技术有限公司 一种应用于数据中心的运维方法和运维设备

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112118131A (zh) * 2020-09-01 2020-12-22 紫光云(南京)数字技术有限公司 一种高可靠快捷扩容的云资源管理方法
CN112561318A (zh) * 2020-12-14 2021-03-26 清华大学 一种数据中心能源系统综合评价分析工具
CN112667594A (zh) * 2021-01-14 2021-04-16 北京智源人工智能研究院 一种基于混合云资源的异构计算平台及模型训练方法
CN116614431A (zh) * 2023-07-19 2023-08-18 中国电信股份有限公司 数据处理方法、装置、电子设备和计算机可读存储介质
CN116614431B (zh) * 2023-07-19 2023-10-03 中国电信股份有限公司 数据处理方法、装置、电子设备和计算机可读存储介质
CN117033880A (zh) * 2023-10-10 2023-11-10 北京金信润天信息技术股份有限公司 数据中心自动化运维方法、装置、设备及存储介质
CN117033880B (zh) * 2023-10-10 2024-01-05 北京金信润天信息技术股份有限公司 数据中心自动化运维方法、装置、设备及存储介质

Also Published As

Publication number Publication date
CN109981333B (zh) 2022-03-25
CN109981333A (zh) 2019-07-05

Similar Documents

Publication Publication Date Title
WO2020135806A1 (zh) 一种应用于数据中心的运维方法和运维设备
US11394628B1 (en) Monitoring and performance improvement of enterprise applications
US20190052575A1 (en) Methods and systems providing a scalable process for anomaly identification and information technology infrastructure resource optimization
US11582130B2 (en) Performance monitoring in a distributed storage system
US10585773B2 (en) System to manage economics and operational dynamics of IT systems and infrastructure in a multi-vendor service environment
US8595564B2 (en) Artifact-based software failure detection
CN107704387B (zh) 用于系统预警的方法、装置、电子设备及计算机可读介质
US11416321B2 (en) Component failure prediction
US20220191226A1 (en) Aggregating results from multiple anomaly detection engines
EP4158480A1 (en) Actionability metric generation for events
JP2022002099A (ja) データをラベリングするための方法、装置、電子機器、記憶媒体およびコンピュータプログラム
WO2020119627A1 (zh) 应用于分布式容器云平台的异常检测与定位方法及装置
US10007583B2 (en) Generating a data structure to maintain error and connection information on components and use the data structure to determine an error correction operation
WO2020206699A1 (en) Predicting virtual machine allocation failures on server node clusters
US11775654B2 (en) Anomaly detection with impact assessment
CN113282417B (zh) 任务分配方法、装置、计算机设备和存储介质
US11915060B2 (en) Graphics processing management system
US20220107858A1 (en) Methods and systems for multi-resource outage detection for a system of networked computing devices and root cause identification
US20200136927A1 (en) Enterprise cloud usage and alerting system
US20230336453A1 (en) Techniques for providing inter-cluster dependencies
US20230315527A1 (en) Robustness Metric for Cloud Providers
WO2021143024A1 (zh) 大数据平台实时监控方法、装置、介质及电子设备
JP2022138807A (ja) 異常判定方法および異常判定プログラム
US20220173994A1 (en) Configuring operational analytics
CN116167519A (zh) 一种监测量预测方法、装置、设备及介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19904888

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19904888

Country of ref document: EP

Kind code of ref document: A1