CN109981333B - Operation and maintenance method and operation and maintenance equipment applied to data center - Google Patents

Operation and maintenance method and operation and maintenance equipment applied to data center Download PDF

Info

Publication number
CN109981333B
CN109981333B CN201811622320.5A CN201811622320A CN109981333B CN 109981333 B CN109981333 B CN 109981333B CN 201811622320 A CN201811622320 A CN 201811622320A CN 109981333 B CN109981333 B CN 109981333B
Authority
CN
China
Prior art keywords
cloud node
service quality
private cloud
dimensions
quality evaluation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811622320.5A
Other languages
Chinese (zh)
Other versions
CN109981333A (en
Inventor
包塔林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Cloud Computing Technologies Co Ltd
Original Assignee
Huawei Cloud Computing Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Cloud Computing Technologies Co Ltd filed Critical Huawei Cloud Computing Technologies Co Ltd
Priority to CN201811622320.5A priority Critical patent/CN109981333B/en
Publication of CN109981333A publication Critical patent/CN109981333A/en
Priority to PCT/CN2019/129603 priority patent/WO2020135806A1/en
Application granted granted Critical
Publication of CN109981333B publication Critical patent/CN109981333B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/147Network analysis or design for predicting network behaviour
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/50Testing arrangements
    • H04L43/55Testing of service level quality, e.g. simulating service usage
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Quality & Reliability (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the invention provides an operation and maintenance method applied to a hybrid cloud data center. The method comprises the steps that after a private cloud node of a hybrid cloud data center obtains a comprehensive evaluation value according to service quality evaluation values of the private cloud node in N dimensions, a plurality of historical data are sent to a prediction module deployed on a public cloud node of the hybrid cloud data center, wherein each historical data is the comprehensive evaluation value obtained according to the service quality evaluation values of the N dimensions of the private cloud node. And the public cloud node receives the plurality of historical data and predicts the comprehensive evaluation value of the private cloud node according to the plurality of historical data. Compared with the prediction completed on the private cloud node, more historical data of the comprehensive evaluation value can be introduced to carry out large-scale calculation, and the high-efficiency operation and maintenance with higher accuracy and lower time delay are realized.

Description

Operation and maintenance method and operation and maintenance equipment applied to data center
Technical Field
The invention relates to the technical field of information, in particular to an operation and maintenance method and operation and maintenance equipment applied to a data center.
Background
Operation and maintenance refers to the process of managing and maintaining data centers and/or the services of data centers through a series of steps and methods. Services provided by the data center include IT, software, and internet related services, as well as other services. Data centers are typically deployed with operation and maintenance equipment. The operation and maintenance equipment is used for providing operation and maintenance services for the user. The operation and maintenance service comprises operation and maintenance of the data center, for example, real-time monitoring, fault handling, capacity management, application deployment and the like of the data center.
One of the important functions of the operation and maintenance service provided by the operation and maintenance equipment is to monitor the service quality of the data center, and besides, the implementation of many functions of the operation and maintenance equipment also depends on the monitoring of the operation and maintenance equipment on the service quality of the data center. Quality of Service (QoS) characterizes how satisfied a user is with a Service provided by a data center, and is used to measure the Quality of Service of the data center. The quality of service can be measured by the quality of service rating. The operation and maintenance equipment monitors the resources of the data center so as to monitor the service quality evaluation value of the data center. In addition, the operation and maintenance of the data center, for example, the prediction of a data center fault, can be performed by using the predicted value of the service quality evaluation value by predicting the value of the service quality evaluation value according to the monitored dynamic curve of the service quality evaluation value and the change trend of the service quality evaluation value.
In general, predicting a service quality assessment value of a data center requires a large amount of computing resources and storage resources. When the data center is a hybrid cloud data center and is used for predicting the service quality index of the private cloud node of the hybrid cloud data center, the computing resources and the storage resources of the private cloud node are often not enough to support the computation and the storage space required by prediction, so that the operation and maintenance equipment of the node on the private cloud cannot predict the service quality index of the private cloud node.
Disclosure of Invention
In a first aspect, an embodiment of the present invention provides an operation and maintenance method applied to a data center, where the data center includes a private cloud node and a public cloud node. The method comprises the following steps: the public cloud node receives a plurality of historical data sent by the private cloud node, wherein each historical data is a comprehensive evaluation value obtained according to N dimensionalities of service quality evaluation values of the private cloud node, the N dimensionalities of the service quality evaluation values represent the service quality of the private cloud node in the N dimensionalities respectively, and N is an integer not less than 2; the public cloud node predicts the comprehensive evaluation value of the private cloud node according to the plurality of historical data to obtain a predicted value; the public cloud node determines that the predicted value meets an alarm rule; and responding to the determination, and sending an alarm message to the private cloud node by the public cloud node.
In the operation and maintenance method provided by the embodiment of the invention, the private cloud node predicts the comprehensive evaluation value by sending the historical data of the comprehensive evaluation value to the public cloud node and utilizing the computing capacity of the public cloud node, so that the private cloud node is subjected to early warning and operation and maintenance before a fault occurs. Because the public cloud node has stronger computing capacity and storage capacity than the private cloud node, compared with the prediction completed on the private cloud node, the method for predicting the comprehensive evaluation value of the private cloud node by using the common node can introduce more historical data of the comprehensive evaluation value to perform larger-scale computation. Therefore, the prediction accuracy is improved, the calculation speed is higher, and a more efficient and accurate operation and maintenance mode is provided for the data center.
With reference to the first aspect, in a first possible implementation manner of the first aspect, the private cloud node includes a physical device for providing cloud services, and the quality of service of the N dimensions includes a quality of service of the cloud services and a quality of service of the physical device.
A plurality of service quality evaluation values with different dimensionalities are introduced, and the service quality of the private cloud node is inspected or monitored from the dimensionalities of the service provided by the resource, the working state of the resource providing the service and the like, so that the operation and the maintenance of the private cloud node are more accurate, and the service quality of the private cloud node can be more comprehensively reflected.
With reference to the first aspect or the first possible implementation manner of the first aspect, in a second possible implementation manner of the first aspect, the method further includes: the private cloud node obtains first historical data in the plurality of historical data according to the service quality evaluation values of the N dimensions of the private cloud node in a first time period.
The comprehensive evaluation value of the service quality is introduced, comprehensive, visual and comprehensive parameters are given according to the service quality of the private cloud nodes on the basis of the comprehensive multi-dimensional service quality evaluation value, the comprehensive, visual and comprehensive parameters are used for monitoring the service quality of the data center more comprehensively, macroscopically and visually, the complexity is reduced, and the user experience is improved.
With reference to the second possible implementation manner of the first aspect, in a third possible implementation manner of the first aspect, the obtaining, by the private cloud node, the first history data in the plurality of history data according to the N-dimensional service quality evaluation values of the private cloud node in the first time period includes: the private cloud node normalizes the service quality evaluation values of the N dimensions in the first time period; the private cloud node obtains the first history data according to the service quality evaluation values of the N dimensions after normalization and the weight of the service quality evaluation value of each dimension.
With reference to the third implementation manner of the first aspect, in a fourth possible implementation manner of the first aspect, the method further includes: the private cloud node acquires N x (N-1)/2 importance degree parameters of the service quality evaluation values of the N dimensions, wherein each importance degree parameter represents a comparison value of the service quality evaluation values of any two dimensions in the service quality evaluation values of the N dimensions; and the private cloud node acquires the weight of the service quality evaluation value of each dimension according to the N x (N-1)/2 importance degree parameters.
In a second aspect, an embodiment of the present invention provides an operation and maintenance device for operation and maintenance of a data center, where the data center includes a private cloud node and a public cloud node, and the operation and maintenance device includes: a monitoring module deployed on the private cloud node to: monitoring the service quality of N dimensionalities of the private cloud node; obtaining a comprehensive evaluation value according to the service quality evaluation values of the N dimensions, wherein the service quality evaluation values of the N dimensions represent the service quality of the private cloud node in the N dimensions respectively, and N is an integer not less than 2; and sending a plurality of historical data to a prediction module deployed on the public cloud node, wherein each historical data is a comprehensive evaluation value obtained according to the service quality evaluation values of the N dimensions of the private cloud node. The operation and maintenance equipment further comprises: a prediction module deployed on the public cloud node to: receiving a plurality of historical data sent by the monitoring module; predicting the comprehensive evaluation value of the private cloud node according to the plurality of historical data to obtain a predicted value; determining that the predicted value meets an alarm rule; and responding to the determination, and sending an alarm message to the private cloud node.
The detection module on the private cloud node sends the historical data of the comprehensive evaluation value to the prediction module of the public cloud node, and the comprehensive evaluation value is predicted by utilizing the computing capacity of the public cloud node, so that early warning and operation and maintenance before a fault occurs are carried out on the private cloud node. Compared with the private cloud node, the public cloud node has stronger computing capacity and storage capacity, and the comprehensive evaluation value of the private cloud node is predicted by utilizing the common node, so that more historical data of the comprehensive evaluation value can be introduced to perform larger-scale calculation compared with the prediction completed on the private cloud node, and more efficient operation and maintenance with higher accuracy and lower time delay are realized.
With reference to the second aspect, in a first possible implementation manner of the second aspect, the private cloud node includes a physical device for providing cloud services, and the quality of service of the N dimensions includes a quality of service of the cloud services and a quality of service of the physical device.
A plurality of service quality evaluation values with different dimensionalities are introduced, and the service quality of the private cloud node is inspected or monitored from the dimensionalities of the service provided by the resource, the working state of the resource providing the service and the like, so that the operation and the maintenance of the private cloud node are more accurate, and the service quality of the private cloud node can be more comprehensively reflected.
The comprehensive evaluation value of the service quality is introduced, comprehensive, visual and comprehensive parameters are given according to the service quality of the private cloud nodes on the basis of the comprehensive multi-dimensional service quality evaluation value, the comprehensive, visual and comprehensive parameters are used for monitoring the service quality of the data center more comprehensively, macroscopically and visually, the complexity is reduced, and the user experience is improved.
With reference to the first possible implementation manner of the second aspect, in a second possible implementation manner of the second aspect, the obtaining, by the monitoring module, a first historical data of the plurality of historical data according to the N-dimensional service quality evaluation values of the private cloud node in a first time period includes: normalizing the service quality evaluation values of the N dimensions in the first time period; and obtaining the first history data according to the service quality evaluation values of the N normalized dimensions and the weight of the service quality evaluation value of each dimension.
With reference to the second implementation manner of the second aspect, in a third possible implementation manner of the second aspect, the monitoring module is further configured to: acquiring N x (N-1)/2 importance degree parameters of the service quality evaluation values of the N dimensions, wherein each importance degree parameter represents a comparison value of the service quality evaluation values of any two dimensions in the service quality evaluation values of the N dimensions; and acquiring the weight of the service quality evaluation value of each dimension according to the N x (N-1)/2 importance degree parameters.
In a third aspect, an embodiment of the present invention provides a data center, where the data center includes at least one computing device, and the at least one computing device includes a processor and a memory, where the processor executes program instructions in the memory to implement the various methods performed by the public cloud node and the private cloud node in the first aspect.
In a fourth aspect, embodiments of the present invention provide a computer program product and a non-volatile computer-readable storage medium, where the computer program product and the non-volatile computer-readable storage medium contain computer instructions, and a computing device executes the computer instructions to implement various methods in the first aspect of the embodiments of the present invention.
Drawings
Fig. 1 is a schematic diagram of a data center architecture according to an embodiment of the present invention;
fig. 2 is a schematic diagram of a hybrid cloud data center according to an embodiment of the present invention;
fig. 3 is a schematic diagram of operation and maintenance equipment deployment provided in an embodiment of the present invention
FIG. 4 is a diagram illustrating a method for data center operation and maintenance according to an embodiment of the present invention;
fig. 5 is a schematic diagram of a method for obtaining a comprehensive evaluation value according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of an operation and maintenance device according to an embodiment of the present invention;
fig. 7 is a schematic diagram of a computing device in a data center according to an embodiment of the present invention.
Detailed Description
A data center in an embodiment of the invention is shown as data center 100 in fig. 1. The data center 100 includes resources 110, and based on the resources 110, the data center 100 provides services 120. The services 120 are all deployed on the resources 110. The services 120 include operation and maintenance services 121, computing services, storage services, network services, management services, data services, security services, and the like. The operation and maintenance service 121 is used for the operation and maintenance data center 100. The resources 110 include physical resources and/or virtual resources, and the resources 110 include computing resources 111, storage resources 112, network resources 113, operation and maintenance equipment, and the like. The computing resources 111 include computing devices used to provide computing power to the data center 100, including physical computing devices and/or virtual computing devices, e.g., physical servers, or virtual machines or containers running on physical servers. The storage resources 112 include storage devices, including physical storage devices and/or virtual storage devices, such as storage arrays or virtual storage devices, for providing storage capability for the data center 100. Network resources 113 include network devices used to provide storage capabilities for data center 100, including physical network devices and/or virtual network devices, such as switches, routers, virtual switches, virtual routers, and the like. In practice, computing resources 111, storage resources 112, and network resources 113 may be deployed in data center 100. The computing devices, storage devices, and network devices in the computing resources 111, storage resources 112, and network resources 113 may be used to directly provide services to users, may also be used to support or manage services provided to users, and the like.
The data center 100 in which the virtual machine, the virtual storage device, or the virtual network device is deployed is a cloud data center. The cloud data center provides cloud services to users on demand based on the resources 110, and the resources 110 of the cloud data center include physical resources and virtual resources.
The cloud data centers comprise a public cloud data center, a private cloud data center and a hybrid cloud data center.
A public cloud data center is a cloud environment shared for use by several organizations and/or users. In a public cloud data center, the services required by users are provided by an independent, third-party provider, and all users share all resources on the public cloud data center.
A private cloud data center is a data center that is exclusively shared by an organization or user. Public cloud data centers provided by third party providers typically have significant computing and storage capabilities. In a private cloud data center, if the data center is exclusively shared by a certain organization, all resources of the private cloud data center are shared by members of the organization, and users not belonging to the organization cannot access services provided by the data center; if the data center is shared by a certain user, other users cannot access the service provided by the data center. In general, the computing power and the storage capacity of the private cloud data center are weaker than those of public cloud data, but the security of the private cloud data center is higher because the private cloud data center is exclusively shared by an organization or a user.
The hybrid cloud data center integrates the advantages of both public cloud data centers and private cloud data centers. As shown in fig. 2, hybrid cloud data center 200 includes public cloud node 212 and private cloud node 211. Public cloud node 212 and private cloud node 211 each have computing, storage, and network resources. The service 120 of the hybrid cloud data center 200 is deployed based on the public cloud node 212 and the private cloud node 211, and the service 120 includes an operation and maintenance service 121. Public cloud node 212 has powerful computing and storage capabilities, shared by several organizations and/or users of its resources; the resources of the private cloud node 211 are exclusively shared by an organization or a user, thereby providing higher security performance for the organization or the user. Services deployed on public cloud nodes 212 of hybrid cloud data center 200 often require strong computing or storage capabilities, but have relatively low requirements for security performance; the service deployed in the private cloud node 211 has a low requirement on computing capacity or storage capacity, but has a high requirement on security performance.
In the embodiment of the invention, the service quality index of the private cloud node is predicted by utilizing the computing capacity of the public cloud node in the operation and maintenance process of the private node in the hybrid cloud data center.
The embodiment of the invention provides an operation and maintenance method of a data center. The method can be applied to the hybrid cloud data center 200 for providing the operation and maintenance service 121 to the hybrid cloud data center 200. The method may be performed by the operation and maintenance device 300 shown in fig. 3. As shown in fig. 3, the operation and maintenance device 300 is deployed in the hybrid cloud data center 200. Specifically, the operation and maintenance device 300 includes a first operation and maintenance unit 310 and a second operation and maintenance unit 320; the first operation and maintenance unit 310 is deployed in the private cloud node 211 and implemented by computing resources, storage resources and network resources in the private cloud node 211; the second operation and maintenance unit 320 is deployed in the public cloud node 212 and implemented by computing resources, storage resources and network resources in the public cloud node 211. The operation and maintenance method described in the embodiment of the present invention will be described with reference to fig. 3 and 4. As shown in fig. 4, the method includes the following steps.
401, the first operation and maintenance unit 310 of the private cloud node 211 acquires multiple sets of historical data of N-dimensional service quality evaluation values of the private cloud node 211, where the N-dimensional service quality evaluation values respectively represent the service quality of the private cloud node 211 in the N dimensions, N is an integer not less than 2, and each set of historical data includes the N-dimensional service quality evaluation values in a time period.
Exemplarily, part of the service quality evaluations are shown in table 1, which belong to private cloud service quality, server service quality, storage service quality, and network service quality, respectively, different types of service quality evaluations, such as performance, availability, and reliability. Each service quality evaluation represents the service quality of the data center in a corresponding dimension, for example, the service quality evaluation, which is the response time of the private cloud service, is a performance index, and represents the service quality of the private cloud node in the dimension, which is the response speed of the private cloud service to the service request. The N quality of service evaluations in the embodiment of the present invention are not limited to the quality of service indexes shown in table 1.
TABLE 1
Figure GDA0003322887340000051
402, the first operation and maintenance unit 310 of the private cloud node 211 obtains multiple sets of historical data of a composite index value according to multiple sets of historical data, where each set of historical data is a composite evaluation value obtained according to N-dimensional service quality evaluation values of the private cloud node 211.
Specifically, the first operation and maintenance unit 310 of the private cloud node 211 obtains, according to the N-dimensional service quality evaluation values of the private cloud node 211 in a first time period, first historical data in the multiple sets of historical data of the composite index value, where the first historical data is one of the multiple sets of historical data of the composite index value, and the first time period is one of the multiple time periods corresponding to the multiple sets of historical data of the N-dimensional service quality evaluation values.
In general, after a set of history data of the N-dimensional service quality evaluation values for one time slot is obtained, the history data of the comprehensive evaluation value for the time slot is calculated based on the set of history data. An embodiment of the present invention provides a method for obtaining historical data of a comprehensive evaluation value according to a service quality evaluation value in a time period, as shown in fig. 5.
4021, normalizing the service quality evaluation values of N dimensions.
The service quality evaluation values of N dimensions may have different units, for example, the unit of storage mean time between failures and physical server tie time between failures is second, and the unit of storage device availability and physical server availability is second. Before obtaining the comprehensive evaluation value, the service quality evaluation values of the N dimensions need to be normalized to eliminate the unit.
Specifically, the embodiment of the present invention provides a formula for performing normalization processing on N-dimensional service quality evaluation values. Evaluating the service quality x according to the following formulaiNormalization processing is carried out to obtain normalized service quality evaluation value yi
Figure GDA0003322887340000061
Wherein i is an arbitrary integer from 1 to N, xiFor any of the N-dimensional quality of service valuations, yiMin is the smallest service quality evaluation value in the service quality evaluation values of the N dimensions, and max is the largest service quality evaluation value in the service quality evaluation values of the N dimensions.
4022, processing the normalized service quality evaluation values of N dimensions according to the weight of each service quality index by adopting the idea of Multiple Attribute Decision (MADM), and obtaining a comprehensive evaluation value. Service quality evaluation value xiWeight w ofiThe importance of the service quality evaluation value in evaluating the service quality of the data center according to the service quality evaluation values of the N dimensions is represented. Specifically, it is obtained according to the following formulaThe comprehensive evaluation value P:
Figure GDA0003322887340000062
in the case where the weight of each quality of service evaluation is not easily obtained, the embodiment of the present invention provides a method for obtaining the weight of each quality of service evaluation value according to the importance degree parameter.
The importance degree parameter represents a comparison value of any two service quality evaluation values in the service quality evaluation values of the N dimensions. The evaluation values of the service quality indexes of the N dimensions correspond to N x (N-1)/2 importance parameters, the N x (N-1)/2 importance parameters are used as elements of the matrix to construct a judgment matrix A, and then a feature vector W corresponding to the maximum feature root of the matrix A is judged, namely the weight of the evaluation values of the service quality indexes of the N dimensions is represented.
Figure GDA0003322887340000063
Wherein, aijThe values of i and j are integers from 1 to N as the important degree parameter, aijCharacterization of xiCorresponding service quality evaluation value and xjA comparison value of the corresponding quality of service evaluation value. The feature vector W is a 1 × N matrix, and the elements of the feature vector are weights of the N-dimensional qos evaluation values, i.e., W ═ W1,w2,....,wn),wiQuality of service index xiThe weight of (c).
In general, the first operation and maintenance unit of the private cloud node monitors the N-dimensional service quality evaluation values in real time, and calculates a comprehensive evaluation value according to the N-dimensional service quality evaluation values in real time. Since storage resources and computing resources on the private cloud are limited, after the historical data of the comprehensive evaluation value is obtained, the first operation and maintenance unit uploads the historical data of the comprehensive evaluation value to the public cloud node, and the historical data of the comprehensive evaluation value is stored in the public cloud node.
403, the second operation and maintenance unit 320 of the public cloud node 212 acquires a plurality of pieces of historical data of the comprehensive evaluation value sent by the private cloud node.
The public cloud node 212 predicts an operating condition of the service of the private cloud node based on a plurality of history data of the comprehensive evaluation value 404.
Specifically, the second operation and maintenance unit takes a plurality of historical data of the comprehensive evaluation value as a training set to obtain a prediction model of the comprehensive evaluation value. By means of the neural network and the deep learning method, a prediction model of the comprehensive evaluation value can be obtained according to a training set of a plurality of historical data containing the comprehensive evaluation value. Preferably, the training method comprises a Recurrent Neural Network (RNN) training method, in particular a Long Short-Term Memory (LSTM) training method. In addition, any method of deriving a predictive model from a training set may be used in embodiments of the invention.
And the second operation and maintenance unit predicts the comprehensive evaluation value based on the prediction model to obtain a predicted value. The predicted value reflects the operation condition trend of the service of the private cloud node.
405, the second operation and maintenance unit 320 of the public cloud node 212 determines that the predicted value satisfies the alarm rule.
405, in response to the determination, the second operation and maintenance unit 320 of the public cloud node 212 sends the alarm message to the first operation and maintenance unit 310 of the private cloud node 211, so that the first operation and maintenance unit 310 operates and maintains the private cloud node 211 according to the alarm message.
406, the first operation and maintenance unit 310 of the private cloud node 211 performs operation and maintenance, such as fault query, troubleshooting, capacity expansion, and the like, on the private cloud node 211 according to the received alarm message.
By the method, the comprehensive index of the data center can be obtained, so that the visual, comprehensive and quantitative evaluation on the service quality of the data center is realized, the operation and maintenance efficiency of the data center is improved, and the follow-up operation and maintenance operations such as early warning, fault identification and the like are facilitated.
The operation and maintenance device 300 in the embodiment of the present invention includes a first operation and maintenance unit 310 and a second operation and maintenance unit 320. As shown in fig. 6, the data center 200 includes a private cloud node and a public cloud node. The first operation and maintenance unit 310 comprises a monitoring module 311 and a processing module 312; the second operation and maintenance module includes a prediction module 313. The modules on the first operation and maintenance unit 310 are respectively deployed at the private cloud node 211, and the modules on the second operation and maintenance unit 320 are respectively deployed at the public cloud node 313.
A monitoring module 311 configured to: monitoring the quality of service of the N dimensions of the private cloud node 211; obtaining a comprehensive evaluation value according to the N-dimensional service quality evaluation values, where the N-dimensional service quality evaluation values respectively represent the service quality of the private cloud node 211 in the N dimensions, and N is an integer not less than 2; a plurality of historical data, each of which is a composite evaluation value obtained from the N-dimensional service quality evaluation values of the private cloud node 211, is sent to a prediction module 313 deployed on the public cloud node 212.
A prediction module 313 to: receiving a plurality of historical data sent by the monitoring module 311; predicting the comprehensive evaluation value of the private cloud node 211 according to the plurality of historical data to obtain a predicted value; determining that the predicted value meets an alarm rule; in response to the determination, an alert message is sent to the processing module 312 of the private cloud node 211.
The processing module 312 performs operation and maintenance on the private cloud node 211 with the fan alert message.
Optionally, the private cloud node 211 includes a physical device for providing the service 120 as shown in fig. 2, and the quality of service of the N dimensions includes the quality of service of the service 120 and the quality of service of the physical device.
Optionally, the monitoring module 311 is configured to obtain, according to the quality of service evaluation values of the N dimensions of the private cloud node 211 in the first time period, a first historical data in the plurality of historical data, where the obtaining includes: normalizing the service quality evaluation values of the N dimensions in the first time period; and obtaining the first history data according to the service quality evaluation values of the N normalized dimensions and the weight of the service quality evaluation value of each dimension.
Optionally, the monitoring module 311 is further configured to: acquiring N x (N-1)/2 importance degree parameters of the service quality evaluation values of the N dimensions, wherein each importance degree parameter represents a comparison value of the service quality evaluation values of any two dimensions in the service quality evaluation values of the N dimensions; and acquiring the weight of the service quality evaluation value of each dimension according to the N x (N-1)/2 importance degree parameters.
An embodiment of the present application further provides a data center 700 as shown in fig. 7. Data center 700 includes at least one computing device 710 and at least one computing device 720. Data center 700 may be used to implement hybrid cloud data center 200 as shown in fig. 3, where public cloud nodes 212, private cloud nodes 211, and operation and maintenance device 300 in hybrid cloud data center 200 are all deployed on at least one computing device 710 and/or at least one computing device 720. Specifically, the private cloud node 211 is deployed on at least one computing device 710 and the public cloud node 212 is deployed on at least one computing device 720. Correspondingly, the first operation and maintenance unit 310 on the private cloud node 211 is deployed on at least one computing device 710, and the second operation and maintenance unit 320 on the public cloud node 212 is deployed on at least one computing device 720. The computing device 710 may include a processing unit 711 and a communication interface 712, where the processing unit 711 is configured to execute functions defined by an operating system and various software programs running on the computing device, including the functions of the modules in the first operation and maintenance unit 310. The computing device 720 may include a processing unit 721 and a communication interface 722, where the processing unit 721 is configured to execute the functions defined by the operating system and various software programs running on the computing device, including the functions of the modules in the second operation and maintenance unit 320. Communication interface 712 and communication interface 722 are for communicative interaction with other devices, which may be other computing devices, and in particular communication interface 712 and communication interface 722 may be network adapter cards.
Optionally, the computing device 710 may further include an input/output interface 713, and the input/output interface 713 is connected with an input/output device for receiving input information and outputting an operation result. The input/output interface 713 may be a mouse, a keyboard, a display, or an optical drive, among others. Optionally, the computing device 710 may also include a secondary storage 714, also commonly referred to as external memory, the storage medium of the secondary storage 714 may be a magnetic medium (e.g., floppy disks, hard disks, tapes), an optical medium (e.g., compact disks), or a semiconductor medium (e.g., solid state drives), among others. The processing unit 711 may have various specific implementations, for example, the processing unit 711 may include a processor 7112 and a memory 7111, the processor 7112 may execute related operations according to program instructions stored in the memory 7111, the processor 7112 may be a Central Processing Unit (CPU), for example, the processor 7112 may include a CPU0 and a CPU1, or may be a Graphics Processing Unit (GPU), and the processor 7112 may be a single-core processor or a multi-core processor. The processing unit 711 may also be implemented by using a logic device with built-in processing logic, such as a Field Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), or the like. Moreover, computing device 710 in FIG. 7 is merely one example of a computing device, and computing device 710 may contain more or fewer components than shown in FIG. 7, or have a different arrangement of components.
Likewise, computing device 720 may also include input/output interface 723 and secondary memory 724. The processing unit 712 of the computing device 720 may also have various implementations, for example, the processing unit 721 may include a processor 7212 and a memory 7211, the processor 7212 may perform operations related to program instructions stored in the memory 7211, or may be implemented solely using logic devices with built-in processing logic. Computing device 720 may contain more or fewer components than computing device 710, or have a different arrangement of components.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may also be an electric, mechanical or other form of connection.
While the invention has been described with reference to specific embodiments, the invention is not limited thereto, and various equivalent modifications and substitutions can be easily made by those skilled in the art within the technical scope of the invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. An operation and maintenance method applied to a data center, wherein the data center comprises a private cloud node and a public cloud node, and the method comprises the following steps:
the public cloud node receives a plurality of historical data sent by the private cloud node, wherein each historical data represents a comprehensive evaluation value obtained according to N-dimensional service quality evaluation values of the private cloud node in one time period of a plurality of time periods corresponding to the plurality of historical data, the N-dimensional service quality evaluation values respectively represent the service quality of the private cloud node in the N dimensions, and N is an integer not less than 2;
the public cloud node predicts the comprehensive evaluation value of the private cloud node according to the plurality of historical data and a prediction model pre-trained by a training set to obtain a prediction value;
the public cloud node determines that the predicted value meets an alarm rule;
and responding to the determination, and sending an alarm message to the private cloud node by the public cloud node.
2. The method of claim 1, wherein the private cloud node comprises a physical device for providing cloud services, and wherein the quality of service for the N dimensions comprises a quality of service for cloud services and a quality of service for the physical device.
3. The method of claim 1 or 2, further comprising:
and the private cloud node obtains first historical data in the plurality of historical data according to the service quality evaluation values of the N dimensions of the private cloud node in a first time period.
4. The method of claim 3, wherein the obtaining, by the private cloud node, a first historical data of the plurality of historical data according to the N-dimensional service quality assessment values of the private cloud node over a first time period comprises:
the private cloud node normalizes the service quality evaluation values of the N dimensions in the first time period;
and the private cloud node obtains the first historical data according to the service quality evaluation values of the N dimensions after normalization and the weight of the service quality evaluation value of each dimension.
5. The method of claim 3, further comprising:
the private cloud node acquires N x (N-1)/2 importance degree parameters of the service quality evaluation values of the N dimensions, wherein each importance degree parameter represents a comparison value of the service quality evaluation values of any two dimensions in the service quality evaluation values of the N dimensions;
and the private cloud node acquires the weight of the service quality evaluation value of each dimension according to the N x (N-1)/2 importance degree parameters.
6. An operation and maintenance device for operation and maintenance of a data center, wherein the data center comprises a private cloud node and a public cloud node, and the operation and maintenance device comprises:
a monitoring module deployed on the private cloud node to:
monitoring the service quality of N dimensions of the private cloud node;
obtaining a comprehensive evaluation value according to the service quality evaluation values of the N dimensions, wherein the service quality evaluation values of the N dimensions represent the service quality of the private cloud node in the N dimensions respectively, and N is an integer not less than 2;
sending a plurality of historical data to a prediction module deployed on the public cloud node, wherein each historical data represents a comprehensive evaluation value obtained according to the N-dimensional service quality evaluation values of the private cloud node in one of a plurality of time periods corresponding to the plurality of historical data;
the prediction module is configured to:
receiving the plurality of historical data sent by the monitoring module;
predicting the comprehensive evaluation value of the private cloud node according to the plurality of historical data and a prediction model pre-trained by a training set to obtain a predicted value;
determining that the predicted value meets an alarm rule;
and responding to the determination, and sending an alarm message to the private cloud node.
7. The operation and maintenance device according to claim 6, wherein the private cloud node comprises a physical device for providing cloud services, and the quality of service of the N dimensions comprises a quality of service of cloud services and a quality of service of the physical device.
8. The operation and maintenance device according to claim 6 or 7, wherein the monitoring module is configured to:
normalizing the service quality evaluation values of the N dimensions in a first time period;
and obtaining first history data in the plurality of history data according to the service quality evaluation values of the N normalized dimensions and the weight of the service quality evaluation value of each dimension.
9. The operation and maintenance device of claim 8, wherein the monitoring module is further configured to:
acquiring N x (N-1)/2 importance degree parameters of the service quality evaluation values of the N dimensions, wherein each importance degree parameter represents a comparison value of the service quality evaluation values of any two dimensions in the service quality evaluation values of the N dimensions;
and acquiring the weight of the service quality evaluation value of each dimension according to the N x (N-1)/2 importance degree parameters.
10. A datacenter comprising a first computing device and a second computing device, the first computing device comprising a first processor and a first memory, the second computing device comprising a second processor and a second memory, the first processor executing program instructions in the first memory to implement the method performed by the private cloud node of any of claims 1-5, the second processor executing program instructions in the second memory to implement the method performed by the public cloud node of any of claims 1-5.
CN201811622320.5A 2018-12-28 2018-12-28 Operation and maintenance method and operation and maintenance equipment applied to data center Active CN109981333B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201811622320.5A CN109981333B (en) 2018-12-28 2018-12-28 Operation and maintenance method and operation and maintenance equipment applied to data center
PCT/CN2019/129603 WO2020135806A1 (en) 2018-12-28 2019-12-28 Operation maintenance method and equipment applied to data center

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811622320.5A CN109981333B (en) 2018-12-28 2018-12-28 Operation and maintenance method and operation and maintenance equipment applied to data center

Publications (2)

Publication Number Publication Date
CN109981333A CN109981333A (en) 2019-07-05
CN109981333B true CN109981333B (en) 2022-03-25

Family

ID=67076482

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811622320.5A Active CN109981333B (en) 2018-12-28 2018-12-28 Operation and maintenance method and operation and maintenance equipment applied to data center

Country Status (2)

Country Link
CN (1) CN109981333B (en)
WO (1) WO2020135806A1 (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109981333B (en) * 2018-12-28 2022-03-25 华为云计算技术有限公司 Operation and maintenance method and operation and maintenance equipment applied to data center
CN111416735B (en) * 2020-03-02 2021-05-11 河海大学 Federal learning-based safety QoS prediction method under mobile edge environment
CN112118131B (en) * 2020-09-01 2023-07-25 紫光云(南京)数字技术有限公司 Cloud resource management method with high reliability and high speed capacity expansion
CN112561318A (en) * 2020-12-14 2021-03-26 清华大学 Comprehensive evaluation and analysis tool for energy system of data center
CN112667594A (en) * 2021-01-14 2021-04-16 北京智源人工智能研究院 Heterogeneous computing platform based on hybrid cloud resources and model training method
CN112817827A (en) * 2021-01-22 2021-05-18 中国银联股份有限公司 Operation and maintenance method, device, server, equipment, system and medium
CN113590571B (en) * 2021-09-29 2022-01-18 睿至科技集团有限公司 Method and system for sharing private cloud resources and public cloud resources
CN116614431B (en) * 2023-07-19 2023-10-03 中国电信股份有限公司 Data processing method, device, electronic equipment and computer readable storage medium
CN117033880B (en) * 2023-10-10 2024-01-05 北京金信润天信息技术股份有限公司 Automatic operation and maintenance method, device, equipment and storage medium for data center

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107992951A (en) * 2017-12-11 2018-05-04 上海市信息网络有限公司 Capacity alarm method, system, memory and the electronic equipment of cloud management platform

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10225164B2 (en) * 2012-09-07 2019-03-05 Oracle International Corporation System and method for providing a cloud computing environment
US20140351648A1 (en) * 2013-05-24 2014-11-27 Connectloud, Inc. Method and Apparatus for Dynamic Correlation of Large Cloud Firewall Fault Event Stream
CN106293872A (en) * 2016-07-27 2017-01-04 云南电网有限责任公司信息中心 A kind of SLA resources balance management-control method based on resource pool
CN106886469A (en) * 2017-04-10 2017-06-23 深圳第线通信有限公司 A kind of cloud computing disaster tolerance management method
CN107895176B (en) * 2017-11-13 2021-08-24 国网湖南省电力有限公司 Fog calculation system and method for wide-area monitoring and diagnosis of hydroelectric machine group
CN109981333B (en) * 2018-12-28 2022-03-25 华为云计算技术有限公司 Operation and maintenance method and operation and maintenance equipment applied to data center

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107992951A (en) * 2017-12-11 2018-05-04 上海市信息网络有限公司 Capacity alarm method, system, memory and the electronic equipment of cloud management platform

Also Published As

Publication number Publication date
WO2020135806A1 (en) 2020-07-02
CN109981333A (en) 2019-07-05

Similar Documents

Publication Publication Date Title
CN109981333B (en) Operation and maintenance method and operation and maintenance equipment applied to data center
US11119878B2 (en) System to manage economics and operational dynamics of IT systems and infrastructure in a multi-vendor service environment
US10129168B2 (en) Methods and systems providing a scalable process for anomaly identification and information technology infrastructure resource optimization
US8595564B2 (en) Artifact-based software failure detection
US20140372347A1 (en) Methods and systems for identifying action for responding to anomaly in cloud computing system
KR20220114986A (en) Apparatus for VNF Anomaly Detection based on Machine Learning for Virtual Network Management and a method thereof
US11283863B1 (en) Data center management using digital twins
CN107704387B (en) Method, device, electronic equipment and computer readable medium for system early warning
US11416321B2 (en) Component failure prediction
US10771369B2 (en) Analyzing performance and capacity of a complex storage environment for predicting expected incident of resource exhaustion on a data path of interest by analyzing maximum values of resource usage over time
US11636090B2 (en) Method and system for graph-based problem diagnosis and root cause analysis for IT operation
US20180288143A1 (en) Managing idle and active servers in cloud data centers
CN109976971B (en) Hard disk state monitoring method and device
US11449772B2 (en) Predicting operational status of system
US20210021456A1 (en) Bayesian-based event grouping
US20230370486A1 (en) Systems and methods for dynamic vulnerability scoring
US11775654B2 (en) Anomaly detection with impact assessment
JP2024505415A (en) Monitoring the health status of large-scale cloud computing systems
US10805180B2 (en) Enterprise cloud usage and alerting system
US20220121548A1 (en) Determining influence of applications on system performance
CN114095345A (en) Method, device, equipment and storage medium for evaluating health condition of host network
Jha et al. Holistic measurement-driven system assessment
JP7302668B2 (en) Level estimation device, level estimation method, and level estimation program
CN116450465B (en) Data processing method, device, equipment and medium
US20220101068A1 (en) Outlier detection in a deep neural network using t-way feature combinations

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20220225

Address after: 550025 Huawei cloud data center, jiaoxinggong Road, Qianzhong Avenue, Gui'an New District, Guiyang City, Guizhou Province

Applicant after: Huawei Cloud Computing Technology Co.,Ltd.

Address before: 518129 Bantian HUAWEI headquarters office building, Longgang District, Guangdong, Shenzhen

Applicant before: HUAWEI TECHNOLOGIES Co.,Ltd.

GR01 Patent grant
GR01 Patent grant