CN104333459A - Method and device for fault management of cloud data center - Google Patents

Method and device for fault management of cloud data center Download PDF

Info

Publication number
CN104333459A
CN104333459A CN201410363945.XA CN201410363945A CN104333459A CN 104333459 A CN104333459 A CN 104333459A CN 201410363945 A CN201410363945 A CN 201410363945A CN 104333459 A CN104333459 A CN 104333459A
Authority
CN
China
Prior art keywords
fault
organization
administration
message
fault type
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410363945.XA
Other languages
Chinese (zh)
Inventor
陈光新
朱波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Beijing Electronic Information Industry Co Ltd
Original Assignee
Inspur Beijing Electronic Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Beijing Electronic Information Industry Co Ltd filed Critical Inspur Beijing Electronic Information Industry Co Ltd
Priority to CN201410363945.XA priority Critical patent/CN104333459A/en
Publication of CN104333459A publication Critical patent/CN104333459A/en
Pending legal-status Critical Current

Links

Abstract

The invention provides a method and a device for fault management of a cloud data center. The method comprises the following steps that an organization and management server receives fault information submitted by a user terminal and judges whether an organization and management fault type corresponding to the fault information can be determined; if the organization and management fault type corresponding to the fault information can be determined, the organization and management server repairs a fault according to the organization and management fault type; if the organization and management fault type corresponding to the fault information cannot be determined, the organization and management server transmits the fault information to a system management server, the system management server determines a system management fault type according to the fault information, and repairs the fault according to the system management fault type. According to the method and the device, the fault can be timely processed, so that system resources are effectively utilized.

Description

Cloud data center failure management method and device
Technical field
The present invention relates to field of cloud computer technology, particularly relate to a kind of cloud data center failure management method and device.
Background technology
Along with the continuous maturation of cloud computing technology, cloud computing progressively becomes the Hot spots for development of industry.Cloud data center operation system is unique scheme that data center completes conversion from hardware to resource pool, and a large amount of heterogeneous device is fused to the logical resource pond that standard is unified, dynamic dispatching is applied to cloud, the service of complete paired terminal.Meanwhile, cloud data center operation system also carries upper Application of Interface, intermediate function to dispatching hardware management, is the unique link of link hardware and application, is in core status, has decisive influence for the technical indicator such as application system, hardware.
Cloud sea operating system is complete cloud data center solution, all demands of cloud data center are covered with the form of external member, system comprises alternation of bed, platform management layer, resource virtualizing layer three-tier architecture, wherein alternation of bed function is realized by iPortal, be divided into administrator interfaces and user interface, different roles uses unified platform to realize resource service; Platform management layer is dispatched the functional module such as (iCloudManager), resource management (iResourceManager), statistics charging (iCharge), Self-Service (iService) by resource pool and is formed; Resource virtualizing layer is carried by traditional server virtualization software, realizes the virtual of physical resource.
In cloud sea operating system, in use may there are some faults in the resources of virtual machine of virtual machine and network, as network is unavailable, or virtual machine such as cannot to be started shooting at the problem, but due to the restrict access of resource, user cannot solve fault, thus causes system resource effectively not use.
Summary of the invention
In order to solve the problems of the technologies described above, the invention provides a kind of cloud data center failure management method and device, can process in time fault, system resource is effectively used.
In order to reach the object of the invention, the invention provides a kind of cloud data center failure management method, comprising: the fault message that organization and administration server receives user terminal is submitted to, judge whether to determine the organization and administration fault type corresponding with fault message; If can determine the organization and administration fault type corresponding with fault message, then organization and administration server is repaired fault according to organization and administration fault type; If can not determine the organization and administration fault type corresponding with fault message, then fault message is sent to system management server by organization and administration server, system management server according to fault message certainty annuity managing failures type, and is repaired fault according to described system management fault type.
Further, the method also comprises: organization and administration server pre-sets organization and administration fault type collection, and organization and administration server is modified to organization and administration fault type collection according to demand; System management server pre-sets system management fault type collection, and system management server is modified to system management fault type collection according to demand.
Further, the fault message that organization and administration server receives user terminal is submitted to, judge whether to determine the organization and administration fault type corresponding with described fault message, comprise: the fault message that organization and administration server receives user terminal is submitted to, fault message and the organization and administration fault type collection pre-set are compared, judges whether to determine the organization and administration fault type corresponding with fault message; If organization and administration fault type is concentrated there is the organization and administration fault type corresponding with fault message, then judge to determine the organization and administration fault type corresponding with fault message; If organization and administration fault type is concentrated there is not the organization and administration fault type corresponding with fault message, then judge to can not determine the organization and administration fault type corresponding with fault message.
Further, the method also comprises: the state that record organization managing failures is repaired, and the state of organization and administration fault restoration comprises fault message submission, fault message shifts, fault processes, fault has solved or fault is closed; And/or the state that register system managing failures is repaired, the state of system management fault restoration comprises fault message submission, fault processes, fault has solved or fault is closed.
The invention provides a kind of cloud data center Fault Management System, comprising: user terminal, for sending fault message to organization and administration server; Organization and administration server, for receiving the fault message that user terminal is submitted to, judge whether to determine the organization and administration fault type corresponding with fault message, if the organization and administration fault type corresponding with fault message can be determined, then organization and administration server is repaired fault according to organization and administration fault type, if can not determine the organization and administration fault type corresponding with fault message, then fault message is sent to system management server by organization and administration server; System management server, for receiving the fault message from organization and administration server, according to fault message certainty annuity managing failures type, and repairs fault according to system management fault type.
Further, organization and administration server, also for: pre-set organization and administration fault type collection, according to demand organization and administration fault type collection modified; System management server, also for: pre-set system management fault type collection, according to demand system management fault type collection modified.
Further, organization and administration server, specifically for: receive the fault message that user terminal is submitted to, fault message and the organization and administration fault type collection pre-set are compared, judges whether to determine the organization and administration fault type corresponding with fault message; If organization and administration fault type is concentrated there is the organization and administration fault type corresponding with fault message, then judge to determine the organization and administration fault type corresponding with fault message; If organization and administration fault type is concentrated there is not the organization and administration fault type corresponding with fault message, then judge to can not determine the organization and administration fault type corresponding with fault message.
Further, organization and administration server, also for the state that: record organization managing failures is repaired, the state of organization and administration fault restoration comprises that fault message is submitted to, fault message transfer, fault processes, fault has solved or fault is closed; System management server, also for the state that: register system managing failures is repaired, the state of system management fault restoration comprises that fault message is submitted to, fault processes, fault has solved or fault is closed.
Compared with prior art, the present invention includes: the fault message that organization and administration server receives user terminal is submitted to, judge whether to determine the organization and administration fault type corresponding with fault message; If can determine the organization and administration fault type corresponding with fault message, then organization and administration server is repaired fault according to organization and administration fault type; If can not determine the organization and administration fault type corresponding with fault message, then fault message is sent to system management server by organization and administration server, system management server according to fault message certainty annuity managing failures type, and is repaired fault according to described system management fault type.The present invention is after user finds fault, fault message is sent to organization and administration server in time by user terminal, organization and administration server is according to fault message determination fault type, or organization and administration server is transmitted to system management server determination fault type, organization and administration server or system management server are repaired fault according to fault type, can process in time fault thus, system resource is effectively used.
Accompanying drawing explanation
Fig. 1 is the schematic flow sheet of cloud data center of the present invention failure management method.
Fig. 2 is the structural representation of cloud data center of the present invention fault management device.
Embodiment
Describe the present invention below with reference to embodiment shown in the drawings.
In cloud sea operating system, the role of cloud data center is system manager, organization administrator and user respectively.The whole architecture of System Administrator Management, is divided into multiple cloud data center by unified data center resource, is managed by organization administrator.The cloud data center of branch is consigned to different user's requests by organization administrator.System manager, organization administrator and user communicate with user terminal respectively by system management server, organization and administration server.
Fig. 1 is the schematic flow sheet of cloud data center of the present invention failure management method, as shown in Figure 1, comprising:
Step 11, organization and administration server pre-sets organization and administration fault type collection, and system management server pre-sets system management fault type collection;
In this step, the virtual resource used due to organization and administration server and system management server is not identical, so organization and administration server and system management server arrange the fault type collection of oneself respectively, it is unavailable that this fault type collection can comprise network, or virtual machine such as cannot to be started shooting at the fault type.
Organization and administration server and system management server can be modified to the fault type collection arranged according to demand, such as, increase or delete certain fault type.
Step 12, the fault message that organization and administration server receives user terminal is submitted to, judges whether to determine the organization and administration fault type corresponding with this fault message, if can, enter step 13; If can not, enter step 14.
In this step, if user finds fault, fault message can be submitted to organization and administration server by user terminal, fault message and the organization and administration fault type collection pre-set are compared by organization and administration server, judge whether to determine the organization and administration fault type corresponding with this fault message.
Step 13, if can determine the organization and administration fault type corresponding with this fault message, then organization and administration server is repaired fault according to this organization and administration fault type.
In this step, if organization and administration fault type is concentrated there is the organization and administration fault type corresponding with this fault message, then judge to determine the organization and administration fault type corresponding with this fault message.
Organization and administration server is repaired fault according to this organization and administration fault type, and record the state of this organization and administration fault restoration, such as fault message is submitted to, fault message shifts, fault processes, fault has solved or fault is closed, so that user can check the state of troubleshooting to organization and administration server.
Step 14, if can not determine the organization and administration fault type corresponding with this fault message, then fault message is sent to system management server by organization and administration server.
In this step, if organization and administration fault type is concentrated there is not the organization and administration fault type corresponding with this fault message, then judge to can not determine the organization and administration fault type corresponding with this fault message.
Step 15, fault message and the system management fault type collection pre-set are compared by system management server, determine the system management fault type corresponding with this fault message, and repair fault according to this system management fault type.
In this step, system management server is repaired fault according to this system management fault type, and record the state of this system management fault restoration, such as fault message is submitted to, fault processes, fault has solved or fault is closed, so that user can check the state of troubleshooting to system management server.
The present invention is after user finds fault, fault message is sent to organization and administration server in time by user terminal, organization and administration server is according to fault message determination fault type, or organization and administration server is transmitted to system management server determination fault type, organization and administration server or system management server are repaired fault according to fault type, can process in time fault thus, system resource is effectively used.
Fig. 2 is the structural representation of cloud data center of the present invention Fault Management System, as shown in Figure 2, comprising:
User terminal, for sending fault message to organization and administration server;
Organization and administration server, for pre-setting organization and administration fault type collection; Receive the fault message that user terminal is submitted to, judge whether to determine the organization and administration fault type corresponding with this fault message; If the organization and administration fault type corresponding with this fault message can be determined, then according to this organization and administration fault type, fault is repaired; If can not determine the organization and administration fault type corresponding with this fault message, then fault message is sent to system management server;
System management server, for pre-setting system management fault type collection; Receive the fault message from organization and administration server, fault message and the system management fault type collection pre-set are compared, determine the system management fault type corresponding with this fault message, and according to this system management fault type, fault is repaired.
Wherein, organization and administration server and system management server can be modified to the fault type collection arranged according to demand, such as, increase or delete certain fault type.
Wherein, fault message and the organization and administration fault type collection pre-set are compared by organization and administration server, judge whether to determine the organization and administration fault type corresponding with this fault message.If organization and administration fault type is concentrated there is the organization and administration fault type corresponding with this fault message, then judge to determine the organization and administration fault type corresponding with this fault message, organization and administration server is repaired fault according to this organization and administration fault type, and record the state of this fault restoration, such as fault message is submitted to, fault message shifts, fault processes, fault has solved or fault is closed, so that user can check the state of troubleshooting to organization and administration server.If organization and administration fault type is concentrated there is not the organization and administration fault type corresponding with this fault message, then judge to can not determine the organization and administration fault type corresponding with this fault message.
Wherein, system management server is repaired fault according to this system management fault type, and record the state of this fault restoration, such as fault message is submitted to, fault processes, fault has solved or fault is closed, so that user can check the state of troubleshooting to system management server.
The present invention is after user finds fault, fault message is sent to organization and administration server in time by user terminal, organization and administration server is according to fault message determination fault type, or organization and administration server is transmitted to system management server determination fault type, organization and administration server or system management server are repaired fault according to fault type, can process in time fault thus, system resource is effectively used.
Be to be understood that, although this specification is described according to execution mode, but not each execution mode only comprises an independently technical scheme, this narrating mode of specification is only for clarity sake, those skilled in the art should by specification integrally, technical scheme in each execution mode also through appropriately combined, can form other execution modes that it will be appreciated by those skilled in the art that.
A series of detailed description listed is above only illustrating for feasibility execution mode of the present invention; they are not for limiting the scope of the invention, all do not depart from equivalent implementations that skill of the present invention spirit does or change all should be included within protection scope of the present invention.

Claims (8)

1. a Zhong Yun data center failure management method, is characterized in that, comprising:
The fault message that organization and administration server receives user terminal is submitted to, judges whether to determine the organization and administration fault type corresponding with described fault message;
If can determine the organization and administration fault type corresponding with described fault message, then organization and administration server is repaired fault according to described organization and administration fault type;
If can not determine the organization and administration fault type corresponding with described fault message, then fault message is sent to system management server by organization and administration server, system management server according to described fault message certainty annuity managing failures type, and is repaired fault according to described system management fault type.
2. cloud data center according to claim 1 failure management method, it is characterized in that, the method also comprises:
Organization and administration server pre-sets organization and administration fault type collection, and described organization and administration server is modified to organization and administration fault type collection according to demand;
System management server pre-sets system management fault type collection, and described system management server is modified to system management fault type collection according to demand.
3. cloud data center according to claim 2 failure management method, is characterized in that, the fault message that described organization and administration server receives user terminal is submitted to, judges whether to determine the organization and administration fault type corresponding with described fault message, comprising:
The fault message that organization and administration server receives user terminal is submitted to, compares fault message and the organization and administration fault type collection pre-set, judges whether to determine the organization and administration fault type corresponding with described fault message;
If organization and administration fault type is concentrated there is the organization and administration fault type corresponding with described fault message, then judge to determine the organization and administration fault type corresponding with described fault message;
If organization and administration fault type is concentrated there is not the organization and administration fault type corresponding with described fault message, then judge to can not determine the organization and administration fault type corresponding with described fault message.
4. cloud data center according to claim 1 failure management method, it is characterized in that, the method also comprises:
The state that record organization managing failures is repaired, the state of described organization and administration fault restoration comprises fault message submission, fault message shifts, fault processes, fault has solved or fault is closed; And/or,
The state that register system managing failures is repaired, the state of described system management fault restoration comprises fault message submission, fault processes, fault has solved or fault is closed.
5. a Zhong Yun data center Fault Management System, is characterized in that, comprising:
User terminal, for sending fault message to organization and administration server;
Organization and administration server, for receiving the fault message that user terminal is submitted to, judge whether to determine the organization and administration fault type corresponding with described fault message, if the organization and administration fault type corresponding with described fault message can be determined, then organization and administration server is repaired fault according to described organization and administration fault type, if can not determine the organization and administration fault type corresponding with described fault message, then fault message is sent to system management server by organization and administration server;
System management server, for receiving the fault message from organization and administration server, according to described fault message certainty annuity managing failures type, and repairs fault according to described system management fault type.
6. cloud data center according to claim 5 Fault Management System, is characterized in that, described organization and administration server, also for: pre-set organization and administration fault type collection, according to demand organization and administration fault type collection modified;
Described system management server, also for: pre-set system management fault type collection, according to demand system management fault type collection modified.
7. cloud data center according to claim 6 Fault Management System, it is characterized in that, described organization and administration server, specifically for: receive the fault message that user terminal is submitted to, fault message and the organization and administration fault type collection pre-set are compared, judges whether to determine the organization and administration fault type corresponding with described fault message;
If organization and administration fault type is concentrated there is the organization and administration fault type corresponding with described fault message, then judge to determine the organization and administration fault type corresponding with described fault message;
If organization and administration fault type is concentrated there is not the organization and administration fault type corresponding with described fault message, then judge to can not determine the organization and administration fault type corresponding with described fault message.
8. cloud data center according to claim 6 Fault Management System, it is characterized in that, described organization and administration server, also for the state that: record organization managing failures is repaired, the state of described organization and administration fault restoration comprises that fault message is submitted to, fault message transfer, fault processes, fault has solved or fault is closed;
Described system management server, also for the state that: register system managing failures is repaired, the state of described system management fault restoration comprises that fault message is submitted to, fault processes, fault has solved or fault is closed.
CN201410363945.XA 2014-07-28 2014-07-28 Method and device for fault management of cloud data center Pending CN104333459A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410363945.XA CN104333459A (en) 2014-07-28 2014-07-28 Method and device for fault management of cloud data center

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410363945.XA CN104333459A (en) 2014-07-28 2014-07-28 Method and device for fault management of cloud data center

Publications (1)

Publication Number Publication Date
CN104333459A true CN104333459A (en) 2015-02-04

Family

ID=52408118

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410363945.XA Pending CN104333459A (en) 2014-07-28 2014-07-28 Method and device for fault management of cloud data center

Country Status (1)

Country Link
CN (1) CN104333459A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10361919B2 (en) 2015-11-09 2019-07-23 At&T Intellectual Property I, L.P. Self-healing and dynamic optimization of VM server cluster management in multi-cloud platform

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101621404A (en) * 2008-07-05 2010-01-06 中兴通讯股份有限公司 Method and system for layering processing of failure
CN102053873A (en) * 2011-01-13 2011-05-11 浙江大学 Method for ensuring fault isolation of virtual machines of cache-aware multi-core processor
CN103167004A (en) * 2011-12-15 2013-06-19 中国移动通信集团上海有限公司 Cloud platform host system fault correcting method and cloud platform front control server
CN103685463A (en) * 2013-11-08 2014-03-26 浪潮(北京)电子信息产业有限公司 Access control method and system in cloud computing system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101621404A (en) * 2008-07-05 2010-01-06 中兴通讯股份有限公司 Method and system for layering processing of failure
CN102053873A (en) * 2011-01-13 2011-05-11 浙江大学 Method for ensuring fault isolation of virtual machines of cache-aware multi-core processor
CN103167004A (en) * 2011-12-15 2013-06-19 中国移动通信集团上海有限公司 Cloud platform host system fault correcting method and cloud platform front control server
CN103685463A (en) * 2013-11-08 2014-03-26 浪潮(北京)电子信息产业有限公司 Access control method and system in cloud computing system

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10361919B2 (en) 2015-11-09 2019-07-23 At&T Intellectual Property I, L.P. Self-healing and dynamic optimization of VM server cluster management in multi-cloud platform
US10616070B2 (en) 2015-11-09 2020-04-07 At&T Intellectual Property I, L.P. Self-healing and dynamic optimization of VM server cluster management in multi-cloud platform
US11044166B2 (en) 2015-11-09 2021-06-22 At&T Intellectual Property I, L.P. Self-healing and dynamic optimization of VM server cluster management in multi-cloud platform
US11616697B2 (en) 2015-11-09 2023-03-28 At&T Intellectual Property I, L.P. Self-healing and dynamic optimization of VM server cluster management in multi-cloud platform

Similar Documents

Publication Publication Date Title
US9600380B2 (en) Failure recovery system and method of creating the failure recovery system
CN108039964B (en) Fault processing method, device and system based on network function virtualization
CN109815043B (en) Fault processing method, related equipment and computer storage medium
US8626936B2 (en) Protocol independent server replacement and replication in a storage area network
WO2017114325A1 (en) Fault processing method, device and system
CN104899095A (en) Resource adjustment method and system for virtual machine
CN102984214B (en) A kind of method and device realizing business migration in telecom cloud
CN104094230A (en) System and method for supporting live migration of virtual machines in virtualization environment
CN103201724A (en) Providing application high availability in highly-available virtual machine environments
CN104516789A (en) Method and system for failover detection and treatment in checkpoint systems
CN104205060A (en) Providing application based monitoring and recovery for a hypervisor of an ha cluster
CN110912991A (en) Super-fusion-based high-availability implementation method for double nodes
CN103559124B (en) Fast fault detection method and device
CN103516802A (en) Method and device for achieving seamless transference of across heterogeneous virtual switch
CN103118130A (en) Cluster management method and cluster management system for distributed service
CN112948063B (en) Cloud platform creation method and device, cloud platform and cloud platform implementation system
US10353786B2 (en) Virtualization substrate management device, virtualization substrate management system, virtualization substrate management method, and recording medium for recording virtualization substrate management program
CN102708027B (en) A kind of method and system avoiding outage of communication device
CN103823708B (en) The method and apparatus that virtual machine read-write requests are processed
WO2018137520A1 (en) Service recovery method and apparatus
CN102929769A (en) Virtual machine internal-data acquisition method based on agency service
CN104780075A (en) Method for evaluating availability of cloud computing system
CN111679889B (en) Conversion migration method and system of virtual machine
CN105556473A (en) I/O task processing method, device and system
CN112099916B (en) Virtual machine data migration method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20150204

WD01 Invention patent application deemed withdrawn after publication