CN102724057B - A kind of distributed levelization autonomous management method towards cloud computing platform - Google Patents

A kind of distributed levelization autonomous management method towards cloud computing platform Download PDF

Info

Publication number
CN102724057B
CN102724057B CN201210042033.3A CN201210042033A CN102724057B CN 102724057 B CN102724057 B CN 102724057B CN 201210042033 A CN201210042033 A CN 201210042033A CN 102724057 B CN102724057 B CN 102724057B
Authority
CN
China
Prior art keywords
node
management
level
module
strategy
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201210042033.3A
Other languages
Chinese (zh)
Other versions
CN102724057A (en
Inventor
曾宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEJING COMPUTING CENTER
Original Assignee
BEJING COMPUTING CENTER
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEJING COMPUTING CENTER filed Critical BEJING COMPUTING CENTER
Priority to CN201210042033.3A priority Critical patent/CN102724057B/en
Publication of CN102724057A publication Critical patent/CN102724057A/en
Application granted granted Critical
Publication of CN102724057B publication Critical patent/CN102724057B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention provides a kind of distributed levelization autonomous management method towards cloud computing platform, large-scale cloud computing management system is carried out logical partition;Pass through to build multi-level Autonomic Element realization autonomous management inside subregion;On subregion, the higher leveled Autonomic Element of layer building realizes system-level management;This is read from the corresponding rule of Major program from knowledge base, lexical analysis module detects whether to meet rule, triggers response events, then event is submitted to event manager module and cache and dispatch execution during autonomous management.The present invention devises the autonomous management system of high-performance computer using thought of dividing and ruling.Based on the dynamic way to manage in many logical partitions, large scale system is carried out logical partition according to necessarily strategy, realize autonomous management inside each subregion, with the extension of adaptive system scale.Inside each subregion, build multi-level Autonomic Element and be managed, on the upper strata of multiple subregions, build higher leveled Autonomic Element and realize system-level management.Every one-level Autonomic Element supports extensibility, and in newly added equipment or modification characteristic parameter, system is not shut down, and realizes the self-configuring of system.

Description

A kind of distributed levelization autonomous management method towards cloud computing platform
Technical field
The present invention relates to cloud computing platform management domain, specifically, there is provided a kind of distribution towards cloud computing platform Method is managed independently in formula stratification.
Background technology
How autonomous counting system structural research coordinates multiple Autonomic Elements reaches a system-level target jointly, including asking Topic detection, reparation, load management, automatically installation configuration etc..
Membership credentials between the autonomous multiple Autonomic Element of counting system structure primary study, main in terms of existing research It is the combination of level and peering structure including hierarchical structure, peering structure, mixed structure.In hierarchical structure, upper strata is from supervisor Reason person (AM) can be to its lower floor AM transmission control information (CI), and lower floor AM is then to its upper strata AM transmission state information (SI);On Layer AM control system macroscopical autonomous nature, CI type out-degree be zero AM be bottom autonomous management person, realize micromanagement.Example As the two-layer autonomic computation system being optimized based on cybernetics and utility function.In peering structure, the AM participating in cooperation does not exist The transmission of hierarchical relationship, control information and status information is two-way, and the overall autonomous nature of system is typically in individual office " emerge in large numbers " out in portion's interaction, for example, emerge in large numbers the architecture of theory based on self-organizing.In this architecture, the pass of AM System is reciprocity, there is not the AM of management overall situation autonomy behavior, and that is, system macroscopic view autonomous nature is to produce in the Local Interaction of AM Raw.In mixed structure, upper strata AM can be to its lower floor AM transmission control information (CI), and lower floor AM transmits shape to its upper strata AM State information (SI);Macroscopical autonomous nature of upper strata AM control system, lower floor AM then based on upper strata AM provide constraint, by interaction Realize the macroscopic properties of this layer.For example, autonomous system is divided into two layers:Upper strata is resource arbiter, and the resource being responsible for the overall situation is divided Join, realize the maximization of overall effectiveness;Lower floor is application manager, and for given resource, application manager passes through adjustment office Portion's parameter, realizes the maximization of local effectiveness.Application manager is converted into resource arbiter local service level utility function and makes Resource level utility function, resource arbiter obtains the Resource Allocation Formula of the overall situation by the effectiveness of computing system level, and with This is adjusting the behavior of lower floor's application manager.
High-efficiency computer system must be extendible, and expansible inclusion scale (resource) is expansible, the time is expansible (upgrading), performance are expansible, software expandable, and first three items feature and high-performance computer itself are related, and software expandable is not only Business software when running for high-performance computer system, also for high-performance computer management system software.
Content of the invention
For solving disadvantages described above, and cloud computing platform management is made to have extensibility, the invention provides a kind of facing cloud The distributed levelization autonomous management method of calculating platform.
A kind of distributed levelization autonomous management method towards cloud computing platform,
Large-scale cloud computing management system is carried out logical partition;Pass through to build multi-level Autonomic Element inside subregion Realize autonomous management;On subregion, the higher leveled Autonomic Element of layer building realizes system-level management;From knowledge base during autonomous management Read this from the corresponding rule of Major program, lexical analysis module detects whether to meet rule, triggers response events, then by event Submit to event manager module to cache and dispatch execution.
Preferably, described from Major program include divide logical partition plan, election plan and alarm association plan.
Preferably, described Autonomic Element includes base module, monitoring resource module, analysis module, event manager module, Respond module, executed in parallel module and autonomous schedule module;
Described base module passes through knowledge base interactive interfacing with user and provides customizable autonomous rule;
Described monitoring resource module is safeguarded the storage standard of a resource and is accepted from being managed node Autonomic Element Resource information, resource information is stored in data base according to standard directories form and uses for other Autonomic Elements;
Described analysis module is used by respective Major program, judges whether the information of storage in data base meets in knowledge base Rule condition;Produce the event of needs execution when meeting, set Event Priority, and event description is sent to incident management mould Block;
Described event manager module caches the event description that analysis module is stored in, and strategically scheduling determines that caching event is No can execute, if executable, generate the scheduled event of concurrent thread execution, execute response in thread, complete specifically Response;
Described respond module provides the method that response action is registered as predefined response, and manages the mapping table of the two;
Described executed in parallel module is used for script or the rule that execution respond module produces on multiple nodes simultaneously;
Described autonomous schedule module logically controls remaining six module, forms the freedom attributes of a management system.
More preferably, described autonomous rule includes conditional plan and formula rule;
Described conditional plan includes predefined conditional plan and predefined rule of response;
Described predefined conditional plan is expression formula or the expression logic combination of various Resource Properties, described Resource Properties It is stored in Resource Properties catalogue by monitoring resource module management;
Described predefined rule of response is that action is registered generation, the trigger action when condition meets by respond module.
More preferably, described base module can extend online, realizes passive learning;Can also by Dynamical Deployment, Update and deletion rule to be changed and sophisticated systems row with realizing in the case of not changing Software Coding or halt system operation For.
Preferably, described Autonomic Element is divided into parametric degree, component-level, node level, partition level and system-level;Fixed in parametric degree Justice minimum Autonomic Element, and build component-level on the basis of parametric degree, build node level on the basis of component-level, in node level base Partition level, constructing system level on the basis of partition level are built on plinth.
More preferably, described node level element is deployed on server node, and is responsible for all portions in this node level node The management of part, in the management node of each logical partition, deployment partition level Autonomic Element is responsible for partition management work, in highest Layer deployment system level Autonomic Element, is communicated between each Autonomic Element by way of extending ID description standard CIM;Quilt Autonomic Element in management node is responsible for collecting the status information of each resource on this node, is sent to higher level's Autonomic Element, and holds The order that row higher level's Autonomic Element is issued, each basic unit Autonomic Element does not have the overall situation to see and the knowledge base of oneself, completely by upper strata certainly Host element judges whether to meet predefined condition, and executes corresponding response.
More preferably, described logical partition need to calculate the nodes of each subregion in division according to formula, and computing formula is:
Wherein R in formula1、R2、R3、R4、R5For the weight of different resource load value, ∑ Ri=1;During self-adaptative adjustment, newly The computing formula of coefficients R newi is:Li represents the load value of Current resource, if RnewiWith old coefficients R i Compare, exceed the threshold value of reservation, then the coefficient with newly calculating substitutes old coefficient.
More preferably, the process of described election plan is:
The topological structure that is entirely connected is set up between all nodes;
The priority of each node is set;
The node of election highest priority is leader node, and is broadcast to other nodes;
If other nodal test to leader node go wrong when, triggering election;
New leader node is elected according to priority and re-broadcasts.
More preferably, described alarm association plan adopts time and space compression method to exclude invalid alarm.
Preferably, described election is using towards cloud computing election algorithm, described employs towards cloud computing election algorithm The conventional distributed network management mechanism based on agent node group in large-scale distributed network management, if a logical partition Inside being managed node number is n, each node all has a node-agent, this agency has a globally unique identifier, and makees For priori known to the agency of other nodes in this subregion, can be mutual by message between any two agencies in whole subregion Mutually transmit message, as entirely connect topological structure, the set of whole partitioned proxies can use { ID0, ID1, ID2... ... IDN-1Table Show;In each logical partition, setting one leader agency (be managed to the agent node in subregion;Leader node and generation According to the cooperation of centralized management pattern between reason node, that is, leader node instruction agent node is specifically operated or is provided spy Fixed information, agent node returns operating result or the information being required;Then according to certain distributed association between leader node With Pattern completion management role.
Preferably, described cloud computing management system adopts unified monitoring management strategy, in described unified monitoring management strategy Hold as follows:
Policy class:It is divided into some classifications according to global monitoring management strategy, including:Switch, disk array, operation System, tape library, data base, hardware information;
Strategy is abstract:Each level Autonomic Element, from the monitoring management strategy of same type different vendor product, takes out The unified monitoring management strategy form of the type product;
Policy depiction:On the basis of above-mentioned monitoring management policy class, each level Autonomic Element is realized to various species Monitoring management strategy carry out Unify legislation;
Strategy combination:Monitoring management strategy is divided into direct strategy and two kinds of indirect strategies, wherein, direct strategy is permissible Changed by strategy and be directly implemented in concrete equipment or application, and indirect strategies are then by one group of direct strategy or indirect strategies Combine;
Strategy configuration:Realize Unified Policy being converted to the monitoring management strategy processing module of concrete equipment strategy, in addition The equipment supervision realizing again concrete equipment strategy is distributed on equipment or application drives and proxy module.
The present invention devises the autonomous management system of high-performance computer using thought of dividing and ruling.Dynamic based on many logical partitions Way to manage, carries out logical partition large scale system according to necessarily strategy, realizes autonomous management, to adapt to inside each subregion The extension of system scale.Inside each subregion, build multi-level Autonomic Element and be managed, on the upper strata of multiple subregions, build Higher leveled Autonomic Element realizes system-level management.Every one-level Autonomic Element supports extensibility, special in newly added equipment or modification System during parameter of levying is not shut down, and realizes the self-configuring of system.
Brief description
Fig. 1 is present invention autonomous management system framework
Fig. 2 is the logical partition based on stratification Autonomic Element for the present invention
Fig. 3 is fault agency and message transmission Figure of the quantitative relationship
Fig. 4 is overall unified monitoring management strategy
Specific embodiment
Distributed levelization autonomous management system frame structure is illustrated in fig. 1 shown below,
Each assembly function is as follows:
(1) knowledge base:Its purpose of design is to provide customizable autonomous rule by same user mutual.User can pass through The interface of knowledge base, carries out inquiring about to rule, changes, deletes, adding.Rule is divided into two kinds:Conditional plan, formula rule.Rule Description information then must be added show which to belong to oneself from Major program.
Conditional plan includes two parts:Predefined condition, predefined response.Predefined condition is the letter of various Resource Properties Single expression formula or the logical combination of expression formula, such as 80 DEG C of cpu [temperature] >.Resource Properties catalogue is by monitoring resource pipe Reason, the action that predefined response triggers when meeting for condition, such as stop forwarding request etc. to this node.By respond module be responsible for by Action is registered as a predefined response, and safeguards that table is hinted obliquely in response.Selected by user or predefine mode, by predefined bar Part and predefined response associate, and generate a conditional plan.
(2) monitoring resource:This module maintains the storage standard of a resource, i.e. Resource TOC service.It receives and is derived from It is managed the resource information of node Autonomic Element, then this information arrives data base according to standard directories form storage (as CIM standard) In, so that other modules use.
(3) analyze:Each Major program uses its analysis module, judges whether the information of storage in data base meets knowledge base In rule condition.Produce the event of needs execution when meeting, set Event Priority, be sent to event manager module.Due to The different rule formats from Major program is different, and corresponding analysis process is also different, such as judges whether cpu utilization rate reaches threshold value Whether overweight with decision node live load cannot unify.
(4) incident management:The event description that caching analysis module is stored in, according to certain strategy scheduling (as priority) certainly Whether certain caching event fixed can execute.This module generates the scheduled event of concurrent thread execution, executes in these threads Response, completes specifically to respond.
(5) respond:This module provides the method that response action is registered as predefined response, and manages the mapping of the two Table.It additionally provides the method that increase/deletion/modification predefines response.This respond module of thread scheduling that incident management starts, And incoming predefined response is as parameter, in respond module, according to mapping table search this predefined respond corresponding action, can Can be a script it is also possible to another group of rule, and execute this script or rule.
(6) executed in parallel:On multiple nodes, execution respond module produces simultaneously script or rule.
(7) from Major program:This module logically controls above six assemblies, formed a management system from master Property.
The vague generalization step of autonomous management system includes:The corresponding rule of this plan, lexical analysis is read from knowledge base Whether module check meets rule, triggers response events.These events are submitted to event manager module and are cached and dispatch execution.
Autonomous management system includes multiple Autonomic Element levels.Divide from functional perspective, Autonomic Element is divided into parametric degree, portion Part level, node level, partition level, system-level.Define minimum Autonomic Element in parametric degree, and based on construct its upper level unit Element, builds based on other one level below respectively at different levels, by that analogy, until the top system-level overall situation Autonomic Element of construction. On each server node, deployment node level Autonomic Element is responsible for the management of all parts in this node and node, at each In the management node of logical partition, deployment partition level Autonomic Element is responsible for partition management work, autonomous in top deployment system level Element, is communicated between each Autonomic Element, thus constitute being based on by way of reasonable extensions ID description standard CIM The autonomous management system of stratification, realizes high-performance computer overall situation unified resource monitoring and manages.
The Autonomic Element of autonomous management system substantially constitutes and includes knowledge base part, and knowledge base is used for defining control system The rule of behavior, deposits the plan knowledge of relative quiescent, such as correlation rule, network connection static topological etc..By to knowledge base The online extension of part plan knowledge, realizes passive learning function.Because strategy can be with Dynamical Deployment, renewal or deletion, therefore Can change, improve system by the dynamic policing rule that updates on the premise of not changing Software Coding or halt system operation System behavior.The Autonomic Element being managed on node is responsible for collecting the status information of each resource on this node, is sent to higher level autonomous Element, and execute the order of higher level's Autonomic Element issue, each basic unit Autonomic Element does not have the overall situation to see and the knowledge base of oneself, completely Judge whether to meet predefined condition by upper strata Autonomic Element, and execute corresponding response.
In order to ensure the extensibility of management system, suitable logical partition partition strategy is selected to be a key issue. Correct logical partition strategy on the one hand can ensure that management node will not overlond running, another side can also avoid underloading and unrestrained Take management node resource.In logical partition, system initialisation phase needs to select a node as management node.Additionally, In the run duration of system, if management node lost efficacy, need to select another one node adapter management work in this subregion.Right Large scale system carries out logical partitioning operation to be needed to consider many factors:A), subregion internal segment points;B), as management node The I/O ability of disposal ability, communication capacity and external memory;C), the management data volume being produced due to management operation in subregion.Comprehensive Close and state many factors realization, in logical partition, manageable nodes index calculating method is as follows:
In above-mentioned formula, R1, R2, R3, R4, R5 are the weight of different resource load value, wherein ∑ Ri=1.In knowledge base In, there is single weight computing formula to the weight of different resource load value, and self-adaptative adjustment is carried out by Autonomic Element, newly The computing formula of coefficients R newi is as follows:Li represents the load value of certain resource current, and now, Ri meets:If RnewiCompared with old coefficients R i, exceed the threshold value of reservation, then the coefficient with newly calculating substitutes old coefficient, threshold The setting of value can prevent from shaking.In the range of nodes, logical partition is constituted according to policy selection node, such as according to physics Nearby principle, selects to belong to same rack or the continuous node in multiple racks, or divides according to the function of node, such as certain A little nodes specially complete inquiry business, and other node is responsible for high intensity calculating task specially.Based on stratification Autonomic Element Zoning schemes schematic diagram as shown in Figure 2.
Propose a kind of towards cloud computing election algorithm (Cloud Computing based Election Algorithm, hereinafter referred to as CCBE algorithm).This algorithm has higher execution efficiency, and solves the less solution of other algorithms Election Trigger Problems;Situations such as this algorithm can adapt to node failure, link failure and node and changes simultaneously, has certain Fault-tolerant ability and dynamic characteristic.
CCBE algorithm employs the conventional distributed network based on agent node group in large-scale distributed network management Network administrative mechanism [Lee 04].This administrative mechanism thinks:From the point of view of angle of network management, managed networks are by basic by Guan Yuan Element ----node forms.If being managed node number in a logical partition is n, each node all has a node-agent (Agent), this agency has a globally unique identifier (ID1), and as priori by the agency of other nodes in this subregion Known, message can be transmitted mutually by message between any two agencies in whole subregion, as entirely connect topological structure, entirely The set of partitioned proxies can use { ID0, ID1, ID2... ... IDN-1Represent.In each logical partition, by a special generation Reason node ----leader agency (Leader Agent) is managed to the agent node in subregion.Leader node and agent node Between according to centralized management pattern cooperate [MZH99], that is, leader node instruction agent node specifically operated or provided Specific information, agent node returns operating result or the information being required;Then according to certain distributed between leader node Cooperative Mode completes management role.
CCBE algorithm is divided into multiple stages, and assumes that node messages transmission and response time are known:First stage, base In subregion internal segment points, the I/O ability generation subregion of the disposal ability of node, communication capacity and external memory, agent node is excellent First level list, and elect subregion medium priority highest node as leader node, it is broadcast to all agencies, select for the first time Act terminates;Second stage, if any Agent ID in subregion1Leader node is detected by timeout mechanism to be out of order, then it Triggering election.According to agent node priority list, Agent ID1The Agent advertisement election message that it is high to all priority ratios, And wait the answer of any one other agency, without receiving any response then it is assumed that all priority ratio ID1High generation Reason is all out of order, then arrange ID1For leader, and update priority agent list and be broadcast to other agencies;If on rule Receive one or more responses in fixing time, then priority list is acted on behalf of according to the priority update of response source agency, and set Put highest priority for leader node;When priority is higher than ID1Agency receive ID1Election message when, it is to ID1Make Response simultaneously sends an election algorithm electing message initiated its own by the agency higher to all priority.If this Process oneself has highest priority, and it just can announce at once oneself is leader, and updates priority list.Repeat Second stage.
Our autonomous management system adopts time and space compression method to exclude invalid alarm.Press between when employed During contracting, need to study effective time window.Otherwise excessive time window can introduce invalid warning information, interference alarm point The accuracy of analysis result.Too small time window can miss effective warning information, causes analysis result unreliable.Space compression is examined Consider multiple filtering rules, including network topology, service logic be topological, parts (node level, device level etc.) associations at different levels.
Monitoring management system each subregion Autonomic Element realizes alarm association reasoning based on Drools, programmed using statement formula, Logical AND data separating, data is saved in system object, and logic is saved in rule, based on Rete algorithm, Leaps algorithm, There is provided to system data object efficient coupling, knowledge centralization, (domain defines language by setting up object model and DSL Speech), can be with natural language come redaction rule, explanation facility.
Unified monitoring management strategy is Unify legislation and the enforcement of monitoring management strategy.Unified Policy description and enforcement meaning It is:Concrete monitoring management strategy configuration detail and difference that shielding to various managed device and is applied, and with unified monitoring pipe The tactful configuration interface of reason presents to user, allows users to intently by the high-efficiency computer monitoring management of natural language description Policy mappings become the strategy that machine can be implemented.Cloudview realizes unified monitoring management strategy mechanism according to below step.
Policy class:It is divided into some classifications according to global monitoring management strategy, including:Switch, disk array, operation System, tape library, data base, hardware information etc..
Strategy is abstract:Each level Autonomic Element, from the monitoring management strategy of same type different vendor product, takes out The unified monitoring management strategy form of the type product.Such as:The manufacturer producing disk array is a lot, and each manufacturer has oneself Different privately owned disk array MIB storehouses, the monitoring management strategy based on a kind of consolidation form is it is simply that disk array unified monitoring pipe Reason strategy.
Policy depiction:On the basis of above-mentioned monitoring management policy class, each level Autonomic Element is realized to various species Monitoring management strategy carry out Unify legislation.
Strategy combination:Monitoring management strategy is divided into direct strategy and two kinds of indirect strategies, wherein, direct strategy is permissible Changed by strategy and be directly implemented in concrete equipment or application, that is, the strategy corresponding to given strategy said before is taken out As, and indirect strategies are then combined by one group of direct strategy or indirect strategies.Introducing one benefit of indirect strategies is can be square Just our services complete to one is directly managed.Management for a service often relates to plurality of devices and application, We only need to provide an abstract indirect strategies description corresponding to this service, and then by policy library, (policy library removes each monitoring Also can find show that service dependence assists to generate according to services topology outside management original configuration policy library, and by specific shape Operation formation rule under condition, is stored in knowledge base, reaches the purpose optimizing autonomous rule.) it is mapped as directly step by step Strategy, then passes through strategy conversion and is implemented in concrete equipment or application.
Strategy configuration:Realize Unified Policy being converted to the monitoring management strategy processing module of concrete equipment strategy, in addition The equipment supervision realizing again concrete equipment strategy is distributed on equipment or application drives and proxy module.
Unified configuration to all kinds of strategies, i.e. policy depiction are completed by each " the tactful configuration interface " of MC.By in MC Corresponding each " policy enforcement module " completes to tactful conversion, is converted into the strategy configuration that can be implemented on concrete equipment Information, and be distributed on each Managed Object by DHC.So, user can be according to the monitoring management plan of natural language description Slightly, uniformly configure and implement overall unified monitoring management strategy, its mechanism is as shown in Figure 4.

Claims (6)

1. a kind of towards cloud computing platform distributed levelization autonomous management method it is characterised in that:
Cloud computing management system is carried out logical partition;Pass through to build multi-level Autonomic Element realization from supervisor inside subregion Reason;On subregion, the higher leveled Autonomic Element of layer building realizes system-level management;This is read from knowledge base autonomous during autonomous management Plan corresponding rule, lexical analysis module detects whether to meet rule, triggers response events, then event is submitted to event Management module caches and dispatches execution;
Described from Major program include divide logical partition plan, election plan and alarm association plan;
Described Autonomic Element includes base module, monitoring resource module, analysis module, event manager module, respond module, and Row performing module and autonomous schedule module;
Described base module passes through knowledge base interactive interfacing with user and provides customizable autonomous rule;
Described monitoring resource module is safeguarded the storage standard of a resource and is accepted from the resource being managed node Autonomic Element Information, resource information is stored in data base according to standard directories form and uses for other Autonomic Elements;
Described analysis module is used by respective Major program, judges whether the information of storage in data base meets the rule in knowledge base Condition;Produce the event of needs execution when meeting, set Event Priority, and event description is sent to event manager module;
Described event manager module caches the event description that analysis module is stored in, and strategically scheduling determines that caching event whether may be used To execute, if executable, generate the scheduled event of concurrent thread execution, execute response in thread, complete specifically to ring Should;
Described respond module provides the method that response action is registered as predefined response, and manages the mapping table of the two;
Described executed in parallel module is used for script or the rule that execution respond module produces on multiple nodes simultaneously;
Described autonomous schedule module logically controls remaining six module, forms the freedom attributes of a management system;
Described base module can extend online, realizes passive learning;Can also be by Dynamical Deployment, renewal and deletion rule To be changed and sophisticated systems behavior in the case of not changing Software Coding or halt system operation with realizing;
Described Autonomic Element is divided into parametric degree, component-level, node level, partition level and system-level;Minimum autonomous in parametric degree definition Element, and build component-level on the basis of parametric degree, build node level on the basis of component-level, build on the basis of node level and divide Area's level, constructing system level on the basis of partition level;
Described node level element is deployed on server node, and is responsible for the management of all parts in this node level node, each In the management node of individual logical partition, deployment partition level Autonomic Element is responsible for partition management work, in top deployment system level certainly Host element, is communicated between each Autonomic Element by way of extending ID description standard CIM;Be managed on node from Host element is responsible for collecting the status information of each resource on this node, is sent to higher level's Autonomic Element, and executes higher level's Autonomic Element The order issued, each basic unit Autonomic Element does not have the overall situation to see and the knowledge base of oneself, is judged whether by upper strata Autonomic Element completely Meet predefined condition, and execute corresponding response;
The process of described election plan is:
The topological structure that is entirely connected is set up between all nodes;
The priority of each node is set;
The node of election highest priority is leader node, and is broadcast to other nodes;
If other nodal test to leader node go wrong when, triggering election;
New leader node is elected according to priority and re-broadcasts.
2. the method for claim 1 it is characterised in that:Described autonomous rule includes conditional plan and formula rule;
Described conditional plan includes predefined conditional plan and predefined rule of response;
Described predefined conditional plan is expression formula or the expression logic combination of various Resource Properties, described Resource Properties storage By monitoring resource module management in Resource Properties catalogue;
Described predefined rule of response is that action is registered generation, the trigger action when condition meets by respond module.
3. the method for claim 1 it is characterised in that:Described logical partition is dividing and need to calculate each point according to formula The nodes in area, computing formula is:
Wherein R in formula1、R2、R3、R4、R5For the weight of different resource load value, Σ Ri=1;During self-adaptative adjustment, new coefficient The computing formula of Rnewi is:Li represents the load value of Current resource, if RnewiWith old coefficients R i phase Ratio exceedes the threshold value of reservation, then the coefficient with newly calculating substitutes old coefficient.
4. the method for claim 1 it is characterised in that:Described alarm association plan adopts time and space compression method Exclude invalid alarm.
5. the method for claim 1 it is characterised in that:Described election is using towards cloud computing election algorithm, described face Employ the conventional distributed network based on agent node group in large-scale distributed network management to cloud computing election algorithm Network administrative mechanism, if being managed node number in a logical partition is n, each node all has a node-agent, this generation Li Youyige globally unique identifier, and as priori known to the agency of other nodes in this subregion, appoint in whole subregion Message can be transmitted mutually by message between meaning two agency, as entirely connect topological structure, the set of whole partitioned proxies can With with { ID0,ID1,ID2,……IDN-1Represent;In each logical partition, one leader agency of setting is to the agency in subregion Node is managed;Cooperate according to centralized management pattern between leader node and agent node, i.e. leader node instruction agency Node is specifically operated or is provided specific information, and agent node returns operating result or the information being required;Leader saves Then according to certain distributed collaboration Pattern completion management role between point.
6. the method for claim 1 it is characterised in that:Described cloud computing management system adopts unified monitoring to manage plan Slightly, described unified monitoring management strategy content is as follows:
Policy class:It is divided into some classifications according to global monitoring management strategy, including:Switch, disk array, operation system System, tape library, data base, hardware information;
Strategy is abstract:Each level Autonomic Element, from the monitoring management strategy of same type different vendor product, takes out such The unified monitoring management strategy form of type product;
Policy depiction:On the basis of above-mentioned monitoring management policy class, each level Autonomic Element realizes the prison to various species Control management strategy carries out Unify legislation;
Strategy combination:Monitoring management strategy is divided into direct strategy and two kinds of indirect strategies, wherein, direct strategy can be by Strategy conversion is directly implemented in concrete equipment or application, and indirect strategies are then combined by one group of direct strategy or indirect strategies Form;
Strategy configuration:Realize Unified Policy is converted to the monitoring management strategy processing module of concrete equipment strategy, in addition real again Now the equipment supervision that concrete equipment strategy is distributed on equipment or application is driven and proxy module.
CN201210042033.3A 2012-02-23 2012-02-23 A kind of distributed levelization autonomous management method towards cloud computing platform Expired - Fee Related CN102724057B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210042033.3A CN102724057B (en) 2012-02-23 2012-02-23 A kind of distributed levelization autonomous management method towards cloud computing platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210042033.3A CN102724057B (en) 2012-02-23 2012-02-23 A kind of distributed levelization autonomous management method towards cloud computing platform

Publications (2)

Publication Number Publication Date
CN102724057A CN102724057A (en) 2012-10-10
CN102724057B true CN102724057B (en) 2017-03-08

Family

ID=46949726

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210042033.3A Expired - Fee Related CN102724057B (en) 2012-02-23 2012-02-23 A kind of distributed levelization autonomous management method towards cloud computing platform

Country Status (1)

Country Link
CN (1) CN102724057B (en)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103473062B (en) * 2013-09-13 2017-01-18 Tcl移动通信科技(宁波)有限公司 Method and system for mobile terminal customization based on user space file system
WO2016070375A1 (en) 2014-11-06 2016-05-12 华为技术有限公司 Distributed storage replication system and method
CN105407334A (en) * 2015-12-29 2016-03-16 上海大学 Self management method for multi-scenario monitoring videos
CN105427545B (en) * 2015-12-30 2018-07-17 山东中创软件商用中间件股份有限公司 Device Alarm Management method and device based on drools
CN105872068A (en) * 2016-04-28 2016-08-17 国网浙江省电力公司信息通信分公司 Cloud platform and automatic operation check method based on same
CN107707431A (en) * 2017-10-31 2018-02-16 河南科技大学 The data safety monitoring method and system of a kind of facing cloud platform
US10735529B2 (en) 2017-12-07 2020-08-04 At&T Intellectual Property I, L.P. Operations control of network services
CN108337315B (en) * 2018-02-07 2019-10-08 平安科技(深圳)有限公司 Dispositions method, device, computer equipment and the storage medium of monitoring system
CN108847961B (en) * 2018-05-28 2021-07-16 中国电子科技集团公司第五十四研究所 Large-scale high-concurrency deterministic network system
CN111078399B (en) * 2019-11-29 2023-10-13 珠海金山数字网络科技有限公司 Resource analysis method and system based on distributed architecture
CN112379977A (en) * 2020-07-10 2021-02-19 中国航空工业集团公司西安飞行自动控制研究所 Task-level fault processing method based on time triggering
CN111711702B (en) * 2020-08-18 2020-12-18 北京东方通科技股份有限公司 Distributed cooperative interaction method and system based on communication topology

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101118521A (en) * 2006-08-01 2008-02-06 国际商业机器公司 System and method for spanning multiple logical sectorization to distributing virtual input-output operation

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE112010003594B4 (en) * 2009-10-19 2023-03-16 International Business Machines Corporation Apparatus, method and computer program for operating a distributed write storage network

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101118521A (en) * 2006-08-01 2008-02-06 国际商业机器公司 System and method for spanning multiple logical sectorization to distributing virtual input-output operation

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
基于自主计算的集群管理软件的设计与实现;李云春等;《中山大学学报(自然科学版)》;20090331;第48卷(第S期);第248-251页 *
基于自适应控制理论的自主计算;吕晔等;《福建电脑》;20090430(第4期);第101-102页 *
自主计算概念模型与实现方法*;廖备水等;《软件学报》;20080430(第4期);第779-802页 *

Also Published As

Publication number Publication date
CN102724057A (en) 2012-10-10

Similar Documents

Publication Publication Date Title
CN102724057B (en) A kind of distributed levelization autonomous management method towards cloud computing platform
CN102103518B (en) System for managing resources in virtual environment and implementation method thereof
CN100570569C (en) Operation cross-domain control method under the grid computing environment
JP4304535B2 (en) Information processing apparatus, program, modular system operation management system, and component selection method
CN103154926A (en) Virtual resource cost tracking with dedicated implementation resources
CN105975378A (en) Distributed layering autonomous monitoring and management system facing supercomputer
Goyal et al. Adaptive and dynamic load balancing in grid using ant colony optimization
Tong et al. Bloom filter-based workflow management to enable QoS guarantee in wireless sensor networks
CN106911540A (en) The method and cloud platform of analysis power resource and service data
Kanbar et al. Region aware dynamic task scheduling and resource virtualization for load balancing in IoT–fog multi-cloud environment
Skarlat et al. FogFrame: a framework for IoT application execution in the fog
CN109587026A (en) A method of large and medium-sized enterprise's Network Programe Design based on Java
Chhetri et al. AWaRE-towards distributed self-management for resilient cyber systems
Hasanzadeh et al. Distributed optimization grid resource discovery
CN106254452A (en) The big data access method of medical treatment under cloud platform
CN106302656A (en) The Medical Data processing method of cloud storage platform
Ribeiro et al. A management architectural pattern for adaptation system in Internet of Things
Lv et al. A hierarchical management architecture for virtual network mapping
Yahaya et al. Dynamic load balancing policy with communication and computation elements in grid computing with multi-agent system integration
Huang et al. Performance diagnosis for SOA on hybrid cloud using the Markov network model
Csorba et al. A bio-inspired method for distributed deployment of services
Xu et al. Cooperative autonomic management in dynamic distributed systems
CN100373883C (en) Gridding service group establishing method and gridding service discovering method
Saxena et al. A High Up-Time and Security Centered Resource Provisioning Model Towards Sustainable Cloud Service Management
Rahman et al. An autonomic workflow management system for global grids

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20170308