CN107426012A - A kind of fault recovery method and its device based on super fusion architecture - Google Patents

A kind of fault recovery method and its device based on super fusion architecture Download PDF

Info

Publication number
CN107426012A
CN107426012A CN201710392491.2A CN201710392491A CN107426012A CN 107426012 A CN107426012 A CN 107426012A CN 201710392491 A CN201710392491 A CN 201710392491A CN 107426012 A CN107426012 A CN 107426012A
Authority
CN
China
Prior art keywords
migration
tenant
main frame
event
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710392491.2A
Other languages
Chinese (zh)
Other versions
CN107426012B (en
Inventor
何盛杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sangfor Technologies Co Ltd
Original Assignee
Sangfor Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sangfor Technologies Co Ltd filed Critical Sangfor Technologies Co Ltd
Priority to CN201710392491.2A priority Critical patent/CN107426012B/en
Publication of CN107426012A publication Critical patent/CN107426012A/en
Application granted granted Critical
Publication of CN107426012B publication Critical patent/CN107426012B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • H04L41/0663Performing the actions predefined by failover planning, e.g. switching to standby network elements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0677Localisation of faults

Abstract

The invention discloses the module information that the parameter information of each main frame under a kind of fault recovery method and its device based on super fusion architecture, including the super fusion architecture platform of collection and each tenant include;By the parameter information of main frame compared with preset failure parameter, judge whether the main frame of progress tenant's migration in need according to comparative result, if so, generation migration event, the interior identification information for carrying the source host and destination host in need for carrying out tenant's migration of migration event;Type according to migration event selects corresponding tenant to be migrated from source host, and using all components that each tenant to be migrated includes as a migrating objects, the identification information according to destination host uniformly migrates migrating objects to destination host.In multiple assembly transfers that the present invention can include tenant to same destination host, so as to avoid flow hop count excessive, and then the time delay of network transmission is avoided to increase, efficiency of transmission is high.

Description

A kind of fault recovery method and its device based on super fusion architecture
Technical field
The present invention relates to hardware fault recovery technology field, more particularly to a kind of fault recovery based on super fusion architecture Method and its device.
Background technology
HCI(Hyper-Converged Infrastructure):Super fusion architecture is also referred to as super fusion architecture, Refer to not only possess the resources such as calculating, network, storage and server virtualization in same set of unit (x86 servers) And technology, but also including caching the elements such as acceleration, data de-duplication, online data compression, backup software, snapping technique, Under the framework, got up by multiple main frames node by network polymerization, realize modular seamless extending transversely, form unified money Source pond.
Tenant (tenant):Refer to the client using system or computer calculation resources, in multi-tenant technology, tenant includes It can recognize that in systems to specify all data of user, such as account and statistical information (accounting data).
Virtualization is a broader term, typically refer in terms of the computer computing element on the basis of virtual without Be it is real on the basis of run.The it is proposed of virtualization technology can expand the capacity of hardware, simplify the re-configuration process of software, mould It is parallel to intend multi -CPU, it is allowed to which a platform runs multiple operating systems simultaneously, and application program can be in separate sky It is interior operation and be independent of each other, so as to significantly improve the operating efficiency of computer.Virtualization technology have can reduce server Excessive offer, improve utilization rate of equipment and installations, reduce IT overall investment, enhancing provide IT environment flexibility, money can be shared The advantages that source.
On the HCI platforms of virtualization, the network equipment or virtual machine are under abnormal scene, HCI High Availabitity function energy Enough, so as to provide normal business service, ensure client traffic in assembly transfer to the new normal node exception automatically Normal operation.
Although current HCI High Availabitity function can ensure the normal of the network equipment or virtual machine in an exceptional case Operation, but under the scene of secure resources pond, in the case of a tenant includes in the case of component (physical assemblies of virtualization), This mechanism is not well positioned to meet the demand of scene.In abnormal cases, the High Availabitity function that HCI is carried can wrap the tenant The multiple assembly transfers contained provide service to normal HCI main frames, but not necessarily ensure that these components all move to same Worked on normal HCI main frames, if these components are moved to different main frames, following defect can be caused:
(1) network transmission path is added
When the components distribution of tenant when on different HCI main frames (for example, virtualization router vroute on host A, Fire wall vAF is virtualized on host B), across main frame feelings just occur when to reach vAF after vroute in the flow of tenant Condition, data can be transmitted in bottom by vxlan, and after the completion of vAF processing, then the flow after filtering passes vroute back again Other assemblies are sent to be handled.So virtually increase the hop count of flow.
(2) network transfer delay is increased
And while adding flow hop count, it can also increase the time delay of network transmission, data are there is also under extreme case The situation of packet loss, influences the experience of user, and efficiency of transmission is low.
Therefore, how to provide a kind of network transmission efficiency high fault recovery method and its device based on super fusion architecture It is that those skilled in the art need to solve the problems, such as at present.
The content of the invention
, can be by tenant it is an object of the invention to provide a kind of fault recovery method and its device based on super fusion architecture Comprising multiple assembly transfers to same destination host in, so as to avoid flow hop count excessive, and then avoid network transmission Time delay increase, efficiency of transmission are high.
In order to solve the above technical problems, the invention provides a kind of fault recovery method based on super fusion architecture, including:
The module information that the parameter information of each main frame and each tenant include under the super fusion architecture platform of collection;
By the parameter information of the main frame compared with preset failure parameter, judge whether according to comparative result in need The main frame of tenant's migration is carried out, if so, generation migration event, the migration event is interior to carry the source in need for carrying out tenant's migration The identification information of main frame and destination host;
Type according to the migration event selects corresponding tenant to be migrated from the source host, described is treated each The all components that include of migration tenant are as a migrating objects, and the identification information according to the destination host is by the migration pair As uniformly migrating to the destination host.
Preferably, it is described select tenant to be migrated from the source host according to the migration event process be specially:
Judge the event type of the migration event, if hostdown class migration event, then tenant's bag to be migrated Include whole tenants of operation in source host corresponding to the migration event;
If other class migration events, then according to preset rules, several for selecting operation in the source host are specified Tenant as run in the tenant to be migrated or the selection source host several include the minimum tenant of component as institute State tenant to be migrated.
Preferably, after the completion of migration, in addition to:
Storage migration record, the migration record include the identification information of source host and destination host corresponding to migration, moved Shift time and tenant's information of migration.
Preferably, the process of the generation migration event specifically includes:
Migration record, the parameter information of each main frame and preset rules selection according to each main frame prestored is negative The main frame of conditions permit and hardware condition stabilization is carried as the destination host;
Moved according to described in the migration reason generation of the identification information of the destination host and the source host, the source host Shifting event.
Preferably, after the completion of migration, in addition to:
Determine whether to migrate successfully, if it is not, the migration operation before repeating, and record number of repetition;
If number of repetition does not migrate successfully yet after reaching default number of retries, send warning message and shown.
Preferably, in addition to:
According to the new information of outside input, corresponding configuration information is updated, the configuration information includes the default event Hinder parameter.
In order to solve the above technical problems, present invention also offers a kind of local fault recovery device based on super fusion architecture, bag Include:
Metadata acquisition module, for gathering the parameter information of each main frame and each tenant bag under super fusion architecture platform The module information contained;
Main control module, for by the parameter information of the main frame compared with preset failure parameter, according to comparative result Judge whether the main frame of progress tenant's migration in need, if so, generation migration event, carry in the migration event it is in need enter The source host of row tenant migration and the identification information of destination host;
Event processing module, selected for the type according to the migration event from the source host corresponding to be migrated Tenant, all components that each tenant to be migrated is included are as a migrating objects, the mark according to the destination host Know information uniformly to migrate the migrating objects to the destination host.
Preferably, the main control module also includes:
Buffer unit, for after the completion of migration, storage migration record, the migration record to include source master corresponding to migration Tenant's information of the identification information of machine and destination host, transit time and migration.
Preferably, the main control module specifically includes:
Data analysis unit, the parameter information for analyzing the main frame analyse whether the master of progress tenant's migration in need Machine, if so, trigger event generation unit;
The event generation unit, for the migration record according to each main frame prestored, the parameter of each main frame Information and preset rules selection loading condition allow and the main frame of hardware condition stabilization is as the destination host;According to described in The identification information of destination host and the source host, the migration reason of the source host generate the migration event.
Preferably, in addition to:
Message processing module, for receiving the new information of outside input, and send to the main control module, for the master Control module and update corresponding configuration information, the configuration information includes the preset failure parameter.
The invention provides a kind of fault recovery method and its device based on super fusion architecture, according to the main frame gathered After parameter information and module information determination need the main frame of progress tenant's migration, migration event is generated, then according to migration event Type selecting tenant to be migrated, and using all components that tenant to be migrated includes as a migrating objects, bulk migration is extremely In destination host.It can be seen that the present invention can be uniformly migrated multiple components that a tenant includes as an entirety to purpose In main frame, avoid situation of the components distribution caused by the different destination hosts more than flow hop count and occur, so as to avoid net The propagation delay time increase of network, network transmission efficiency are high.
Brief description of the drawings
Technical scheme in order to illustrate the embodiments of the present invention more clearly, below will be to institute in prior art and embodiment The accompanying drawing needed to use is briefly described, it should be apparent that, drawings in the following description are only some implementations of the present invention Example, for those of ordinary skill in the art, on the premise of not paying creative work, can also be obtained according to these accompanying drawings Obtain other accompanying drawings.
Fig. 1 is a kind of flow chart of the process of the fault recovery method based on super fusion architecture provided by the invention;
Fig. 2 is a kind of structural representation of the local fault recovery device based on super fusion architecture provided by the invention.
Embodiment
The core of the present invention is to provide a kind of fault recovery method and its device based on super fusion architecture, can be by tenant Comprising multiple assembly transfers to same destination host in, so as to avoid flow hop count excessive, and then avoid network transmission Time delay increase, efficiency of transmission are high.
To make the purpose, technical scheme and advantage of the embodiment of the present invention clearer, below in conjunction with the embodiment of the present invention In accompanying drawing, the technical scheme in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is Part of the embodiment of the present invention, rather than whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art The every other embodiment obtained under the premise of creative work is not made, belongs to the scope of protection of the invention.
Shown in Figure 1 the invention provides a kind of fault recovery method based on super fusion architecture, Fig. 1 is the present invention A kind of flow chart of the process of the fault recovery method based on super fusion architecture provided;This method includes:
Step s101:The component letter that the parameter information of each main frame and each tenant include under the super fusion architecture platform of collection Breath;
Wherein, the parameter information of main frame includes the network interface information and load information of main frame, and load information includes the cpu of main frame And the trip information of internal memory.Main frame of the module information residing for including component type and component.Certainly, other also can also be included Information, the specific present invention are not particularly limited.
Step s102:By the parameter information of main frame compared with preset failure parameter, judge whether according to comparative result The main frame in need for carrying out tenant's migration, if so, generation migration event, the interior carrying of migration event progress tenant's migration in need The identification information of source host and destination host;
Wherein, preset failure parameter here is referred to for judging whether the relevant parameter information of main frame meets requirement Threshold data, when the parameter information of main frame is unsatisfactory for corresponding default failure quantity, showing main frame, there occurs failure ginseng Failure corresponding to amount is, it is necessary to carry out certain types of tenant's migration.For example, the EMS memory occupation state of main frame is 80%, accordingly High capacity parameter is 70%, now because EMS memory occupation state is higher than default high capacity parameter, then shows that the main frame is in height Load condition is, it is necessary to carry out tenant's migration.Certainly, the content of preset failure parameter and the parameter information of main frame are related, and its is each The concrete numerical value of failure quantity sets itself or can be updated according to the new information of outside input, and the present invention does not limit specifically It is fixed.
In addition, identification information here can be IP or ID of main frame etc., certainly, the present invention is not especially limited to this.
It should be noted that the number of destination host carried in migration event can not be limited to one, you can with including Multiple satisfactory destination hosts.
Step s103:Type according to migration event selects corresponding tenant to be migrated from source host, waits to move by each The all components that tenant includes are moved to migrate migrating objects unification as a migrating objects, the identification information of foundation destination host To destination host.
Wherein it is determined that, it is necessary to generate corresponding migration task after tenant to be migrated, and migration task is added into migration queue In, can be 1 due to selecting the number of tenant to be migrated relevant with migration event type, but in most cases to be multiple.
Can be that each migrating objects set a migration to appoint respectively preferably when needing to migrate multiple tenants Business, i.e., a migrating objects are migrated every time.
In another embodiment, this whole migrating objects to be migrated can also be collectively constituted to a migration and appointed Business, i.e., once migration can complete the migration of all tenants to be migrated.
If in addition, if generating a migration task respectively for each migrating objects, now each migrating objects are corresponding Destination host can be with identical, can also be different, you can so that each tenant to be migrated to be migrated to different destination hosts respectively. Certainly, several embodiments of the above are preferred scheme, can also use other implementations during specific implementation, the present invention does not make to this It is particularly limited to.
In addition, migration event here includes:
Store mouth event of failure, the offline event of main frame, data port event of failure, high capacity event, in low-load event It is any.Certainly, other events for needing to carry out tenant's migration can be also included, specifically can be depending on actual conditions.
Preferably, in step s103, select the process of tenant to be migrated specific from source host according to migration event For:
Judge the event type of migration event, if hostdown class migration event, then tenant to be migrated is including migrating thing Whole tenants of operation in source host corresponding to part;
If other class migration events, then according to preset rules, several tenants specified run in source host are selected As run in tenant to be migrated or selection source host several include the minimum tenant of component as tenant to be migrated.
It is understood that when event is main frame failure classes migration event, show now main frame failure, so when The whole tenants that will be run on the main frame are needed to be migrated, now according to the corresponding migration of migrating objects generation corresponding to tenant Task, and migration task is added in failure ready queue;When event is other class migration events, such as high capacity type thing Part, show that now main frame needs to carry out the migration of one or part tenant, according to event type, can optionally select part Tenant is specified as tenant to be migrated, or selects several to include the minimum tenant of component as tenant to be migrated, is specifically adopted It is not especially limited with which kind of mode present invention, afterwards according to the corresponding migration task of migrating objects generation corresponding to tenant, and Migration task is added in common ready queue.
Preferably, after the completion of migration, in addition to:
Storage migration record, when migration record includes the identification information of source host and destination host corresponding to migration, migration Between and migration tenant's information.
It is understood that by storing migration record, moving into for tenant in each main frame conveniently can be subsequently checked Artificial situation, and then whether the hardware condition for understanding each main frame is stablized, and each tenant can also be facilitated to understand the migration of itself Journey, provide for follow-up migration and Host Administration and instruct foundation.
Wherein, in step s102, the process for generating migration event specifically includes:
Migration record, the parameter information of each main frame and preset rules selection according to each main frame prestored is negative The main frame of conditions permit and hardware condition stabilization is carried as destination host;
Migration reason generation migration event according to the identification information of destination host and source host, source host.
It is understood that the migration of scheme of the prior art only accounts for cpu, internal memory even load factor, but not Consider that hardware device is not very stable situation, in the case where hardware device is not very stable, it is possible that component faults After moving to other normal HCI main frames, (for example, ten minutes) original host fault recovery after a while is low due to loading, with Before the component that migrates out can be moved back to come again, then original host hardware device is unstable, breaks down again, just again above Assembly transfer is gone out, and causes to migrate back and forth, ping-pong occurs.
Here preset rules are preferably to select in past preset time period (such as a hour), and no tenant moves out Main frame, then from this part main frame suitable (load the is relatively low) main frame of selection load as destination host;Certainly, the above Only preferred scheme, few main frame of moving out in the recent period is selected according to migration record as far as possible during preset rules here, it is hard to avoid The phenomenon that the unstable tenant's component brought of part equipment migrates back and forth.
Preferably, after the completion of migration, in addition to:
Determine whether to migrate successfully, if it is not, the migration operation before repeating, and record number of repetition;
If number of repetition does not migrate successfully yet after reaching default number of retries, send warning message and shown.
Wherein, determine whether to migrate successfully here, can be by destination host return in response to determining that or By gathering each tenant's information after the completion of migration, judge the position of host machine residing for tenant to judge, can also use it certainly His mode, the present invention are not especially limited.
Further, since in many cases, it is necessary to migrate multiple tenants, may have in this case part tenant migration into Work(, and part tenant migrates failure, if therefore being judged that the response returned must be taken according to the response that destination host returns Identification information with tenant.If there is tenant to migrate failure, only need to repeat the tenant for migrating the failure of this part.
It is further known that transmission warning message here can be to send alarm (or report file) to keeper On the display interface of (or corresponding tenant), keeper (or corresponding tenant) is reminded to be handled accordingly.Certainly, it can also be used His type of alarm, any type of alarm is within protection scope of the present invention.
Preferably, this method also includes:
According to the new information of outside input, corresponding configuration information is updated, configuration information includes preset failure parameter.
It is understood that user may send some new informations according to their needs, these message can influence to sentence Disconnected main frame whether need carry out tenant's migration basis for estimation, for example, these new informations have updated HCI judge main frame whether be The foundation of high load condition, therefore, the situation for generating the event of high capacity type are also changed.
The invention provides a kind of fault recovery method based on super fusion architecture, the parameter information according to the main frame of collection After needing the main frame of progress tenant's migration with module information determination, migration event is generated, then the type according to migration event is selected Tenant to be migrated is selected, and using all components that tenant to be migrated includes as a migrating objects, bulk migration to destination host It is interior.It can be seen that the present invention can be uniformly migrated multiple components that a tenant includes as an entirety to destination host, keep away Exempt from situation of the components distribution caused by the different destination hosts more than flow hop count to occur, during so as to avoid the transmission of network Prolong increase, network transmission efficiency is high.
Shown in Figure 2 present invention also offers a kind of local fault recovery device based on super fusion architecture, Fig. 2 is this hair A kind of structural representation of local fault recovery device based on super fusion architecture of bright offer.The device includes:
Metadata acquisition module 1, for gathering the parameter information of each main frame and each tenant under super fusion architecture platform Comprising module information;
Main control module 2, for the parameter information of main frame compared with preset failure parameter, to be judged according to comparative result Whether the main frame in need for carrying out tenant's migration, if so, generation migration event, progress tenant in need is carried in migration event and is moved The source host of shifting and the identification information of destination host;
Event processing module 3, will for selecting corresponding tenant to be migrated from source host according to the type of migration event The all components that each tenant to be migrated includes are as a migrating objects, and the identification information according to destination host is by migrating objects Uniformly migrate to destination host.
Wherein, event processing module 3 is in addition to tenant's migration management for the above, is additionally operable to carry out event duplicate removal and excellent First queue management etc. operates.It is understood that a main frame only allows the too high migration event of cpu, an internal memory, for super Part is crossed, it is necessary to be ignored, here it is event duplicate removal;After selecting tenant to be migrated, according to migrating objects generation migration task Add in corresponding queue and handled, in processing procedure, for part special duty, priority treatment can be carried out, here The screening conditions present invention of special duty is not construed as limiting.
Further, event processing module 3 is additionally operable to carry out inhibition of metastasis operation, i.e., according to the migration note prestored Record, the parameter information of each main frame and preset rules selection loading condition allow and the main frame of hardware condition stabilization is as purpose Main frame.Consider the historical failure record of destination host, avoid the occurrence of situation about migrating back and forth.
In addition, general each migration event can carry the identification information (such as host address) of a destination host, afterwards Obtained whole tenant to be migrated and its component form a migration task, migrate to destination host, but it is also possible to select One migration event carries the identification information of multiple destination hosts, and corresponding one of the component of each tenant to be migrated migrates task, Each migration tenant disperses to migrate to each destination host respectively, is specifically not especially limited using which kind of mode present invention.
The course of work of event processing module 3 is specially:
Step s11:Start the process of event processing module 3;
Step s12:A migration event is obtained in from main control module 2;
Step s13:Judge whether migration event obtains success, if obtaining successfully, into step s16, if obtaining failure, enter Enter step s14;
Step s14:Carry out event duplicate removal;
Step s15:Whether the transit time for judging now to handle repeats, if so, after dormancy preset time, return to step S12, otherwise, into step s16;
Step s16:The determination of tenant to be migrated, and the migration event of acquisition is converted into migration task (hostdown The migration task of class migration event generation adds failure ready queue, and the migration task of other class migration events generation adds commonly Ready queue);
Step s17:Perform migration task;
Step s18:Judge whether to exit own process, if not exiting, after dormancy preset time, return to step s12 is no Then, process is exited.
Preferably, main control module 2 also includes:
Buffer unit, for after the completion of migration, storage migration record, migration record include source host corresponding to migration with Identification information, transit time and the tenant of the migration information of destination host.
Wherein, main control module 2 specifically includes:
Data analysis unit, the parameter information for analyzing main frame analyse whether the main frame of progress tenant's migration in need, If so, trigger event generation unit;
Event generation unit, for migration record, the parameter information of each main frame according to each main frame prestored And the main frame that preset rules selection loading condition allows and hardware condition is stable is as destination host;According to destination host and source The identification information of main frame, the migration reason generation migration event of source host.
Preferably, this method also includes:
Message processing module 4, for receiving the new information of outside input, and send to main control module 2, for main control module The 2 corresponding configuration informations of renewal, configuration information include preset failure parameter.
Further, after message processing module 4 is additionally operable to the STATUS ENQUIRY message of reception outside input, main control module is called The corresponding main frame of data query or component states in 2, and Query Result is fed back into display interface;For example, outside input Whether the cluster inquired about in High Availabitity is in the Query Information of high load condition, and inquiry tenant's position adjustment of outside input Query messages etc..
Shared storage in Fig. 2 refers to that the operation of message processing module 4 needs to obtain the number of the memory storage of main control module 2 According to, and metadata acquisition module 1 can send gathered data to main control module 2.
Wherein, main control module 2 is except data (the data mode prison of the above-described collection of reception metadata acquisition module 1 Control), to metadata acquisition module 1 gather data analysis, migration record storage and analysis and the generation of migration event, also The processing of the external message sent including message processing module 4 and feedback (the action listener behaviour such as config update, the detection of tenant position Make), the monitoring of High Availabitity process status etc..The course of work of main control module 2 is specially:
Step s21:Starting up's High Availabitity process;
Step s22:Initializing system resources;
Wherein, the process of initialization specifically includes the acquiescence High Availabitity configuration for reading HCI platforms, here mainly from data The configuration information of High Availabitity, including the configuration information of failure migration and load balancing are read in storehouse;And register some signal letters Number, such as SIG_CHILD, SIG_TERM etc., for handling the outside signal for being sent to High Availabitity host process;And initialization member Data acquisition module 1, event processing module 3 and message processing module 4;
Step s23:Start the subprocess of metadata acquisition module 1;
Step s24:Start the subprocess of event processing module 3;
Step s25:The subprocess of initiation message processing module 4;
Wherein, step s23, the not strict sequencing relation of step s24, step s25.
Step s26:Host process circulation work;
External message renewal main frame cpu, internal memory, network card status information, the offline shape sent according to message processing module 4 State, cluster state information and configuration file, analyze gathered data and trigger migration event generation etc..
Step s27:Detection starts whether High Availabitity process exits, if exiting, into step s28, otherwise, return to step s26;
Step s28:Exit High Availabitity process.
Wherein, detecting the process whether startup High Availabitity process exits is specially:
Judge whether to receive and exit signal, if receiving, mark is exited in renewal, and enters step s28.
Step s28 process is specially:
Term signals (exiting signal) are sent to whole subprocess, record the normal post-set time of each subprocess, and examine Survey whether subprocess exits;
If failure is exited, and current normal post-set time not yet exceeds 2.5s, then after dormancy certain time (such as 0.5s), Again term signals are sent to the subprocess;If add up the normal post-set time of the subprocess beyond 2.5s, to this Subprocess sends kill signals (compulsory withdrawal signal), and start recording compulsory withdrawal time;
Detect whether the subprocess exits success afterwards, if failure is still exited, and the current compulsory withdrawal time not yet exceeds 2.5s, then after dormancy certain time (such as 0.5s), send kill signals to the subprocess again;If add up the strong of the subprocess Post-set time processed, that then terminates the subprocess exited operation beyond 2.5s.
Wherein, 2.5s here is post-set time threshold value, may be alternatively provided as other threshold values, the present invention is not especially limited.
In addition, these are only that one is preferably exited scheme, other processes can be also used in practical application, the present invention does not make It is specific to limit.
In addition, the HCI in Fig. 2 is man-machine interactive component, UI is user interface, and Unix socket refer to leading between process Letter.
The invention provides a kind of local fault recovery device based on super fusion architecture, the parameter information according to the main frame of collection After needing the main frame of progress tenant's migration with module information determination, migration event is generated, then the type according to migration event is selected Tenant to be migrated is selected, and using all components that tenant to be migrated includes as a migrating objects, bulk migration to destination host It is interior.It can be seen that the present invention can be uniformly migrated multiple components that a tenant includes as an entirety to destination host, keep away Exempt from situation of the components distribution caused by the different destination hosts more than flow hop count to occur, during so as to avoid the transmission of network Prolong increase, network transmission efficiency is high.
Each embodiment is described by the way of progressive in this specification, what each embodiment stressed be and other The difference of embodiment, between each embodiment identical similar portion mutually referring to.For device disclosed in embodiment For, because it is corresponded to the method disclosed in Example, so description is fairly simple, related part is said referring to method part It is bright.
It should also be noted that, in this manual, such as first and second or the like relational terms be used merely to by One entity or operation make a distinction with another entity or operation, and not necessarily require or imply these entities or operation Between any this actual relation or order be present.Moreover, term " comprising ", "comprising" or its any other variant meaning Covering including for nonexcludability, so that process, method, article or equipment including a series of elements not only include that A little key elements, but also the other element including being not expressly set out, or also include for this process, method, article or The intrinsic key element of equipment.In the absence of more restrictions, the key element limited by sentence "including a ...", is not arranged Except other identical element in the process including the key element, method, article or equipment being also present.
The foregoing description of the disclosed embodiments, professional and technical personnel in the field are enable to realize or using the present invention. A variety of modifications to these embodiments will be apparent for those skilled in the art, as defined herein General Principle can be realized in other embodiments without departing from the spirit or scope of the present invention.Therefore, it is of the invention The embodiments shown herein is not intended to be limited to, and is to fit to and principles disclosed herein and features of novelty phase one The most wide scope caused.

Claims (10)

  1. A kind of 1. fault recovery method based on super fusion architecture, it is characterised in that including:
    The module information that the parameter information of each main frame and each tenant include under the super fusion architecture platform of collection;
    By the parameter information of the main frame compared with preset failure parameter, judge whether progress in need according to comparative result The main frame of tenant's migration, if so, generation migration event, the migration event is interior to carry the source host in need for carrying out tenant's migration And the identification information of destination host;
    Type according to the migration event selects corresponding tenant to be migrated from the source host, will be each described to be migrated For all components that tenant includes as a migrating objects, the identification information according to the destination host unites the migrating objects One migrates to the destination host.
  2. 2. according to the method for claim 1, it is characterised in that described to be selected according to the migration event from the source host The process for selecting tenant to be migrated is specially:
    Judge the event type of the migration event, if hostdown class migration event, then the tenant to be migrated includes institute State whole tenants of operation in source host corresponding to migration event;
    If other class migration events, then according to preset rules, several tenants specified run in the source host are selected As run in the tenant to be migrated or the selection source host several include the minimum tenant of component and treated as described Migrate tenant.
  3. 3. according to the method for claim 1, it is characterised in that after the completion of migration, in addition to:
    Storage migration record, when the migration record includes the identification information of source host and destination host corresponding to migration, migration Between and migration tenant's information.
  4. 4. according to the method for claim 3, it is characterised in that the process of the generation migration event specifically includes:
    According to the migration record of each main frame prestored, the parameter information of each main frame and preset rules selection load bar The main frame that part allows and hardware condition is stable is as the destination host;
    Migration reason according to the identification information of the destination host and the source host, the source host generates the migration thing Part.
  5. 5. according to the method for claim 3, it is characterised in that after the completion of migration, in addition to:
    Determine whether to migrate successfully, if it is not, the migration operation before repeating, and record number of repetition;
    If number of repetition does not migrate successfully yet after reaching default number of retries, send warning message and shown.
  6. 6. according to the method for claim 3, it is characterised in that also include:
    According to the new information of outside input, corresponding configuration information is updated, the configuration information is joined including the preset failure Amount.
  7. A kind of 7. local fault recovery device based on super fusion architecture, it is characterised in that including:
    Metadata acquisition module, include for gathering the parameter information of each main frame and each tenant under super fusion architecture platform Module information;
    Main control module, for the parameter information of the main frame compared with preset failure parameter, to be judged according to comparative result Whether the main frame in need for carrying out tenant's migration, if so, generation migration event, in need is rented is carried in the migration event The source host of family migration and the identification information of destination host;
    Event processing module, corresponding rent to be migrated is selected from the source host for the type according to the migration event Family, all components that each tenant to be migrated is included are as a migrating objects, the mark according to the destination host Information uniformly migrates the migrating objects to the destination host.
  8. 8. device according to claim 7, it is characterised in that the main control module also includes:
    Buffer unit, for after the completion of migration, storage migration record, the migration record include source host corresponding to migration with Identification information, transit time and the tenant of the migration information of destination host.
  9. 9. device according to claim 8, it is characterised in that the main control module specifically includes:
    Data analysis unit, the parameter information for analyzing the main frame analyse whether the main frame of progress tenant's migration in need, If so, trigger event generation unit;
    The event generation unit, for migration record, the parameter information of each main frame according to each main frame prestored And the main frame that preset rules selection loading condition allows and hardware condition is stable is as the destination host;According to the purpose The identification information of main frame and the source host, the migration reason of the source host generate the migration event.
  10. 10. device according to claim 9, it is characterised in that also include:
    Message processing module, for receiving the new information of outside input, and send to the main control module, for the master control mould Block updates corresponding configuration information, and the configuration information includes the preset failure parameter.
CN201710392491.2A 2017-05-27 2017-05-27 Fault recovery method and device based on super-fusion architecture Active CN107426012B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710392491.2A CN107426012B (en) 2017-05-27 2017-05-27 Fault recovery method and device based on super-fusion architecture

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710392491.2A CN107426012B (en) 2017-05-27 2017-05-27 Fault recovery method and device based on super-fusion architecture

Publications (2)

Publication Number Publication Date
CN107426012A true CN107426012A (en) 2017-12-01
CN107426012B CN107426012B (en) 2020-06-09

Family

ID=60429236

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710392491.2A Active CN107426012B (en) 2017-05-27 2017-05-27 Fault recovery method and device based on super-fusion architecture

Country Status (1)

Country Link
CN (1) CN107426012B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109951531A (en) * 2019-02-27 2019-06-28 广东唯一网络科技有限公司 Super fusion cloud computing system
CN111488248A (en) * 2020-04-14 2020-08-04 深信服科技股份有限公司 Control method, device and equipment for hosting private cloud system and storage medium
CN111835576A (en) * 2019-04-19 2020-10-27 厦门网宿有限公司 DPVS-based back-end server health detection method and server

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102917025A (en) * 2012-09-21 2013-02-06 江苏乐买到网络科技有限公司 Method for business migration based on cloud computing platform
US20140258539A1 (en) * 2013-03-11 2014-09-11 International Business Machines Corporation Minimizing workload migrations during cloud maintenance operations
CN104135535A (en) * 2014-08-14 2014-11-05 苏州大学 Tenant adjusting method and system oriented to cloud computing
CN105117280A (en) * 2015-08-24 2015-12-02 用友网络科技股份有限公司 Virtual machine migration device and method
CN105335214A (en) * 2015-11-12 2016-02-17 国云科技股份有限公司 Virtual machine failure detection and recovery method
CN106254114A (en) * 2016-05-13 2016-12-21 江苏云途腾科技有限责任公司 Cloud hostdown moving method and system
CN106462458A (en) * 2014-04-30 2017-02-22 大连理工大学 Virtual machine migration
US20170116084A1 (en) * 2015-10-26 2017-04-27 Beijing Baidu Netcom Science And Technology, Ltd. Method and System for Monitoring Virtual Machine Cluster

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102917025A (en) * 2012-09-21 2013-02-06 江苏乐买到网络科技有限公司 Method for business migration based on cloud computing platform
US20140258539A1 (en) * 2013-03-11 2014-09-11 International Business Machines Corporation Minimizing workload migrations during cloud maintenance operations
CN106462458A (en) * 2014-04-30 2017-02-22 大连理工大学 Virtual machine migration
CN104135535A (en) * 2014-08-14 2014-11-05 苏州大学 Tenant adjusting method and system oriented to cloud computing
CN105117280A (en) * 2015-08-24 2015-12-02 用友网络科技股份有限公司 Virtual machine migration device and method
US20170116084A1 (en) * 2015-10-26 2017-04-27 Beijing Baidu Netcom Science And Technology, Ltd. Method and System for Monitoring Virtual Machine Cluster
CN105335214A (en) * 2015-11-12 2016-02-17 国云科技股份有限公司 Virtual machine failure detection and recovery method
CN106254114A (en) * 2016-05-13 2016-12-21 江苏云途腾科技有限责任公司 Cloud hostdown moving method and system

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109951531A (en) * 2019-02-27 2019-06-28 广东唯一网络科技有限公司 Super fusion cloud computing system
CN109951531B (en) * 2019-02-27 2021-05-07 广东唯一网络科技有限公司 Super-fusion cloud computing system
CN111835576A (en) * 2019-04-19 2020-10-27 厦门网宿有限公司 DPVS-based back-end server health detection method and server
CN111488248A (en) * 2020-04-14 2020-08-04 深信服科技股份有限公司 Control method, device and equipment for hosting private cloud system and storage medium

Also Published As

Publication number Publication date
CN107426012B (en) 2020-06-09

Similar Documents

Publication Publication Date Title
US10924535B2 (en) Resource load balancing control method and cluster scheduler
US8862744B2 (en) Optimizing traffic load in a communications network
US9584389B2 (en) Physical resource management
EP3361703B1 (en) Load balancing method, related device and system
CN107534570A (en) Virtualize network function monitoring
CN105245381B (en) Cloud Server delay machine monitors migratory system and method
CN107431666A (en) For realizing the technology of low time delay in data center network environment
CN110784515B (en) Data storage method based on distributed cluster and related equipment thereof
CN110209492A (en) A kind of data processing method and device
CN109936473A (en) Distributed computing system and its operation method based on deep learning prediction
WO2023066084A1 (en) Computing power distribution method and apparatus, and computing power server
CN107426012A (en) A kind of fault recovery method and its device based on super fusion architecture
CN116701043B (en) Heterogeneous computing system-oriented fault node switching method, device and equipment
CN109728981A (en) A kind of cloud platform fault monitoring method and device
CN109074280A (en) Network function virtualization
CN107967164A (en) A kind of method and system of live migration of virtual machine
Rygielski et al. Data center network throughput analysis using queueing petri nets
US10892940B2 (en) Scalable statistics and analytics mechanisms in cloud networking
CN103634167B (en) Security configuration check method and system for target hosts in cloud environment
CN110751458B (en) Business approval method, device and system
CN109446062A (en) The method and apparatus of software debugging in cloud computing service
CN108363611A (en) Method for managing security, device and the omnidirectional system of virtual machine
Rygielski et al. Model-based throughput prediction in data center networks
CN108243205A (en) A kind of method, equipment and system for being used to control cloud platform resource allocation
CN108464031A (en) The redundancy based on database in telecommunication network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant