CN101207519A - Version server, operation maintenance unit and method for restoring failure - Google Patents

Version server, operation maintenance unit and method for restoring failure Download PDF

Info

Publication number
CN101207519A
CN101207519A CN 200710172404 CN200710172404A CN101207519A CN 101207519 A CN101207519 A CN 101207519A CN 200710172404 CN200710172404 CN 200710172404 CN 200710172404 A CN200710172404 A CN 200710172404A CN 101207519 A CN101207519 A CN 101207519A
Authority
CN
China
Prior art keywords
unit
operation
software
maintenance
maintenance unit
Prior art date
Application number
CN 200710172404
Other languages
Chinese (zh)
Inventor
江胜峰
Original Assignee
上海华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海华为技术有限公司 filed Critical 上海华为技术有限公司
Priority to CN 200710172404 priority Critical patent/CN101207519A/en
Publication of CN101207519A publication Critical patent/CN101207519A/en

Links

Abstract

The invention discloses an edition server, an operation and maintenance unit, as well as a method to recover the failure thereof. The operation and maintenance unit comprises a basic input and output unit, a receiver unit and a monitoring unit, wherein, the basic input and output unit is used for providing initial configuration to the loading of the software in the operation and maintenance unit, and sending out the recovery request of the software to the edition server when the operation and maintenance unit is restarted; the receiver unit is used to receive the backup of the software in the operation and maintenance unit and install after the basic input and output unit sends out the recovery request of the software, and the backup the software is sent out by a network; the monitoring unit is used to restart the operation and maintenance unit when the abnormity happens to the software operation of the operation and maintenance unit. The invention can high efficiently and automatically recover the system of the network element, thereby dispensing with manual near-end operation.

Description

版本服务器、操作维护单元及其故障的恢复方法 Version of the server, operation and maintenance unit and its failure recovery method

技术领域 FIELD

本发明涉及通信领域,特别是涉及版本服务器、操作维护单元及其故障的恢复方法。 The present invention relates to communication field, particularly to a version of the server, the operation and maintenance unit and fault recovery method.

背景技术 Background technique

网络是提供各种信息传送的物理载体,其基础构成因素主要由终端设备、 传输设备、交换设备以及相应的支撑系统等硬件和软件组成,构成网络的基本元素称为网元。 Information transfer network to provide various physical carrier, which constitute the basis of factors mainly by the terminal device, transmission device, the switching device and the corresponding support hardware and software systems, referred to as basic elements constituting the network NE.

参阅图1,网元一般由操作维护单元(Operation and Maintenance Unit, OMU)和业务板组成。 Referring to Figure 1, the general operation and maintenance network element unit (Operation and Maintenance Unit, OMU) and service boards. 其中,OMU提供了网元的操作维护功能,OMU上的4喿作系统(Operating System, OS ) —般为Linux或Windows。 Which, OMU provides operation and maintenance functions NE, 4 Qiao for the system (Operating System, OS) on OMU - like for Linux or Windows. 此外,在另一些网元中,业务寺反也可以通过交换^反与OMU相连。 Further, in some other network element, the service can also exchange Temple trans ^ Anti connected OMU.

网元运行过程中会遇到各种异常情况,如业务板复位,业务板数据损坏, 这时业务板可以继续从OMU加载数据和程序继续运行。 NE during the running experience exception conditions, such as business board reset, data corruption service board, then continue to service board can continue to run the program and data loaded from the OMU. 但是,如果OMU发生软件异常,如OMU的数据损坏、OMU的版本文件损坏等,则OMU复位后也无法正常工作,网元的功能会受到严重影响。 However, if you happen OMU software anomalies, such as data corruption of OMU, OMU version of file corruption, etc., then reset after OMU also does not work, the network element functions will be seriously affected.

在OMU由于软件问题无法正常工作时,现有技术采用人工来恢复OMU 的系统及版本文件。 When OMU does not work due to a software problem, the prior art artificial OMU system and to restore the version of the file. 具体是派遣人工到现场,并通过专用设备,比如多计算机切换器(Keyboard、 Video、 Mouse, KVM, —种集成键盘、显示器和鼠标的设备)重新恢复OMU的OS、恢复OMU的版本及数据,再运行OMU。 Specifically dispatch artificial to the scene, and by special equipment, such as the KVM switch (Keyboard, Video, Mouse, KVM, - kind of integrated keyboard, monitor, and mouse devices) to restore OMU of the OS, recovered version and data OMU of run OMU. 业务板再从OMU加载数据和程序,使网元恢复正常。 Service board and reload the data program from the OMU, so that the network element back to normal.

在进行本发明创造过程中,发明人发现上述现有人工恢复OMU的版本及数据的技术中至少存在以下问题: In carrying out the process of creating the present invention, the inventor finds at least the following versions of the above-described conventional technical problems and artificial restoration OMU data:

OMU系统恢复需要借助于KVM来完成,如果每个网元配套一个KVM, 则增加了网元系统的成本;如果网元不配套KVM,则需要操作人员自带一个KVM,增加操作的复杂性。 OMU system by means of the KVM complete recovery requires, if each network element supporting a KVM, the system increases the cost of the network element; If the network element does not support KVM, the operator will need a KVM comes, increases the complexity of the operation.

并且,这种恢复方案响应慢,且需要人工近端操作,不适用于网元远端及自动恢复的要求。 And, in response to this recovery scheme slow and requires manual operation proximal end, a distal end and network element is not available for automatic recovery requirements.

发明内容 SUMMARY

本发明实施方式提供一种操作维护单元故障的恢复方法,可以在操作维护单元出现故障时自动进行恢复。 Embodiments of the invention provide a method of operating a maintenance unit failure recovery method, the operation can be automatically restored when the maintenance unit fails.

本发明实施方式提供一种操作维护单元故障的恢复方法,包括:检测网元中操作维护单元内运行的软件是否发生异常;检测到所述软件异常后,向版本服务器发送所述软件的恢复请求;接收所述版本服务器下发的所述操作维护单元内软件的备份;将所述备份软件安装到所述:操作维护单元。 There is provided a method of operating recovery unit failure to maintain embodiment of the present invention, comprising: detecting network elements within the operating software to run the maintenance unit is abnormal; the software abnormality is detected after transmitting a recovery request to the server version of the software ; delivered by the operation of the receiving unit within the backup server maintains a version of software; the software is installed to the backup: operation and maintenance unit.

本发明实施方式还提供一种操作维护单元,包括:监控单元,用于在所述操作维护单元内软件运行发生异常时,重启所述操作维护单元;基本输入输出单元,用于为所述操作维护单元内软件的加载提供初始配置,以及在所述操作维护单元重启时向版本服务器发送所述软件的恢复请求;接收单元, Embodiments of the invention further provides a method of operating the maintenance unit, comprising: a monitoring unit for operating when the maintenance unit within software running abnormality occurs, restarting the operation and maintenance unit; basic input output unit for the operation of loading the maintenance software is provided within the initial configuration unit, and transmitting a recovery request to the server version of the software during the restarting operation and maintenance unit; a receiving unit,

用于在所述基本输入输出单元发送所述软件的恢复请求后,接收网络下发的所述操作维护单元内软件的备份。 After recovery request for a basic input output means is the transmission of the software, the operation network maintenance delivered by the receiving unit within the backup software.

本发明实施方式又提供一种版本服务器,包括:备份单元,用于备份网元中操作维护单元的软件;下发单元,用于在收到来自网元的所述软件的恢复请求后,向所述网元下发所述备份单元中备份的所述软件。 Embodiments of the invention further provides a version of the server, comprising: a backup unit, a network element for the backup operation and maintenance software in the unit; issuing unit, after receiving the resume request from a network element software to the network elements send the backup software backup unit.

本发明实施方式还提供一种系统,包括:操作维护单元及版本服务器,其中所述操作维护单元包括:监控单元,用于在所述操作维护单元内软件运行发生异常时,重启所述操作维护单元;基本输入输出单元,用于为所述操作维护单元内软件的加载提供初始配置,以及在所述操作维护单元重启时向版本服务器发送所述软件的恢复请求;接收单元,用于在所述基本输入输出单元发送所述软件的恢复请求后,接收网络下发的所述操作维护单元内软件的备份;所述版本服务器包括:备份单元,用于备份所述网元中操作维护单元的软件;下发单元,用于在收到来自所述网元的所述软件的恢复请求后, 向所述网元下发所述备份单元中备份的所述软件。 Embodiments of the invention also provides a system, comprising: operation and maintenance unit and a version of the server, wherein the operation and maintenance unit comprising: a monitoring unit for operating when the maintenance unit within software running abnormality occurs, restarting the operation and maintenance unit; a basic input output unit for providing the initial service unit is configured to load the software operation, and a request to resume transmission of the software version of the server when restarting the operation and maintenance unit; receiving means for the described later, a basic input output request to resume transmission of the software unit, the reception operation and maintenance network delivered within the cell backup software; version of the server comprising: a backup means for backing up the network element in the operation and maintenance unit software; issuing unit, said software after receiving a recovery request from the software of the network element, the network element sent the backup to the backup unit.

以上技术方案可以看出,在操作维护单元中增设监控单元,监控操作维 As can be seen above technical solutions, in the additional unit operation and maintenance monitoring unit, the monitoring operation dimension

护单元内软件的运行情况,在发生异常时重启所述操作维护单元,触发该基本输入输出单元重新运行并向版本服务器发送所述软件^j恢复请求,进而采用接收单元接收并安装所述操作维护单元内软件的备份,使操作维护单元恢复正常,相对现有技术中一旦操作维护单元操作系统失败即需要人工到现场维修而导致效率低的技术问题,本实施方式显然可以高效率地自动恢复操作维护单元的故障进而恢复网元的系统,无需人工近端操作。 The operation of the software protection unit, when an abnormality occurs in the operation and maintenance unit is restarted, the BIOS trigger means to re-run the software version of the server transmits a recovery request ^ j, using the receiving unit further receives the operation and installation maintaining backup software unit within the unit returns to normal operation and maintenance, compared with the prior art, once the operating system fails operation and maintenance unit to the need of doing field service and result in low efficiency technical problem, the present embodiment will be apparent to efficiently and automatically recover Furthermore the operation and maintenance unit fault recovery system of a network element, the proximal end without manual operation.

附图说明 BRIEF DESCRIPTION

图1是现有技术一种网元的原理框图; 图2是本发明提供的操作维护单元第一实施方式的结构框图; 图3是本发明提供的版本服务器第一实施方式的原理框图; 图4是本发明提供的系统第一实施方式的原理框图; 图5是本发明操作维护单元故障恢复方法第一实施方式的流程图。 FIG. 1 is a block diagram of network elements of the prior art; FIG. 2 is a block diagram showing the operation and maintenance unit according to the present invention provides a first embodiment; FIG. 3 is a schematic block diagram of a version of the present invention provides a server according to the first embodiment; FIG. 4 is a schematic block diagram of a first embodiment of the present invention provides a system; FIG. 5 is a flowchart illustrating operation of the present invention, the maintenance unit failure recovery method according to the first embodiment.

具体实施方式 Detailed ways

为使本发明实施例的目的、技术方案、及优点更加清晰,以下参照附图对本发明实施例作进一步详细说明。 For purposes of embodiments of the present invention, technical solutions and advantages clearer, the accompanying drawings of the embodiments of the present invention will be further described in detail below with reference.

本发明的实施例一个方面是增设连接操作维护单元的版本服务器,通过版本服务器远程OS部署的方法恢复操作维护单元。 Embodiment of an aspect of the present invention is connected to an additional version of the server operation and maintenance unit, the recovery operation and maintenance unit via the remote method server OS version deployed. 操作维护单元可以在检测到软件故障时,自动从版本服务器恢复版本和数据。 When the operation and maintenance unit may detect a software failure, and automatic recovery of data from the version of the server version.

参阅图2,本发明提供操作维护单元第一实施方式,包括: Referring to Figure 2, the present invention provides an operation and maintenance unit of the first embodiment, comprising:

监控单元,用于在所述操作维护单元内软件运行发生异常时,重启所述操作维护单元; Monitoring unit for operating when the maintenance unit within software running abnormality occurs, restarting the operation and maintenance unit;

基本输入输出单元,用于为所述操作维护单元内软件的加栽提供初始配置,以及在所述操作维护单元重启时向版本服务器发送所述软件的恢复请求; A basic input-output unit for providing an initial configuration to maintain the plant operation plus the software unit, and transmitting a recovery request to the server version of the software when the restarting operation and maintenance unit;

接收单元,用于在所述基本输入输出单元发送所述软件的恢复请求后, 接收网络下发的所述操作维护单元内软件的备份并安装。 After the receiving unit, a request to restore a basic input output means is the transmission of the software, the operation of the receiver sent by the network maintenance and backup software installed in the cell.

以上实施方式可以看出,由于在操作维护单元中增设监控单元,监控操作维护单元内软件的运行情况,在发生异常时重启所述操作维护单元,触发该基本输入输出单元重启并向版本服务器发送所述软件的恢复请求,进而采用接收单元接收并安'装所述操作维护单元内软件的备份,保证网元内的业务板能从操作维护单元中加载数据和程序,使网元恢复正常,相对现有技术中一旦网元操作系统失败即需要人工到现场维修而导致效率低的技术问题,本实施方式显然可以高效率地自动恢复操作维护单元的故障进而恢复网元的系统,无需人工近端操作。 As can be seen above embodiments, since the additional unit operation and maintenance monitoring unit that monitors the operation of the software operation and maintenance unit, a maintenance unit restarting the operation when an abnormality occurs, triggers the BIOS and restart the unit transmits the server version a recovery request of the software, the receiving unit further receives and safe use 'means the operation of the maintenance unit within backup software, ensure the service panel in operation and maintenance unit from the network element to load data and programs, so that the network element back to normal, relative to the prior art, once the operating system fails i.e. NE requires manual repairs to the field caused by inefficient technical problem, the present embodiment will be apparent to efficiently restore the automatic fault recovery operation of the maintenance unit of the network element further system, without manual near end of the operation.

参阅图3,本发明提供版本服务器第一实施方式,包括: Referring to Figure 3, the present invention provides a first embodiment of the version of the server, comprising:

备份单元,用于备份网元中操作维护单元的软件; Backup means for backing up the network element in the operation and maintenance of the software elements;

下发单元,用于在收到来自操作维护单元的所述软件的恢复请求后,向所述网元下发所述备份单元中备份的所述软件。 Issuing unit, after receiving a recovery request from said software operation and maintenance unit, in said network element to send the backup software backup unit.

本实施方式对应上述操作维护单元第一实施方式,是利用下发单元在收到来自操作维护单元的软件恢复请求后,向所述操作维护单元下发所述备份单元中备份的所述软件,协助操作维护单元自动恢复系统,提高效率。 The present embodiment corresponds to the above-described operation and maintenance unit of the first embodiment, using the issuing unit is received from the operation and maintenance software in the unit after the recovery request, the operation and maintenance software to the backup means a backup hair in the lower unit, assist in the operation and maintenance unit automatic recovery system to improve efficiency.

参阅图4,本发明还提出系统第一实施方式,包括操作维护单元和版本服务器。 Referring to Figure 4, the present invention also provides a system of the first embodiment, includes an operation and maintenance unit version of the server.

所述版本服务器包括: The version of the server comprises:

备份单元,用于备份网元中操作维护单元的软件;进一步,可以备份网元的OS安装包,网元信息和网元的版本、安装配置信息、补丁版本、配置数据信息等。 Backup means for backing up the network element in the operation and maintenance of the software elements; further, can back up the OS installation package NE NE NE version information and installation configuration information, patch version, configuration data information.

下发单元,用于在收到来自操作维护单元的所述软件的恢复请求后,向所述纟乘作维护单元下发所述备份单元中备份的所述软件。 Issuing unit, after receiving a recovery request from said software operation and maintenance unit to the Si issued by the software for maintaining the lower unit of the backup unit in the backup.

所述操作维护单元提供网元的操作维护功能,操作维护单元上的操作系统(Operating System, OS )可以是Linux或Windows。 The operation and maintenance network element operating unit provides maintenance functions, the operating system on the operation and maintenance unit (Operating System, OS) may be a Linux or Windows. 其中所述操作维护单元包括: Wherein the operation and maintenance unit comprises:

基本输入输出单元,用于为所述操作维护单元内软件的加载提供初始配置,以及在所述操作维护单元重启时向版本服务器发送所述软件的恢复请求; 在实践中,所述基本输入输出单元可以是BIOS (Basic Input Output System ); A basic input-output unit, configured to provide an initial loading operation of the software within the maintenance unit, and transmitting a recovery request to the server version of the software during the restarting operation and maintenance unit; in practice, the basic input output unit may be a BIOS (Basic Input Output System);

接收单元,用于在所述基本输入输出单元发送所述软件的恢复请求后, Receiving means for recovery after a basic input output means requesting the transmission of the software,

接收网络下发的所述操作维护单元内软件的备份;所述操作维护单元内软件的备份接收完成后,所述安装程序自动安装所述软件备份; The operation delivered by the backup unit receives network maintenance software; after the backup operation and maintenance unit of the reception completion software, the installation program to install the software automatically backup;

监控单元,用于在所述操作维护单元的所述软件运行发生异常时,直接重启所述操作维护单元;在更具体的实施方式中,所述监控单元可以是监控软件,监控软件注册进OS的服务中,OS启动时启动该服务,监控软件负责启动和监控操作维护单元内运行的软件; Monitoring unit for the software running at the operation and maintenance unit is abnormal, the operation and maintenance unit is directly restarted; In a more specific embodiment, the monitoring unit may monitor the software, monitoring software register into the OS the service, start the service when OS starts, responsible for initiating and monitoring software monitoring software running in the operation and maintenance unit;

在其他实施方式中,所述监控单元可以进一步包括: In other embodiments, the monitoring unit may further comprise:

监控模块,用于监控所述操作维护单元的所述软件运行是否发生异常; A monitoring module for monitoring the operation of said software operation and maintenance unit is abnormal;

单板管理控制单元,与所述监控单元进行心跳连接,在检测不到所述心跳时重启所述操作维护单元; Board management control unit, heartbeat connection with the monitoring unit, upon detecting the operation restart when a heartbeat is less than the maintenance unit;

其中,所述监控模块在所述软件运行发生异常时断开与所述单板管理控制单元之间的心跳连接。 Wherein the monitoring module is disconnected from the board heartbeat between the management control unit when the abnormal operation of the software.

这样,可以保证当监控软件发生异常时,单板管理控制单元同样可以重新恢复整个OMU的系统。 This ensures that when an exception occurs when the monitoring software, the management board control unit can also restore the entire system OMU. 单板管理控制单元也可以以软件的形式存在。 Board management control unit may also be present in the form of software.

在其他实施方式中,所述监控单元还可以进一步包括: In other embodiments, the monitoring unit may further comprise:

监控模块,用于监控所述操作维护单元的所述软件运行是否发生异常; A monitoring module for monitoring the operation of said software operation and maintenance unit is abnormal;

异常预处理单元,用于在所述软件的运行开始发生异常时累计连续异常次数; Abnormality preprocessing unit, configured to start operation of the cumulative number of the software abnormality abnormality occurs continuously;

其中,所述监控模块在连续异常次数累计值达到预定异常次数后重启所述操作维护单元。 Wherein, after the monitoring module of abnormally accumulated number of times reaches a predetermined number of times the abnormal restarting operation and maintenance unit.

此外,所述操作维护单元还可以进一步包括: In addition, the operation and maintenance unit may further comprise:

发送单元,用于在所述软件安装、打补丁或升级后,向网络中的版本服务器发送所述安装、打补丁或升级后的软件的备份; Transmission means for the software installation, upgrade or patch, transmitted to the network server version of the installed backup software or upgrade the patch;

发送单元还可以用于定期向所述版本服务器发送网元中保存的数据。 For periodically transmitting unit may further transmit stored data to the network element server version.

其中,操作维护单元版本安装或升级后,同时将安装或升级后的软件版本保存一份到版本服务器上。 Among them, the operation and maintenance unit installation or upgrade version, while the software will be installed or upgraded version to save a copy on the server version. 操作维护单元打补丁后,也将补丁包保存一份在版本服务器上。 After the operation and maintenance unit patch, the patch will also save a version on the server. 操作维护单元定期将自身的数据备份,将备份文件传到版本服务器下指定的目录下。 Operation and maintenance unit regularly backup their data, back up files to the specified directory under the version of the server.

在一个更具体的实施方式中,所述单板管理控制单元可以是单板管理控制(Baseboard management controller,单板管理控制)软件; In a more specific embodiment, the control unit may be a management board management control board (Baseboard management controller, the control board management) software;

操作维护单元内运行的软件^是供如配置、性能、告警、安全管理等功能。 Operation and maintenance of the software running within the cell for ^ is such as configuration, performance, alarm, and security management. 单板管理控制软件提供操作维护单元硬件管理功能,如操作维护单元上下电、 操作维护单元运行状态监控等。 Board software operation and maintenance management control unit hardware management functions, such as power down operation and maintenance unit, operation and maintenance unit operation status monitoring. BIOS提供操作维护单元复位时初始配置,启动时向网络上的版本服务器发送恢复请求。 Initial Configuration BIOS provides the reset operation and maintenance unit transmits a recovery request to the server on the network version of the boot.

监控软件与操作维护单元运行的软件设置在不同目录下,监控软件可以安装在操作维护单元OS的系统目录中,因此操作维护单元内数据或软件破坏,监控软件不会受到影响。 Operation and maintenance unit and monitoring software running software provided in different directories, monitoring software can be installed in the operating system directory maintenance unit of the OS, and therefore the operation and maintenance unit damage data or software, monitoring software is not affected. 同样,单板管理控制软件在OS内的存在形式与监控软件类似。 Similarly, the management board of the control software in the OS in the form of monitoring software is similar.

操作维护单元正常运行情况下,监控软件与单板管理控制软件间保持心跳。 Operation and maintenance unit under normal operating conditions, the monitoring software to keep the heartbeat between the board and management control software.

在包括异常预处理单元及单板管理控制单元的操作维护单元一个实施方式中,其运作过程如下: In the pre-processing unit includes an operation anomaly and snowboard management control unit of the maintenance unit one embodiment, the operation of which is as follows:

当操作维护单元数据或软件被破坏时,如果操作维护单元尝试恢复被破坏的软件或数据。 When the operation and maintenance unit of data or software is broken, if the operation and maintenance unit attempts to recover corrupted software or data. 如果异常预处理单元不能恢复被破坏的软件或数据,则认为操作维护单元内运行的软件发生异常,异常预处理单元检测到异常后重新加载操作维护单元内运行的软件, 一旦异常仍不能恢复,则认为操作维护单元内运行的软件连续异常,异常预处理单元记录异常次数。 If the abnormality can not restore the pre-processing unit software or data is destroyed, it is considered abnormal operation and maintenance software running within the unit occurs, an abnormal pre-processing unit after the abnormality is detected reload operation and maintenance software running in the cell, once the abnormality still can not be restored, that the software running inside a continuous operation and maintenance unit abnormality, abnormality abnormality preprocessing unit record number.

异常预处理单元检测到操作维护单元内运行的软件连续异常并超过预定的异常次数,则停止启动操作维护单元内运行的软件,并断开监控软件与单板管理控制软件间的心跳。 Preprocessing unit detects the abnormal operation and maintenance software running continuously in the cell exceeds a predetermined abnormality and the abnormal number of times, the stop-start operation and maintenance software running in the cell, and disconnect between the heartbeat monitoring software management and control software board. 单板管理控制软件检测不到心跳,则复位操作维护单元,重新运行所述BIOS。 Board management control software can not detect the heartbeat, the reset operation and maintenance unit, re-run the BIOS.

然后,BIOS向发送软件恢复请求到版本服务器,下载版本服务器上的备份软件、安装程序并执行操作系统预启动环境。 Then, BIOS recovery software to send a request to the server version, download version of backup software on the server, the installation program and perform an operating system pre-boot environment. 设置操作系乡充自动化安装配置脚本。 Rural setting operation based charging automated installation configuration script.

后续操作维护单元重启,安装程序开始自动化安装操作系统。 Subsequent restart operation and maintenance unit, the installation program to install the operating system starts the automation. 操作系统安装完成后,恢复操作维护单元的版本、数据及操作系统服务设置,向版本服务器发送注销消息,注销本机在版本服务器上的OS自动化郜署服务,并重 After the operating system installation, operation and maintenance unit of recovery version, operating system, data and service settings, send a message to the write-off version of the server, log off the native OS Gao Department of automation service on the server version, both

启操作维护单元。 Kai operation and maintenance unit. 然后OS启动时启动监控软件。 And then start monitoring software when the OS starts.

操作维护单元重启后恢复正常,业务板从操作维护单元加载数据和程序, 网元恢复正常。 Operation and maintenance unit returned to normal after the restart, the operation and maintenance unit LPU load data and programs, a network element back to normal.

由于本实施方式是通过远端网络自动恢复网元系统,无需通过KVM人工近端操作,因此效率高成本相对较低。 Since the present embodiment is an automatic system recovery NE via the remote network, without passing through the proximal end of the artificial KVM operation, the efficiency is relatively low cost.

值得说明的是,前述本发明第操作维护单元第一实施方式中的各单元可以集成在一个处理模块中;同理,前述本发明操作维护单元其他实施方式中的各单元或版本服务器中的各单元也可以集成在一个处理模块中;或者,前述各实施方式各单元中的任何两个或两个以上都可以集成在一个处理模块中。 It should be noted that the operation and maintenance unit of the first embodiment of the present invention in each unit may be integrated in one processing module; Similarly, the operation and maintenance unit according to the present invention, other embodiments of each unit or server version of each unit may also be integrated in one processing module; Alternatively, any two or more of the preceding embodiments of the above units may be integrated into one processing module.

参阅图5,本发明操作维护单元故障的恢复方法第一实施方式包括步骤: 步骤501:检测网元中操作维护单元内运行的软件是否发生异常; 步骤502:检测到所述软件异常后,向版本服务器发送所述软件的恢复请 Referring to Figure 5, operation of the present invention is the maintenance unit failure recovery method of the first embodiment includes: Step 501: detecting network elements within the operating software to run the maintenance unit is abnormal; Step 502: after detecting the abnormal software to Please send resume server version of the software

求; begging;

步骤503:接收所述版本服务器下发的所述操作维护单元内软件的备份; Step 503: receiving the delivered by the server version of the backup operation and maintenance of the software unit;

步骤504:将所述备份软件安装到所述操作维护单元。 Step 504: the backup software is installed to the operation and maintenance unit.

其中,所述向版本服务器发送所述软件的恢复请求之前,可以进一步包 Wherein prior to the request to resume transmission of the software version of the server, the packet may be further

括: include:

A、 建立网元系统中单板管理控制单元与监控单元之间的心跳连接; A, establishing a heartbeat connection between the network element management system board monitoring unit and the control unit;

B、 在所述单板管理控制单元检测不到所述心跳时,判断所述软件运行发生异常,于是断开所述单板管理控制单元与监控单元之间的心跳连接。 B, the board management control unit does not detect the heartbeat, the software determines abnormal operation occurs, then disconnect the heartbeat connection between the management board monitoring unit and the control unit.

另外,在向版本服务器发送所述软件的恢复请求之前,还可以进一步包括:重启所述操作维护单元。 Further, prior to transmitting a recovery request to the server version of the software, it may further comprises: restarting the operation and maintenance unit.

此外,在所述重启所述操作维护单元之前,还可以进一步包括:累计所述软件运行连续发生异常的次数,在达到预定异常次数后执行所述重启所述操作维护单元友其基本输入输出单元的操作。 Further, before restarting the operation of the maintenance unit may further comprise: the total number of the software running abnormality occurs continuously, after a predetermined number of times to perform the restarting of the abnormal operation and maintenance unit which is a basic input output Friends operation.

本领域普通技术人可以理解实现上述方法实施方式中的全部或分步骤是可以通过程序来指令相关的硬件来完成,所述的程序可以存储于计算机可读取存储介质中,该程序在执行时,可以包括前述本发明方法各个实施方式的内容。 When skilled in the art will be appreciated that the above method embodiment may be all or substeps instructing relevant hardware by a program, the program may be stored in a computer readable storage medium, the program execution , the method may comprise the foregoing various embodiments of the present invention. 这里所称得的存储介质,如:ROM/RAM、磁碟、光盘等。 The storage medium mentioned here too, such as: ROM / RAM, magnetic disk, optical disk.

综上,本发明至少可以产生如下技术效果: In summary, the present invention may produce at least the following technical effects:

1、 通过网络恢复网元系统的方法,无需KVM。 1. A method for recovery through the network NE system without KVM.

2、 通过远端OS部署的方法自动恢复网元的系统,效率高成本低,无需人工近端操作。 2, automatic recovery by the process of deploying the distal end of the OS network element system, the efficiency is low cost, without manual operation of the proximal end.

以上对本发明所提供的一种版本服务器、操作维护单元及其故障的恢复方法通过具体实施例进行了详细介绍,以上实施例的说明只是用于帮助理解本发明的方法及其思想;同时,对于本领域的一般技术人员,在了解本发明所揭露的技术内容后可轻易想到的变化或替换,都应涵盖在本发明的保护范围之内,综上所述,本说明书内容不应理解为对本发明保护范围的限制。 Described above for one version of the present invention provides a server, the operation and maintenance unit and failure recovery method described in detail by way of specific embodiments, the above embodiments are only used to help understand the method and idea of ​​the present invention; the same time, those skilled in the art, variations or replacement after understanding the technical contents disclosed in the present invention can be easily thought, shall fall within the protection scope of the present invention, Therefore, the specification shall not be construed as the present limit the scope of the invention.

Claims (10)

1.一种操作维护单元,其特征在于,包括: 监控单元,用于在所述操作维护单元内软件运行发生异常时,重启所述操作维护单元; 基本输入输出单元,用于为所述操作维护单元内软件的加载提供初始配置,以及在所述操作维护单元重启时向版本服务器发送所述软件的恢复请求; 接收单元,用于在所述基本输入输出单元发送所述软件的恢复请求后,接收网络下发的所述操作维护单元内软件的备份并安装。 An operation and maintenance unit, characterized by comprising: a monitoring unit for operating when the maintenance unit within software running abnormality occurs, restarting the operation and maintenance unit; basic input output unit for the operation of loading the maintenance software unit providing the initial configuration, and the resume request transmission of the software version of the server unit to restart the operation maintenance; after receiving means, for restoring a basic input output request unit transmits the software in the , delivered by the receiving network operation and maintenance and backup software installed in the cell.
2. 根据权利要求1所述的操作维护单元,其特征在于,所述监控单元进一步包括:监控模块,用于监控所述操作维护单元的所述软件运行是否发生异常;单板管理控制模块,与所述监控单元进行心跳连接,在检测不到所述心跳时重启所述操作维护单元;其中,所述监控才莫块在所述软件运行发生异常时断开与所述单板管理控制单元之间的心跳连接。 2. The operation of the maintenance unit according to claim 1, wherein the monitoring unit further comprising: a monitoring module for monitoring the operation of said software operation and maintenance unit is abnormal; management control board module, heartbeat connection with the monitoring unit, upon detecting the operation restart when a heartbeat is less than the maintenance unit; wherein the monitoring block only when the Mo software running board abnormal disconnected from the management control unit between the heartbeat connection.
3. 根据权利要求1所述的操作维护单元,其特征在于,所述监控单元进一步包括:监控模块,用于监控所述操作维护单元的所述软件运行是否发生异常; 异常预处理单元,用于在所述软件的运行开始发生异常时累计连续异常次数;其中所述监控模块在连续异常次数累计值达到预定异常次数后重启所述操作维护单元。 3. The operation and maintenance unit according to claim 1, wherein the monitoring unit further comprising: a monitoring module for monitoring the operation of said software operation and maintenance unit is abnormal; abnormal pretreatment unit, with in the number of abnormal accumulation of abnormally begins to occur during operation of the software; wherein said abnormality monitoring module after successive accumulated number of times reaches a predetermined number of the abnormal restarting operation and maintenance unit.
4. 根据权利要求1所述的操作维护单元,其特征在于,进一步包括:发送单元,用于在所述软件安装、打补丁或升级后,向版本服务器发送所述安装、打补丁或升级后的软件的备份。 The operation and maintenance unit according to claim 1, characterized in that, further comprising: transmitting means for mounting said software, or upgrade the patch is transmitted to the server version of the installed, upgraded or patched backup software.
5. —种版本服务器,其特征在于,包括: 备份单元,用于备份网元中操作维护单元的软件;下发单元,用于在收到来自网元的所述软件的恢复请求后,向所述网元下发所述备份单元中备份的所述软件。 5. - server versions, characterized by comprising: a backup unit, a network element for the backup operation and maintenance software in the unit; issuing unit, after receiving the resume request from a network element software to the network elements send the backup software backup unit. -
6. —种系统,其特征在于,包括:操作维护单元及版本服务器,其中所述操作维护单元包括:监控单元,用于在所述操作维护单元内软件运行发生异常时,重启所述操作维护单元;基本输入输出单元,用于为所述操作维护单元内软件的加载提供初始配置,以及在所述操作维护单元重启时向版本服务器发送所述软件的恢复请求;接收单元,用于在所述基本输入输出单元发送所述软件的恢复请求后, 接收网络下发的所述操作维护单元内软件的备份;所述版本服务器包括:备份单元,用于备份所述网元中操作维护单元的软件;下发单元,用于在收到来自所述网元的所述软件的恢复请求后,向所述网元下发所述备份单元中备份的所述软件。 6. - such systems, characterized by comprising: operation and maintenance unit and a version of the server, wherein the operation and maintenance unit comprising: a monitoring unit for operating when the maintenance unit within software running abnormality occurs, restarting the operation and maintenance unit; a basic input output unit for providing the initial service unit is configured to load the software operation, and a request to resume transmission of the software version of the server when restarting the operation and maintenance unit; receiving means for the described later, a basic input output request to resume transmission of the software unit, the reception operation and maintenance network delivered within the cell backup software; version of the server comprising: a backup means for backing up the network element in the operation and maintenance unit software; issuing unit, said software after receiving a recovery request from the software of the network element, the network element sent the backup to the backup unit.
7. —种操作维护单元故障的恢复方法,其特征在于,包括: 检测网元中操作维护单元内运行的软件是否发生异常; 检测到所述软件异常后,向版本服务器发送所述软件的恢复请求; 接收所述版本服务器下发的所述操作维护单元内软件的备份; 将所述备份软件安装到所述操作维护单元。 7. - modes of operation of the maintenance unit failure recovery method, characterized by comprising: detecting network elements within the operating software to run the maintenance unit is abnormal; the software abnormality is detected after transmitting the restored version of the server software to request; delivered by the operation of the receiving unit within the backup server maintains a version of software; the backup software is installed to the operation and maintenance unit.
8. 根据权利要求7所述的操作维护单元故障的恢复方法,其特征在于,所述向版本服务器发送所述软件的恢复请求之前进一步包括:建立单板管理控制单元与监控单元之间的心跳连接;在检测到所述软件异常后,断开所述单板管理控制单元与监控单元之间的心跳连接。 7, 8. The operation of the recovery method according to claim maintenance unit failure, characterized in that, before the server transmits a recovery request to the version of the software further comprises: establishing a heartbeat between the management board monitoring unit and the control unit connection; after the software abnormality is detected, disconnecting the heartbeat connection between the management board monitoring unit and the control unit.
9. 根据权利要求7所述的操作维护单元故障的恢复方法,其特征在于,在向版本服务器发送所述软件的恢复请求之前,进一步包括:重启所述操作维护单元。 9. The operation according to claim 7 of the maintenance unit failure recovery method, wherein, prior to transmitting a recovery request to the server version of the software, further comprising: restarting the operation and maintenance unit.
10.根据权利要求9所述的操作维护单元故障的恢复方法,其特征在于,在所述重启所述操作维护单元之前,进一步包括:累计所述软件运行连续发生异常的次数,在达到预定异常次数后执行所述重启所述操作维护单元的操作。 10. The operation according to claim 9 maintenance unit failure recovery method, characterized in that, before restarting the operation and maintenance of the unit, further comprising: a total number of the software running continuous occurrence of an abnormality, the abnormality reaches a predetermined the number of operations performed after restarting the operation of the maintenance unit.
CN 200710172404 2007-12-13 2007-12-13 Version server, operation maintenance unit and method for restoring failure CN101207519A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 200710172404 CN101207519A (en) 2007-12-13 2007-12-13 Version server, operation maintenance unit and method for restoring failure

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 200710172404 CN101207519A (en) 2007-12-13 2007-12-13 Version server, operation maintenance unit and method for restoring failure

Publications (1)

Publication Number Publication Date
CN101207519A true CN101207519A (en) 2008-06-25

Family

ID=39567422

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 200710172404 CN101207519A (en) 2007-12-13 2007-12-13 Version server, operation maintenance unit and method for restoring failure

Country Status (1)

Country Link
CN (1) CN101207519A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102024109A (en) * 2010-12-02 2011-04-20 清华大学 Method for checking security of operating system based on Meta operating system (MetaOS) technology
CN102521116A (en) * 2011-12-29 2012-06-27 苏州佰思迈信息咨询有限公司 Fault monitoring software
CN102957563A (en) * 2011-08-16 2013-03-06 中国石油化工股份有限公司 Linux cluster fault automatic recovery method and Linux cluster fault automatic recovery system
CN103188714A (en) * 2011-12-29 2013-07-03 中兴通讯股份有限公司 Real-time data acquisition method, acquisition system and acquisition network element
CN103229535A (en) * 2010-11-19 2013-07-31 阿尔卡特朗讯公司 A method and system for cell recovery in telecommunication networks
CN104461866A (en) * 2014-11-04 2015-03-25 中国广核电力股份有限公司 Method and system for detecting abnormal version of software object
CN105978721A (en) * 2016-05-11 2016-09-28 中国农业银行股份有限公司 Method, device and system for monitoring operation state of services in clustering system
CN106598819A (en) * 2016-12-12 2017-04-26 世纪龙信息网络有限责任公司 Monitoring processing method and monitoring processing system used for loading client patch
CN106598819B (en) * 2016-12-12 2019-07-26 世纪龙信息网络有限责任公司 The monitor processing method and system that the load of client patch uses

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103229535A (en) * 2010-11-19 2013-07-31 阿尔卡特朗讯公司 A method and system for cell recovery in telecommunication networks
CN103229535B (en) * 2010-11-19 2016-10-19 阿尔卡特朗讯公司 A method in a telecommunications network and a system unit for recovery
CN102024109B (en) 2010-12-02 2012-07-18 清华大学 Method for checking security of operating system based on Meta operating system (MetaOS) technology
CN102024109A (en) * 2010-12-02 2011-04-20 清华大学 Method for checking security of operating system based on Meta operating system (MetaOS) technology
CN102957563A (en) * 2011-08-16 2013-03-06 中国石油化工股份有限公司 Linux cluster fault automatic recovery method and Linux cluster fault automatic recovery system
CN102957563B (en) * 2011-08-16 2016-07-06 中国石油化工股份有限公司 Linux cluster and automatic fault recovery method Linux clusters automatic fault recovery system
CN103188714A (en) * 2011-12-29 2013-07-03 中兴通讯股份有限公司 Real-time data acquisition method, acquisition system and acquisition network element
CN102521116A (en) * 2011-12-29 2012-06-27 苏州佰思迈信息咨询有限公司 Fault monitoring software
CN104461866A (en) * 2014-11-04 2015-03-25 中国广核电力股份有限公司 Method and system for detecting abnormal version of software object
CN104461866B (en) * 2014-11-04 2017-08-29 中国广核电力股份有限公司 A software version of the object detecting method and an abnormal detection system
CN105978721A (en) * 2016-05-11 2016-09-28 中国农业银行股份有限公司 Method, device and system for monitoring operation state of services in clustering system
CN105978721B (en) * 2016-05-11 2019-04-12 中国农业银行股份有限公司 The methods, devices and systems of monitoring service operating status in a kind of group system
CN106598819A (en) * 2016-12-12 2017-04-26 世纪龙信息网络有限责任公司 Monitoring processing method and monitoring processing system used for loading client patch
CN106598819B (en) * 2016-12-12 2019-07-26 世纪龙信息网络有限责任公司 The monitor processing method and system that the load of client patch uses

Similar Documents

Publication Publication Date Title
US8726078B1 (en) Method and system for providing high availability to computer applications
US6564336B1 (en) Fault tolerant database for picture archiving and communication systems
US9270752B2 (en) Flexible remote data mirroring
US7213246B1 (en) Failing over a virtual machine
US20010056554A1 (en) System for clustering software applications
Huang et al. Software implemented fault tolerance: Technologies and experience
JP4870915B2 (en) Storage devices
EP2130126B1 (en) Self-managed processing device
EP1650653B1 (en) Remote enterprise management of high availability systems
US6266781B1 (en) Method and apparatus for providing failure detection and recovery with predetermined replication style for distributed applications in a network
US7809836B2 (en) System and method for automating bios firmware image recovery using a non-host processor and platform policy to select a donor system
US20060143497A1 (en) System, method and circuit for mirroring data
CN102782656B (en) Systems and methods for failing over cluster unaware applications in a clustered system
DE102006048115B4 (en) System and method for recording recoverable errors
US8156490B2 (en) Dynamic migration of virtual machine computer programs upon satisfaction of conditions
US7174547B2 (en) Method for updating and restoring operating software in an active region of a network element
US8484431B1 (en) Method and apparatus for synchronizing a physical machine with a virtual machine while the virtual machine is operational
US9424021B2 (en) Capturing updates to applications and operating systems
AU752846B2 (en) Method and apparatus for providing failure detection and recovery with predetermined degree of replication for distributed applications in a network
CN102314369B (en) Self-upgrade method for equipment in remote online monitoring system
US6324692B1 (en) Upgrade of a program
US7487343B1 (en) Method and apparatus for boot image selection and recovery via a remote management module
US7096381B2 (en) On-the-fly repair of a computer
US7194652B2 (en) High availability synchronization architecture
JP4345334B2 (en) Fault computer system, a program parallel execution method and program

Legal Events

Date Code Title Description
C06 Publication
C10 Request of examination as to substance
C12 Rejection of an application for a patent