WO2010135966A1 - Upgrade method and device for components in paired redundancy structure - Google Patents

Upgrade method and device for components in paired redundancy structure Download PDF

Info

Publication number
WO2010135966A1
WO2010135966A1 PCT/CN2010/073032 CN2010073032W WO2010135966A1 WO 2010135966 A1 WO2010135966 A1 WO 2010135966A1 CN 2010073032 W CN2010073032 W CN 2010073032W WO 2010135966 A1 WO2010135966 A1 WO 2010135966A1
Authority
WO
WIPO (PCT)
Prior art keywords
unit
upgrade
target device
redundant
code
Prior art date
Application number
PCT/CN2010/073032
Other languages
French (fr)
Chinese (zh)
Inventor
李晓初
Original Assignee
成都市华为赛门铁克科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 成都市华为赛门铁克科技有限公司 filed Critical 成都市华为赛门铁克科技有限公司
Publication of WO2010135966A1 publication Critical patent/WO2010135966A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/08Configuration management of networks or network elements
    • H04L41/0803Configuration setting
    • H04L41/0813Configuration setting characterised by the conditions triggering a change of settings
    • H04L41/082Configuration setting characterised by the conditions triggering a change of settings the condition being updates or upgrades of network functionality

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Stored Programmes (AREA)

Abstract

An upgrade method and device for components in a paired redundancy structure are provided in the embodiment of the present invention. The method includes: using the second redundancy unit of the redundancy pair structure to write an upgrade code into a target component of the first redundancy unit, which is paired redundantly with said second redundancy unit, and notifying the first redundancy unit to restart; checking whether the target component of the first redundancy unit is upgraded successfully; if unsuccessful, rewriting the code of the pre-upgrade version of the target component into the target component to restore same. The embodiment of the present invention separates the service function assignment from the maintenance function assignment of the component in the paired redundancy structure, thereby improving the reliability of the component upgrade without additional cost.

Description

成对冗余结构中器件的升级方法及设备 本申请要求于 2009 年 5 月 25 日提交中国专利局、 申请号为 200910141658.3、 发明名称为 "一种成对冗余结构中器件的升级方法及设备" 的中国专利申请的优先权, 其全部内容通过引用结合在本申请中。 技术领域  Method and device for upgrading a device in a paired redundant structure This application claims to be filed on May 25, 2009, the Chinese Patent Office, the application number is 200910141658.3, and the invention is entitled "A method and device for upgrading a device in a paired redundant structure" The priority of the Chinese Patent Application, the entire contents of which is incorporated herein by reference. Technical field
本发明涉及电子技术领域,具体地涉及一种成对冗余结构中器件的升级方 法及设备。 背景技术  The present invention relates to the field of electronic technologies, and in particular to a method and apparatus for upgrading devices in a pair of redundant structures. Background technique
在通信或存储等对系统可用性、可靠性要求很高的领域, 由很多工作节点 组成的系统内一般包含互为冗余的成对节点。 系统内的工作节点数一般为偶 数, 节点之间有支撑用户业务的通信通道。有的系统中在成冗余对的两个节点 之间有高可用性的低数据速率通道, 这种底层数据通道结构简单, 实现复杂度 低, 比如常见的 RS-232串口, JTAG调试口等。  In the field of communication or storage, which requires high system availability and reliability, a system consisting of many working nodes generally includes mutually redundant pairs of nodes. The number of working nodes in the system is generally even, and there are communication channels between the nodes to support user services. In some systems, there are high-availability low-data-rate channels between the two nodes in the redundant pair. The underlying data channel has a simple structure and low implementation complexity, such as a common RS-232 serial port and a JTAG debug port.
系统运行期内, 可能会在硬件上升级工作节点, 比如使用更高性能的单板 替换现有工作单板, 这种升级被称为硬升级。 另一种则是节点的软升级, 即工 作节点投入使用后在其工作期内,会需要代码的升级,升级的代码可能是系统 运行的程序代码, 也可能是各类可编程器件的逻辑代码。 本文涉及的是节点的 软升级。  During the system, the working node may be upgraded on the hardware. For example, a higher-performance board is used to replace the existing working board. This type of upgrade is called hard upgrade. The other is a soft upgrade of the node, that is, after the working node is put into use, it will need to upgrade the code during its working period. The upgraded code may be the program code of the system running, or the logic code of various programmable devices. . This article covers the soft upgrade of nodes.
对如前所述的由多个冗余对组成的系统,如果需要全部升级系统, 则系统 内成对的节点是不能同时升级的, 而是有先后, 这是因为在通信或者存储等领 域内, 这种冗余对承担的用户业务是不能随便中断或者根本不能中断的。如果 冗余对中第一节点升级失败, 那么第二节点就不能再 "冒险,, 进行升级了。  For a system consisting of multiple redundant pairs as described above, if all the systems need to be upgraded, the paired nodes in the system cannot be upgraded at the same time, but there are sequential, because in the fields of communication or storage. This kind of redundancy can not be interrupted or can not be interrupted at all. If the upgrade of the first node in the redundant pair fails, the second node can no longer "adventure," and upgrades.
冗余对中的每一工作节点为一个微控制器系统,一般包含但不限于如下部 分: 用于处理数据的 CPU、 存放系统运行代码的各种存储器、 收发数据的通 信接口、 基本输入输出系统 BIOS ( Basic Input Output System )、 与用户终端交 互的管理接口、各类用于收集板卡信息及处理底层复位中断信号等的可编程逻 辑器件 ( PLD, Programmable Logic Device )等。  Each working node in the redundant pair is a microcontroller system, which generally includes but is not limited to the following parts: CPU for processing data, various memories for storing system running code, communication interface for transmitting and receiving data, basic input/output system BIOS (Basic Input Output System), a management interface for interacting with user terminals, and various programmable logic devices (PLDs, Programmable Logic Devices) for collecting board information and processing underlying reset interrupt signals.
对冗余对中的节点,现有的升级方法为: 用户通过管理接口将升级所用的 代码传送给 CPU, 并下达升级命令。 CPU解析用户命令, 将代码按照可被具 体器件接受的格式, 通过相应的编程接口写入至存储或逻辑器件中。 For a node in a redundant pair, the existing upgrade method is: The user will upgrade through the management interface. The code is sent to the CPU and an upgrade command is issued. The CPU parses the user commands and writes the code to the storage or logic device through the corresponding programming interface in a format acceptable to the particular device.
从此过程中可以看出, 节点的升级都是由节点自身的 CPU完成的。  As can be seen from this process, the node upgrade is done by the node's own CPU.
在上述节点升级方法中, 如果由于存储器、可编程器件等目标器件的某种 问题, 使得 CPU确认了代码已经写入完毕而实际上写入的代码并不完整或者 含有错误, 这将导致在一次 CPU认为是升级完毕后的正常重启之后此单板挂 死。 而且由于此类目标器件在系统中扮演着很重要的角色, 如 PLD器件一般 要担当控制整个单板的上电、 复位等的关键功能, 因此这类目标器件的升级失 败将导致整个单板无法再次响应用户命令。  In the above node upgrade method, if the CPU confirms that the code has been written due to some problem of the target device such as the memory or the programmable device, the code actually written is not complete or contains an error, which will result in one time. The CPU considers that the board hangs after a normal restart after the upgrade. Moreover, because such target devices play an important role in the system, such as PLD devices generally have to play a key role in controlling the power-on and reset of the entire board, the upgrade failure of such target devices will result in the failure of the entire board. Respond to user commands again.
此节点升级失败导致冗余对的另一节点不能再 "冒险"升级, 否则将有可 能导致整个冗余对挂死而中断用户业务。  The failure of this node upgrade causes the other node of the redundant pair to no longer "adventure" the upgrade, otherwise it will cause the entire redundancy pair to hang and interrupt the user's business.
为了解决上上面的问题,现有的另一种技术为单板增加了一个错误恢复控 制器。该技术的节点升级方法和前面的技术一样,但是如果遇到了节点重启后 挂死的问题时,错误恢复控制器就会转入系统修复模式, 用默认值刷新目标器 件, 再次重启单板, 系统恢复正常。 该现有技术虽然解决了单板升级遇到底层 故障无法自修复的问题, 但是其引入了新的控制器模块, 增加了系统的成本, 也相应增加了系统的复杂度。 发明内容  In order to solve the above problem, another existing technique adds an error recovery controller to the board. The node upgrade method of this technology is the same as the previous technology, but if the problem of hanging after the node restarts is encountered, the error recovery controller will be transferred to the system repair mode, the target device is refreshed with the default value, and the board is restarted again. Back to normal. Although the prior art solves the problem that the board upgrade cannot be self-repaired due to the underlying fault, it introduces a new controller module, which increases the cost of the system and increases the complexity of the system. Summary of the invention
本发明提供一种成对冗余结构中器件的升级方法及设备,以减少现有技术 中存在的无法软修复和降低升级成本高的问题。  The invention provides a method and a device for upgrading a device in a paired redundant structure, so as to reduce the problem that the prior art cannot be softly repaired and the upgrade cost is high.
为了实现上述目的,本发明实施例提供一种成对冗余结构中器件的升级方 法, 该方法包括:  In order to achieve the above object, an embodiment of the present invention provides a method for upgrading a device in a paired redundant structure, the method comprising:
利用该冗余对结构中的第二冗余单元向互为冗余对的第一冗余单元的目 标器件中写入升级代码, 并通知所述第一冗余单元重启;  Writing, by the second redundant unit in the structure, the upgrade code to the target device of the first redundant unit that is a redundant pair, and notifying the first redundant unit to restart;
检查所述第一冗余单元的所述目标器件是否升级成功;  Checking whether the target device of the first redundant unit is successfully upgraded;
如果升级不成功, 将所述目标器件升级前的版本代码重新写入该目标器 件, 进行目标器件的恢复。  If the upgrade is unsuccessful, the version code of the target device before the upgrade is rewritten to the target device for recovery of the target device.
本发明还提供一种目标器件的升级设备,所述目标器件和升级设备分别位 于成冗余对的第一冗余单元和第二冗余单元中, 该升级设备包括: 代码写入单元, 用于向所述目标器件中写入升级代码, 并通知所述第一冗 余单元重启; The present invention also provides an upgrade device for a target device, where the target device and the upgrade device are respectively located In the first redundant unit and the second redundant unit of the redundant pair, the upgrading device includes: a code writing unit, configured to write an upgrade code to the target device, and notify the first redundancy Unit restart
检查单元, 用于检查所述目标器件是否升级成功;  An checking unit, configured to check whether the target device is successfully upgraded;
恢复单元, 用于在所述目标器件升级不成功时,将所述目标器件升级前的 版本代码重新写入该目标器件, 进行目标器件的恢复。  And a recovery unit, configured to rewrite the version code of the target device before the upgrade of the target device to the target device to perform recovery of the target device when the target device is unsuccessfully upgraded.
本发明实施例通过成冗余对结构的另一冗余单元来升级本冗余单元中的 目标器件, 既提高了器件升级的可靠性, 又降低了额外的升级成本。 附图说明  In the embodiment of the present invention, the target device in the redundant unit is upgraded by another redundant unit of the redundant pair structure, which not only improves the reliability of the device upgrade but also reduces the additional upgrade cost. DRAWINGS
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施 例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地, 下面描述 中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲, 在不付 出创造性劳动性的前提下, 还可以根据这些附图获得其他的附图。  In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the description of the prior art will be briefly described below. Obviously, the drawings in the following description are only It is a certain embodiment of the present invention, and other drawings can be obtained from those skilled in the art without any inventive labor.
图 1为本发明实施例 1中对器件升级的流程图;  1 is a flow chart of upgrading a device in Embodiment 1 of the present invention;
图 2为本发明实施例的成冗余对的本板和对板之间的交互示意图; 图 3为本发明实施例 2中对器件升级的流程图;  2 is a schematic diagram of interaction between a board and a pair of boards in a redundant pair according to an embodiment of the present invention; FIG. 3 is a flowchart of upgrading a device in Embodiment 2 of the present invention;
图 4为本发明一实施例中目标器件的升级设备的结构框图;  4 is a structural block diagram of an upgrade device of a target device according to an embodiment of the present invention;
图 5为本发明另一实施例中目标器件的升级设备的结构框图;  FIG. 5 is a structural block diagram of an upgrade device of a target device according to another embodiment of the present invention; FIG.
图 6为图 5中检查单元的结构框图。 具体实施方式  Figure 6 is a block diagram showing the structure of the inspection unit of Figure 5. detailed description
为使本发明的目的、技术方案和优点更加清楚, 下面结合附图对本发明的 具体实施例进行详细说明。在此, 本发明的示意性实施例及其说明用于解释本 发明, 但并不作为对本发明的限定。 实施例一  In order to make the objects, technical solutions and advantages of the present invention more comprehensible, the specific embodiments of the present invention will be described in detail below. The illustrative embodiments of the present invention and the description thereof are intended to be illustrative of the invention, but are not intended to limit the invention. Embodiment 1
本发明实施例提供一种成对冗余结构中器件的升级方法。如图 1所示, 该 方法包括如下步骤:  Embodiments of the present invention provide a method for upgrading a device in a paired redundant structure. As shown in Figure 1, the method includes the following steps:
步骤 110, 由该冗余对结构中的第二冗余单元向互为冗余对的第一冗余单 元的目标器件中写入升级代码, 并通知所述第一冗余单元重启。 Step 110: The second redundant unit in the redundant pair structure is the first redundant list that is mutually redundant pair The upgrade code is written in the target device of the element, and the first redundant unit is notified to restart.
本发明实施例中, 第一冗余单元和第二冗余单元为冗余对, 并且该第一冗 余单元和第二冗余单元既可以是构成工作节点的单板,也可以是一个单板或节 点中的互为冗余的模块。  In the embodiment of the present invention, the first redundant unit and the second redundant unit are redundant pairs, and the first redundant unit and the second redundant unit may be a single board forming a working node, or may be a single A redundant module in a board or node.
第二冗余单元将全部升级代码写入第一冗余单元的目标器件后,通知所述 第一冗余单元重启以使升级生效。  After the second redundancy unit writes the entire upgrade code to the target device of the first redundancy unit, it notifies the first redundant unit to restart to make the upgrade take effect.
本实施例中, 所述目标器件可以是 BIOS等系统模块, 也可以是可编程逻 辑器件(PLD ), 该可编程逻辑器件包括复杂的可编程逻辑器件 (CPLD )。  In this embodiment, the target device may be a system module such as a BIOS, or may be a programmable logic device (PLD), and the programmable logic device includes a complex programmable logic device (CPLD).
步骤 120, 检查所述第一冗余单元的所述目标器件是否升级成功。  Step 120: Check whether the target device of the first redundant unit is successfully upgraded.
正常情况下, 第一冗余单元重启后以新版本代码运行。但如果此时存在现 有技术中提到的写入的代码不完整或含有错误等问题,将导致第二冗余单元认 为升级成功了而实际上目标器件并未升级成功,即当第一冗余单元重启后即挂 死。 因此, 在通知第一冗余单元重启后, 要在设定时间内检查所述第一冗余单 元的所述目标器件是否升级成功,该设定的时间可以是系统正常重启的最大时 间, 但本发明并不限于此。  Normally, the first redundant unit is restarted and runs with the new version code. However, if there is a problem that the written code mentioned in the prior art is incomplete or contains errors, the second redundant unit thinks that the upgrade is successful and the target device is not upgraded successfully, that is, when the first redundancy is successful. After the unit is restarted, it hangs. Therefore, after the first redundant unit is notified to restart, it is necessary to check whether the target device of the first redundant unit is successfully upgraded within a set time, and the set time may be a maximum time for the system to restart normally, but The invention is not limited to this.
步骤 130, 如果升级不成功, 利用所述第二冗余单元将所述目标器件升级 前的版本代码重新写入第一冗余单元的该目标器件, 进行目标器件的恢复。  Step 130: If the upgrade is unsuccessful, the second redundant unit rewrites the version code before the target device upgrade to the target device of the first redundant unit, and performs recovery of the target device.
通过如上的对目标器件的升级方法,分割了成对冗余结构中器件的业务功 能归属和维护功能归属。在现有技术中, 冗余单元中器件的业务功能和维护功 能一般都归属到本冗余单元的控制器,而本发明实施例中每一冗余单元中器件 的业务功能归属在本冗余单元的控制器,而对该器件的维护功能却归属到成冗 余对的另一冗余单元上, 具体例如该另一冗余单元的控制器上。 这样位于第一 冗余单元的器件只为本冗余单元的业务服务,若由于这些器件的软故障导致系 统出现异常, 那第二冗余单元就可以对第一冗余单元的器件开展维护功能。这 样既提高了系统升级的可靠性, 又降低了实现复杂度, 且不增加额外的成本。 实施例二  The service function attribution and maintenance function assignment of the devices in the paired redundant structure are divided by the above method for upgrading the target device. In the prior art, the service function and the maintenance function of the device in the redundant unit are generally attributed to the controller of the redundant unit, and the service function of the device in each redundant unit belongs to the redundancy in the embodiment of the present invention. The controller of the unit, while the maintenance function of the device is attributed to another redundant unit that is a redundant pair, for example on the controller of the other redundant unit. The device located in the first redundant unit only serves the service of the redundant unit. If the system is abnormal due to the soft fault of these devices, the second redundant unit can perform maintenance functions on the device of the first redundant unit. . This not only improves the reliability of the system upgrade, but also reduces the implementation complexity without adding additional costs. Embodiment 2
本实施例另提出一种成对冗余结构中器件的升级方法。本实施例中, 成对 冗余结构为一对单板, 分别称为节点 A和节点 B, 待升级的单板称为本板, 即节点 A, 另一单板称为对板, 即节点 B。 This embodiment further proposes a method for upgrading a device in a paired redundant structure. In this embodiment, the paired redundant structure is a pair of boards, which are respectively called node A and node B. The board to be upgraded is called the board. That is, node A, another board is called a pair board, that is, node B.
图 2为本实施例的冗余对中对板升级本板的结构示意图。图 2中各节点内 部以及节点间的有向箭头代表单板升级时的数据流向,单板正常工作时的数据 流并未在图中标示。 由图 2中可以看出,位于节点 A上的 BIOS、 PLD等器件, 其编程通道实际上是由节点 B提供并控制的, 但正常状态下 BIOS、 PLD等器 件都为节点 A服务。 因此从业务功能来看, 节点 A上的 PLD等器件是归属于 节点 A的, 而从软升级功能, 即维护功能来看, 这些器件是归属于节点 B的。  FIG. 2 is a schematic structural diagram of the board of the redundant centering board for upgrading the board in the embodiment. The directional arrows in the nodes and the nodes in Figure 2 represent the data flow when the board is upgraded. The data flow when the board is working normally is not shown in the figure. It can be seen from Fig. 2 that the programming channels of the BIOS, PLD and other devices located on node A are actually provided and controlled by node B, but in the normal state, the BIOS, PLD and other devices serve node A. Therefore, from the perspective of service functions, devices such as PLDs on node A belong to node A, and these devices belong to node B in terms of soft upgrade function, that is, maintenance function.
如图 3所示, 本实施例的对节点 A中目标器件的软升级流程包括如下步 骤:  As shown in FIG. 3, the soft upgrade process of the target device in the node A in this embodiment includes the following steps:
步骤 310, 节点 B远程接收对节点 A的目标器件升级所用的代码数据包, 该代码数据包中可包含有目标器件当前版本的代码和新版本的代码,即升级代 码。  Step 310: The Node B remotely receives the code data packet used for upgrading the target device of the node A. The code data packet may include the code of the current version of the target device and the code of the new version, that is, the upgrade code.
当节点 B上未存储有节点 A的待升级目标器件的代码数据时, 执行该步 骤。 此时, 用户可通过节点 B的管理接口实现远程接入, 通过 FTP方式或其 他方式将升级目标器件的代码数据包上传至节点 B中。  This step is performed when the code data of the target device of the node A to be upgraded is not stored on the node B. At this time, the user can remotely access through the management interface of the node B, and upload the code data packet of the upgrade target device to the node B through the FTP method or other methods.
如果节点 B上预先存储有节点 A的待升级目标器件的代码数据, 该步骤 省略。  If the code data of the target device of the node A to be upgraded is previously stored on the node B, this step is omitted.
步骤 320, 可选地, 在要为节点 A的目标器件进行升级时, 用户可远程接 入节点 A, 为节点 A下达关闭业务的通知。  Step 320, optionally, when the target device of the node A is to be upgraded, the user can remotely access the node A to issue a notification for the node A to close the service.
本实施例中, 所述目标器件可以是 BIOS等系统模块, 也可以是可编程逻 辑器件(PLD ), 该可编程逻辑器件包括复杂的可编程逻辑器件 (CPLD )。  In this embodiment, the target device may be a system module such as a BIOS, or may be a programmable logic device (PLD), and the programmable logic device includes a complex programmable logic device (CPLD).
步骤 330, 节点 B通过与节点 A之间的通信接口向节点 A发送升级目标 器件的消息, 以通知节点 A将要升级节点 A上的目标器件, 如 PLD器件(包 括 CPLD器件)。  In step 330, the node B sends a message of upgrading the target device to the node A through the communication interface with the node A, to notify the node A that the target device on the node A, such as a PLD device (including the CPLD device), is to be upgraded.
在对节点 A的目标器件进行升级时, 用户可通过管理接口给节点 B下达 升级 A的目标器件的命令,该命令由节点 B与节点 A的通信接口向节点 A发 送, 通知节点 A将要升级节点 A上的目标器件。  When the target device of the node A is upgraded, the user can issue a command of the target device of the upgrade A to the node B through the management interface, and the command is sent by the communication interface between the node B and the node A to the node A, and the node A is notified that the node is to be upgraded. Target device on A.
步骤 340, 节点 A响应来自节点 B的消息, 关闭正在运行的业务, 所有 业务关闭完成后发送应答(ACK ) 消息给节点 B。 节点 A发送 ACK消息给节点 B, 表示升级前的准备就绪。 In step 340, the node A responds to the message from the node B, closes the running service, and sends an acknowledgement (ACK) message to the node B after all the services are closed. Node A sends an ACK message to Node B, indicating that it is ready before the upgrade.
步骤 350, 节点 B收到节点 A的已准备就绪的 ACK消息后, 将升级代码 写入节点 A的目标器件, 并通知节点 A重启。  Step 350: After receiving the ready ACK message of the node A, the node B writes the upgrade code to the target device of the node A, and notifies the node A to restart.
节点 B可通过与节点 A的编程通道, 按照目标器件接受的数据格式将升 级代码写入目标器件。 例如节点 B 通过 JTAG编程通道按照 JTAG规定的数 据格式将升级代码数据写入到节点 A的 CPLD器件中。  Node B can write the upgrade code to the target device according to the data format accepted by the target device through the programming channel of Node A. For example, Node B writes the upgrade code data to the CPLD device of Node A through the JTAG programming channel according to the data format specified by JTAG.
节点 B在确认升级代码完全写入节点 A的目标器件后, 通知节点 A重启 以使升级生效。 例如, 节点 B可通过编程通道, 如 JTAG编程通道, 校验升级 的代码是否写入完毕, 确认写入完毕后通过通信接口命令节点 A重启以使升 级生效。  After confirming that the upgrade code is completely written to the target device of node A, node B notifies node A to restart for the upgrade to take effect. For example, Node B can verify whether the upgraded code is written through a programming channel, such as the JTAG programming channel. After confirming that the write is completed, the node A is restarted through the communication interface to make the upgrade take effect.
步骤 360, 检查节点 A是否升级成功。  Step 360: Check whether node A is successfully upgraded.
正常情况下, 节点 A重启后以新版本代码运行。 但如果遇到写入的升级 代码不完整或含有错误等问题, 则会存在节点 B认为节点 A升级成功了而实 际上节点 A并未升级成功, 即当节点 A重启后即挂死。 因此, 在通知节点 A 重启后的设定时间内, 节点 B要检查节点 A是否升级成功。  Under normal circumstances, Node A restarts and runs with the new version code. However, if the upgrade code that is written is incomplete or contains errors, the node B thinks that the node A upgrade is successful and the node A is not upgraded successfully. That is, when the node A restarts, it hangs. Therefore, after notifying the set time after the node A restarts, the node B checks whether the node A is successfully upgraded.
在设定时间内检查节点 A升级是否成功的步骤具体可为: 节点 B釆用一 个超时定时策略, 即启动一个对板升级超时定时器, 例如定时 5分钟, 并每隔 若干秒向节点 A发送一次查询命令, 以查询节点 A的状态。 如果节点 A在运 行, 则节点 A在收到节点 B的查询命令后会向节点 B返回应答( ACK )消息。 正常情况下,节点 A会在超时定时器设定的时间到达之前发送 ACK应答消息 给节点 B, 这就证明升级成功。 如果在超时定时器设定的时间内节点 A没有 应答, 那么可以证明节点 A重启之后挂死了。  The step of checking whether the node A upgrade is successful within the set time may be: Node B uses a timeout timing policy, that is, starts a board upgrade timeout timer, for example, 5 minutes, and sends to node A every few seconds. A query command to query the status of node A. If node A is running, node A will return an acknowledgement (ACK) message to node B upon receiving a query command from node B. Under normal circumstances, Node A will send an ACK response message to Node B before the time set by the timeout timer expires, which proves that the upgrade is successful. If node A does not answer within the timeout period set by the timeout timer, it can be proved that node A hangs after rebooting.
因此本步骤中节点 B通过如下方式来检查节点 A是否升级成功: 如果在 超时定时器设定的时间 (如 5分钟)之内接收到节点 A返回的应答消息, 那 么证明节点 A已经升级成功。 如果在这段时间之内节点 A未响应, 那证明对 节点 A上目标器件的升级并未成功。 超时定时器设定的时间可以为系统正常 关闭、 重启至正常工作状态的最大时间。  Therefore, in this step, the node B checks whether the node A is successfully upgraded as follows: If the response message returned by the node A is received within the time set by the timeout timer (for example, 5 minutes), it is proved that the node A has been successfully upgraded. If Node A does not respond within this time, it proves that the upgrade to the target device on Node A was not successful. The timeout timer can be set to the maximum time for the system to shut down normally and restart to normal operation.
步骤 380, 如果节点 A上目标器件的升级不成功, 节点 B将所述目标器 件升级前的版本重新写入节点 A的该目标器件, 进行目标器件的恢复。 例如, 如果节点 B确认节点 A上目标器件的升级不成功, 则直接进入强 制恢复目标器件程序,将升级前的旧代码重新写入目标器件, 然后强制重启节 点 A。 Step 380, if the upgrade of the target device on the node A is unsuccessful, the node B rewrites the version before the target device upgrade to the target device of the node A, and performs recovery of the target device. For example, if Node B confirms that the upgrade of the target device on Node A is unsuccessful, it directly enters the forced recovery target device program, rewrites the old code before the upgrade to the target device, and then forcibly restarts Node A.
步骤 390, 可选地, 在重启节点 A后可釆用超时定时策略, 即启动超时定 时器, 检查目标器件是否恢复成功。  Step 390, optionally, after restarting the node A, the timeout timing policy may be used, that is, the timeout timer is started to check whether the target device is successfully restored.
步骤 400, 如果恢复不成功, 可再次执行步骤 380, 进行目标器件的恢复。 如果恢复若干次(如 2次)后节点 A仍然未响应, 则执行步骤 410。  Step 400, if the recovery is unsuccessful, step 380 may be performed again to perform recovery of the target device. If node A still does not respond after recovering several times (e.g., 2 times), step 410 is performed.
步骤 410, 节点 B可通过管理接口向用户告警。  Step 410: Node B can alert the user through the management interface.
在前述步骤 390中, 如果检查目标器件恢复成功, 则节点 B可再次执行 步骤 350, 重新进行节点 A中所述目标器件的升级。 此时步骤 380之前还包括 步骤 370, 以限定升级的次数。  In the foregoing step 390, if the check target device is successfully restored, the node B may perform step 350 again to perform the upgrade of the target device in the node A again. Step 380 is further included before step 380 to limit the number of upgrades.
步骤 370, 判断升级次数是否大于设定值, 如 2。 如果大于 2, 表示升级 2 次还不成功, 则说明系统可能有不能软修复的故障, 此时节点 B可通过管理 接口发送告警信息至用户, 以通知技术人员前往维护。  Step 370, determining whether the number of upgrades is greater than a set value, such as 2. If the value is greater than 2, it indicates that the upgrade is not successful. The system may have a fault that cannot be repaired. In this case, Node B can send alarm information to the user through the management interface to notify the technician to go to the maintenance.
如果节点 A升级成功, 可以按照与上述相同的步骤升级节点 B, 只不过 此时本版是节点 B, 对板则是节点 A。 在此不作赘述。  If node A is successfully upgraded, you can upgrade node B in the same way as above, except that this version is node B and the board is node A. I will not repeat them here.
本发明实施例的如上冗余对中对板升级本板的方法 ,分割了冗余对中器件 的业务功能归属和维护功能归属, 把平常都归属到一个控制器 (如节点 A的 CPU )的两个功能现在分别归属到成冗余对的本板和对板的两个独立的控制器 (如分别归属到节点 A的 CPU和节点 B的控制器)上, 位于本板的器件只为 本板业务服务, 若由于这些器件的软故障导致本板系统出现异常, 那另一个正 常工作的对板上的控制器就可以对本板的器件开展维护功能。这样可大幅提高 冗余对升级的可靠性, 又降低了实现复杂度且不增加额外的成本。  In the embodiment of the present invention, the method for upgrading the board in the redundant pairing board is to divide the service function attribution and the maintenance function attribution of the redundant centering device, and assign the usual to a controller (such as the CPU of the node A). The two functions are now assigned to the redundant board and the two independent controllers of the board (such as the controllers belonging to Node A and Node B respectively). The devices located on this board are only Board business service, if the board system is abnormal due to the soft failure of these devices, the controller on the other working board can perform maintenance functions on the board. This greatly increases the reliability of the redundancy for the upgrade, reducing the implementation complexity without adding additional costs.
并且, 本发明实施例的对器件的升级方法, 不仅限于成冗余对的单板和单 板之间,还同样适用于一个单板上成冗余对的模块与模块之间的互操作, 这将 很大程度地提高模块与模块的健壮性。  Moreover, the method for upgrading the device in the embodiment of the present invention is not limited to being between the redundant board and the board, and is also applicable to the interoperation between the module and the module in a redundant pair on a single board. This will greatly improve the robustness of modules and modules.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分步骤 可以通过程序来指令相关的硬件来完成,该程序可以存储于一计算机可读取存 储介质中, 比如 ROM/RAM、 磁碟、 光盘等。 实施例 3 It will be understood by those skilled in the art that all or part of the steps of implementing the above embodiments may be performed by a program to instruct related hardware, and the program may be stored in a computer readable storage medium such as a ROM/RAM or a disk. , CD, etc. Example 3
本实施例提供一种目标器件的升级设备,所述目标器件和升级设备分别位 于成冗余对的第一冗余单元和第二冗余单元中, 如图 4所示, 该升级设备 400 包括:  The embodiment provides an upgrade device of the target device, where the target device and the upgrade device are respectively located in the first redundant unit and the second redundant unit of the redundant pair. As shown in FIG. 4, the upgrade device 400 includes :
代码写入单元 410, 用于向所述目标器件中写入升级代码, 并通知所述第 一冗余单元重启。  The code writing unit 410 is configured to write an upgrade code to the target device, and notify the first redundant unit to restart.
检查单元 420 , 用于在所述第一冗余单元重启后设定时间内检查所述目标 器件是否升级成功。  The checking unit 420 is configured to check whether the target device is successfully upgraded within a set time after the restarting of the first redundant unit.
恢复单元 430 , 用于在所述目标器件升级不成功时, 将所述目标器件升级 前的版本代码重新写入该目标器件, 进行目标器件的恢复。  The recovery unit 430 is configured to rewrite the version code before the target device is upgraded to the target device to perform recovery of the target device when the target device is unsuccessfully upgraded.
本发明另一实施例中, 如图 5所示, 该升级设备 400还包括:  In another embodiment of the present invention, as shown in FIG. 5, the upgrade device 400 further includes:
获取单元 440 , 用于在代码写入单元向所述目标器件中写入升级代码前获 取所述目标器件的升级前的版本代码和所述升级代码。  The obtaining unit 440 is configured to obtain a pre-upgrade version code and the upgrade code of the target device before the code writing unit writes the upgrade code to the target device.
发送单元 450 , 用于在代码写入单元向所述目标器件中写入升级代码前向 所述第一冗余单元发送升级目标器件的命令。  The sending unit 450 is configured to send a command for upgrading the target device to the first redundant unit before the code writing unit writes the upgrade code to the target device.
本发明另一实施例中, 所述升级设备还包括:  In another embodiment of the present invention, the upgrading device further includes:
调用单元 460 , 用于在所述目标器件的恢复不成功时, 调用所述恢复单元 进行目标器件的恢复, 直至恢复成功或恢复达到设定的次数;  The calling unit 460 is configured to invoke the recovery unit to perform recovery of the target device when the recovery of the target device is unsuccessful, until the recovery is successful or the recovery reaches a set number of times;
报警单元 470 ,用于在恢复达到设定的次数后还未成功时,发出告警信息。 本发明实施例中, 所述报警单元也可以连接第一检查单元, 用于在对第一 冗余单元的目标器件进行若干次升级都不成功时, 发出报警信息。  The alarm unit 470 is configured to issue an alarm message when the recovery has not been successful after the set number of times has elapsed. In the embodiment of the present invention, the alarm unit may also be connected to the first checking unit, and is configured to issue an alarm message when the target device of the first redundant unit is not successfully upgraded several times.
本发明另一实施例中, 如图 6所示, 所述检查单元 420包括:  In another embodiment of the present invention, as shown in FIG. 6, the checking unit 420 includes:
查询单元 610 , 用于在所述第一冗余单元重启后, 定期向第一冗余单元发 送查询消息;  The query unit 610 is configured to periodically send a query message to the first redundant unit after the first redundant unit is restarted;
确认单元 620 , 如果在设定的时间内接收到第一冗余单元的应答消息, 则 确认升级成功, 否则确认升级失败。  The confirmation unit 620 confirms that the upgrade is successful if the response message of the first redundant unit is received within the set time, otherwise the upgrade fails.
本发明实施例的对目标器件的升级设备分割了成对冗余结构中器件的业 务功能归属和维护功能归属。在现有技术中, 冗余单元中器件的业务功能和维 护功能一般都归属到本冗余单元的控制器,而本发明实施例中每一冗余单元中 器件的业务功能归属在本冗余单元的控制器,而对该器件的维护功能却归属到 成冗余对的另一冗余单元上, 具体例如该另一冗余单元的控制器上。这样位于 第一冗余单元的器件只为本冗余单元的业务服务,若由于这些器件的软故障导 致系统出现异常, 那第二冗余单元就可以对第一冗余单元的器件开展维护功 能。这样可降低实现复杂度, 大幅提高冗余对升级的可靠性却不增加额外的成 本。 The upgrade device for the target device in the embodiment of the present invention divides the service function attribution and the maintenance function attribution of the devices in the paired redundant structure. In the prior art, the service function and the maintenance function of the device in the redundant unit are generally attributed to the controller of the redundant unit, and in each redundant unit in the embodiment of the present invention The service function of the device belongs to the controller of the redundant unit, and the maintenance function of the device belongs to another redundant unit of the redundant pair, for example, the controller of the other redundant unit. The device located in the first redundant unit only serves the service of the redundant unit. If the system is abnormal due to the soft fault of these devices, the second redundant unit can perform maintenance functions on the device of the first redundant unit. . This reduces implementation complexity and significantly increases redundancy for upgrade reliability without adding additional cost.
以上所述的具体实施例,对本发明的目的、技术方案和有益效果进行了进 一步详细说明, 所应理解的是, 以上所述仅为本发明的具体实施例而已, 并不 用于限定本发明的保护范围,凡在本发明的精神和原则之内,所做的任何修改、 等同替换、 改进等, 均应包含在本发明的保护范围之内。  The above described specific embodiments of the present invention are further described in detail, and it is to be understood that the foregoing description is only All modifications, equivalents, improvements, etc., made within the spirit and scope of the invention are intended to be included within the scope of the invention.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程 , 是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一计算 机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。 其中,所述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory, ROM )或随机存储记忆体(Random Access Memory, RAM )等。  A person skilled in the art can understand that all or part of the process of implementing the above embodiment method can be completed by a computer program to instruct related hardware, and the program can be stored in a computer readable storage medium, the program When executed, the flow of an embodiment of the methods as described above may be included. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), or a random access memory (RAM).
总之, 以上所述仅为本发明的较佳实施例而已, 并非用于限定本发明的保 护范围。凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等, 均应包含在本发明的保护范围之内。  In summary, the above description is only a preferred embodiment of the present invention and is not intended to limit the scope of the present invention. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and scope of the invention are intended to be included within the scope of the invention.

Claims

权 利 要 求 Rights request
1、 一种成对冗余结构中器件的升级方法, 其特征在于, 包括: A method for upgrading a device in a paired redundant structure, comprising:
利用该成对冗余结构中的第二冗余单元向互为冗余对的第一冗余单元的 目标器件中写入升级代码, 并通知所述第一冗余单元重启;  Writing, by the second redundant unit in the pair of redundant structures, an upgrade code to the target device of the first redundant unit that is a redundant pair, and notifying the first redundant unit to restart;
检查所述第一冗余单元的所述目标器件是否升级成功;  Checking whether the target device of the first redundant unit is successfully upgraded;
如果升级不成功,利用所述第二冗余单元将所述目标器件升级前的版本代 码重新写入该目标器件, 进行目标器件的恢复。  If the upgrade is unsuccessful, the second redundant unit is used to rewrite the version code before the target device upgrade to the target device to perform recovery of the target device.
2、 根据权利要求 1所述的方法, 其特征在于, 利用该冗余对结构中的第 二冗余单元向互为冗余对的第一冗余单元的目标器件中写入升级代码之前包 括:  2. The method according to claim 1, wherein the second redundant unit in the redundant pair is configured to include an upgrade code in the target device of the first redundant unit that is a redundant pair :
向所述第一冗余单元发送升级其目标器件的命令。  A command to upgrade its target device is sent to the first redundant unit.
3、 根据权利要求 1所述的方法, 其特征在于, 利用该冗余对结构中的第 二冗余单元向互为冗余对的第一冗余单元的目标器件中写入升级代码之前包 括:  3. The method according to claim 1, wherein the second redundant unit in the redundant pair is used to write the upgrade code to the target device of the first redundant unit that is a redundant pair :
第二冗余单元获取所述目标器件的升级前的版本代码和所述升级代码。 The second redundancy unit acquires the pre-upgrade version code of the target device and the upgrade code.
4、 根据权利要求 1所述的方法, 其特征在于, 还包括: 4. The method according to claim 1, further comprising:
检查所述目标器件的恢复是否成功;  Checking whether the recovery of the target device is successful;
如果不成功, 重复进行目标器件的恢复, 直至恢复成功或恢复达到设定的 次数;  If it is unsuccessful, repeat the recovery of the target device until the recovery is successful or the recovery reaches the set number of times;
如果恢复达到设定的次数后还未成功, 由该第二冗余单元发出告警信息。 If the recovery is not successful after reaching the set number of times, the second redundant unit sends an alarm message.
5、 根据权利要求 1所述的方法, 其特征在于, 检查所述第一冗余单元的 所述目标器件是否升级成功是指: The method according to claim 1, wherein checking whether the target device of the first redundant unit is upgraded successfully means:
在所述第一冗余单元重启后, 定期向第一冗余单元发送查询消息,如果在 设定的时间内接收到第一冗余单元的应答消息, 则确认升级成功; 否则确认升 级失败。  After the first redundant unit is restarted, the query message is periodically sent to the first redundant unit. If the response message of the first redundant unit is received within the set time, the upgrade is confirmed to be successful; otherwise, the upgrade fails.
6、 一种目标器件的升级设备, 其特征在于, 所述目标器件和升级设备分 别位于成冗余对的第一冗余单元和第二冗余单元中, 该升级设备包括:  An upgrade device of the target device, wherein the target device and the upgrade device are respectively located in a first redundant unit and a second redundant unit that are redundant pairs, and the upgrade device includes:
代码写入单元, 用于向所述目标器件中写入升级代码, 并通知所述第一冗 余单元重启; a code writing unit, configured to write an upgrade code to the target device, and notify the first redundancy The remaining unit is restarted;
检查单元, 用于检查所述目标器件是否升级成功;  An checking unit, configured to check whether the target device is successfully upgraded;
恢复单元, 用于在所述目标器件升级不成功时,将所述目标器件升级前的 版本代码重新写入该目标器件, 进行目标器件的恢复。  And a recovery unit, configured to rewrite the version code of the target device before the upgrade of the target device to the target device to perform recovery of the target device when the target device is unsuccessfully upgraded.
7、 根据权利要求 6所述的设备, 其特征在于, 还包括:  7. The device according to claim 6, further comprising:
发送单元,用于在代码写入单元向所述目标器件中写入升级代码前向所述 第一冗余单元发送升级目标器件的命令。  And a sending unit, configured to send a command to upgrade the target device to the first redundant unit before the code writing unit writes the upgrade code to the target device.
8、 根据权利要求 6所述的设备, 其特征在于, 还包括:  8. The device according to claim 6, further comprising:
获取单元,用于在代码写入单元向所述目标器件中写入升级代码前获取所 述目标器件的升级前的版本代码和所述升级代码。  And an obtaining unit, configured to obtain a pre-upgrade version code and the upgrade code of the target device before the code writing unit writes the upgrade code to the target device.
9、 根据权利要求 8所述的设备, 其特征在于, 还包括:  9. The device according to claim 8, further comprising:
调用单元, 用于在所述目标器件的恢复不成功时, 调用所述恢复单元进行 目标器件的恢复, 直至恢复成功或恢复达到设定的次数;  a calling unit, configured to invoke the recovery unit to perform recovery of the target device when the recovery of the target device is unsuccessful, until the recovery is successful or the recovery reaches a set number of times;
报警单元, 用于在恢复达到设定的次数后还未成功时, 发出告警信息。  The alarm unit is used to send an alarm message when the recovery has not been successful after the set number of times has elapsed.
10. 根据权利要求 8所述的设备, 其特征在于, 所述检查单元包括: 查询单元, 用于在所述第一冗余单元重启后, 定期向第一冗余单元发送查 询消息; The device according to claim 8, wherein the checking unit comprises: a query unit, configured to periodically send a query message to the first redundant unit after the first redundant unit is restarted;
确认单元, 如果在设定的时间内接收到第一冗余单元的应答消息, 则确认 升级成功, 否则确认升级失败。  The confirmation unit confirms that the upgrade is successful if the response message of the first redundant unit is received within the set time, otherwise the upgrade fails.
PCT/CN2010/073032 2009-05-25 2010-05-21 Upgrade method and device for components in paired redundancy structure WO2010135966A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN200910141658.3 2009-05-25
CN200910141658A CN101556542B (en) 2009-05-25 2009-05-25 Method and equipment for upgrading device in paired redundant structure

Publications (1)

Publication Number Publication Date
WO2010135966A1 true WO2010135966A1 (en) 2010-12-02

Family

ID=41174667

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2010/073032 WO2010135966A1 (en) 2009-05-25 2010-05-21 Upgrade method and device for components in paired redundancy structure

Country Status (2)

Country Link
CN (1) CN101556542B (en)
WO (1) WO2010135966A1 (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101556542B (en) * 2009-05-25 2012-10-03 成都市华为赛门铁克科技有限公司 Method and equipment for upgrading device in paired redundant structure
CN102346692A (en) * 2010-08-03 2012-02-08 深圳Tcl新技术有限公司 Verification method of IPTV (Internet Protocol Television) updating files
CN103595564A (en) * 2013-11-12 2014-02-19 浪潮集团有限公司 Method for restoring default settings through redundancy management units in cloud system
CN106372538A (en) * 2016-08-30 2017-02-01 苏州国芯科技有限公司 Firmware protection method based on SoC (System on Chip)
CN108762784A (en) * 2018-05-28 2018-11-06 青岛海信网络科技股份有限公司 A kind of traffic signal controlling machine remote upgrade method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040103340A1 (en) * 2002-11-21 2004-05-27 Texas Instruments Incorporated Upgrading of firmware with tolerance to failures
CN101101553A (en) * 2006-07-05 2008-01-09 乐金电子(昆山)电脑有限公司 Firmware upgrading restoring method
CN101114935A (en) * 2007-07-27 2008-01-30 华为技术有限公司 System upgrading method, upgrading system and monitoring entity
CN101556542A (en) * 2009-05-25 2009-10-14 成都市华为赛门铁克科技有限公司 Method and equipment for upgrading device in paired redundant structure

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040103340A1 (en) * 2002-11-21 2004-05-27 Texas Instruments Incorporated Upgrading of firmware with tolerance to failures
CN101101553A (en) * 2006-07-05 2008-01-09 乐金电子(昆山)电脑有限公司 Firmware upgrading restoring method
CN101114935A (en) * 2007-07-27 2008-01-30 华为技术有限公司 System upgrading method, upgrading system and monitoring entity
CN101556542A (en) * 2009-05-25 2009-10-14 成都市华为赛门铁克科技有限公司 Method and equipment for upgrading device in paired redundant structure

Also Published As

Publication number Publication date
CN101556542A (en) 2009-10-14
CN101556542B (en) 2012-10-03

Similar Documents

Publication Publication Date Title
JP6291248B2 (en) Firmware upgrade error detection and automatic rollback
CN109677455B (en) Train auxiliary driving system
US8713553B2 (en) Disk array apparatus and firmware update method therefor
US20050160257A1 (en) System and method for updating device firmware
JP5183542B2 (en) Computer system and setting management method
US11848889B2 (en) Systems and methods for improved uptime for network devices
WO2010135966A1 (en) Upgrade method and device for components in paired redundancy structure
CN111182033B (en) Method and equipment for restoring switch
CN103713925A (en) Method and device for avoiding service interruption of storage array in upgrading process
US7499987B2 (en) Deterministically electing an active node
WO2012001780A1 (en) System control device, information processing system, and data migration and restoration method for information processing system
TW200426571A (en) Policy-based response to system errors occurring during os runtime
WO2021101698A1 (en) Detecting and recovering from fatal storage errors
CN111917576B (en) Storage cluster control method and device, computer readable storage medium and processor
JP5879246B2 (en) Network relay device
CN111158963A (en) Server firmware redundancy starting method and server
WO2011158367A1 (en) Technology for updating active program
CN114860286A (en) CPLD (Complex programmable logic device) noninductive upgrading method, system, storage medium and equipment
Cisco Operational Traps
Cisco Operational Traps
Cisco Operational Traps
Cisco Operational Traps
Cisco Operational Traps
Cisco Operational Traps
Cisco Operational Traps

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 10780047

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112 (1) EPC

122 Ep: pct application non-entry in european phase

Ref document number: 10780047

Country of ref document: EP

Kind code of ref document: A1