CN111858187A - An electronic device and service switching method and device - Google Patents
An electronic device and service switching method and device Download PDFInfo
- Publication number
- CN111858187A CN111858187A CN201910353386.7A CN201910353386A CN111858187A CN 111858187 A CN111858187 A CN 111858187A CN 201910353386 A CN201910353386 A CN 201910353386A CN 111858187 A CN111858187 A CN 111858187A
- Authority
- CN
- China
- Prior art keywords
- controller
- chip
- extended processing
- main controller
- extended
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/16—Error detection or correction of the data by redundancy in hardware
- G06F11/20—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
- G06F11/202—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
- G06F11/2023—Failover techniques
- G06F11/2028—Failover techniques eliminating a faulty processor or activating a spare
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/16—Error detection or correction of the data by redundancy in hardware
- G06F11/20—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
- G06F11/202—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
- G06F11/2023—Failover techniques
- G06F11/2033—Failover techniques switching over of hardware resources
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Hardware Redundancy (AREA)
Abstract
本发明实施例提供了一种电子设备及业务切换方法、装置,该电子设备包括两个控制器,其中一个控制器为主控制器,另一个控制器为备控制器,在两个控制器未出现故障的情况下,由主控制器处理业务,备控制器检测主控制器是否故障,如果主控制器故障,备控制器切换为主控制器处理业务,在处理业务过程中,开启自身内网网口与该多个扩展处理芯片进行通信;可见,即便一个控制器故障,另一个控制器能够接管故障控制器的业务,并且能够与多个扩展处理芯片进行通信,进而该多个扩展处理芯片仍能进行数据处理,减少了业务中断的情况。
Embodiments of the present invention provide an electronic device and a service switching method and device. The electronic device includes two controllers, one of which is a master controller and the other a backup controller. In the event of a fault, the primary controller processes services, and the standby controller detects whether the primary controller is faulty. If the primary controller fails, the secondary controller switches to the primary controller to process services, and opens its own intranet during service processing. The network port communicates with the multiple extended processing chips; it can be seen that even if one controller fails, another controller can take over the services of the faulty controller, and can communicate with multiple extended processing chips, and then the multiple extended processing chips Data processing is still possible, reducing business interruptions.
Description
技术领域technical field
本发明涉及硬件设备技术领域,特别是涉及一种电子设备及业务切换方法、装置。The present invention relates to the technical field of hardware devices, and in particular, to an electronic device and a service switching method and device.
背景技术Background technique
目前,为了提高电子设备的数据处理能力,通常需要在设备中扩展多个处理芯片,这样,设备中的控制器便可以将待处理的业务数据发送至各处理芯片进行处理。比如,在智能分析设备中扩展多个GPU(Graphics Processing Unit,图形处理器),智能分析设备中的主板控制器获取到待处理的图像数据,将这些图像数据发送给扩展的多个GPU,由该多个GPU来对图像数据进行分析处理,并将处理结果反馈给主板控制器。At present, in order to improve the data processing capability of an electronic device, it is usually necessary to expand multiple processing chips in the device, so that the controller in the device can send the service data to be processed to each processing chip for processing. For example, expand multiple GPUs (Graphics Processing Unit, graphics processing unit) in the intelligent analysis device, the motherboard controller in the intelligent analysis device obtains the image data to be processed, and sends the image data to the expanded multiple GPUs. The multiple GPUs are used to analyze and process the image data, and feed back the processing results to the mainboard controller.
现有的设备中,通常安装有多个扩展处理芯片和一个控制器,如果该控制器发生故障,该控制器也就不能将业务数据发送至该多个扩展处理芯片,该多个扩展处理芯片也就无法对业务数据分析处理,导致业务中断。In the existing equipment, multiple expansion processing chips and a controller are usually installed. If the controller fails, the controller cannot send service data to the multiple expansion processing chips. It is impossible to analyze and process business data, resulting in business interruption.
发明内容SUMMARY OF THE INVENTION
本发明实施例的目的在于提供一种电子设备及业务切换方法、装置,以减少业务中断的情况。The purpose of the embodiments of the present invention is to provide an electronic device and a service switching method and apparatus, so as to reduce the situation of service interruption.
为达到上述目的,本发明实施例提供了一种电子设备,包括:主控制器、备控制器和多个扩展处理芯片;每个控制器均包括内网网口,每个控制器通过自身的内网网口与所述多个扩展处理芯片相连接;In order to achieve the above purpose, an embodiment of the present invention provides an electronic device, including: a main controller, a backup controller, and a plurality of extended processing chips; each controller includes an intranet network port, and each controller uses its own The intranet network port is connected with the plurality of expansion processing chips;
所述主控制器在处理业务过程中,通过所述主控制器的内网网口与所述多个扩展处理芯片进行通信;The main controller communicates with the plurality of extended processing chips through the intranet network port of the main controller during business processing;
所述备控制器检测主控制器是否故障;如果故障,所述备控制器切换为主控制器处理业务;切换后的主控制器在处理业务过程中开启自身内网网口;通过所述切换后的主控制器的内网网口与所述多个扩展处理芯片进行通信。The standby controller detects whether the main controller is faulty; if faulty, the standby controller switches to the main controller to process services; the switched main controller opens its own intranet network port in the process of processing services; The internal network port of the main controller communicates with the plurality of expansion processing chips.
可选的,所述设备还包括第一网络交换芯片和背板;Optionally, the device further includes a first network switch chip and a backplane;
所述第一网络交换芯片设置于所述背板上;每个控制器的内网网口均与所述第一网络交换芯片相连接;所述第一网络交换芯片通过所述背板上的第一类接口与所述多个扩展处理芯片相连接。The first network switch chip is arranged on the backplane; the internal network port of each controller is connected to the first network switch chip; the first network switch chip passes through the backplane. The first type of interface is connected with the plurality of expansion processing chips.
可选的,所述多个扩展处理芯片设置于托盘上,所述托盘上还设置有第二网络交换芯片,所述托盘插入所述背板上的第一类接口中,所述多个扩展处理芯片与所述第二网络交换芯片相连接,所述第二网络交换芯片与所述背板上的第一类接口相连接。Optionally, the plurality of expansion processing chips are arranged on a tray, a second network switching chip is further arranged on the tray, the tray is inserted into the first type interface on the backplane, and the plurality of expansion The processing chip is connected with the second network switch chip, and the second network switch chip is connected with the first type interface on the backplane.
可选的,所述多个扩展处理芯片设置于多个托盘上,每个托盘上均设置第二网络交换芯片以及一个或多个扩展处理芯片,每个托盘分别插入所述背板上的一个第一类接口中,每个扩展处理芯片分别与自身所在托盘上的第二网络交换芯片相连接,每个第二网络交换芯片分别与自身所在托盘插入的第一类接口相连接。Optionally, the multiple extended processing chips are arranged on multiple trays, each tray is provided with a second network switch chip and one or more extended processing chips, and each tray is respectively inserted into one of the backplanes. In the first type of interface, each expansion processing chip is respectively connected with the second network switch chip on its own tray, and each second network switch chip is respectively connected with the first type of interface inserted in its own tray.
可选的,所述设备还包括硬盘,所述硬盘与所述背板上的第二类接口相连接;每个控制器均包括适配器,每个控制器通过自身的适配器与所述背板上的第二类接口相连接。Optionally, the device further includes a hard disk, and the hard disk is connected to the second-type interface on the backplane; each controller includes an adapter, and each controller is connected to the backplane through its own adapter. The second type of interface is connected.
可选的,所述主控制器将所述多个扩展处理芯片对数据的处理状态参数写入所述硬盘的指定区域;若所述主控制器故障,则所述备控制器切换为主控制器处理业务之后,访问所述指定区域,获取所述处理状态参数,基于所述处理状态参数处理业务。Optionally, the main controller writes the data processing status parameters of the multiple expansion processing chips into a designated area of the hard disk; if the main controller fails, the standby controller switches to the main controller. After processing the service, the server accesses the designated area, obtains the processing state parameter, and processes the service based on the processing state parameter.
可选的,每个控制器均包括心跳网口,所述主控制器和所述备控制器之间通过所述心跳网口检测对端控制器是否故障。Optionally, each controller includes a heartbeat network port, and whether the peer controller is faulty is detected between the main controller and the standby controller through the heartbeat network port.
可选的,所述备控制器切换为主控制器处理业务、并开启自身内网网口之后,控制所述多个扩展处理芯片进行复位。Optionally, after the standby controller switches to the main controller to process services and opens its own intranet network port, it controls the plurality of extended processing chips to reset.
可选的,所述扩展处理芯片为GPU,所述第一类接口及所述第二类接口均为SFF-8639接口。Optionally, the extended processing chip is a GPU, and both the first type interface and the second type interface are SFF-8639 interfaces.
为达到上述目的,本发明实施例还提供了一种业务切换方法,应用于电子设备中的第一控制器,所述设备还包括第二控制器和多个扩展处理芯片,每个控制器均包括内网网口,每个控制器通过自身的内网网口与所述多个扩展处理芯片相连接;所述方法包括:In order to achieve the above object, an embodiment of the present invention further provides a service switching method, which is applied to a first controller in an electronic device, and the device further includes a second controller and a plurality of extended processing chips, each of which is Including an intranet network port, each controller is connected to the plurality of expansion processing chips through its own intranet network port; the method includes:
若所述第一控制器为主控制器,则在所述第一控制器处理业务过程中,通过所述第一控制器的内网网口与所述多个扩展处理芯片进行通信;If the first controller is the main controller, during the business process of the first controller, communicate with the plurality of expansion processing chips through the intranet network port of the first controller;
若所述第一控制器为备控制器,则检测作为主控制器的第二控制器是否故障;如果故障,所述第一控制器切换为主控制器处理业务;在处理业务过程中,开启所述第一控制器的内网网口;通过所述第一控制器的内网网口与所述多个扩展处理芯片进行通信。If the first controller is the standby controller, it is detected whether the second controller, which is the main controller, is faulty; if the first controller is faulty, the first controller switches to the main controller to process services; in the process of processing services, it is turned on The intranet network port of the first controller; communicates with the plurality of expansion processing chips through the intranet network port of the first controller.
可选的,所述通过所述第一控制器的内网网口与所述多个扩展处理芯片进行通信,包括:Optionally, the communicating with the multiple expansion processing chips through the intranet network port of the first controller includes:
通过所述第一控制器的内网网口,将待处理的业务数据发送至所述多个扩展处理芯片;Send the service data to be processed to the plurality of extended processing chips through the intranet network port of the first controller;
通过所述第一控制器的内网网口,接收所述多个扩展处理芯片发送的数据处理结果。The data processing results sent by the multiple expansion processing chips are received through the intranet network port of the first controller.
可选的,在所述通过所述第一控制器的内网网口与所述多个扩展处理芯片进行通信之后,还包括:Optionally, after communicating with the multiple expansion processing chips through the intranet network port of the first controller, the method further includes:
将所述多个扩展处理芯片对数据的处理状态参数写入预设存储区域;writing the data processing state parameters of the plurality of extended processing chips into a preset storage area;
在检测到作为主控制器的第二控制器故障的情况下,还包括:In the case where the failure of the second controller serving as the main controller is detected, it also includes:
通过访问所述预设存储区域,获取所述处理状态参数,基于所述处理状态参数处理业务。By accessing the preset storage area, the processing state parameter is acquired, and the service is processed based on the processing state parameter.
可选的,在检测到作为主控制器的第二控制器故障的情况下,并且在所述开启所述第一控制器的内网网口之后,还包括:Optionally, in the case where the failure of the second controller serving as the main controller is detected, and after the opening of the intranet network port of the first controller, the method further includes:
控制所述多个扩展处理芯片中的一个或多个扩展处理芯片进行复位。One or more extended processing chips in the plurality of extended processing chips are controlled to be reset.
为达到上述目的,本发明实施例还提供了一种业务切换装置,应用于电子设备中的第一控制器,所述设备还包括第二控制器和多个扩展处理芯片,每个控制器均包括内网网口,每个控制器通过自身的内网网口与所述多个扩展处理芯片相连接;所述装置包括:In order to achieve the above object, an embodiment of the present invention also provides a service switching device, which is applied to a first controller in an electronic device, and the device further includes a second controller and a plurality of extended processing chips, each of which is Including an intranet network port, each controller is connected to the plurality of expansion processing chips through its own intranet network port; the device includes:
第一处理模块,用于在所述第一控制器为主控制器的情况下,在所述第一控制器处理业务过程中,通过所述第一控制器的内网网口与所述多个扩展处理芯片进行通信;The first processing module is configured to, when the first controller is the main controller, communicate with the multiplexer through the internal network port of the first controller during the business process of the first controller. an expansion processing chip for communication;
第二处理模块,用于在所述第一控制器为备控制器的情况下,检测作为主控制器的第二控制器是否故障;如果故障,将所述第一控制器切换为主控制器处理业务;在处理业务过程中,开启所述第一控制器的内网网口;通过所述第一控制器的内网网口与所述多个扩展处理芯片进行通信。The second processing module is configured to detect whether the second controller serving as the main controller fails when the first controller is the standby controller; if the first controller fails, switch the first controller to the main controller Processing services; in the process of processing services, opening the intranet network port of the first controller; and communicating with the plurality of extended processing chips through the intranet network port of the first controller.
可选的,第一处理模块,还用于:Optionally, the first processing module is also used for:
在所述第一控制器为主控制器的情况下,通过所述第一控制器的内网网口,将待处理的业务数据发送至所述多个扩展处理芯片;通过所述第一控制器的内网网口,接收所述多个扩展处理芯片发送的数据处理结果;In the case that the first controller is the main controller, the service data to be processed is sent to the plurality of extended processing chips through the intranet network port of the first controller; through the first controller The internal network port of the device receives the data processing results sent by the multiple expansion processing chips;
所述第二处理模块,还用于:The second processing module is also used for:
在所述第一控制器为备控制器、并且到检测作为主控制器的第二控制器故障的情况下,通过所述第一控制器的内网网口,将待处理的业务数据发送至所述多个扩展处理芯片;通过所述第一控制器的内网网口,接收所述多个扩展处理芯片发送的数据处理结果。In the case that the first controller is the standby controller and the failure of the second controller, which is the main controller, is detected, the service data to be processed is sent to the intranet port of the first controller to the multiple extended processing chips; receive the data processing results sent by the multiple extended processing chips through the intranet network port of the first controller.
可选的,所述装置还包括:Optionally, the device further includes:
写入模块,用于在所述通过所述第一控制器的内网网口与所述多个扩展处理芯片进行通信之后,将所述多个扩展处理芯片对数据的处理状态参数写入预设存储区域;The writing module is configured to write the data processing state parameters of the multiple expansion processing chips into the pre-processor after the communication with the multiple expansion processing chips through the internal network port of the first controller. set storage area;
获取模块,用于在所述检测模块检测到主控制器故障的情况下,通过访问所述预设存储区域,获取所述处理状态参数,基于所述处理状态参数处理业务。An obtaining module, configured to obtain the processing state parameter by accessing the preset storage area when the detection module detects a failure of the main controller, and process services based on the processing state parameter.
可选的,所述装置还包括:Optionally, the device further includes:
复位模块,用于在检测到作为主控制器的第二控制器故障的情况下,并且在所述开启所述第一控制器的内网网口之后,控制所述多个扩展处理芯片中的一个或多个扩展处理芯片进行复位。The reset module is used to control the multiple expansion processing chips in the case of detecting the failure of the second controller as the main controller and after the opening of the internal network port of the first controller. One or more extended processing chips are reset.
本发明实施例提供的电子设备包括两个控制器,其中一个控制器为主控制器,另一个控制器为备控制器,在两个控制器未出现故障的情况下,由主控制器处理业务,备控制器检测主控制器是否故障,如果主控制器故障,备控制器切换为主控制器处理业务,在处理业务过程中,开启自身内网网口与该多个扩展处理芯片进行通信;可见,第一方面,即便一个控制器故障,另一个控制器能够接管故障控制器的业务,并且能够与多个扩展处理芯片进行通信,进而该多个扩展处理芯片仍能进行数据处理,减少了业务中断的情况。The electronic device provided by the embodiment of the present invention includes two controllers, wherein one controller is the main controller and the other controller is the standby controller. In the case that the two controllers are not faulty, the main controller processes the business , the standby controller detects whether the main controller is faulty. If the main controller fails, the standby controller switches to the main controller to process services. In the process of processing services, it opens its own intranet network port to communicate with the multiple expansion processing chips; It can be seen that, in the first aspect, even if one controller fails, another controller can take over the business of the faulty controller, and can communicate with multiple extended processing chips, and then the multiple extended processing chips can still perform data processing, reducing the need for business interruption.
第二方面,一些相关的主备切换方案中通常包括:主节点、备节点和计算节点,计算节点即为对业务数据进行分析处理的节点,在主节点未故障的情况下,由主节点对计算节点进行管理,主节点与计算节点之间进行通信,如果主节点故障,则由备节点接替主节点对计算节点进行管理,备节点与计算节点之间进行通信。主节点与备节点之间通过软件通信机制来进行主备切换,或者说,主节点与计算节点之间的通信链路在硬件上是连通的,备节点与计算节点之间的通信链路在硬件上也是连通的,主节点与备节点通过自身软件通信机制(自身业务逻辑)来控制自身是否与计算节点进行通信。这种方案中,主节点与备节点需要使用不同的地址与计算节点通信,否则会出现地址冲突的情况,这样,进行主备切换时,计算节点需要切换目的地址才能正常通信,这使得计算节点的逻辑较复杂。In the second aspect, some related active-standby switching solutions usually include: a master node, a backup node, and a computing node. The computing node is the node that analyzes and processes business data. The computing node manages, and the master node communicates with the computing node. If the master node fails, the standby node replaces the master node to manage the computing node, and the standby node communicates with the computing node. The master node and the backup node perform the master-slave switchover through the software communication mechanism. In other words, the communication link between the master node and the computing node is connected in hardware, and the communication link between the backup node and the computing node is in the The hardware is also connected, and the master node and the standby node control whether they communicate with the computing node through their own software communication mechanism (their own business logic). In this solution, the master node and the backup node need to use different addresses to communicate with the computing node, otherwise there will be address conflict. In this way, when the master and backup switch is performed, the computing node needs to switch the destination address to communicate normally, which makes the computing node logic is more complicated.
而本方案中,控制器通过在硬件上开启自身内网网口来控制与扩展处理芯片的通信,换句话说,只有一个控制器与扩展处理芯片之间的通信链路在硬件上是连通的,当一个控制器故障时,另一个控制器才连通自身与扩展处理芯片之间的硬件通信链路,这样,两个控制器使用相同的地址与扩展处理芯片通信,也不会出现地址冲突的情况,因而,扩展处理芯片不需要切换目的地址,简化了扩展处理芯片的逻辑,并且实现了扩展处理芯片无感知情况下的主备切换。In this solution, the controller controls the communication with the extended processing chip by opening its own internal network port on the hardware. In other words, only one communication link between the controller and the extended processing chip is connected on the hardware. , when one controller fails, the other controller connects the hardware communication link between itself and the expansion processing chip, so that the two controllers use the same address to communicate with the expansion processing chip, and there will be no address conflict. Therefore, the extended processing chip does not need to switch the destination address, which simplifies the logic of the extended processing chip, and realizes the active-standby switching without the extended processing chip sensing.
第三方面,上述一些主备切换的方案中,为减少主节点与备节点地址冲突的情况,采用的手段为:由计算节点切换目的地址;另一些相关的主备切换方案中,为减少主节点与备节点地址冲突的情况,采用的手段为:在主节点未故障的情况下,主节点与备节点使用不同的地址,如果主节点故障,则备节点将自身地址修改为主节点的地址,这也称为地址漂移,也就是主节点的地址漂移至备节点;这种方案中,计算节点不需要切换目的地址,但是备节点需要切换自身地址,这使得备节点的逻辑较复杂。In the third aspect, in some of the above active-standby switching solutions, in order to reduce the address conflict between the master node and the standby node, the means used are: the computing node switches the destination address; In the case of address conflict between the node and the standby node, the method used is: when the master node is not faulty, the master node and the standby node use different addresses. If the master node fails, the standby node changes its own address to the address of the master node. , which is also called address drift, that is, the address of the master node drifts to the standby node; in this solution, the computing node does not need to switch the destination address, but the standby node needs to switch its own address, which makes the logic of the standby node more complicated.
而本方案中,控制器通过在硬件上开启自身内网网口来控制与扩展处理芯片的通信,换句话说,只有一个控制器与扩展处理芯片之间的通信链路在硬件上是连通的,当一个控制器故障时,另一个控制器才连通自身与扩展处理芯片之间的硬件通信链路,这样,两个控制器使用相同的地址与扩展处理芯片通信,也不会出现地址冲突的情况,因而,备控制器不需要切换目的地址,简化了备控制器的逻辑。In this solution, the controller controls the communication with the extended processing chip by opening its own internal network port on the hardware. In other words, only one communication link between the controller and the extended processing chip is connected on the hardware. , when one controller fails, the other controller connects the hardware communication link between itself and the expansion processing chip, so that the two controllers use the same address to communicate with the expansion processing chip, and there will be no address conflict. Therefore, the standby controller does not need to switch the destination address, which simplifies the logic of the standby controller.
当然,实施本发明的任一产品或方法并不一定需要同时达到以上所述的所有优点。Of course, it is not necessary for any product or method of the present invention to achieve all of the advantages described above at the same time.
附图说明Description of drawings
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to explain the embodiments of the present invention or the technical solutions in the prior art more clearly, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments of the present invention. For those of ordinary skill in the art, other drawings can also be obtained according to these drawings without creative efforts.
图1为本发明实施例提供的电子设备的第一种结构示意图;FIG. 1 is a first structural schematic diagram of an electronic device provided by an embodiment of the present invention;
图2为本发明实施例提供的一种控制器的结构示意图;2 is a schematic structural diagram of a controller according to an embodiment of the present invention;
图3为本发明实施例提供的电子设备的第二种结构示意图;3 is a schematic diagram of a second structure of an electronic device provided by an embodiment of the present invention;
图4为本发明实施例提供的电子设备的第三种结构示意图;4 is a third schematic structural diagram of an electronic device provided by an embodiment of the present invention;
图5为本发明实施例提供的电子设备的第四种结构示意图;5 is a schematic diagram of a fourth structure of an electronic device provided by an embodiment of the present invention;
图6为本发明实施例提供的电子设备的第五种结构示意图;6 is a fifth structural schematic diagram of an electronic device provided by an embodiment of the present invention;
图7为本发明实施例提供的一种业务切换方法的流程示意图;FIG. 7 is a schematic flowchart of a service switching method according to an embodiment of the present invention;
图8为本发明实施例提供的一种业务切换装置的结构示意图。FIG. 8 is a schematic structural diagram of a service switching apparatus according to an embodiment of the present invention.
具体实施方式Detailed ways
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, but not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.
为了解决上述技术问题,本发明实施例提供了一种电子设备及业务切换方法、装置,下面首先对本发明实施例提供的电子设备进行详细介绍。In order to solve the above technical problems, the embodiments of the present invention provide an electronic device and a service switching method and apparatus. The electronic device provided by the embodiments of the present invention is first introduced in detail below.
图1为本发明实施例提供的电子设备的第一种结构示意图,包括:主控制器100、备控制器200和多个扩展处理芯片300(扩展处理芯片1、扩展处理芯片2……扩展处理芯片N,N为大于1的正整数),每个控制器均包括内网网口,每个控制器通过自身的内网网口与该多个扩展处理芯片相连接。为了方便描述,将主控制器的内网网口记为110,将备控制器的内网网口记为210。1 is a schematic diagram of a first structure of an electronic device provided by an embodiment of the present invention, including: a
主控制器100在处理业务过程中,通过主控制器的内网网口110与该多个扩展处理芯片300进行通信;备控制器200检测主控制器100是否故障,如果故障,备控制器200切换为主控制器处理业务;切换后的主控制器(原备控制器)在处理业务过程中开启自身内网网口210,并通过该内网网口210与该多个扩展处理芯片300进行通信。The
一般来说,控制器获取待处理的业务数据,并将业务数据发送给多个扩展处理芯片,每个扩展处理芯片对自身接收到的业务数据进行处理,并将处理结果反馈给控制器。本发明实施例中,控制器与扩展处理芯片的通信过程可以包括:控制器向扩展处理芯片发送业务数据,扩展处理芯片向控制器发送数据处理结果;此外,控制器还可以控制扩展处理芯片的复位、启动,控制器还可以监测扩展处理芯片对数据的处理状态,等等,具体不做限定。Generally speaking, the controller obtains the service data to be processed, and sends the service data to a plurality of extended processing chips. Each extended processing chip processes the service data received by itself, and feeds back the processing result to the controller. In the embodiment of the present invention, the communication process between the controller and the extended processing chip may include: the controller sends service data to the extended processing chip, and the extended processing chip sends data processing results to the controller; in addition, the controller may also control the extended processing chip. The controller can also monitor the data processing status of the extended processing chip during reset, startup, and so on, which is not specifically limited.
本实施例中提供的电子设备包括两个控制器:主控制器和备控制器,一种情况下,主控制器与备控制器在硬件上可以为相同的部件。比如图2所示,主控制器100中可以包括主板以及设置在主板上的控制器电路;控制器电路包括:CPU(Central Processing Unit,中央处理器)、内存、HBA(Host Bus Adapter,主机总线适配器)、提供外网网口的外网网卡、提供内网网口110的内网网卡和提供心跳网口的心跳网卡;备控制器200中可以包括主板以及设置在主板上的控制器电路;控制器电路包括:CPU、内存、HBA、提供外网网口的外网网卡、提供内网网口210的内网网卡和提供心跳网口的心跳网卡。其中,内网网口用于电子设备内部通信,比如,控制器通过内网网口与扩展处理芯片进行通信;外网网口用于电子设备与外部设备的通信,比如,电子设备通过外网网口获取待处理的业务数据;HBA可以与硬盘相连接,并与硬盘进行数据交互;两个控制器的心跳网口相连接,两个控制器之间可以通过心跳网口检测对方是否故障。The electronic device provided in this embodiment includes two controllers: a main controller and a standby controller. In one case, the main controller and the standby controller may be the same component in hardware. For example, as shown in FIG. 2 , the
本实施例中,电子设备中的一个控制器用来处理业务,另一个控制器作为备份。将处理业务的控制器称为主控制器,作为备份的控制器为备控制器,两者可以进行主备切换。在主控制器100处理业务过程中,主控制器100与该多个扩展处理芯片300进行通信,这种情况下,备控制器的内网网口210关闭,备控制器200不能与该多个扩展处理芯片300进行通信;如果主控制器故障,则备控制器200切换为主控制器处理业务;在处理业务过程中,启动自身内网网口210与该多个扩展处理芯片300进行通信。或者说,如果主控制器故障,备控制器200接替主控制器100处理业务。In this embodiment, one controller in the electronic device is used to process services, and the other controller is used as a backup. The controller that processes the service is called the main controller, and the controller that serves as the backup is the backup controller, and the two can perform active-standby switchover. During the business process of the
举例来说,在电子设备启动时,两个控制器可以通过心跳网口协商哪个控制器成为主控制器。比如,在电子设备启动时,两个控制器通常会有上电后启动完成速度快慢之分,上电后启动完成快的控制器(或者说先完成启动的控制器)可以作为主控制器,另一个控制器则作为备控制器。具体来说,控制器上电启动完成后可以通过心跳网口检测对端控制器是否上电启动完成,如果未完成上电启动,则该已上电启动完成的控制器成为主控制器。For example, when the electronic device starts up, the two controllers can negotiate which controller becomes the master controller through the heartbeat network port. For example, when an electronic device starts up, the two controllers usually have a difference in the speed of startup completion after power-on. The controller that starts up quickly after power-on (or the controller that completes startup first) can be used as the main controller. The other controller acts as the standby controller. Specifically, after the controller is powered on and started, the heartbeat network port can be used to detect whether the peer controller is powered on and started. If the power-on and start-up is not completed, the controller that has been powered on and started becomes the master controller.
每个控制器中均配置有两种业务逻辑,一种为主控制器业务逻辑,另一种为备控制器业务逻辑。对于每个控制器来说,该控制器判断自身是主控制器还是备控制器;如果不存在,则判定自身为主控制器,进入主控制器业务逻辑;如果存在,则判定自身为备控制器,进入备控制器业务逻辑。主控制器业务逻辑也就是:处理业务,并且在处理业务过程中,开启自身内网网口,并通过自身内网网口与该多个扩展处理芯片进行通信;备控制器业务逻辑也就是:检测主控制器是否故障,如果故障,切换为主控制器处理业务;在处理业务过程中,开启自身内网网口,并通过自身内网网口与该多个扩展处理芯片进行通信。Each controller is configured with two kinds of business logic, one is the business logic of the primary controller, and the other is the business logic of the standby controller. For each controller, the controller determines whether it is the main controller or the standby controller; if it does not exist, it determines that it is the main controller and enters the business logic of the main controller; if it exists, it determines that it is the standby controller. controller, enter the business logic of the standby controller. The business logic of the main controller is to process the business, and in the process of processing the business, it opens its own intranet network port, and communicates with the multiple expansion processing chips through its own intranet network port; the business logic of the standby controller is: Detect whether the main controller is faulty, and if it fails, switch to the main controller to process services; in the process of processing services, open its own internal network port, and communicate with the multiple expansion processing chips through its own internal network port.
一种实施方式中,如图1及图2所示,每个控制器均包括心跳网口,主控制器100和备控制器200之间通过心跳网口检测对端控制器是否故障。In one embodiment, as shown in FIG. 1 and FIG. 2 , each controller includes a heartbeat network port, and whether the peer controller is faulty is detected between the
扩展处理芯片也就是不同于控制器内部的芯片,或者说是在控制器外部扩展出来的处理芯片。举例来说,本实施例中的扩展处理芯片可以为GPU、CPU、APU(AcceleratedProcessing Unit,加速处理器)、或者TPU(Tensor Processing Unit,张量处理单元)等等,具体不做限定。扩展处理芯片的数量可以根据实际情况进行设定,比如3-9个、10-20个等等,具体数量不做限定。The extended processing chip is different from the chip inside the controller, or a processing chip extended outside the controller. For example, the extended processing chip in this embodiment may be a GPU, a CPU, an APU (Accelerated Processing Unit, accelerated processor), or a TPU (Tensor Processing Unit, tensor processing unit), etc., which is not specifically limited. The number of expansion processing chips can be set according to the actual situation, such as 3-9, 10-20, etc. The specific number is not limited.
如图1及图2所示,每个控制器中可以包括一个内网网口,控制器可以开启或关闭自身的内网网口。As shown in FIG. 1 and FIG. 2 , each controller may include an intranet port, and the controller may open or close its own intranet port.
一种情况下,主控制器与备控制器可以使用相同的IP(Internet Protocol,互联网协议)地址;举例来说,在主控制器正常工作的过程中,主控制器采用该IP地址与扩展处理芯片进行通信;假设主控制器故障,备控制器通过心跳网口检测到主控制器故障后,备控制器切换为主控制器,并启动自身网口,切换后的主控制器(原备控制器)仍使用该IP地址与扩展处理芯片进行通信。这样,对于扩展处理芯片来说,通信对象的IP地址未发生变化,扩展处理芯片不需要执行切换通信对象IP地址的操作。In one case, the main controller and the standby controller can use the same IP (Internet Protocol, Internet Protocol) address; for example, in the process of the main controller working normally, the main controller The chip communicates; if the main controller fails, the standby controller detects the failure of the main controller through the heartbeat network port, and the standby controller switches to the main controller and starts its own network port. server) still use this IP address to communicate with the expansion processing chip. In this way, for the expansion processing chip, the IP address of the communication object does not change, and the expansion processing chip does not need to perform the operation of switching the IP address of the communication object.
一种实施方式中,备控制器200切换为主控制器处理业务、并开启自身内网网口210之后,控制该多个扩展处理芯片300进行复位。In one embodiment, after the
举例来说,假设存在三个扩展处理芯片:扩展处理芯片1、扩展处理芯片2和扩展处理芯片3;假设主控制器获取到15张待处理的图像,主控制器将第1,4,7,10,13张图像发送至扩展处理芯片1,将第2,5,8,11,14张图像发送至扩展处理芯片2,将第3,6,9,12,15张图像发送至扩展处理芯片3;假设主控制器出现故障,此时扩展处理芯片1正在对第7张图像进行处理,扩展处理芯片2正在对第8张图像进行处理,扩展处理芯片3正在对第9张图像进行处理;备控制器200通过心跳网口检测到主控制器100故障后,备控制器200切换为主控制器处理业务、并开启自身内网网口210,然后控制这三个扩展处理芯片进行复位;复位后,扩展处理芯片1重新对第7张图像进行处理,扩展处理芯片2重新对第8张图像进行处理,扩展处理芯片3重新对第9张图像进行处理。For example, suppose there are three extended processing chips: extended processing chip 1, extended processing chip 2, and extended processing chip 3; assuming that the main controller obtains 15 images to be processed, the main controller will , 10, 13 images are sent to extended processing chip 1, 2, 5, 8, 11, 14 images are sent to extended processing chip 2, 3, 6, 9, 12, 15 images are sent to extended processing Chip 3; Suppose the main controller fails, at this time the extended processing chip 1 is processing the seventh image, the extended processing chip 2 is processing the eighth image, and the extended processing chip 3 is processing the ninth image After the
上述例子中,如果扩展处理芯片对某张图像进行处理的过程中,切换了控制器,很有可能干扰扩展处理芯片对该张图像的处理,导致该张图像的处理结果错误;本实施方式中,切换控制器之后,对扩展处理芯片进行复位,重新对该张图像进行处理,减少了这种处理结果错误的情况,提高了处理结果的准确率。In the above example, if the controller is switched in the process of processing an image by the extended processing chip, it is likely to interfere with the processing of the image by the extended processing chip, resulting in an error in the processing result of the image; in this embodiment, , after switching the controller, the extended processing chip is reset, and the image is processed again, which reduces the error of the processing result and improves the accuracy of the processing result.
图2中,每个控制器包括一块独立的主板,这种情况下,两个控制器的主板可以通过一块背板相连接。或者,在其他实施例中,两个控制器电路也可以设置于同一块主板上,具体不做限定。In Figure 2, each controller includes an independent mainboard, in this case, the mainboards of the two controllers can be connected through a backplane. Alternatively, in other embodiments, the two controller circuits may also be provided on the same motherboard, which is not specifically limited.
作为一种实施方式,参考图3,电子设备中还可以包括第一网络交换芯片400和背板500;第一网络交换芯片400设置于背板500上;每个控制器的内网网口均与第一网络交换芯片400相连接;第一网络交换芯片400通过背板500上的第一类接口与该多个扩展处理芯片300相连接。As an embodiment, referring to FIG. 3 , the electronic device may further include a first
背板中可以设置有多个接口,举例来说,该接口可以为SFF-8639接口,该接口可以直接与硬盘相连接,也可以通过网络交换芯片与扩展处理芯片相连接,为了区分描述,本实施方式中,将背板中与网络交换芯片相连接的接口称为第一类接口。There can be multiple interfaces on the backplane. For example, the interface can be an SFF-8639 interface, which can be directly connected to the hard disk, or can be connected to the expansion processing chip through a network switch chip. In the embodiment, the interface connected to the network switch chip in the backplane is referred to as the first type of interface.
本实施方式中,控制器与第一网络交换芯片相连接;第一网络交换芯片设置于背板上,因而第一网络交换芯片可以与背板上的第一类接口进行数据交互;背板上的第一类接口与扩展处理芯片相连接;这样,控制器启动自身内网网口,便可以与扩展处理芯片之间便可以进行通信。In this embodiment, the controller is connected to the first network switch chip; the first network switch chip is arranged on the backplane, so the first network switch chip can perform data interaction with the first type interface on the backplane; The first type of interface of the controller is connected with the extended processing chip; in this way, the controller can start its own intranet network port, and can communicate with the extended processing chip.
一种情况下,背板可以具备独立供电的功能,或者说,背板可以连接独立的供电电源,为两个控制器、扩展处理芯片、第一网络交换芯片等器件供电。这样,如果主控制器故障,背板的供电功能不受影响,扩展处理芯片、备控制器、第一网络交换芯片等器件可以继续工作。In one case, the backplane may have an independent power supply function, or in other words, the backplane may be connected to an independent power supply to supply power to two controllers, an expansion processing chip, a first network switch chip, and other devices. In this way, if the main controller fails, the power supply function of the backplane is not affected, and devices such as the expansion processing chip, the backup controller, and the first network switch chip can continue to work.
仍参考图3,本实施方式可以理解为,两个控制器通过背板变成一个整体控制器,背板中的第一类接口可以理解为该整体控制器提供的接口。Still referring to FIG. 3 , in this embodiment, it can be understood that two controllers become an integral controller through the backplane, and the first type of interface in the backplane can be understood as an interface provided by the integral controller.
作为一种实施方式,电子设备中包括的多个扩展处理芯片300设置于托盘600上,托盘600上还设置有第二网络交换芯片700,托盘600插入背板500上的第一类接口中,多个扩展处理芯片300与第二网络交换芯片700相连接,第二网络交换芯片700与背板500上的第一类接口相连接。As an embodiment, the plurality of
为了区分描述,将背板上的网络交换芯片称为第一网络交换芯片,将托盘上的网络交换芯片称为第二网络交换芯片。托盘的数量可以为一个或多个,具体数量不做限定。To distinguish the descriptions, the network switch chip on the backplane is called the first network switch chip, and the network switch chip on the tray is called the second network switch chip. The number of trays can be one or more, and the specific number is not limited.
本实施方式中,控制器与第一网络交换芯片相连接;第一网络交换芯片设置于背板上,因而第一网络交换芯片可以与背板上的第一类接口进行数据交互;背板上的第一类接口与托盘中的第二网络交换芯片相连接;托盘中的第二网络交换芯片与扩展处理芯片相连接;这样,控制器启动自身内网网口,便可以与扩展处理芯片之间便可以进行通信。In this embodiment, the controller is connected to the first network switch chip; the first network switch chip is arranged on the backplane, so the first network switch chip can perform data interaction with the first type interface on the backplane; The first type of interface is connected with the second network switch chip in the tray; the second network switch chip in the tray is connected with the expansion processing chip; in this way, the controller starts its own internal network port and can communicate with the expansion processing chip. communication is possible.
一种实施方式中,该多个扩展处理芯片300设置于多个托盘600上,每个托盘600上均设置第二网络交换芯片700以及一个或多个扩展处理芯片300,每个托盘600分别插入背板500上的一个第一类接口中,每个扩展处理芯片300分别与自身所在托盘600上的第二网络交换芯片700相连接,每个第二网络交换芯片700分别与自身所在托盘600插入的第一类接口相连接。In one embodiment, the plurality of
参考图4,假设电子设备中包括多个托盘600,假设扩展处理芯片为GPU,每个托盘中设置有两个GPU和一个第二网络交换芯片,每个托盘插入一个第一类接口中。Referring to FIG. 4 , it is assumed that the electronic device includes
本实施方式可以应用于扩展处理芯片较多的场景,换句话说,如果扩展处理芯片较多,则可以将这些扩展处理芯片划分为多个组,每组扩展处理芯片设置于同一个托盘中,该托盘中还设置有第二网络交换芯片,该托盘中的扩展处理芯片均通过该第二网络交换芯片与背板上的第一类接口相连接。This embodiment can be applied to a scenario where there are many extended processing chips. In other words, if there are many extended processing chips, these extended processing chips can be divided into multiple groups, and each group of extended processing chips is arranged in the same tray. The tray is also provided with a second network switch chip, and the expansion processing chips in the tray are all connected to the first type interface on the backplane through the second network switch chip.
一种实施方式中,电子设备还可以包括硬盘800,硬盘800与背板500上的第二类接口相连接;每个控制器均包括适配器,每个控制器通过自身的适配器与背板500上的第二类接口相连接。为了方便描述,将主控制器的适配器记为120,将备控制器的适配器记为220。In one embodiment, the electronic device may further include a
举例来说,电子设备可以为存储设备,电子设备中可以包括多个硬盘。如上所述,背板中可以设置有多个接口,该接口可以为SFF-8639接口,该接口可以直接与硬盘相连接,也可以通过网络交换芯片与扩展处理芯片相连接,为了区分描述,将背板中与网络交换芯片相连接的接口称为第一类接口,将背板中与硬盘相连接的接口称为第二类接口。For example, the electronic device may be a storage device, and the electronic device may include multiple hard disks. As mentioned above, the backplane can be provided with multiple interfaces, and the interface can be an SFF-8639 interface, which can be directly connected to the hard disk, or can be connected to the expansion processing chip through a network switch chip. In order to distinguish the description, the The interface connected with the network switch chip in the backplane is called the first type interface, and the interface connected with the hard disk in the backplane is called the second type interface.
上述适配器可以为HBA(Host Bus Adapter,主机总线适配器),一种情况下,控制器的HBA可以直接通过第二类接口与硬盘相连接。参考图5,每个控制器的HBA(120和220)均与背板上的第二类接口进行数据交互;背板上的第二类接口与硬盘800相连接;这样,控制器的HBA与硬盘之间便可以进行数据交互。The above-mentioned adapter may be an HBA (Host Bus Adapter, host bus adapter). In one case, the HBA of the controller may be directly connected to the hard disk through the second type of interface. Referring to FIG. 5, the HBAs (120 and 220) of each controller perform data interaction with the second-type interface on the backplane; the second-type interface on the backplane is connected to the
或者,另一种情况下,如果硬盘数量较多,可以在背板中设置扩展组件900,例如:可以是扩展硬盘数量使用的expander芯片。参考图6,每个控制器的HBA(120和220)均与该扩展组件900相连接,该扩展组件900可以与背板上的第二类接口进行数据交互;背板上的第二类接口与硬盘800相连接;这样,控制器的HBA与硬盘之间便可以进行数据交互。Or, in another case, if the number of hard disks is large, an
一种实施方式中,主控制器100将多个扩展处理芯片300对数据的处理状态参数写入硬盘800的指定区域;若主控制器100故障,则备控制器200切换为主控制器处理业务之后,访问所述指定区域,获取所述处理状态参数,基于所述处理状态参数处理业务。In one embodiment, the
如上所述,控制器可以监测扩展处理芯片对数据的处理状态,本实施方式中,主控制器将监测得到的扩展处理芯片对数据的处理状态参数写入硬盘中。延续上述例子(每个扩展处理芯片对5张图像进行处理的例子),以扩展处理芯片1为例,扩展处理芯片1处理完第M(1-15之间的正整数)张图像后,将第M张图像的处理结果反馈给控制器,控制器可以记录扩展处理芯片1的处理状态参数,该处理状态参数携带了“扩展处理芯片1对第M张图像处理完成”的信息。As described above, the controller can monitor the data processing status of the extended processing chip. In this embodiment, the main controller writes the monitored data processing status parameters of the extended processing chip into the hard disk. Continuing the above example (an example in which each extended processing chip processes 5 images), taking extended processing chip 1 as an example, after extended processing chip 1 has processed the Mth (positive integer between 1-15) image, The processing result of the M th image is fed back to the controller, and the controller can record the processing state parameter of the extended processing chip 1, and the processing state parameter carries the information that the extended processing chip 1 has completed the processing of the M th image.
主控制器可以周期性或者非周期性地将所记录的处理状态参数写入硬盘的指定区域,该指定区域可以理解为两个控制器的共享区域,两个控制器均可以访问该指定区域。比如,控制器可以每分钟将所记录的处理状态参数写入一次硬盘的指定区域,这样,当主控制器故障后,备控制器切换为主控制器后,可以从硬盘的指定区域中读取该处理状态参数。这一过程可以理解为控制器中业务的迁移过程,另一个控制器基于读取到的处理状态参数,接管正在运行的业务,使得业务可以继续运行,扩展处理芯片可以继续进行数据处理。The main controller can periodically or aperiodically write the recorded processing state parameters into a designated area of the hard disk, and the designated area can be understood as a shared area of the two controllers, and both controllers can access the designated area. For example, the controller can write the recorded processing status parameters into a designated area of the hard disk once every minute, so that when the main controller fails, after the standby controller is switched to the main controller, the data can be read from the designated area of the hard disk. Process state parameters. This process can be understood as the migration process of the business in the controller. Another controller takes over the running business based on the read processing status parameters, so that the business can continue to run, and the extended processing chip can continue to process data.
本发明实施例提供的电子设备包括两个控制器,其中一个控制器为主控制器,另一个控制器为备控制器,在两个控制器未出现故障的情况下,由主控制器处理业务,备控制器检测主控制器是否故障,如果主控制器故障,备控制器切换为主控制器处理业务,在处理业务过程中,开启自身内网网口与该多个扩展处理芯片进行通信;可见,本方案中,第一方面,即便一个控制器故障,另一个控制器能够接管故障控制器的业务,并且能够与多个扩展处理芯片进行通信,进而该多个扩展处理芯片仍能进行数据处理,减少了业务中断的情况;第二方面,控制器通过自身内网网口来控制与扩展处理芯片的通信,不需要为扩展处理芯片分配外部IP地址,节省了外部IP资源,简化了配置;第三方面,控制器通过自身内网网口来控制与扩展处理芯片的通信,将扩展处理芯片与外网隔离,提高了安全性;第四方面,一般来说,内网通信地址是固定的,而外网IP地址是用户分配的,这样,对于设备来说,内网通信地址比外网通信地址更便于管理。The electronic device provided by the embodiment of the present invention includes two controllers, wherein one controller is the main controller and the other controller is the standby controller. In the case that the two controllers are not faulty, the main controller processes the business , the standby controller detects whether the main controller is faulty. If the main controller fails, the standby controller switches to the main controller to process services. In the process of processing services, it opens its own intranet network port to communicate with the multiple expansion processing chips; It can be seen that in this solution, in the first aspect, even if one controller fails, another controller can take over the business of the faulty controller, and can communicate with multiple extended processing chips, and then the multiple extended processing chips can still process data. processing, reducing the situation of business interruption; secondly, the controller controls the communication with the expansion processing chip through its own internal network port, and does not need to allocate an external IP address for the expansion processing chip, which saves external IP resources and simplifies the configuration. In the third aspect, the controller controls the communication with the expansion processing chip through its own intranet network port, which isolates the expansion processing chip from the external network and improves the security; in the fourth aspect, generally speaking, the intranet communication address is fixed. , and the external network IP address is assigned by the user. In this way, for the device, the internal network communication address is more convenient to manage than the external network communication address.
一些相关的主备切换方案中通常包括:主节点、备节点和计算节点,计算节点即为对业务数据进行分析处理的节点,在主节点未故障的情况下,由主节点对计算节点进行管理,主节点与计算节点之间进行通信,如果主节点故障,则由备节点接替主节点对计算节点进行管理,备节点与计算节点之间进行通信。主节点与备节点之间通过软件通信机制来进行主备切换,或者说,主节点与计算节点之间的通信链路在硬件上是连通的,备节点与计算节点之间的通信链路在硬件上也是连通的,主节点与备节点通过自身软件通信机制(自身业务逻辑)来控制自身是否与计算节点进行通信。这种方案中,主节点与备节点需要使用不同的地址与计算节点通信,否则会出现地址冲突的情况,这样,进行主备切换时,计算节点需要切换目的地址才能正常通信,这使得计算节点的逻辑较复杂。Some related active-standby switching solutions usually include: a master node, a standby node, and a computing node. The computing node is the node that analyzes and processes business data. When the master node does not fail, the master node manages the computing node. , the master node communicates with the computing node. If the master node fails, the backup node replaces the master node to manage the computing node, and the backup node communicates with the computing node. The master node and the backup node perform the master-slave switchover through the software communication mechanism. In other words, the communication link between the master node and the computing node is connected in hardware, and the communication link between the backup node and the computing node is in the The hardware is also connected, and the master node and the standby node control whether they communicate with the computing node through their own software communication mechanism (their own business logic). In this solution, the master node and the backup node need to use different addresses to communicate with the computing node, otherwise there will be address conflict. In this way, when the master and backup switch is performed, the computing node needs to switch the destination address to communicate normally, which makes the computing node logic is more complicated.
而本方案中,控制器通过在硬件上开启自身内网网口来控制与扩展处理芯片的通信,换句话说,只有一个控制器与扩展处理芯片之间的通信链路在硬件上是连通的,当一个控制器故障时,另一个控制器才连通自身与扩展处理芯片之间的硬件通信链路,这样,两个控制器使用相同的地址与扩展处理芯片通信,也不会出现地址冲突的情况,因而,扩展处理芯片不需要切换目的地址,简化了扩展处理芯片的逻辑,并且实现了扩展处理芯片无感知情况下的主备切换。In this solution, the controller controls the communication with the extended processing chip by opening its own internal network port on the hardware. In other words, only one communication link between the controller and the extended processing chip is connected on the hardware. , when one controller fails, the other controller connects the hardware communication link between itself and the expansion processing chip, so that the two controllers use the same address to communicate with the expansion processing chip, and there will be no address conflict. Therefore, the extended processing chip does not need to switch the destination address, which simplifies the logic of the extended processing chip, and realizes the active-standby switching without the extended processing chip sensing.
上述一些主备切换的方案中,为减少主节点与备节点地址冲突的情况,采用的手段为:由计算节点切换目的地址;另一些相关的主备切换方案中,为减少主节点与备节点地址冲突的情况,采用的手段为:在主节点未故障的情况下,主节点与备节点使用不同的地址,如果主节点故障,则备节点将自身地址修改为主节点的地址,这也称为地址漂移,也就是主节点的地址漂移至备节点;这种方案中,计算节点不需要切换目的地址,但是备节点需要切换自身地址,这使得备节点的逻辑较复杂。In some of the above active-standby switching schemes, in order to reduce the address conflict between the master node and the standby node, the means used are: the computing node switches the destination address; In the case of address conflict, the method used is: when the master node is not faulty, the master node and the backup node use different addresses. If the master node fails, the backup node changes its own address to the address of the master node, which is also called the address of the master node. In this solution, the computing node does not need to switch the destination address, but the standby node needs to switch its own address, which makes the logic of the standby node more complicated.
而本方案中,控制器通过在硬件上开启自身内网网口来控制与扩展处理芯片的通信,换句话说,只有一个控制器与扩展处理芯片之间的通信链路在硬件上是连通的,当一个控制器故障时,另一个控制器才连通自身与扩展处理芯片之间的硬件通信链路,这样,两个控制器使用相同的地址与扩展处理芯片通信,也不会出现地址冲突的情况,因而,备控制器不需要切换目的地址,简化了备控制器的逻辑。In this solution, the controller controls the communication with the extended processing chip by opening its own internal network port on the hardware. In other words, only one communication link between the controller and the extended processing chip is connected on the hardware. , when one controller fails, the other controller connects the hardware communication link between itself and the expansion processing chip, so that the two controllers use the same address to communicate with the expansion processing chip, and there will be no address conflict. Therefore, the standby controller does not need to switch the destination address, which simplifies the logic of the standby controller.
本发明实施例还提供了一种业务切换方法及装置,该方法及装置可以应用于电子设备中的第一控制器,该设备还包括第二控制器和多个扩展处理芯片,每个控制器均包括内网网口,每个控制器通过自身的内网网口与所述多个扩展处理芯片相连接。下面参考图7对该业务切换方法进行介绍。Embodiments of the present invention further provide a service switching method and apparatus. The method and apparatus can be applied to a first controller in an electronic device. The device further includes a second controller and a plurality of extended processing chips. Each controller All include intranet ports, and each controller is connected to the plurality of extended processing chips through its own intranet ports. The service switching method will be described below with reference to FIG. 7 .
图7为本发明实施例提供的一种业务切换方法的流程示图,包括:FIG. 7 is a flowchart of a service switching method provided by an embodiment of the present invention, including:
S701:判断第一控制器是否为主控制器;如果是,执行S702,如果否,执行S703。S701: Determine whether the first controller is the main controller; if yes, go to S702, and if not, go to S703.
S701中的第一控制器即为方法实施例的执行主体,第一控制器可以为电子设备中的任意一个控制器。The first controller in S701 is the execution body of the method embodiment, and the first controller may be any controller in the electronic device.
举例来说,在电子设备启动时,两个控制器可以通过心跳网口协商哪个控制器成为主控制器。比如,在电子设备启动时,两个控制器通常会有上电后启动完成速度快慢之分,上电后启动完成快的控制器(或者说先完成启动的控制器)可以作为主控制器,另一个控制器则作为备控制器。具体来说,控制器上电启动完成后可以通过心跳网口检测对端控制器是否上电启动完成,如果未完成上电启动,则该已上电启动完成的控制器成为主控制器。For example, when the electronic device starts up, the two controllers can negotiate which controller becomes the master controller through the heartbeat network port. For example, when an electronic device starts up, the two controllers usually have a difference in the speed of startup completion after power-on. The controller that starts up quickly after power-on (or the controller that completes startup first) can be used as the main controller. The other controller acts as the standby controller. Specifically, after the controller is powered on and started, the heartbeat network port can be used to detect whether the peer controller is powered on and started. If the power-on and start-up is not completed, the controller that has been powered on and started becomes the master controller.
每个控制器中均配置有两种业务逻辑,一种为主控制器业务逻辑,另一种为备控制器业务逻辑。对于每个控制器来说,该控制器判断自身是主控制器还是备控制器,如果是主控制器,则进入主控制器业务逻辑;如果是备控制器,则进入备控制器业务逻辑。主控制器业务逻辑也就是执行S702,备控制器业务逻辑也就是执行S703-S704。Each controller is configured with two kinds of business logic, one is the business logic of the primary controller, and the other is the business logic of the standby controller. For each controller, the controller determines whether it is the main controller or the standby controller. If it is the main controller, it enters the main controller business logic; if it is the standby controller, it enters the standby controller business logic. The business logic of the main controller is to execute S702, and the business logic of the standby controller is to execute S703-S704.
S702:在第一控制器处理业务过程中,通过第一控制器的内网网口与该多个扩展处理芯片进行通信。S702: During the service processing process of the first controller, communicate with the plurality of extended processing chips through the intranet network port of the first controller.
S702以及后续S704可以包括:在第一控制器处理业务过程中,通过第一控制器的内网网口,将待处理的业务数据发送至所述多个扩展处理芯片;通过第一控制器的内网网口,接收所述多个扩展处理芯片发送的数据处理结果。S702 and subsequent S704 may include: in the process of processing the service by the first controller, sending the service data to be processed to the plurality of extended processing chips through the intranet network port of the first controller; The intranet network port receives the data processing results sent by the multiple expansion processing chips.
一般来说,控制器获取待处理的业务数据,并将业务数据发送给多个扩展处理芯片,每个扩展处理芯片对自身接收到的业务数据进行处理,并将处理结果反馈给控制器。本发明实施例中,控制器与扩展处理芯片的通信过程可以包括:控制器向扩展处理芯片发送业务数据,扩展处理芯片向控制器发送数据处理结果;此外,控制器还可以控制扩展处理芯片的复位、启动,控制器还可以监测扩展处理芯片对数据的处理状态,等等,具体不做限定。Generally speaking, the controller obtains the service data to be processed, and sends the service data to a plurality of extended processing chips. Each extended processing chip processes the service data received by itself, and feeds back the processing result to the controller. In the embodiment of the present invention, the communication process between the controller and the extended processing chip may include: the controller sends service data to the extended processing chip, and the extended processing chip sends data processing results to the controller; in addition, the controller may also control the extended processing chip. The controller can also monitor the data processing status of the extended processing chip during reset, startup, and so on, which is not specifically limited.
S703:检测作为主控制器的第二控制器是否故障;如果故障,执行S704,如果未故障,则继续检测。S703: Detect whether the second controller serving as the main controller is faulty; if it is faulty, execute S704, and if not, continue to detect.
举例来说,每个控制器均可以包括心跳网口,主控制器和备控制器之间通过心跳网口检测对端控制器是否故障。For example, each controller may include a heartbeat network port, and whether the peer controller is faulty is detected between the master controller and the standby controller through the heartbeat network port.
S704:切换为主控制器处理业务;在处理业务过程中,开启第一控制器的内网网口;通过第一控制器的内网网口与该多个扩展处理芯片进行通信。S704: Switch to the main controller to process services; in the process of processing the services, open the intranet network port of the first controller; communicate with the plurality of expansion processing chips through the intranet network port of the first controller.
本实施例中提供的电子设备包括两个控制器,一种情况下,这两个控制器在硬件上可以为相同的部件。比如图2所示,主控制器100中可以包括主板以及设置在主板上的控制器电路;控制器电路包括:CPU(Central Processing Unit,中央处理器)、内存、HBA(HostBus Adapter,主机总线适配器)、提供外网网口的外网网卡、提供内网网口110的内网网卡和提供心跳网口的心跳网卡;备控制器200中可以包括主板以及设置在主板上的控制器电路;控制器电路包括:CPU、内存、HBA、提供外网网口的外网网卡、提供内网网口210的内网网卡和提供心跳网口的心跳网卡。其中,内网网口用于电子设备内部通信,比如,控制器通过内网网口与扩展处理芯片进行通信;外网网口用于电子设备与外部设备的通信,比如,电子设备通过外网网口获取待处理的业务数据;HBA可以与硬盘相连接,并与硬盘进行数据交互;两个控制器的心跳网口相连接,两个控制器之间可以通过心跳网口检测对方是否故障。The electronic device provided in this embodiment includes two controllers. In one case, the two controllers may be the same component in hardware. For example, as shown in FIG. 2 , the
本实施例中,电子设备中的一个控制器用来处理业务,另一个控制器作为备份。将处理业务的控制器称为主控制器,作为备份的控制器为备控制器,两者可以进行主备切换。在主控制器处理业务过程中,主控制器与该多个扩展处理芯片进行通信,这种情况下,备控制器的内网网口关闭,备控制器不能与该多个扩展处理芯片进行通信;如果主控制器故障,则备控制器切换为主控制器处理业务;在处理业务过程中,启动自身内网网口与该多个扩展处理芯片进行通信。或者说,如果主控制器故障,备控制器接替主控制器处理业务。In this embodiment, one controller in the electronic device is used to process services, and the other controller is used as a backup. The controller that processes the service is called the main controller, and the controller that serves as the backup is the backup controller, and the two can perform active-standby switchover. During the business process of the main controller, the main controller communicates with the multiple extended processing chips. In this case, the internal network port of the standby controller is closed, and the standby controller cannot communicate with the multiple extended processing chips. ; If the main controller fails, the standby controller switches to the main controller to process services; in the process of processing services, it starts its own intranet network port to communicate with the multiple expansion processing chips. In other words, if the primary controller fails, the backup controller takes over the primary controller to process services.
在S704之后,第一控制器可以控制该多个扩展处理芯片中的一个或多个扩展处理芯片进行复位。After S704, the first controller may control one or more extended processing chips of the plurality of extended processing chips to reset.
举例来说,假设存在三个扩展处理芯片:扩展处理芯片1、扩展处理芯片2和扩展处理芯片3;假设主控制器获取到15张待处理的图像,主控制器将第1,4,7,10,13张图像发送至扩展处理芯片1,将第2,5,8,11,14张图像发送至扩展处理芯片2,将第3,6,9,12,15张图像发送至扩展处理芯片3;假设主控制器出现故障,此时扩展处理芯片1正在对第7张图像进行处理,扩展处理芯片2正在对第8张图像进行处理,扩展处理芯片3正在对第9张图像进行处理;备控制器200通过心跳网口检测到主控制器100故障后,备控制器200切换为主控制器处理业务、并开启自身内网网口210,然后控制这三个扩展处理芯片进行复位;复位后,扩展处理芯片1重新对第7张图像进行处理,扩展处理芯片2重新对第8张图像进行处理,扩展处理芯片3重新对第9张图像进行处理。For example, suppose there are three extended processing chips: extended processing chip 1, extended processing chip 2, and extended processing chip 3; assuming that the main controller obtains 15 images to be processed, the main controller will , 10, 13 images are sent to extended processing chip 1, 2, 5, 8, 11, 14 images are sent to extended processing chip 2, 3, 6, 9, 12, 15 images are sent to extended processing Chip 3; Suppose the main controller fails, at this time the extended processing chip 1 is processing the seventh image, the extended processing chip 2 is processing the eighth image, and the extended processing chip 3 is processing the ninth image After the
上述例子中,如果扩展处理芯片对某张图像进行处理的过程中,切换了控制器,很有可能干扰扩展处理芯片对该张图像的处理,导致该张图像的处理结果错误;本实施方式中,切换控制器之后,对扩展处理芯片进行复位,重新对该张图像进行处理,减少了这种处理结果错误的情况,提高了处理结果的准确率。In the above example, if the controller is switched in the process of processing an image by the extended processing chip, it is likely to interfere with the processing of the image by the extended processing chip, resulting in an error in the processing result of the image; in this embodiment, , after switching the controller, the extended processing chip is reset, and the image is processed again, which reduces the error of the processing result and improves the accuracy of the processing result.
一种实施方式中,在S702和/或S704之后,第一控制器可以将所述多个扩展处理芯片对数据的处理状态参数写入预设存储区域;这种实施方式,在检测到作为主控制器的第二控制器故障的情况下,可以通过访问所述预设存储区域,获取所述处理状态参数,基于所述处理状态参数处理业务。In an embodiment, after S702 and/or S704, the first controller may write the data processing state parameters of the plurality of extended processing chips into a preset storage area; When the second controller of the controller fails, the processing state parameter may be acquired by accessing the preset storage area, and the service may be processed based on the processing state parameter.
如上所述,控制器可以监测扩展处理芯片对数据的处理状态,本实施方式中,主控制器将监测得到的扩展处理芯片对数据的处理状态参数写入硬盘中。延续上述例子(每个扩展处理芯片对5张图像进行处理的例子),以扩展处理芯片1为例,扩展处理芯片1处理完第M(1-15之间的正整数)张图像后,将第M张图像的处理结果反馈给控制器,控制器可以记录扩展处理芯片1的处理状态参数,该处理状态参数携带了“扩展处理芯片1对第M张图像处理完成”的信息。As described above, the controller can monitor the data processing status of the extended processing chip. In this embodiment, the main controller writes the monitored data processing status parameters of the extended processing chip into the hard disk. Continuing the above example (an example in which each extended processing chip processes 5 images), taking extended processing chip 1 as an example, after extended processing chip 1 has processed the Mth (positive integer between 1-15) image, The processing result of the M th image is fed back to the controller, and the controller can record the processing state parameter of the extended processing chip 1, and the processing state parameter carries the information that the extended processing chip 1 has completed the processing of the M th image.
主控制器可以周期性或者非周期性地将所记录的处理状态参数写入硬盘的指定区域,该指定区域可以理解为两个控制器的共享区域,两个控制器均可以访问该指定区域。比如,主控制器可以每分钟将所记录的处理状态参数写入一次硬盘的指定区域,这样,当主控制器故障后,备控制器切换为主控制器后,可以从硬盘的指定区域中读取该处理状态参数。这一过程可以理解为控制器中业务的迁移过程,一个控制器故障后,另一个控制器基于读取到的处理状态参数,接管正在运行的业务,使得业务可以继续运行,也就是扩展处理芯片继续进行数据处理。The main controller can periodically or aperiodically write the recorded processing state parameters into a designated area of the hard disk, and the designated area can be understood as a shared area of the two controllers, and both controllers can access the designated area. For example, the main controller can write the recorded processing status parameters into the designated area of the hard disk once every minute, so that when the main controller fails, after the standby controller switches to the main controller, it can read from the designated area of the hard disk. The processing state parameter. This process can be understood as the migration process of the business in the controller. After one controller fails, another controller takes over the running business based on the read processing status parameters, so that the business can continue to run, that is, expanding the processing chip Continue with data processing.
延续上述例子,假设作为主控制器的第二控制器将处理状态参数“扩展处理芯片1对第M张图像处理完成”写入硬盘的指定区域,然后第二控制器故障,第一控制器检测到第二控制器故障后,第一控制器将自身切换为主控制器,第一控制器从该指定区域中读取到该处理状态参数,基于该处理状态参数,继续为扩展处理芯片1分配业务数据。比如,假设M为13,表示扩展处理芯片1即将对第13张图像处理完成,则第一控制器可以准备将下一张图像发送给扩展处理芯片1。Continuing the above example, it is assumed that the second controller as the main controller writes the processing status parameter "Extended processing chip 1 completes the processing of the M-th image" into the designated area of the hard disk, and then the second controller fails, and the first controller detects After the second controller fails, the first controller switches itself to the main controller, the first controller reads the processing status parameter from the designated area, and continues to allocate the extended processing chip 1 based on the processing status parameter. business data. For example, assuming that M is 13, it means that the extended processing chip 1 is about to finish processing the thirteenth image, and the first controller can prepare to send the next image to the extended processing chip 1 .
应用本发明所示实施例,电子设置中包括两个控制器,每个控制器判断自身是主控制器还是备控制器;如果是主控制器,则处理业务,并通过自身内网网口与扩展处理芯片进行通信;如果是备控制器,则检测主控制器是否故障,如果主控制器故障,将自身切换为主控制器处理业务,并开启自身内网网口与该多个扩展处理芯片进行通信;可见,第一方面,即便一个控制器故障,另一个控制器能够接管故障控制器的业务,并且能够与多个扩展处理芯片进行通信,进而该多个扩展处理芯片仍能进行数据处理,减少了业务中断的情况;第二方面,控制器通过自身内网网口来控制与扩展处理芯片的通信,不需要为扩展处理芯片分配外部IP地址,节省了外部IP资源,简化了配置;第三方面,控制器通过自身内网网口来控制与扩展处理芯片的通信,将扩展处理芯片与外网隔离,提高了安全性;第四方面,一般来说,内网通信地址是固定的,而外网IP地址是用户分配的,这样,对于设备来说,内网通信地址比外网通信地址更便于管理。Applying the embodiment shown in the present invention, the electronic setting includes two controllers, each controller determines whether it is the main controller or the standby controller; The expansion processing chip communicates; if it is a standby controller, it detects whether the main controller is faulty. If the main controller fails, it switches itself to the main controller to process services, and opens its own intranet network port and the multiple expansion processing chips. It can be seen that, in the first aspect, even if one controller fails, another controller can take over the business of the faulty controller, and can communicate with multiple extended processing chips, and then the multiple extended processing chips can still perform data processing. , reducing the situation of business interruption; secondly, the controller controls the communication with the expansion processing chip through its own internal network port, and does not need to allocate an external IP address for the expansion processing chip, which saves external IP resources and simplifies the configuration; In the third aspect, the controller controls the communication with the expansion processing chip through its own intranet network port, which isolates the expansion processing chip from the external network and improves security; in the fourth aspect, generally speaking, the intranet communication address is fixed , and the external network IP address is assigned by the user. In this way, for the device, the internal network communication address is more convenient to manage than the external network communication address.
一些相关的主备切换方案中通常包括:主节点、备节点和计算节点,计算节点即为对业务数据进行分析处理的节点,在主节点未故障的情况下,由主节点对计算节点进行管理,主节点与计算节点之间进行通信,如果主节点故障,则由备节点接替主节点对计算节点进行管理,备节点与计算节点之间进行通信。主节点与备节点之间通过软件通信机制来进行主备切换,或者说,主节点与计算节点之间的通信链路在硬件上是连通的,备节点与计算节点之间的通信链路在硬件上也是连通的,主节点与备节点通过自身软件通信机制(自身业务逻辑)来控制自身是否与计算节点进行通信。这种方案中,主节点与备节点需要使用不同的地址与计算节点通信,否则会出现地址冲突的情况,这样,进行主备切换时,计算节点需要切换目的地址才能正常通信,这使得计算节点的逻辑较复杂。Some related active-standby switching solutions usually include: a master node, a standby node, and a computing node. The computing node is the node that analyzes and processes business data. When the master node does not fail, the master node manages the computing node. , the master node communicates with the computing node. If the master node fails, the backup node replaces the master node to manage the computing node, and the backup node communicates with the computing node. The master node and the backup node perform the master-slave switchover through the software communication mechanism. In other words, the communication link between the master node and the computing node is connected in hardware, and the communication link between the backup node and the computing node is in the The hardware is also connected, and the master node and the standby node control whether they communicate with the computing node through their own software communication mechanism (their own business logic). In this solution, the master node and the backup node need to use different addresses to communicate with the computing node, otherwise there will be address conflict. In this way, when the master and backup switch is performed, the computing node needs to switch the destination address to communicate normally, which makes the computing node logic is more complicated.
而本方案中,控制器通过在硬件上开启自身内网网口来控制与扩展处理芯片的通信,换句话说,只有一个控制器与扩展处理芯片之间的通信链路在硬件上是连通的,当一个控制器故障时,另一个控制器才连通自身与扩展处理芯片之间的硬件通信链路,这样,两个控制器使用相同的地址与扩展处理芯片通信,也不会出现地址冲突的情况,因而,扩展处理芯片不需要切换目的地址,简化了扩展处理芯片的逻辑,并且实现了扩展处理芯片无感知情况下的主备切换。In this solution, the controller controls the communication with the extended processing chip by opening its own internal network port on the hardware. In other words, only one communication link between the controller and the extended processing chip is connected on the hardware. , when one controller fails, the other controller connects the hardware communication link between itself and the expansion processing chip, so that the two controllers use the same address to communicate with the expansion processing chip, and there will be no address conflict. Therefore, the extended processing chip does not need to switch the destination address, which simplifies the logic of the extended processing chip, and realizes the active-standby switching without the extended processing chip sensing.
上述一些主备切换的方案中,为减少主节点与备节点地址冲突的情况,采用的手段为:由计算节点切换目的地址;另一些相关的主备切换方案中,为减少主节点与备节点地址冲突的情况,采用的手段为:在主节点未故障的情况下,主节点与备节点使用不同的地址,如果主节点故障,则备节点将自身地址修改为主节点的地址,这也称为地址漂移,也就是主节点的地址漂移至备节点;这种方案中,计算节点不需要切换目的地址,但是备节点需要切换自身地址,这使得备节点的逻辑较复杂。In some of the above active-standby switching schemes, in order to reduce the address conflict between the master node and the standby node, the means used are: the computing node switches the destination address; In the case of address conflict, the method used is: when the master node is not faulty, the master node and the backup node use different addresses. If the master node fails, the backup node changes its own address to the address of the master node, which is also called the address of the master node. In this solution, the computing node does not need to switch the destination address, but the standby node needs to switch its own address, which makes the logic of the standby node more complicated.
而本方案中,控制器通过在硬件上开启自身内网网口来控制与扩展处理芯片的通信,换句话说,只有一个控制器与扩展处理芯片之间的通信链路在硬件上是连通的,当一个控制器故障时,另一个控制器才连通自身与扩展处理芯片之间的硬件通信链路,这样,两个控制器使用相同的地址与扩展处理芯片通信,也不会出现地址冲突的情况,因而,备控制器不需要切换目的地址,简化了备控制器的逻辑。In this solution, the controller controls the communication with the extended processing chip by opening its own internal network port on the hardware. In other words, only one communication link between the controller and the extended processing chip is connected on the hardware. , when one controller fails, the other controller connects the hardware communication link between itself and the expansion processing chip, so that the two controllers use the same address to communicate with the expansion processing chip, and there will be no address conflict. Therefore, the standby controller does not need to switch the destination address, which simplifies the logic of the standby controller.
与上述方法实施例相对应,本发明实施例还提供了一种通信装置,如图8所示;所述装置包括:Corresponding to the foregoing method embodiments, an embodiment of the present invention further provides a communication device, as shown in FIG. 8 ; the device includes:
判断模块801,用于判断第一控制器是否为主控制器;如果是,触发第一处理模块802,如果否,触发第二处理模块803;The
第一处理模块802,用于在所述第一控制器处理业务过程中,通过所述第一控制器的内网网口与所述多个扩展处理芯片进行通信;A
第二处理模块803,用于检测作为主控制器的第二控制器是否故障;如果故障,将所述第一控制器切换为主控制器处理业务;在处理业务过程中,开启所述第一控制器的内网网口;通过所述第一控制器的内网网口与所述多个扩展处理芯片进行通信。The
作为一种实施方式,第一处理模块802还用于:As an implementation manner, the
在所述第一控制器为主控制器的情况下,通过所述第一控制器的内网网口,将待处理的业务数据发送至所述多个扩展处理芯片;通过所述第一控制器的内网网口,接收所述多个扩展处理芯片发送的数据处理结果;In the case that the first controller is the main controller, the service data to be processed is sent to the plurality of extended processing chips through the intranet network port of the first controller; through the first controller The internal network port of the device receives the data processing results sent by the multiple expansion processing chips;
第二处理模块803还用于:The
在所述第一控制器为备控制器、并且到检测作为主控制器的第二控制器故障的情况下,通过所述第一控制器的内网网口,将待处理的业务数据发送至所述多个扩展处理芯片;通过所述第一控制器的内网网口,接收所述多个扩展处理芯片发送的数据处理结果。In the case that the first controller is the standby controller and the failure of the second controller, which is the main controller, is detected, the service data to be processed is sent to the intranet port of the first controller to the multiple extended processing chips; receive the data processing results sent by the multiple extended processing chips through the intranet network port of the first controller.
作为一种实施方式,所述装置还包括:写入模块和获取模块(图中未示出),其中,As an implementation manner, the apparatus further includes: a writing module and an acquiring module (not shown in the figure), wherein,
写入模块,用于在所述通过所述第一控制器的内网网口与所述多个扩展处理芯片进行通信之后,将所述多个扩展处理芯片对数据的处理状态参数写入预设存储区域;The writing module is configured to write the data processing state parameters of the multiple expansion processing chips into the pre-processor after the communication with the multiple expansion processing chips through the internal network port of the first controller. set storage area;
获取模块,用于在所述检测模块检测到主控制器故障的情况下,通过访问所述预设存储区域,获取所述处理状态参数,基于所述处理状态参数处理业务。An obtaining module, configured to obtain the processing state parameter by accessing the preset storage area when the detection module detects a failure of the main controller, and process services based on the processing state parameter.
作为一种实施方式,所述装置还包括:As an embodiment, the device further includes:
复位模块(图中未示出),用于在检测到作为主控制器的第二控制器故障的情况下,并且在所述开启所述第一控制器的内网网口之后,控制所述多个扩展处理芯片中的一个或多个扩展处理芯片进行复位。A reset module (not shown in the figure) is used for, in the case of detecting the failure of the second controller as the main controller, and after the opening of the internal network port of the first controller, to control the One or more extended processing chips among the multiple extended processing chips are reset.
应用本发明所示实施例,电子设置中包括两个控制器,每个控制器判断自身是主控制器还是备控制器;如果是主控制器,则处理业务,并通过自身内网网口与扩展处理芯片进行通信;如果是备控制器,则检测主控制器是否故障,如果主控制器故障,将自身切换为主控制器处理业务,并开启自身内网网口与该多个扩展处理芯片进行通信;可见,第一方面,即便一个控制器故障,另一个控制器能够接管故障控制器的业务,并且能够与多个扩展处理芯片进行通信,进而该多个扩展处理芯片仍能进行数据处理,减少了业务中断的情况;第二方面,控制器通过自身内网网口来控制与扩展处理芯片的通信,不需要为扩展处理芯片分配外部IP地址,节省了外部IP资源,简化了配置;第三方面,控制器通过自身内网网口来控制与扩展处理芯片的通信,将扩展处理芯片与外网隔离,提高了安全性;第四方面,一般来说,内网通信地址是固定的,而外网IP地址是用户分配的,这样,对于设备来说,内网通信地址比外网通信地址更便于管理。Applying the embodiment shown in the present invention, the electronic setting includes two controllers, each controller determines whether it is the main controller or the standby controller; The expansion processing chip communicates; if it is a standby controller, it detects whether the main controller is faulty. If the main controller fails, it switches itself to the main controller to process services, and opens its own intranet network port and the multiple expansion processing chips. It can be seen that, in the first aspect, even if one controller fails, another controller can take over the business of the faulty controller, and can communicate with multiple extended processing chips, and then the multiple extended processing chips can still perform data processing. , reducing the situation of business interruption; secondly, the controller controls the communication with the expansion processing chip through its own internal network port, and does not need to allocate an external IP address for the expansion processing chip, which saves external IP resources and simplifies the configuration; In the third aspect, the controller controls the communication with the expansion processing chip through its own intranet network port, which isolates the expansion processing chip from the external network and improves security; in the fourth aspect, generally speaking, the intranet communication address is fixed , and the external network IP address is assigned by the user. In this way, for the device, the internal network communication address is more convenient to manage than the external network communication address.
一些相关的主备切换方案中通常包括:主节点、备节点和计算节点,计算节点即为对业务数据进行分析处理的节点,在主节点未故障的情况下,由主节点对计算节点进行管理,主节点与计算节点之间进行通信,如果主节点故障,则由备节点接替主节点对计算节点进行管理,备节点与计算节点之间进行通信。主节点与备节点之间通过软件通信机制来进行主备切换,或者说,主节点与计算节点之间的通信链路在硬件上是连通的,备节点与计算节点之间的通信链路在硬件上也是连通的,主节点与备节点通过自身软件通信机制(自身业务逻辑)来控制自身是否与计算节点进行通信。这种方案中,主节点与备节点需要使用不同的地址与计算节点通信,否则会出现地址冲突的情况,这样,进行主备切换时,计算节点需要切换目的地址才能正常通信,这使得计算节点的逻辑较复杂。Some related active-standby switching solutions usually include: a master node, a standby node, and a computing node. The computing node is the node that analyzes and processes business data. When the master node does not fail, the master node manages the computing node. , the master node communicates with the computing node. If the master node fails, the backup node replaces the master node to manage the computing node, and the backup node communicates with the computing node. The master node and the backup node perform the master-slave switchover through the software communication mechanism. In other words, the communication link between the master node and the computing node is connected in hardware, and the communication link between the backup node and the computing node is in the The hardware is also connected, and the master node and the standby node control whether they communicate with the computing node through their own software communication mechanism (their own business logic). In this solution, the master node and the backup node need to use different addresses to communicate with the computing node, otherwise there will be address conflict. In this way, when the master and backup switch is performed, the computing node needs to switch the destination address to communicate normally, which makes the computing node logic is more complicated.
而本方案中,控制器通过在硬件上开启自身内网网口来控制与扩展处理芯片的通信,换句话说,只有一个控制器与扩展处理芯片之间的通信链路在硬件上是连通的,当一个控制器故障时,另一个控制器才连通自身与扩展处理芯片之间的硬件通信链路,这样,两个控制器使用相同的地址与扩展处理芯片通信,也不会出现地址冲突的情况,因而,扩展处理芯片不需要切换目的地址,简化了扩展处理芯片的逻辑,并且实现了扩展处理芯片无感知情况下的主备切换。In this solution, the controller controls the communication with the extended processing chip by opening its own internal network port on the hardware. In other words, only one communication link between the controller and the extended processing chip is connected on the hardware. , when one controller fails, the other controller connects the hardware communication link between itself and the expansion processing chip, so that the two controllers use the same address to communicate with the expansion processing chip, and there will be no address conflict. Therefore, the extended processing chip does not need to switch the destination address, which simplifies the logic of the extended processing chip, and realizes the active-standby switching without the extended processing chip sensing.
上述一些主备切换的方案中,为减少主节点与备节点地址冲突的情况,采用的手段为:由计算节点切换目的地址;另一些相关的主备切换方案中,为减少主节点与备节点地址冲突的情况,采用的手段为:在主节点未故障的情况下,主节点与备节点使用不同的地址,如果主节点故障,则备节点将自身地址修改为主节点的地址,这也称为地址漂移,也就是主节点的地址漂移至备节点;这种方案中,计算节点不需要切换目的地址,但是备节点需要切换自身地址,这使得备节点的逻辑较复杂。In some of the above active-standby switching schemes, in order to reduce the address conflict between the master node and the standby node, the means used are: the computing node switches the destination address; In the case of address conflict, the method used is: when the master node is not faulty, the master node and the backup node use different addresses. If the master node fails, the backup node changes its own address to the address of the master node, which is also called the address of the master node. In this solution, the computing node does not need to switch the destination address, but the standby node needs to switch its own address, which makes the logic of the standby node more complicated.
而本方案中,控制器通过在硬件上开启自身内网网口来控制与扩展处理芯片的通信,换句话说,只有一个控制器与扩展处理芯片之间的通信链路在硬件上是连通的,当一个控制器故障时,另一个控制器才连通自身与扩展处理芯片之间的硬件通信链路,这样,两个控制器使用相同的地址与扩展处理芯片通信,也不会出现地址冲突的情况,因而,备控制器不需要切换目的地址,简化了备控制器的逻辑。In this solution, the controller controls the communication with the extended processing chip by opening its own internal network port on the hardware. In other words, only one communication link between the controller and the extended processing chip is connected on the hardware. , when one controller fails, the other controller connects the hardware communication link between itself and the expansion processing chip, so that the two controllers use the same address to communicate with the expansion processing chip, and there will be no address conflict. Therefore, the standby controller does not need to switch the destination address, which simplifies the logic of the standby controller.
本发明实施例还提供一种计算机可读存储介质,所述计算机可读存储介质内存储有计算机程序,所述计算机程序被处理器执行时实现上述任一种业务切换方法。Embodiments of the present invention further provide a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, any one of the foregoing service switching methods is implemented.
需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。It should be noted that, in this document, relational terms such as first and second are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply any relationship between these entities or operations. any such actual relationship or sequence exists. Moreover, the terms "comprising", "comprising" or any other variation thereof are intended to encompass a non-exclusive inclusion such that a process, method, article or device that includes a list of elements includes not only those elements, but also includes not explicitly listed or other elements inherent to such a process, method, article or apparatus. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in a process, method, article or apparatus that includes the element.
本说明书中的各个实施例均采用相关的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于方法实施例、装置实施例、计算机可读存储介质实施例而言,由于其基本相似于电子设备实施例,所以描述的比较简单,相关之处参见电子设备实施例的部分说明即可。Each embodiment in this specification is described in a related manner, and the same and similar parts between the various embodiments may be referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, for the method embodiments, apparatus embodiments, and computer-readable storage medium embodiments, since they are basically similar to the electronic device embodiments, the description is relatively simple, and for related parts, please refer to the partial descriptions of the electronic device embodiments. .
以上所述仅为本发明的较佳实施例而已,并非用于限定本发明的保护范围。凡在本发明的精神和原则之内所作的任何修改、等同替换、改进等,均包含在本发明的保护范围内。The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the protection scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention are included in the protection scope of the present invention.
Claims (11)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910353386.7A CN111858187A (en) | 2019-04-29 | 2019-04-29 | An electronic device and service switching method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910353386.7A CN111858187A (en) | 2019-04-29 | 2019-04-29 | An electronic device and service switching method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111858187A true CN111858187A (en) | 2020-10-30 |
Family
ID=72966290
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910353386.7A Pending CN111858187A (en) | 2019-04-29 | 2019-04-29 | An electronic device and service switching method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111858187A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112286851A (en) * | 2020-11-06 | 2021-01-29 | 百度在线网络技术(北京)有限公司 | Server mainboard, server, control method, electronic device and readable medium |
CN115291814A (en) * | 2022-10-09 | 2022-11-04 | 深圳市安信达存储技术有限公司 | Embedded memory core data storage method, embedded memory chip and memory system |
CN115396252A (en) * | 2022-07-13 | 2022-11-25 | 陕西千山航空电子有限责任公司 | Method for switching main BC equipment and standby BC equipment |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050102549A1 (en) * | 2003-04-23 | 2005-05-12 | Dot Hill Systems Corporation | Network storage appliance with an integrated switch |
CN101192960A (en) * | 2006-11-28 | 2008-06-04 | 中兴通讯股份有限公司 | Main/slave switching detection and control device and method in distributed system |
CN101340315A (en) * | 2008-08-26 | 2009-01-07 | 中兴通讯股份有限公司 | End-to-end Ethernet protection method and communication apparatus adopting the same |
CN101877631A (en) * | 2010-06-28 | 2010-11-03 | 中兴通讯股份有限公司 | Server and business switching method thereof |
CN103605346A (en) * | 2013-11-22 | 2014-02-26 | 曙光信息产业(北京)有限公司 | Master-slave management module failure automatic switching system and realization method thereof |
CN104572534A (en) * | 2014-12-06 | 2015-04-29 | 呼和浩特铁路局科研所 | Locomotive information monitoring equipment and operating method thereof |
CN105760260A (en) * | 2016-02-19 | 2016-07-13 | 浙江大华系统工程有限公司 | Backup system and backup method |
CN106160841A (en) * | 2015-04-09 | 2016-11-23 | 中兴通讯股份有限公司 | Backboard, the method and apparatus of analysis message, the method and apparatus of realization communication |
CN206249296U (en) * | 2016-09-30 | 2017-06-13 | 浙江宇视科技有限公司 | A kind of dual control storage server |
-
2019
- 2019-04-29 CN CN201910353386.7A patent/CN111858187A/en active Pending
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050102549A1 (en) * | 2003-04-23 | 2005-05-12 | Dot Hill Systems Corporation | Network storage appliance with an integrated switch |
CN101192960A (en) * | 2006-11-28 | 2008-06-04 | 中兴通讯股份有限公司 | Main/slave switching detection and control device and method in distributed system |
CN101340315A (en) * | 2008-08-26 | 2009-01-07 | 中兴通讯股份有限公司 | End-to-end Ethernet protection method and communication apparatus adopting the same |
CN101877631A (en) * | 2010-06-28 | 2010-11-03 | 中兴通讯股份有限公司 | Server and business switching method thereof |
CN103605346A (en) * | 2013-11-22 | 2014-02-26 | 曙光信息产业(北京)有限公司 | Master-slave management module failure automatic switching system and realization method thereof |
CN104572534A (en) * | 2014-12-06 | 2015-04-29 | 呼和浩特铁路局科研所 | Locomotive information monitoring equipment and operating method thereof |
CN106160841A (en) * | 2015-04-09 | 2016-11-23 | 中兴通讯股份有限公司 | Backboard, the method and apparatus of analysis message, the method and apparatus of realization communication |
CN105760260A (en) * | 2016-02-19 | 2016-07-13 | 浙江大华系统工程有限公司 | Backup system and backup method |
CN206249296U (en) * | 2016-09-30 | 2017-06-13 | 浙江宇视科技有限公司 | A kind of dual control storage server |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112286851A (en) * | 2020-11-06 | 2021-01-29 | 百度在线网络技术(北京)有限公司 | Server mainboard, server, control method, electronic device and readable medium |
CN112286851B (en) * | 2020-11-06 | 2023-06-23 | 百度在线网络技术(北京)有限公司 | Server main board, server, control method, electronic device and readable medium |
CN115396252A (en) * | 2022-07-13 | 2022-11-25 | 陕西千山航空电子有限责任公司 | Method for switching main BC equipment and standby BC equipment |
CN115291814A (en) * | 2022-10-09 | 2022-11-04 | 深圳市安信达存储技术有限公司 | Embedded memory core data storage method, embedded memory chip and memory system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9489274B2 (en) | System and method for performing efficient failover and virtual machine (VM) migration in virtual desktop infrastructure (VDI) | |
US7904906B2 (en) | Tracking modified pages on a computer system | |
US9423956B2 (en) | Emulating a stretched storage device using a shared storage device | |
TW201720103A (en) | PCIe network system with failover capability and operation method thereof | |
JP2002287999A (en) | Server duplexing method, duplex server system, and duplex database server | |
US9378103B2 (en) | Coordination techniques for redundant array of independent disks storage controllers | |
US9442811B2 (en) | Emulating a stretched storage device using a shared replicated storage device | |
CN113342262A (en) | Method and apparatus for disk management for full flash memory array server | |
US20070180301A1 (en) | Logical partitioning in redundant systems | |
CN111858187A (en) | An electronic device and service switching method and device | |
JPH11161625A (en) | Computer system | |
CN109783280A (en) | Shared memory systems and shared storage method | |
US9798615B2 (en) | System and method for providing a RAID plus copy model for a storage network | |
TWI773152B (en) | Server and control method of server | |
US11341073B2 (en) | Redundant paths to single port storage devices | |
US11210034B2 (en) | Method and apparatus for performing high availability management of all flash array server | |
US20100023801A1 (en) | Method to recover from ungrouped logical path failures | |
US9952941B2 (en) | Elastic virtual multipath resource access using sequestered partitions | |
CN113342257B (en) | Server and related control method | |
JP2012014239A (en) | Fault tolerant calculator system, switch device connected to multiple physical servers and storage device, and server synchronous control method | |
TWI766590B (en) | Server and control method thereof | |
CN113342260B (en) | Server and control method applied to server | |
EP4237954B1 (en) | Expanded availability computing system | |
CN119629209A (en) | Industrial control system and redundant data communication method based on fused memory architecture |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20201030 |
|
RJ01 | Rejection of invention patent application after publication |