CN111858187A - Electronic equipment and service switching method and device - Google Patents

Electronic equipment and service switching method and device Download PDF

Info

Publication number
CN111858187A
CN111858187A CN201910353386.7A CN201910353386A CN111858187A CN 111858187 A CN111858187 A CN 111858187A CN 201910353386 A CN201910353386 A CN 201910353386A CN 111858187 A CN111858187 A CN 111858187A
Authority
CN
China
Prior art keywords
controller
service
extended processing
main
intranet
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910353386.7A
Other languages
Chinese (zh)
Inventor
乔勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Hikvision Digital Technology Co Ltd
Original Assignee
Hangzhou Hikvision Digital Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Hikvision Digital Technology Co Ltd filed Critical Hangzhou Hikvision Digital Technology Co Ltd
Priority to CN201910353386.7A priority Critical patent/CN111858187A/en
Publication of CN111858187A publication Critical patent/CN111858187A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2023Failover techniques
    • G06F11/2028Failover techniques eliminating a faulty processor or activating a spare
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2023Failover techniques
    • G06F11/2033Failover techniques switching over of hardware resources

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Hardware Redundancy (AREA)

Abstract

The embodiment of the invention provides an electronic device and a service switching method and a device, wherein the electronic device comprises two controllers, one controller is a main controller, the other controller is a standby controller, under the condition that the two controllers are not in fault, the main controller processes the service, the standby controller detects whether the main controller is in fault, if the main controller is in fault, the standby controller is switched to the main controller to process the service, and in the process of processing the service, an intranet port of the standby controller is opened to communicate with a plurality of extended processing chips; therefore, even if one controller fails, the other controller can take over the service of the failed controller and can communicate with the plurality of extended processing chips, and then the plurality of extended processing chips can still perform data processing, so that the condition of service interruption is reduced.

Description

Electronic equipment and service switching method and device
Technical Field
The present invention relates to the technical field of hardware devices, and in particular, to an electronic device and a service switching method and apparatus.
Background
At present, in order to improve the data processing capability of an electronic device, a plurality of processing chips are generally required to be expanded in the device, so that a controller in the device can send service data to be processed to each processing chip for processing. For example, a plurality of GPUs (Graphics Processing units) are extended in the intelligent analysis device, a motherboard controller in the intelligent analysis device acquires image data to be processed, sends the image data to the extended GPUs, analyzes and processes the image data by the GPUs, and feeds back a Processing result to the motherboard controller.
In the existing device, a plurality of extended processing chips and a controller are usually installed, and if the controller fails, the controller cannot send service data to the plurality of extended processing chips, and the plurality of extended processing chips cannot analyze and process the service data, which results in service interruption.
Disclosure of Invention
The embodiment of the invention aims to provide electronic equipment, a service switching method and a service switching device, so as to reduce the service interruption.
To achieve the above object, an embodiment of the present invention provides an electronic device, including: the system comprises a main controller, a controller and a plurality of extended processing chips; each controller comprises an intranet net port, and each controller is connected with the plurality of extended processing chips through the intranet net port of the controller;
the main controller communicates with the plurality of extended processing chips through an intranet port of the main controller in the process of processing the service;
the backup controller detects whether the main controller fails; if the fault occurs, the backup controller switches to the main controller to process the service; the switched main controller starts an intranet port of the main controller in the process of processing the service; and communicating with the plurality of extended processing chips through the switched intranet net port of the main controller.
Optionally, the apparatus further includes a first network switching chip and a backplane;
the first network switching chip is arranged on the back plate; the intranet net port of each controller is connected with the first network exchange chip; the first network exchange chip is connected with the plurality of extended processing chips through a first type interface on the backboard.
Optionally, the plurality of extended processing chips are disposed on a tray, a second network switching chip is further disposed on the tray, the tray is inserted into the first type of interface on the backplane, the plurality of extended processing chips are connected to the second network switching chip, and the second network switching chip is connected to the first type of interface on the backplane.
Optionally, the plurality of expansion processing chips are disposed on a plurality of trays, each tray is disposed with a second network switching chip and one or more expansion processing chips, each tray is inserted into one first type of interface on the backplane, each expansion processing chip is connected to the second network switching chip on the tray where the expansion processing chip is located, and each second network switching chip is connected to the first type of interface into which the tray where the expansion processing chip is located is inserted.
Optionally, the device further includes a hard disk, where the hard disk is connected to the second type of interface on the backplane; each controller comprises an adapter, and each controller is connected with the second type interface on the back panel through the adapter of the controller.
Optionally, the main controller writes the processing state parameters of the multiple extended processing chips to the data into the designated area of the hard disk; and if the main controller fails, the backup controller accesses the specified area after switching to the main controller to process the service, acquires the processing state parameters, and processes the service based on the processing state parameters.
Optionally, each controller includes a heartbeat network port, and the master controller and the slave controller detect whether the opposite-end controller fails through the heartbeat network port.
Optionally, after the controller switches to the main controller to process the service and opens the intranet port of the controller, the controller controls the multiple extended processing chips to reset.
Optionally, the expansion processing chip is a GPU, and the first type interface and the second type interface are both SFF-8639 interfaces.
In order to achieve the above object, an embodiment of the present invention further provides a service switching method, which is applied to a first controller in an electronic device, where the device further includes a second controller and a plurality of extended processing chips, each controller includes an intranet port, and each controller is connected to the plurality of extended processing chips through its own intranet port; the method comprises the following steps:
If the first controller is a main controller, communicating with the plurality of extended processing chips through an intranet port of the first controller in the process that the first controller processes the service;
if the first controller is a controller, detecting whether a second controller serving as a main controller fails; if the fault occurs, the first controller is switched to a main controller to process the service; in the process of processing the service, an intranet port of the first controller is started; and the internal network port of the first controller is communicated with the plurality of extended processing chips.
Optionally, the communicating with the multiple extended processing chips through the intranet port of the first controller includes:
sending the service data to be processed to the plurality of extended processing chips through the intranet port of the first controller;
and receiving the data processing result sent by the plurality of extended processing chips through the intranet net port of the first controller.
Optionally, after the communication with the plurality of extended processing chips through the intranet port of the first controller, the method further includes:
writing the processing state parameters of the data by the plurality of extended processing chips into a preset storage area;
In the case where a failure of the second controller as the main controller is detected, the method further includes:
and acquiring the processing state parameter by accessing the preset storage area, and processing the service based on the processing state parameter.
Optionally, in the case that a failure of the second controller as the main controller is detected, and after the intranet port of the first controller is opened, the method further includes:
and controlling one or more extended processing chips in the plurality of extended processing chips to reset.
In order to achieve the above object, an embodiment of the present invention further provides a service switching apparatus, which is applied to a first controller in an electronic device, where the device further includes a second controller and a plurality of extended processing chips, each controller includes an intranet port, and each controller is connected to the plurality of extended processing chips through its own intranet port; the device comprises:
the first processing module is used for communicating with the plurality of extended processing chips through the intranet net port of the first controller in the process that the first controller processes the business under the condition that the first controller is a main controller;
the second processing module is used for detecting whether a second controller serving as a main controller fails or not under the condition that the first controller is a slave controller; if the fault occurs, the first controller is switched to a main controller to process the service; in the process of processing the service, an intranet port of the first controller is started; and the internal network port of the first controller is communicated with the plurality of extended processing chips.
Optionally, the first processing module is further configured to:
under the condition that the first controller is a main controller, sending service data to be processed to the plurality of extended processing chips through an intranet network port of the first controller; receiving data processing results sent by the plurality of extended processing chips through an intranet port of the first controller;
the second processing module is further configured to:
under the condition that the first controller is a controller and a second controller serving as a main controller is detected to be in fault, sending to-be-processed service data to the plurality of extended processing chips through an intranet port of the first controller; and receiving the data processing result sent by the plurality of extended processing chips through the intranet net port of the first controller.
Optionally, the apparatus further comprises:
the writing module is used for writing the processing state parameters of the data of the plurality of extended processing chips into a preset storage area after the communication is carried out between the intranet network port of the first controller and the plurality of extended processing chips;
and the acquisition module is used for acquiring the processing state parameters by accessing the preset storage area under the condition that the detection module detects the fault of the main controller, and processing the service based on the processing state parameters.
Optionally, the apparatus further comprises:
and the resetting module is used for controlling one or more extended processing chips in the plurality of extended processing chips to reset under the condition that the second controller serving as the main controller is detected to be in fault and after the intranet network port of the first controller is started.
The electronic equipment provided by the embodiment of the invention comprises two controllers, wherein one controller is a main controller, the other controller is a standby controller, under the condition that the two controllers are not in fault, the main controller processes the service, the standby controller detects whether the main controller is in fault, if the main controller is in fault, the standby controller is switched to the main controller to process the service, and in the process of processing the service, an intranet port of the electronic equipment per se is opened to communicate with the plurality of extended processing chips; therefore, in the first aspect, even if one controller fails, the other controller can take over the service of the failed controller and can communicate with the multiple extended processing chips, and then the multiple extended processing chips can still perform data processing, thereby reducing the service interruption.
In a second aspect, some related active/standby switching schemes generally include: the system comprises a main node, a standby node and a computing node, wherein the computing node is a node for analyzing and processing service data, under the condition that the main node does not fail, the main node manages the computing node, the main node and the computing node communicate with each other, if the main node fails, the standby node takes over the main node to manage the computing node, and the standby node and the computing node communicate with each other. The main node and the standby node are switched between each other through a software communication mechanism, or a communication link between the main node and the computing node is communicated on hardware, a communication link between the standby node and the computing node is also communicated on hardware, and the main node and the standby node control whether to communicate with the computing node or not through a software communication mechanism (self service logic) of the main node and the standby node. In the scheme, the main node and the standby node need to use different addresses to communicate with the computing node, otherwise, the address conflict situation occurs, so that the computing node needs to switch the destination address to normally communicate when the main node and the standby node are switched, and the logic of the computing node is complex.
In the scheme, the controllers control the communication with the extended processing chip by opening the intranet ports thereof on hardware, in other words, only one communication link between one controller and the extended processing chip is communicated on hardware, when one controller fails, the other controller is communicated with the hardware communication link between the other controller and the extended processing chip, so that the two controllers use the same address to communicate with the extended processing chip, and the condition of address conflict cannot occur, therefore, the extended processing chip does not need to switch the destination address, the logic of the extended processing chip is simplified, and the master-standby switching of the extended processing chip without knowing the conditions is realized.
In a third aspect, in some of the above solutions for switching between a master node and a slave node, in order to reduce the address conflict between the master node and the slave node, the following measures are adopted: switching the destination address by the computing node; in other related main/standby switching schemes, in order to reduce the address conflict between the main node and the standby node, the adopted means is as follows: under the condition that the main node does not fail, the main node and the standby node use different addresses, if the main node fails, the standby node modifies the address of the standby node into the address of the main node, which is also called address drift, namely the address of the main node is drifted to the standby node; in this scheme, the computing node does not need to switch the destination address, but the standby node needs to switch its own address, which makes the logic of the standby node more complex.
In the scheme, the controllers control the communication with the extended processing chip by opening the intranet ports on the hardware, in other words, only one communication link between one controller and the extended processing chip is communicated on the hardware, and when one controller fails, the other controller is communicated with the hardware communication link between the other controller and the extended processing chip, so that the two controllers use the same address to communicate with the extended processing chip, and the address conflict can not occur, therefore, the controllers do not need to switch the destination addresses, and the logic of the controllers is simplified.
Of course, not all of the advantages described above need to be achieved at the same time in the practice of any one product or method of the invention.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic structural diagram of an electronic device according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a controller according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of an electronic device according to a second embodiment of the present invention;
fig. 4 is a schematic structural diagram of an electronic device according to a third embodiment of the invention;
fig. 5 is a schematic diagram of a fourth structure of the electronic device according to the embodiment of the present invention;
fig. 6 is a schematic structural diagram of a fifth electronic device according to an embodiment of the present invention;
fig. 7 is a flowchart illustrating a service switching method according to an embodiment of the present invention;
fig. 8 is a schematic structural diagram of a service switching apparatus according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In order to solve the above technical problem, embodiments of the present invention provide an electronic device, and a method and an apparatus for service switching.
Fig. 1 is a first schematic structural diagram of an electronic device according to an embodiment of the present invention, including: the controller comprises a main controller 100, a controller 200 and a plurality of expansion processing chips 300 (an expansion processing chip 1 and an expansion processing chip 2 … … are an expansion processing chip N, N is a positive integer greater than 1), each controller comprises an intranet port, and each controller is connected with the plurality of expansion processing chips through the intranet port of the controller. For convenience of description, the intranet port of the main controller is denoted as 110, and the intranet port of the controller is denoted as 210.
The main controller 100 communicates with the plurality of extended processing chips 300 through the intranet port 110 of the main controller during the process of processing the service; the backup controller 200 detects whether the primary controller 100 fails, if so, the backup controller 200 switches to the primary controller to process the service; the switched main controller (original controller) starts its intranet port 210 during the process of processing the service, and communicates with the plurality of extended processing chips 300 through the intranet port 210.
Generally, a controller acquires service data to be processed and sends the service data to a plurality of extended processing chips, and each extended processing chip processes the service data received by itself and feeds back a processing result to the controller. In this embodiment of the present invention, a communication process between the controller and the extended processing chip may include: the controller sends service data to the extended processing chip, and the extended processing chip sends a data processing result to the controller; in addition, the controller may also control resetting and starting of the extended processing chip, and the controller may also monitor a processing state of the extended processing chip on data, and the like, which is not limited specifically.
The electronic device provided in this embodiment includes two controllers: the main controller and the controller, in one case, may be the same component in hardware as the controller. For example, as shown in fig. 2, the main controller 100 may include a main board and a controller circuit disposed on the main board; the controller circuit includes: a CPU (Central Processing Unit), a memory, an HBA (Host Bus Adapter), an extranet network card providing an extranet portal, an intranet network card providing an intranet portal 110, and a heartbeat network card providing a heartbeat portal; the controller 200 may include a main board and a controller circuit provided on the main board; the controller circuit includes: CPU, memory, HBA, an extranet network card providing extranet port, an intranet network card providing intranet port 210 and a heartbeat network card providing heartbeat port. The intranet network port is used for internal communication of the electronic equipment, for example, the controller communicates with the expansion processing chip through the intranet network port; the external network port is used for communication between the electronic device and the external device, for example, the electronic device obtains service data to be processed through the external network port; the HBA can be connected with the hard disk and performs data interaction with the hard disk; the heartbeat net ports of the two controllers are connected, and whether the two controllers are in fault can be detected through the heartbeat net ports.
In this embodiment, one controller in the electronic device is used to process the service, and the other controller is used as a backup. The controller for processing the service is called a main controller, the controller as a backup is a backup controller, and the main controller and the backup controller can be switched between the main controller and the backup controller. In the process of the main controller 100 processing the business, the main controller 100 communicates with the plurality of extended processing chips 300, in this case, the intranet port 210 of the controller is closed, and the controller 200 cannot communicate with the plurality of extended processing chips 300; if the main controller fails, the backup controller 200 switches to the main controller to process the service; in the process of processing the service, the intranet port 210 is started to communicate with the plurality of extended processing chips 300. Alternatively, if the primary controller fails, the backup controller 200 takes over the primary controller 100 to process traffic.
For example, when the electronic device is started, the two controllers can negotiate which controller becomes the master controller through the heartbeat portal. For example, when the electronic device is started, the two controllers usually have a speed that is lower than the speed of the completion of the startup after power-on, and the controller that completes the startup quickly after power-on (or the controller that completes the startup first) may be used as the main controller, and the other controller is used as the slave controller. Specifically, after the controller is powered on and started, whether the opposite-end controller is powered on and started is detected through the heartbeat network port, and if the powered on and started is not finished, the controller which is powered on and started becomes the main controller.
Each controller is provided with two kinds of service logics, wherein one kind is a main controller service logic, and the other kind is a controller service logic. For each controller, the controller judges whether the controller is a main controller or a slave controller; if not, judging that the host controller is the main controller and entering the main controller service logic; if the data exists, the data is judged to be a controller, and the service logic of the controller is entered. The main controller service logic is also: processing the service, and in the process of processing the service, starting an intranet port of the user, and communicating with the plurality of extended processing chips through the intranet port of the user; the controller service logic is also: detecting whether the main controller fails, and if so, switching to the main controller to process services; and in the process of processing the service, the intranet port is opened and is communicated with the plurality of extended processing chips through the intranet port.
In one embodiment, as shown in fig. 1 and 2, each controller includes a heartbeat network port, and the master controller 100 and the slave controller 200 detect whether the opposite controller fails through the heartbeat network port.
The extended processing chip is a chip different from the chip inside the controller, or a processing chip extended outside the controller. For example, the extended Processing chip in this embodiment may be a GPU, a CPU, an APU (assisted Processing Unit), or a TPU (Tensor Processing Unit), and the like, and is not limited specifically. The number of the expansion processing chips can be set according to practical situations, such as 3-9, 10-20, and the like, and the specific number is not limited.
As shown in fig. 1 and 2, each controller may include an intranet port, and the controller may open or close its intranet port.
In one case, the master controller and the slave controller may use the same IP (Internet Protocol) address; for example, in the normal working process of the main controller, the main controller adopts the IP address to communicate with the expansion processing chip; assuming that the main controller has a fault, after the main controller detects the fault through the heartbeat network port, the backup controller switches to the main controller and starts the network port of the backup controller, and the switched main controller (the original backup controller) still uses the IP address to communicate with the expansion processing chip. Thus, the IP address of the communication object does not change for the extended processing chip, and the extended processing chip does not need to execute the operation of switching the IP address of the communication object.
In one embodiment, the controller 200 controls the plurality of extended processing chips 300 to reset after switching to the main controller to process the service and opening the intranet port 210 thereof.
For example, assume that there are three extended processing chips: an expansion processing chip 1, an expansion processing chip 2 and an expansion processing chip 3; assuming that the main controller acquires 15 images to be processed, the main controller sends the 1 st, 4 th, 7 th, 10 th and 13 th images to the expansion processing chip 1, sends the 2 nd, 5 th, 8 th, 11 th and 14 th images to the expansion processing chip 2, and sends the 3 rd, 6 th, 9 th, 12 th and 15 th images to the expansion processing chip 3; assuming that the main controller fails, the expansion processing chip 1 is processing the 7 th image, the expansion processing chip 2 is processing the 8 th image, and the expansion processing chip 3 is processing the 9 th image; after the master controller 100 detects a fault through the heartbeat network port, the slave controller 200 switches to the master controller to process services, starts the intranet network port 210 of the slave controller, and then controls the three extended processing chips to reset; after resetting, the expansion processing chip 1 processes the 7 th image again, the expansion processing chip 2 processes the 8 th image again, and the expansion processing chip 3 processes the 9 th image again.
In the above example, if the controller is switched during the process of processing a certain image by the expansion processing chip, it is likely that the expansion processing chip interferes with the processing of the certain image, resulting in an error in the processing result of the certain image; in the embodiment, after the controller is switched, the extended processing chip is reset, and the image is processed again, so that the error of the processing result is reduced, and the accuracy of the processing result is improved.
In fig. 2, each controller includes a separate motherboard, in which case the motherboards of the two controllers may be connected by a backplane. Alternatively, in other embodiments, the two controller circuits may be disposed on the same motherboard, which is not limited specifically.
As an embodiment, referring to fig. 3, the electronic device may further include a first network switch chip 400 and a backplane 500; the first network switch chip 400 is disposed on the backplane 500; the intranet port of each controller is connected to the first network switch chip 400; the first network switch chip 400 is connected to the plurality of extended processing chips 300 through a first type interface on the backplane 500.
For example, the interface may be an SFF-8639 interface, and the interface may be directly connected to a hard disk, or may be connected to the expansion processing chip through a network switching chip.
In this embodiment, the controller is connected to the first network switch chip; the first network exchange chip is arranged on the backboard, so that the first network exchange chip can perform data interaction with the first type interface on the backboard; the first type interface on the back plate is connected with the expansion processing chip; therefore, the controller starts the intranet port of the controller, and can communicate with the expansion processing chip.
In one case, the backplane may have an independent power supply function, or the backplane may be connected to an independent power supply to supply power to the two controllers, the extended processing chip, the first network switching chip, and other devices. Therefore, if the main controller fails, the power supply function of the back plate is not affected, and the devices such as the extended processing chip, the controller and the first network switching chip can continue to work.
Still referring to fig. 3, the present embodiment may be understood that two controllers become one integral controller through a backplane, and the first type of interface in the backplane may be understood as an interface provided by the integral controller.
In one embodiment, a plurality of extended processing chips 300 included in the electronic device are disposed on the tray 600, the tray 600 is further disposed with a second network switch chip 700, the tray 600 is inserted into the first type interface on the backplane 500, the plurality of extended processing chips 300 are connected to the second network switch chip 700, and the second network switch chip 700 is connected to the first type interface on the backplane 500.
For the sake of description differentiation, the network switch chip on the backplane is referred to as a first network switch chip, and the network switch chip on the tray is referred to as a second network switch chip. The number of the trays may be one or more, and the specific number is not limited.
In this embodiment, the controller is connected to the first network switch chip; the first network exchange chip is arranged on the backboard, so that the first network exchange chip can perform data interaction with the first type interface on the backboard; the first type of interface on the backboard is connected with the second network exchange chip in the tray; the second network exchange chip in the tray is connected with the expansion processing chip; therefore, the controller starts the intranet port of the controller, and can communicate with the expansion processing chip.
In one embodiment, the plurality of extended processing chips 300 are disposed on a plurality of trays 600, each tray 600 is disposed with a second network switching chip 700 and one or more extended processing chips 300, each tray 600 is inserted into a first type interface on the backplane 500, each extended processing chip 300 is connected to the second network switching chip 700 on the tray 600 where it is located, and each second network switching chip 700 is connected to the first type interface inserted into the tray 600 where it is located.
Referring to fig. 4, it is assumed that a plurality of trays 600 are included in the electronic device, and the extended processing chips are GPUs, two GPUs and one second network switching chip are disposed in each tray, and each tray is inserted into one first-type interface.
The embodiment can be applied to a scene with more extended processing chips, in other words, if more extended processing chips are available, the extended processing chips can be divided into a plurality of groups, each group of extended processing chips is arranged in the same tray, a second network switching chip is further arranged in the tray, and the extended processing chips in the tray are all connected with the first type of interface on the backboard through the second network switching chip.
In one embodiment, the electronic device may further include a hard disk 800, where the hard disk 800 is connected to the second type interface on the backplane 500; each controller includes an adapter, and each controller is connected to the second type interface on the backplane 500 through its own adapter. For convenience of description, the adapter of the host controller is denoted as 120, and the adapter of the controller is denoted as 220.
For example, the electronic device may be a storage device, and the electronic device may include a plurality of hard disks. As described above, a plurality of interfaces may be provided in the backplane, and the interfaces may be SFF-8639 interfaces, and the interfaces may be directly connected to the hard disk or may be connected to the expansion processing chip through the network switch chip.
The Adapter may be an HBA (Host Bus Adapter), and in one case, the HBA of the controller may be directly connected to the hard disk through the second type interface. Referring to fig. 5, the HBA (120 and 220) of each controller is in data interaction with the second type interface on the backplane; the second type interface on the back plate is connected with the hard disk 800; therefore, data interaction can be carried out between the HBA of the controller and the hard disk.
Or, in another case, if the number of hard disks is large, the expansion component 900 may be disposed in the backplane, for example: there may be an expander chip that expands the number of hard disks used. Referring to fig. 6, the HBA (120 and 220) of each controller is connected to the expansion component 900, and the expansion component 900 can perform data interaction with the second type interface on the backplane; the second type interface on the back plate is connected with the hard disk 800; therefore, data interaction can be carried out between the HBA of the controller and the hard disk.
In one embodiment, the main controller 100 writes the processing state parameters of the data by the plurality of extended processing chips 300 into a designated area of the hard disk 800; if the main controller 100 fails, the backup controller 200 accesses the designated area after switching to the main controller to process the service, acquires the processing state parameter, and processes the service based on the processing state parameter.
As described above, the controller may monitor the processing state of the extended processing chip on the data, and in this embodiment, the main controller writes the monitored processing state parameter of the extended processing chip on the data into the hard disk. Continuing with the above example (example in which each expansion processing chip processes 5 images), taking the expansion processing chip 1 as an example, after the expansion processing chip 1 finishes processing the M (positive integer between 1 and 15) th image, the expansion processing chip 1 feeds back the processing result of the M image to the controller, and the controller may record the processing state parameter of the expansion processing chip 1, where the processing state parameter carries information that "the expansion processing chip 1 finishes processing the M image".
The main controller may periodically or non-periodically write the recorded process state parameters to a designated area of the hard disk, which may be understood as a shared area of the two controllers, which may be accessed by both controllers. For example, the controller may write the recorded processing state parameters to a designated area of the hard disk once every minute, so that the processing state parameters may be read from the designated area of the hard disk after the main controller fails and the controller switches to the main controller. The process can be understood as a migration process of the service in the controller, and the other controller takes over the running service based on the read processing state parameters, so that the service can continue to run, and the extended processing chip can continue to process the data.
The electronic equipment provided by the embodiment of the invention comprises two controllers, wherein one controller is a main controller, the other controller is a standby controller, under the condition that the two controllers are not in fault, the main controller processes the service, the standby controller detects whether the main controller is in fault, if the main controller is in fault, the standby controller is switched to the main controller to process the service, and in the process of processing the service, an intranet port of the electronic equipment per se is opened to communicate with the plurality of extended processing chips; therefore, in the scheme, in a first aspect, even if one controller fails, another controller can take over the service of the failed controller and can communicate with a plurality of extended processing chips, and then the plurality of extended processing chips can still perform data processing, so that the condition of service interruption is reduced; in the second aspect, the controller controls the communication with the extended processing chip through the intranet port thereof, and does not need to allocate an external IP address to the extended processing chip, thereby saving external IP resources and simplifying configuration; in the third aspect, the controller controls the communication with the expansion processing chip through the intranet port of the controller, and the expansion processing chip is isolated from the extranet, so that the safety is improved; in a fourth aspect, the intranet communication address is generally fixed and the extranet IP address is user-assigned, such that the intranet communication address is more manageable for the device than the extranet communication address.
Some related active/standby switching schemes generally include: the system comprises a main node, a standby node and a computing node, wherein the computing node is a node for analyzing and processing service data, under the condition that the main node does not fail, the main node manages the computing node, the main node and the computing node communicate with each other, if the main node fails, the standby node takes over the main node to manage the computing node, and the standby node and the computing node communicate with each other. The main node and the standby node are switched between each other through a software communication mechanism, or a communication link between the main node and the computing node is communicated on hardware, a communication link between the standby node and the computing node is also communicated on hardware, and the main node and the standby node control whether to communicate with the computing node or not through a software communication mechanism (self service logic) of the main node and the standby node. In the scheme, the main node and the standby node need to use different addresses to communicate with the computing node, otherwise, the address conflict situation occurs, so that the computing node needs to switch the destination address to normally communicate when the main node and the standby node are switched, and the logic of the computing node is complex.
In the scheme, the controllers control the communication with the extended processing chip by opening the intranet ports thereof on hardware, in other words, only one communication link between one controller and the extended processing chip is communicated on hardware, when one controller fails, the other controller is communicated with the hardware communication link between the other controller and the extended processing chip, so that the two controllers use the same address to communicate with the extended processing chip, and the condition of address conflict cannot occur, therefore, the extended processing chip does not need to switch the destination address, the logic of the extended processing chip is simplified, and the master-standby switching of the extended processing chip without knowing the conditions is realized.
In some of the above solutions for switching between the master node and the slave node, in order to reduce the address conflict between the master node and the slave node, the following measures are adopted: switching the destination address by the computing node; in other related main/standby switching schemes, in order to reduce the address conflict between the main node and the standby node, the adopted means is as follows: under the condition that the main node does not fail, the main node and the standby node use different addresses, if the main node fails, the standby node modifies the address of the standby node into the address of the main node, which is also called address drift, namely the address of the main node is drifted to the standby node; in this scheme, the computing node does not need to switch the destination address, but the standby node needs to switch its own address, which makes the logic of the standby node more complex.
In the scheme, the controllers control the communication with the extended processing chip by opening the intranet ports on the hardware, in other words, only one communication link between one controller and the extended processing chip is communicated on the hardware, and when one controller fails, the other controller is communicated with the hardware communication link between the other controller and the extended processing chip, so that the two controllers use the same address to communicate with the extended processing chip, and the address conflict can not occur, therefore, the controllers do not need to switch the destination addresses, and the logic of the controllers is simplified.
The embodiment of the invention also provides a service switching method and a device, wherein the method and the device can be applied to a first controller in electronic equipment, the equipment also comprises a second controller and a plurality of extended processing chips, each controller comprises an intranet network port, and each controller is connected with the plurality of extended processing chips through the intranet network port of the controller. The service switching method is described below with reference to fig. 7.
Fig. 7 is a flowchart of a service switching method according to an embodiment of the present invention, where the method includes:
s701: judging whether the first controller is a main controller or not; if yes, S702 is executed, and if no, S703 is executed.
The first controller in S701 is an execution subject of the method embodiment, and the first controller may be any one of the controllers in the electronic device.
For example, when the electronic device is started, the two controllers can negotiate which controller becomes the master controller through the heartbeat portal. For example, when the electronic device is started, the two controllers usually have a speed that is lower than the speed of the completion of the startup after power-on, and the controller that completes the startup quickly after power-on (or the controller that completes the startup first) may be used as the main controller, and the other controller is used as the slave controller. Specifically, after the controller is powered on and started, whether the opposite-end controller is powered on and started is detected through the heartbeat network port, and if the powered on and started is not finished, the controller which is powered on and started becomes the main controller.
Each controller is provided with two kinds of service logics, wherein one kind is a main controller service logic, and the other kind is a controller service logic. For each controller, the controller judges whether the controller is a main controller or a controller, and if the controller is the main controller, the controller enters the main controller service logic; if the controller is the controller, the controller service logic is entered. The main controller service logic is then executing S702 and the controller service logic is then executing S703-S704.
S702: and in the process of processing the service by the first controller, the first controller communicates with the plurality of extended processing chips through the intranet port of the first controller.
S702 and subsequent S704 may include: in the process that the first controller processes the service, the service data to be processed are sent to the plurality of extended processing chips through the intranet network port of the first controller; and receiving the data processing results sent by the plurality of extended processing chips through the intranet net port of the first controller.
Generally, a controller acquires service data to be processed and sends the service data to a plurality of extended processing chips, and each extended processing chip processes the service data received by itself and feeds back a processing result to the controller. In this embodiment of the present invention, a communication process between the controller and the extended processing chip may include: the controller sends service data to the extended processing chip, and the extended processing chip sends a data processing result to the controller; in addition, the controller may also control resetting and starting of the extended processing chip, and the controller may also monitor a processing state of the extended processing chip on data, and the like, which is not limited specifically.
S703: detecting whether a second controller as a main controller fails; if there is a failure, S704 is executed, and if there is no failure, the detection is continued.
For example, each controller may include a heartbeat network port, and the master controller and the slave controller detect whether the opposite controller fails through the heartbeat network port.
S704: switching to the main controller to process the service; in the process of processing the service, an intranet port of a first controller is started; and the internal network port of the first controller is communicated with the plurality of extended processing chips.
The electronic device provided in this embodiment includes two controllers, and in one case, the two controllers may be the same component in hardware. For example, as shown in fig. 2, the main controller 100 may include a main board and a controller circuit disposed on the main board; the controller circuit includes: a CPU (Central Processing Unit), a memory, an HBA (host bus Adapter), an extranet network card providing an extranet portal, an intranet network card providing an intranet portal 110, and a heartbeat network card providing a heartbeat portal; the controller 200 may include a main board and a controller circuit provided on the main board; the controller circuit includes: CPU, memory, HBA, an extranet network card providing extranet port, an intranet network card providing intranet port 210 and a heartbeat network card providing heartbeat port. The intranet network port is used for internal communication of the electronic equipment, for example, the controller communicates with the expansion processing chip through the intranet network port; the external network port is used for communication between the electronic device and the external device, for example, the electronic device obtains service data to be processed through the external network port; the HBA can be connected with the hard disk and performs data interaction with the hard disk; the heartbeat net ports of the two controllers are connected, and whether the two controllers are in fault can be detected through the heartbeat net ports.
In this embodiment, one controller in the electronic device is used to process the service, and the other controller is used as a backup. The controller for processing the service is called a main controller, the controller as a backup is a backup controller, and the main controller and the backup controller can be switched between the main controller and the backup controller. In the process of processing the service by the main controller, the main controller communicates with the plurality of extended processing chips, in this case, the intranet port of the controller is closed, and the controller cannot communicate with the plurality of extended processing chips; if the main controller fails, the backup controller switches to the main controller to process the service; and in the process of processing the service, starting an intranet port per se to communicate with the plurality of extended processing chips. Or, if the main controller fails, the backup takes over the main controller to process the service.
After S704, the first controller may control one or more extended processing chips of the plurality of extended processing chips to perform reset.
For example, assume that there are three extended processing chips: an expansion processing chip 1, an expansion processing chip 2 and an expansion processing chip 3; assuming that the main controller acquires 15 images to be processed, the main controller sends the 1 st, 4 th, 7 th, 10 th and 13 th images to the expansion processing chip 1, sends the 2 nd, 5 th, 8 th, 11 th and 14 th images to the expansion processing chip 2, and sends the 3 rd, 6 th, 9 th, 12 th and 15 th images to the expansion processing chip 3; assuming that the main controller fails, the expansion processing chip 1 is processing the 7 th image, the expansion processing chip 2 is processing the 8 th image, and the expansion processing chip 3 is processing the 9 th image; after the master controller 100 detects a fault through the heartbeat network port, the slave controller 200 switches to the master controller to process services, starts the intranet network port 210 of the slave controller, and then controls the three extended processing chips to reset; after resetting, the expansion processing chip 1 processes the 7 th image again, the expansion processing chip 2 processes the 8 th image again, and the expansion processing chip 3 processes the 9 th image again.
In the above example, if the controller is switched during the process of processing a certain image by the expansion processing chip, it is likely that the expansion processing chip interferes with the processing of the certain image, resulting in an error in the processing result of the certain image; in the embodiment, after the controller is switched, the extended processing chip is reset, and the image is processed again, so that the error of the processing result is reduced, and the accuracy of the processing result is improved.
In one embodiment, after S702 and/or S704, the first controller may write the processing state parameters of the data by the plurality of extended processing chips into a preset storage area; in this embodiment, in the case where a failure of the second controller as the main controller is detected, the processing state parameter may be acquired by accessing the preset storage area, and the service may be processed based on the processing state parameter.
As described above, the controller may monitor the processing state of the extended processing chip on the data, and in this embodiment, the main controller writes the monitored processing state parameter of the extended processing chip on the data into the hard disk. Continuing with the above example (example in which each expansion processing chip processes 5 images), taking the expansion processing chip 1 as an example, after the expansion processing chip 1 finishes processing the M (positive integer between 1 and 15) th image, the expansion processing chip 1 feeds back the processing result of the M image to the controller, and the controller may record the processing state parameter of the expansion processing chip 1, where the processing state parameter carries information that "the expansion processing chip 1 finishes processing the M image".
The main controller may periodically or non-periodically write the recorded process state parameters to a designated area of the hard disk, which may be understood as a shared area of the two controllers, which may be accessed by both controllers. For example, the main controller may write the recorded processing state parameters to a designated area of the hard disk once every minute, so that, when the main controller fails, the controller switches to the main controller, and then the processing state parameters may be read from the designated area of the hard disk. The process can be understood as a migration process of the services in the controllers, and after one controller fails, the other controller takes over the running services based on the read processing state parameters, so that the services can continue to run, that is, the extended processing chip continues to perform data processing.
Continuing with the above example, assume that the second controller as the main controller writes the processing state parameter "the extended processing chip 1 has completed processing on the mth image" into the designated area of the hard disk, and then the second controller fails, and after the first controller detects the failure of the second controller, the first controller switches itself to the main controller, and the first controller reads the processing state parameter from the designated area, and continues to allocate the service data to the extended processing chip 1 based on the processing state parameter. For example, assuming that M is 13, indicating that the expansion processing chip 1 is about to complete the processing of the 13 th image, the first controller may prepare to send the next image to the expansion processing chip 1.
By applying the embodiment of the invention, the electronic device comprises two controllers, and each controller judges whether the controller is a main controller or a controller; if the host controller is the main controller, processing the service, and communicating with the extended processing chip through the intranet port of the host controller; if the host controller is the controller, detecting whether the host controller fails, switching the host controller into the host controller to process services if the host controller fails, and starting an intranet port of the host controller to communicate with the plurality of extended processing chips; therefore, in the first aspect, even if one controller fails, the other controller can take over the service of the failed controller and can communicate with the multiple extended processing chips, and then the multiple extended processing chips can still perform data processing, thereby reducing the service interruption condition; in the second aspect, the controller controls the communication with the extended processing chip through the intranet port thereof, and does not need to allocate an external IP address to the extended processing chip, thereby saving external IP resources and simplifying configuration; in the third aspect, the controller controls the communication with the expansion processing chip through the intranet port of the controller, and the expansion processing chip is isolated from the extranet, so that the safety is improved; in a fourth aspect, the intranet communication address is generally fixed and the extranet IP address is user-assigned, such that the intranet communication address is more manageable for the device than the extranet communication address.
Some related active/standby switching schemes generally include: the system comprises a main node, a standby node and a computing node, wherein the computing node is a node for analyzing and processing service data, under the condition that the main node does not fail, the main node manages the computing node, the main node and the computing node communicate with each other, if the main node fails, the standby node takes over the main node to manage the computing node, and the standby node and the computing node communicate with each other. The main node and the standby node are switched between each other through a software communication mechanism, or a communication link between the main node and the computing node is communicated on hardware, a communication link between the standby node and the computing node is also communicated on hardware, and the main node and the standby node control whether to communicate with the computing node or not through a software communication mechanism (self service logic) of the main node and the standby node. In the scheme, the main node and the standby node need to use different addresses to communicate with the computing node, otherwise, the address conflict situation occurs, so that the computing node needs to switch the destination address to normally communicate when the main node and the standby node are switched, and the logic of the computing node is complex.
In the scheme, the controllers control the communication with the extended processing chip by opening the intranet ports thereof on hardware, in other words, only one communication link between one controller and the extended processing chip is communicated on hardware, when one controller fails, the other controller is communicated with the hardware communication link between the other controller and the extended processing chip, so that the two controllers use the same address to communicate with the extended processing chip, and the condition of address conflict cannot occur, therefore, the extended processing chip does not need to switch the destination address, the logic of the extended processing chip is simplified, and the master-standby switching of the extended processing chip without knowing the conditions is realized.
In some of the above solutions for switching between the master node and the slave node, in order to reduce the address conflict between the master node and the slave node, the following measures are adopted: switching the destination address by the computing node; in other related main/standby switching schemes, in order to reduce the address conflict between the main node and the standby node, the adopted means is as follows: under the condition that the main node does not fail, the main node and the standby node use different addresses, if the main node fails, the standby node modifies the address of the standby node into the address of the main node, which is also called address drift, namely the address of the main node is drifted to the standby node; in this scheme, the computing node does not need to switch the destination address, but the standby node needs to switch its own address, which makes the logic of the standby node more complex.
In the scheme, the controllers control the communication with the extended processing chip by opening the intranet ports on the hardware, in other words, only one communication link between one controller and the extended processing chip is communicated on the hardware, and when one controller fails, the other controller is communicated with the hardware communication link between the other controller and the extended processing chip, so that the two controllers use the same address to communicate with the extended processing chip, and the address conflict can not occur, therefore, the controllers do not need to switch the destination addresses, and the logic of the controllers is simplified.
Corresponding to the above method embodiment, an embodiment of the present invention further provides a communication apparatus, as shown in fig. 8; the device comprises:
a judging module 801, configured to judge whether the first controller is a master controller; if yes, the first processing module 802 is triggered, and if not, the second processing module 803 is triggered;
a first processing module 802, configured to communicate with the multiple extended processing chips through an intranet port of the first controller in a process of processing a service by the first controller;
a second processing module 803, configured to detect whether a second controller, which is a master controller, fails; if the fault occurs, the first controller is switched to a main controller to process the service; in the process of processing the service, an intranet port of the first controller is started; and the internal network port of the first controller is communicated with the plurality of extended processing chips.
As an embodiment, the first processing module 802 is further configured to:
under the condition that the first controller is a main controller, sending service data to be processed to the plurality of extended processing chips through an intranet network port of the first controller; receiving data processing results sent by the plurality of extended processing chips through an intranet port of the first controller;
The second processing module 803 is further configured to:
under the condition that the first controller is a controller and a second controller serving as a main controller is detected to be in fault, sending to-be-processed service data to the plurality of extended processing chips through an intranet port of the first controller; and receiving the data processing result sent by the plurality of extended processing chips through the intranet net port of the first controller.
As an embodiment, the apparatus further comprises: a write module and an acquisition module (not shown), wherein,
the writing module is used for writing the processing state parameters of the data of the plurality of extended processing chips into a preset storage area after the communication is carried out between the intranet network port of the first controller and the plurality of extended processing chips;
and the acquisition module is used for acquiring the processing state parameters by accessing the preset storage area under the condition that the detection module detects the fault of the main controller, and processing the service based on the processing state parameters.
As an embodiment, the apparatus further comprises:
and a reset module (not shown in the figure) for controlling one or more extended processing chips in the plurality of extended processing chips to reset after the intranet port of the first controller is started under the condition that a failure of the second controller serving as a main controller is detected.
By applying the embodiment of the invention, the electronic device comprises two controllers, and each controller judges whether the controller is a main controller or a controller; if the host controller is the main controller, processing the service, and communicating with the extended processing chip through the intranet port of the host controller; if the host controller is the controller, detecting whether the host controller fails, switching the host controller into the host controller to process services if the host controller fails, and starting an intranet port of the host controller to communicate with the plurality of extended processing chips; therefore, in the first aspect, even if one controller fails, the other controller can take over the service of the failed controller and can communicate with the multiple extended processing chips, and then the multiple extended processing chips can still perform data processing, thereby reducing the service interruption condition; in the second aspect, the controller controls the communication with the extended processing chip through the intranet port thereof, and does not need to allocate an external IP address to the extended processing chip, thereby saving external IP resources and simplifying configuration; in the third aspect, the controller controls the communication with the expansion processing chip through the intranet port of the controller, and the expansion processing chip is isolated from the extranet, so that the safety is improved; in a fourth aspect, the intranet communication address is generally fixed and the extranet IP address is user-assigned, such that the intranet communication address is more manageable for the device than the extranet communication address.
Some related active/standby switching schemes generally include: the system comprises a main node, a standby node and a computing node, wherein the computing node is a node for analyzing and processing service data, under the condition that the main node does not fail, the main node manages the computing node, the main node and the computing node communicate with each other, if the main node fails, the standby node takes over the main node to manage the computing node, and the standby node and the computing node communicate with each other. The main node and the standby node are switched between each other through a software communication mechanism, or a communication link between the main node and the computing node is communicated on hardware, a communication link between the standby node and the computing node is also communicated on hardware, and the main node and the standby node control whether to communicate with the computing node or not through a software communication mechanism (self service logic) of the main node and the standby node. In the scheme, the main node and the standby node need to use different addresses to communicate with the computing node, otherwise, the address conflict situation occurs, so that the computing node needs to switch the destination address to normally communicate when the main node and the standby node are switched, and the logic of the computing node is complex.
In the scheme, the controllers control the communication with the extended processing chip by opening the intranet ports thereof on hardware, in other words, only one communication link between one controller and the extended processing chip is communicated on hardware, when one controller fails, the other controller is communicated with the hardware communication link between the other controller and the extended processing chip, so that the two controllers use the same address to communicate with the extended processing chip, and the condition of address conflict cannot occur, therefore, the extended processing chip does not need to switch the destination address, the logic of the extended processing chip is simplified, and the master-standby switching of the extended processing chip without knowing the conditions is realized.
In some of the above solutions for switching between the master node and the slave node, in order to reduce the address conflict between the master node and the slave node, the following measures are adopted: switching the destination address by the computing node; in other related main/standby switching schemes, in order to reduce the address conflict between the main node and the standby node, the adopted means is as follows: under the condition that the main node does not fail, the main node and the standby node use different addresses, if the main node fails, the standby node modifies the address of the standby node into the address of the main node, which is also called address drift, namely the address of the main node is drifted to the standby node; in this scheme, the computing node does not need to switch the destination address, but the standby node needs to switch its own address, which makes the logic of the standby node more complex.
In the scheme, the controllers control the communication with the extended processing chip by opening the intranet ports on the hardware, in other words, only one communication link between one controller and the extended processing chip is communicated on the hardware, and when one controller fails, the other controller is communicated with the hardware communication link between the other controller and the extended processing chip, so that the two controllers use the same address to communicate with the extended processing chip, and the address conflict can not occur, therefore, the controllers do not need to switch the destination addresses, and the logic of the controllers is simplified.
An embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements any of the service switching methods described above.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, as for the method embodiment, the apparatus embodiment and the computer-readable storage medium embodiment, since they are substantially similar to the electronic device embodiment, the description is relatively simple, and the relevant points can be referred to the partial description of the electronic device embodiment.
The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims (11)

1. An electronic device, comprising: the system comprises a main controller, a controller and a plurality of extended processing chips; each controller comprises an intranet net port, and each controller is connected with the plurality of extended processing chips through the intranet net port of the controller;
the main controller communicates with the plurality of extended processing chips through an intranet port of the main controller in the process of processing the service;
the backup controller detects whether the main controller fails; if the fault occurs, the backup controller switches to the main controller to process the service; the switched main controller starts an intranet port of the main controller in the process of processing the service; and communicating with the plurality of extended processing chips through the switched intranet net port of the main controller.
2. The apparatus of claim 1, further comprising a first network switch chip and a backplane;
the first network switching chip is arranged on the back plate; the intranet net port of each controller is connected with the first network exchange chip; the first network exchange chip is connected with the plurality of extended processing chips through a first type interface on the backboard.
3. The device of claim 2, wherein the plurality of extended processing chips are disposed on a tray, the tray further having a second network switch chip disposed thereon, the tray being inserted into the first type of interface on the backplane, the plurality of extended processing chips being connected to the second network switch chip, the second network switch chip being connected to the first type of interface on the backplane.
4. The device of claim 3, wherein the plurality of extended processing chips are disposed on a plurality of trays, each tray is disposed with a second network switching chip and one or more extended processing chips, each tray is inserted into one of the first type interfaces on the backplane, each extended processing chip is connected to the second network switching chip on the tray where the extended processing chip is located, and each second network switching chip is connected to the first type interface into which the tray where the extended processing chip is located is inserted.
5. The device of claim 2, further comprising a hard disk, the hard disk being connected to the second type of interface on the backplane; each controller comprises an adapter, and each controller is connected with the second type interface on the back panel through the adapter of the controller.
6. The apparatus according to claim 5, wherein the main controller writes the processing state parameters of the plurality of extended processing chips on the data into a designated area of the hard disk; and if the main controller fails, the backup controller accesses the specified area after switching to the main controller to process the service, acquires the processing state parameters, and processes the service based on the processing state parameters.
7. The device according to claim 1, wherein each controller comprises a heartbeat network port, and the main controller and the slave controller detect whether the opposite controller fails through the heartbeat network port.
8. The device according to claim 1, wherein the controller controls the plurality of extended processing chips to reset after switching to the main controller to process the service and starting an intranet port of the controller.
9. The device of claim 5, wherein the extended processing chip is a GPU, and the first type of interface and the second type of interface are both SFF-8639 interfaces.
10. A service switching method is characterized in that the service switching method is applied to a first controller in electronic equipment, the equipment further comprises a second controller and a plurality of extended processing chips, each controller comprises an intranet network port, and each controller is connected with the plurality of extended processing chips through the intranet network port of the controller; the method comprises the following steps:
If the first controller is a main controller, communicating with the plurality of extended processing chips through an intranet port of the first controller in the process that the first controller processes the service;
if the first controller is a controller, detecting whether a second controller serving as a main controller fails; if the fault occurs, the first controller is switched to a main controller to process the service; in the process of processing the service, an intranet port of the first controller is started; and the internal network port of the first controller is communicated with the plurality of extended processing chips.
11. A service switching device is characterized in that the service switching device is applied to a first controller in electronic equipment, the equipment further comprises a second controller and a plurality of extended processing chips, each controller comprises an intranet network port, and each controller is connected with the plurality of extended processing chips through the intranet network port of the controller; the device comprises:
the first processing module is used for communicating with the plurality of extended processing chips through the intranet net port of the first controller in the process that the first controller processes the business under the condition that the first controller is a main controller;
The second processing module is used for detecting whether a second controller serving as a main controller fails or not under the condition that the first controller is a slave controller; if the fault occurs, the first controller is switched to a main controller to process the service; in the process of processing the service, an intranet port of the first controller is started; and the internal network port of the first controller is communicated with the plurality of extended processing chips.
CN201910353386.7A 2019-04-29 2019-04-29 Electronic equipment and service switching method and device Pending CN111858187A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910353386.7A CN111858187A (en) 2019-04-29 2019-04-29 Electronic equipment and service switching method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910353386.7A CN111858187A (en) 2019-04-29 2019-04-29 Electronic equipment and service switching method and device

Publications (1)

Publication Number Publication Date
CN111858187A true CN111858187A (en) 2020-10-30

Family

ID=72966290

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910353386.7A Pending CN111858187A (en) 2019-04-29 2019-04-29 Electronic equipment and service switching method and device

Country Status (1)

Country Link
CN (1) CN111858187A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112286851A (en) * 2020-11-06 2021-01-29 百度在线网络技术(北京)有限公司 Server mainboard, server, control method, electronic device and readable medium
CN115291814A (en) * 2022-10-09 2022-11-04 深圳市安信达存储技术有限公司 Embedded memory core data storage method, embedded memory chip and memory system

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050102549A1 (en) * 2003-04-23 2005-05-12 Dot Hill Systems Corporation Network storage appliance with an integrated switch
CN101192960A (en) * 2006-11-28 2008-06-04 中兴通讯股份有限公司 Main/slave switching detection and control device and method in distributed system
CN101340315A (en) * 2008-08-26 2009-01-07 中兴通讯股份有限公司 End-to-end Ethernet protection method and communication apparatus adopting the same
CN101877631A (en) * 2010-06-28 2010-11-03 中兴通讯股份有限公司 Server and business switching method thereof
CN103605346A (en) * 2013-11-22 2014-02-26 曙光信息产业(北京)有限公司 Master-slave management module failure automatic switching system and realization method thereof
CN104572534A (en) * 2014-12-06 2015-04-29 呼和浩特铁路局科研所 Locomotive information monitoring equipment and operating method thereof
CN105760260A (en) * 2016-02-19 2016-07-13 浙江大华系统工程有限公司 Backup system and backup method
CN106160841A (en) * 2015-04-09 2016-11-23 中兴通讯股份有限公司 Backboard, the method and apparatus of analysis message, the method and apparatus of realization communication
CN206249296U (en) * 2016-09-30 2017-06-13 浙江宇视科技有限公司 A kind of dual control storage server

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050102549A1 (en) * 2003-04-23 2005-05-12 Dot Hill Systems Corporation Network storage appliance with an integrated switch
CN101192960A (en) * 2006-11-28 2008-06-04 中兴通讯股份有限公司 Main/slave switching detection and control device and method in distributed system
CN101340315A (en) * 2008-08-26 2009-01-07 中兴通讯股份有限公司 End-to-end Ethernet protection method and communication apparatus adopting the same
CN101877631A (en) * 2010-06-28 2010-11-03 中兴通讯股份有限公司 Server and business switching method thereof
CN103605346A (en) * 2013-11-22 2014-02-26 曙光信息产业(北京)有限公司 Master-slave management module failure automatic switching system and realization method thereof
CN104572534A (en) * 2014-12-06 2015-04-29 呼和浩特铁路局科研所 Locomotive information monitoring equipment and operating method thereof
CN106160841A (en) * 2015-04-09 2016-11-23 中兴通讯股份有限公司 Backboard, the method and apparatus of analysis message, the method and apparatus of realization communication
CN105760260A (en) * 2016-02-19 2016-07-13 浙江大华系统工程有限公司 Backup system and backup method
CN206249296U (en) * 2016-09-30 2017-06-13 浙江宇视科技有限公司 A kind of dual control storage server

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112286851A (en) * 2020-11-06 2021-01-29 百度在线网络技术(北京)有限公司 Server mainboard, server, control method, electronic device and readable medium
CN112286851B (en) * 2020-11-06 2023-06-23 百度在线网络技术(北京)有限公司 Server main board, server, control method, electronic device and readable medium
CN115291814A (en) * 2022-10-09 2022-11-04 深圳市安信达存储技术有限公司 Embedded memory core data storage method, embedded memory chip and memory system

Similar Documents

Publication Publication Date Title
CN106776159B (en) Fast peripheral component interconnect network system with failover and method of operation
US6934878B2 (en) Failure detection and failure handling in cluster controller networks
US8498967B1 (en) Two-node high availability cluster storage solution using an intelligent initiator to avoid split brain syndrome
CN110807064B (en) Data recovery device in RAC distributed database cluster system
US8484416B2 (en) High availability raid using low-cost direct attached raid controllers
US20070174719A1 (en) Storage control device, and error information management method for storage control device
CN113342262B (en) Method and apparatus for disk management of an all-flash memory array server
CN112199240B (en) Method for switching nodes during node failure and related equipment
CN105204977A (en) System exception capturing method, main system, shadow system and intelligent equipment
CN109783280A (en) Shared memory systems and shared storage method
CN111581043A (en) Server power consumption monitoring method and device and server
CN111858187A (en) Electronic equipment and service switching method and device
CN107145304B (en) Server, storage system and related method
US20070294600A1 (en) Method of detecting heartbeats and device thereof
US7996707B2 (en) Method to recover from ungrouped logical path failures
CN114610551A (en) Method for realizing dual-computer hot standby system based on FPGA fault detection
CN118214648A (en) Dual-computer hot standby management method and computing equipment
CN109358982A (en) Hard disk self-healing device, method and hard disk
JP6135403B2 (en) Information processing system and information processing system failure processing method
JP2011076344A (en) Information processing apparatus, method of controlling information processing apparatus and control program
US7725761B2 (en) Computer system, fault tolerant system using the same and operation control method and program thereof
CN107147516B (en) Server, storage system and related method
CN113342593B (en) Method and apparatus for high availability management of full flash memory array servers
JP4495248B2 (en) Information processing apparatus and failure processing method
US20100023802A1 (en) Method to recover from logical path failures

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination