CN116150068A - Extended chip management method and device, storage medium and electronic equipment - Google Patents

Extended chip management method and device, storage medium and electronic equipment Download PDF

Info

Publication number
CN116150068A
CN116150068A CN202310071561.XA CN202310071561A CN116150068A CN 116150068 A CN116150068 A CN 116150068A CN 202310071561 A CN202310071561 A CN 202310071561A CN 116150068 A CN116150068 A CN 116150068A
Authority
CN
China
Prior art keywords
chip
expansion
expansion chip
fault
normal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310071561.XA
Other languages
Chinese (zh)
Inventor
唐传贞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202310071561.XA priority Critical patent/CN116150068A/en
Publication of CN116150068A publication Critical patent/CN116150068A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/40Bus structure
    • G06F13/4004Coupling between buses
    • G06F13/4022Coupling between buses using switching circuits, e.g. switching matrix, connection or expansion network
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/079Root cause analysis, i.e. error or fault diagnosis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0793Remedial or corrective actions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3031Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a motherboard or an expansion card
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3055Monitoring arrangements for monitoring the status of the computing system or of the computing system component, e.g. monitoring if the computing system is on, off, available, not available
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The embodiment of the application provides an extended chip management method, an extended chip management device, a storage medium and electronic equipment, wherein the method comprises the following steps: acquiring a monitoring signal of each expansion chip, and judging whether the expansion chip has faults or not through the monitoring signal; under the condition that the expansion chip fails, determining the expansion chip as a failure chip, and determining a normal expansion chip for monitoring the failure chip; updating the reserved row port of the normal expansion chip from a closed state to an started state, wherein the reserved downlink port is used for connecting equipment in communication connection with the fault chip; and updating port information of the normal expansion chip, and controlling equipment connected with the normal expansion chip and the fault chip to be in communication connection with the normal expansion chip based on the updated port information. According to the method and the device, the problem that the service is interrupted for a certain time when the expansion chip breaks down is solved, and the effect that the service is not interrupted when the expansion chip breaks down is achieved.

Description

Extended chip management method and device, storage medium and electronic equipment
Technical Field
The embodiment of the application relates to the field of computers, in particular to an extended chip management method, an extended chip management device, a storage medium and electronic equipment.
Background
The JBOD (Just Bundle Of Disks, disk cluster) product is connected with a host with a computing unit through a high-speed cable, and when a user needs to access data in the JBOD, the host extracts relevant data from the JBOD through the high-speed cable and uploads the relevant data to the cloud for the user to use. In the scene that larger data volume is stored in JBOD, the maximum number of expansion hard disks exists objectively in each expansion chip, the number of expansion ports of the expansion chip is limited, meanwhile, high reliability of products is considered, often, one JBOD needs a plurality of expansion chips which are redundant and backed up mutually to be connected with all hard disks, and each expansion chip can be responsible for a part of the number of hard disks. The host end is connected with a plurality of expansion chips in the JBOD through a plurality of cables to realize that the host accesses all hard disks in the JBOD.
However, when one of the expansion chips fails, the host cannot access the hard disk connected below the expansion chip through the expansion chip, in the related art, if only one expansion chip interconnected with the host in the JBOD is used, no redundancy backup exists, and when the expansion chip fails, all the hard disks in the whole JBOD are in an inaccessible state, data is lost in a short time, and service is interrupted. The command needs to be manually sent to the machine for adjustment, and the operator cannot monitor all the machines in a full time period, so that the hard disk under the fault equipment is in a state of being inaccessible for a long time.
If the same JBOD has mutually redundant expansion chips, the fault chip is monitored by a monitoring algorithm of upper software. The hard disk of the redundant expander chip connection needs to be a dual port SAS (Serial Attached SCSI, serial connection) hard disk, where one SAS port is abnormal and can continue to operate using another SAS port. Because the automatic configuration parameter switching cannot be realized, the configuration needs to be changed by manual remote operation, and the service interruption condition exists for a certain time.
Disclosure of Invention
The embodiment of the application provides an extended chip management method, an extended chip management device, a storage medium and electronic equipment, which are used for at least solving the problem that service interruption is caused for a certain time when an extended chip breaks down in the related technology.
According to one embodiment of the present application, there is provided an extended chip management method including: acquiring a monitoring signal of each expansion chip, and judging whether the expansion chip fails or not through the monitoring signal, wherein the expansion chip is used for realizing access of a host end to a disk cluster, the disk cluster comprises a plurality of disks, each expansion chip is in communication connection with a preset number of devices, and the devices are disks or next-stage expansion chips; under the condition that the expansion chip fails, determining the expansion chip as a failure chip, and determining a normal expansion chip for monitoring the failure chip, wherein the failure chip is an expansion chip which cannot be in communication connection with equipment, and the normal expansion chip and the failure chip are the same-level expansion chip; updating the reserved row port of the normal expansion chip from a closed state to an started state, wherein the reserved downlink port is used for connecting equipment in communication connection with the fault chip; and updating port information of the normal expansion chip, and controlling equipment connected with the normal expansion chip and the fault chip to be in communication connection with the normal expansion chip based on the updated port information.
In an exemplary embodiment, optionally, after the device controlling the connection of the normal expansion chip and the failed chip based on the updated port information is communicatively connected with the normal expansion chip, the method further comprises: sending fault information to a host end, wherein the fault information is used for notifying a user of position information of a fault chip; and receiving a repairing instruction of a user to the fault chip, and repairing the fault chip through the repairing instruction and the fault information.
In an exemplary embodiment, optionally, after repairing the failed chip by the repair instruction and the failure information, the method further comprises: acquiring a monitoring signal of the repaired fault chip, and judging whether the repaired fault chip works normally or not through the monitoring signal, wherein the normal work represents the function of recovering the communication connection of the repaired fault chip and equipment; under the condition that the repaired fault chip works normally, updating the reserved row port from a starting state to a closing state; and updating port information of the normal expansion chip, and controlling equipment connected with the normal expansion chip to be in communication connection with the normal expansion chip based on the updated port information.
In one exemplary embodiment, optionally, acquiring the monitoring signal of each expansion chip, and determining whether the expansion chip has a fault according to the monitoring signal includes: acquiring the working state information of each expansion chip, and judging whether the working state information is preset information, wherein the preset information is one of the following: presetting a heartbeat signal and a return value of a preset register; under the condition that the working state information is preset information, determining that the expansion chip has no fault; and under the condition that the working state information is not preset information, determining that the expansion chip fails.
In an exemplary embodiment, optionally, the expansion chip includes multiple levels, the expansion chip communicatively connected to the host side belongs to the first-level expansion chip, the expansion chip communicatively connected to the nth-level expansion chip belongs to the n+1th-level expansion chip, and the n+1th-level expansion chip is connected to a preset number of disks, where N is a positive integer, and N is greater than or equal to 1.
In an exemplary embodiment, optionally, a monitoring line is disposed between the expansion chips belonging to the same stage, and the expansion chips of the same stage acquire the monitoring signal of the monitored expansion chip through the monitoring line.
In one exemplary embodiment, optionally, updating the pre-left row port of the normal expansion chip from the off state to the on state comprises: acquiring a configuration file of a normal expansion chip; modifying the port configuration parameters in the configuration file to obtain updated port configuration parameters, wherein the port configuration parameters are used for configuring information of each port management disk; and updating the reserved row port from the closed state to the started state through the updated port configuration parameters.
According to another embodiment of the present application, there is provided an extended chip management apparatus including: the system comprises an acquisition unit, a storage unit and a control unit, wherein the acquisition unit is used for acquiring a monitoring signal of each expansion chip and judging whether the expansion chip fails or not through the monitoring signal, the expansion chip is used for realizing access of a disk cluster at a host end, the disk cluster comprises a plurality of disks, each expansion chip is in communication connection with a preset number of devices, and the devices are the disks or the next-stage expansion chip; the device comprises a determining unit, a judging unit and a judging unit, wherein the determining unit is used for determining the expansion chip as a fault chip and determining a normal expansion chip for monitoring the fault chip under the condition that the expansion chip fails, the fault chip is an expansion chip which cannot be in communication connection with equipment, and the normal expansion chip and the fault chip are the same-level expansion chip; the first updating unit is used for updating the reserved row port of the normal expansion chip from a closed state to an started state, wherein the reserved downlink port is used for connecting equipment in communication connection with the fault chip; and the second updating unit is used for updating the port information of the normal expansion chip and controlling the equipment connected with the normal expansion chip and the fault chip to be in communication connection with the normal expansion chip based on the updated port information.
According to a further embodiment of the present application, there is also provided a computer readable storage medium having stored therein a computer program, wherein the computer program is arranged to perform the steps of any of the method embodiments described above when run.
According to a further embodiment of the present application, there is also provided an electronic device comprising a memory having stored therein a computer program and a processor arranged to run the computer program to perform the steps of any of the method embodiments described above.
According to the method and the device, the monitoring circuit is arranged between the extension chips at the same level, so that the extension chips at the same level are mutually monitored through the monitoring signals, when the fault chip is found, the port configuration of the normal extension chip of the monitoring fault chip is updated, and the normal extension chip is in communication connection with the equipment connected with the fault chip through the reserved row port, so that the equipment which can be connected with the fault chip at the moment is guaranteed to be in communication connection with the host end, the problem that service is interrupted for a certain time when the extension chip breaks down can be solved, and the effect that the service cannot be interrupted when the extension chip breaks down is achieved.
Drawings
Fig. 1 is a hardware block diagram of a mobile terminal of an extended chip management method according to an embodiment of the present application;
FIG. 2 is a flow chart of an extended chip management method according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a connection between a host side and a disk cluster according to an embodiment of the present application;
fig. 4 is a block diagram of the structure of an extended chip management apparatus according to an embodiment of the present application.
Detailed Description
Embodiments of the present application will be described in detail below with reference to the accompanying drawings in conjunction with the embodiments.
It should be noted that the terms "first," "second," and the like in the description and claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order.
The method embodiments provided in the embodiments of the present application may be performed in a mobile terminal, a computer terminal or similar computing device. Taking the mobile terminal as an example, fig. 1 is a block diagram of a hardware structure of the mobile terminal of an extended chip management method according to an embodiment of the present application. As shown in fig. 1, a mobile terminal may include one or more (only one is shown in fig. 1) processors 102 (the processor 102 may include, but is not limited to, a microprocessor MCU or a processing device such as a programmable logic device FPGA) and a memory 104 for storing data, wherein the mobile terminal may also include a transmission device 106 for communication functions and an input-output device 108. It will be appreciated by those skilled in the art that the structure shown in fig. 1 is merely illustrative and not limiting of the structure of the mobile terminal described above. For example, the mobile terminal may also include more or fewer components than shown in fig. 1, or have a different configuration than shown in fig. 1.
The memory 104 may be used to store computer programs, such as software programs and modules of application software, such as computer programs corresponding to the extended chip management method in the embodiments of the present application, and the processor 102 executes the computer programs stored in the memory 104 to perform various functional applications and data processing, that is, implement the above-mentioned method. Memory 104 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory remotely located relative to the processor 102, which may be connected to the mobile terminal via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The transmission device 106 is used to receive or transmit data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the mobile terminal. In one example, the transmission device 106 includes a network adapter (Network Interface Controller, simply referred to as NIC) that can connect to other network devices through a base station to communicate with the internet. In one example, the transmission device 106 may be a Radio Frequency (RF) module, which is configured to communicate with the internet wirelessly.
In this embodiment, there is provided an extended chip management method running on the mobile terminal, and fig. 2 is a flowchart of an extended chip management method according to an embodiment of the present application, as shown in fig. 2, where the flowchart includes the following steps:
step S202, a monitoring signal of each expansion chip is obtained, and whether the expansion chip fails or not is judged through the monitoring signal, wherein the expansion chip is used for enabling a host side to access a disk cluster, the disk cluster comprises a plurality of disks, each expansion chip is in communication connection with a preset number of devices, and the devices are the disks or the next-stage expansion chip.
Specifically, the monitoring signal may be a signal obtained by the peer expansion chip through the monitoring line, such as a heartbeat signal. And determining whether the expansion chip has faults or not by judging whether the monitoring signal is the same as the signal of the expansion chip in the normal working state.
It should be noted that, with the continuous development of cloud computing, in order to meet the requirement that people upload more data to the cloud, the demands of storage servers are increasing, and a large part of the uploaded data has a proportion that is not frequently accessed after uploading, and the part of the data is cold data. In order to cope with the increasing of cold data, manufacturers push out JBOD products in a cold storage form, and the products have no calculation unit, so that a large number of hard disks are concentrated in one box to be uniformly managed, and a large-capacity hard disk, namely a disk cluster, is similarly formed. Because the number of ports is limited, the host end cannot be directly connected with all the disks in the disk cluster, so that the host end is in communication connection with a certain number of disks through the expansion chip and is in communication connection with the expansion chip, and the host end can access all the chips in the disk cluster through the expansion chip.
Step S204, under the condition that the expansion chip fails, the expansion chip is determined to be a failure chip, and a normal expansion chip for monitoring the failure chip is determined, wherein the failure chip is an expansion chip which cannot be in communication connection with equipment, and the normal expansion chip and the failure chip are the same-level expansion chip.
Specifically, if the monitoring signal of a certain extended chip is different from the signal of the extended chip in the normal working state, the extended chip is indicated to be a fault chip. The fault chip cannot be in communication connection with the connected device, so that a normal expansion chip for monitoring the fault chip needs to be determined, the normal expansion chip can take over the device connected with the fault chip, and service interruption caused by the fact that a host cannot access a disk cluster is avoided.
Step S206, updating the reserved row port of the normal expansion chip from the closed state to the started state, wherein the reserved downlink port is used for connecting the equipment in communication connection with the fault chip.
Specifically, after the normal expansion chip for monitoring the fault chip is determined, the reserved row port of the normal expansion chip is updated from the closed state to the starting state, so that the normal expansion chip can take over the equipment connected with the fault chip.
And step S208, updating port information of the normal expansion chip, and controlling the equipment connected with the normal expansion chip and the fault chip to be in communication connection with the normal expansion chip based on the updated port information.
Specifically, after the state of the reserved row port is modified, the normal expansion chip can re-identify the port condition of the normal expansion chip, namely, the port information of the normal expansion chip is updated, and the hard disk information is read from the newly-added reserved downlink port again, so that all downlink devices connected with the fault chip are taken over. The host end can access all downlink equipment connected with the normal expansion chip and the fault chip through the normal expansion chip.
Through the steps, the problem of service interruption caused by a certain time when the expansion chips are in failure is solved, and the mutual monitoring signals are added between the expansion chips at the same level, so that each expansion chip can monitor other expansion chips with mutual backup redundancy. When the expansion chip monitors that other expansion chips are abnormal, the configuration parameters of the expansion chip are adjusted, so that the own downlink port forcibly takes over all the hard disks of the other expansion chip. And meanwhile, the normal expansion chip sends the information of the abnormal expansion chip to the host end, so that the subsequent operation instruction of the host end is issued through the normal expansion chip. Therefore, the fast noninductive switching of the data channels is realized, and the seamless connection quantity flow is realized. Meanwhile, the service data is prevented from being lost within a period of time, the maintenance times of products are reduced, the effect that the service is not interrupted when the expansion chip fails is achieved, and the service processing efficiency is improved.
The main execution body of the above steps may be a server, but is not limited thereto.
The execution order of step S102 and step S104 may be interchanged, i.e. step S104 may be executed first and then step S102 may be executed.
After the failed chip is found, fault information is further required to be sent to the host end to prompt a user to repair, in an exemplary embodiment, optionally, after the device for controlling the normal expansion chip and the connection of the failed chip based on the updated port information is in communication connection with the normal expansion chip, the method further includes: sending fault information to a host end, wherein the fault information is used for notifying a user of position information of a fault chip; and receiving a repairing instruction of a user to the fault chip, and repairing the fault chip through the repairing instruction and the fault information.
Specifically, the normal expansion chip sends out fault information to the host end, another expansion chip at the host end fails, so that all hard disk access operations sent by the host end later are realized through the normal expansion chip, a user is reminded of repairing the fault chip, and after the user obtains the fault information, the host end sends out a remote repairing instruction to the fault chip to repair the fault chip. The function of the peer expansion chips to monitor each other and quickly modify the configuration switching data link is realized by sending fault information to the host end. And ensuring that the service is not interrupted.
For example, the normal expansion chip 1 notifies the host side through the SAS link that the failed chip 2 is currently in a failed state, and all devices connected downstream of the failed chip 2 are taken over by the normal expansion chip 1. After the host receives the fault information, all the subsequent hard disk access instructions are executed through the normal expansion chip 1. Through the operation, the normal expansion chip can automatically adjust and configure all business operations of the fault chip.
After the host end repairs the failed chip, the connection state of the normal expansion chip and the device needs to be updated, and in an exemplary embodiment, optionally, after repairing the failed chip through the repair instruction and the fault information, the method further includes: acquiring a monitoring signal of the repaired fault chip, and judging whether the repaired fault chip works normally or not through the monitoring signal, wherein the normal work represents the function of recovering the communication connection of the repaired fault chip and equipment; under the condition that the repaired fault chip works normally, updating the reserved row port from a starting state to a closing state; and updating port information of the normal expansion chip, and controlling equipment connected with the normal expansion chip to be in communication connection with the normal expansion chip based on the updated port information.
For example, after taking over all the devices of the faulty chip 2, the normal expansion chip 1 continues to monitor the fault situation of the faulty chip 2 by means of the monitoring signal. If the subsequent user side remotely restores the fault chip 2 through a command, the fault chip 2 works normally. The normal expansion chip 1 can monitor that the fault chip 2 is recovered to normal through the monitoring signal. At this time, the normal expansion chip 1 modifies its own setting and closes the reserved row port, i.e. the expansion channel 3. The host side is notified via SAS link that all devices in the down link of the failed chip 2 have restored the communication connection with the failed chip.
When the subsequent host accesses all the hard disks in the JBOD, the access is not performed through the normal expansion chip but is performed through the fault chip. Through the operation, the normal expansion chip is automatically adjusted and configured, so that all business operations of the fault chip are recovered. Through the addition of the mutual monitoring lines among the peer expansion chips, the mutual monitoring fault state among the peers is realized, the configuration parameters of the mutual monitoring lines are adjusted, and all downlink equipment of the fault chip is forcibly taken over. The automatic switching of the service link is realized.
Judging whether the extended chip is a failed chip by judging whether the operation state information is preset information, and in an exemplary embodiment, optionally, acquiring a monitoring signal of each extended chip, and judging whether the extended chip fails by the monitoring signal includes: acquiring the working state information of each expansion chip, and judging whether the working state information is preset information, wherein the preset information is one of the following: presetting a heartbeat signal and a return value of a preset register; under the condition that the working state information is preset information, determining that the expansion chip has no fault; and under the condition that the working state information is not preset information, determining that the expansion chip fails.
Specifically, the operation state information may be a heartbeat link of the expansion chip, for example, the heartbeat link of the expansion chip 2 is connected to the expansion chip 1, and the heartbeat link of the expansion chip 1 is connected to the expansion chip 2. The software definition in each expansion chip monitors the heartbeat signal of the other expansion chip in real time. If the heartbeat signal is different from the preset heartbeat signal, the expansion chip is a fault chip. The operation state information may also be a return value of a preset register acquired by using an I2C (Inter Integrated Circuit, two-wire serial bus) or other active monitoring link, for example, the expansion chip 1 actively accesses the operation state register in the expansion chip 2 through the I2C link. The expansion chip 2 actively accesses the working state register in the expansion chip 1 through the I2C link. And if the return value of the working state register is different from the preset value, indicating that the extended chip is a fault chip. The connection state of the expansion chip and the equipment is timely adjusted by judging whether the expansion chip is a fault chip, so that the automatic switching of the service link is realized, and the service interruption is avoided.
In an exemplary embodiment, optionally, the expansion chip includes multiple levels, the expansion chip communicatively connected to the host side belongs to the first-level expansion chip, the expansion chip communicatively connected to the nth-level expansion chip belongs to the n+1th-level expansion chip, and the n+1th-level expansion chip is connected to a preset number of disks, where N is a positive integer, and N is greater than or equal to 1.
Specifically, when the host accesses a JBOD, it is necessary to pass through more than one expansion chip, such as two expansion chips, each expansion chip downstream port expands a preset number of hard disks, i.e. is communicatively connected to a preset number of hard disks. All peer expansion chips expand all hard disks in the entire JBOD. Meanwhile, each expansion chip needs to reserve an additional expansion circuit, namely, a circuit corresponding to the reserved row port. The reserved extension line of the first extension chip is connected to the hard disk connected with the downlink extension port of the second extension chip. The second expansion chip also reserves a line expansion connection to the hard disk to which the downstream expansion port of the first expansion chip is connected. The reserved downlink extension line is not used as an actual data stream line by default. The two expansion chips interconnected with the host are called as peer expansion chips, and monitoring lines are designed between the peer expansion chips, so that mutual monitoring is realized between all the peer expansion chips.
For example, FIG. 3 is a schematic diagram of a connection between a host and a disk cluster according to an embodiment of the present application, as shown in FIG. 3, there are four expander chips in JBOD, where expander chip 1 and expander chip 2 are directly connected to the host via two cables and communicate via a SAS link, these two expander chips being referred to as peer expander chips as the first level expander chip. There are also two second level extension chips, extension chip 3 and extension chip 4, below the first level extension chip. A certain number of hard disks are respectively connected below the expansion chip 3 and the expansion chip 4. The expansion chip 1 is connected to the expansion chip 3 through an expansion channel 1, and all hard disks under the expansion chip 3 can be accessed. The expansion chip 2 is connected to the expansion chip 4 via an expansion channel 2, and all hard disks under the expansion chip 4 are accessible. Meanwhile, an expansion channel 3 is reserved in the expansion chip 1 and connected with an expansion chip 4, and an expansion channel 4 is reserved in the expansion chip 2 and connected with the expansion chip 3.
In an exemplary embodiment, optionally, a monitoring line is disposed between the expansion chips belonging to the same stage, and the expansion chips of the same stage acquire the monitoring signal of the monitored expansion chip through the monitoring line.
For example, as shown in fig. 3, a mutual monitoring line is reserved between the first-stage two expansion chips 1 and 2. Under the normal working state of the host and the JBOD, the expansion channel 1 and the expansion channel 2 are in a normal communication state, and the expansion channel 3 and the expansion channel 4 are in a closed state. The expander chip 1 only has access to all hard disks under the expander chip 3, while the expander chip 2 only has access to all hard disks under the expander chip 4. The host accesses both first level expander chips via two SAS links to access all hard disks in the JBOD. When one of the expansion chips of the first stage has a problem, such as the expansion chip 2 has a failure. The expansion chip 1 finds out that the expansion chip 2 fails by monitoring the signal.
In one exemplary embodiment, optionally, updating the pre-left row port of the normal expansion chip from the off state to the on state comprises: acquiring a configuration file of a normal expansion chip; modifying the port configuration parameters in the configuration file to obtain updated port configuration parameters, wherein the port configuration parameters are used for configuring information of each port management disk; and updating the reserved row port from the closed state to the started state through the updated port configuration parameters.
Specifically, when the first expansion chip monitors that the second expansion chip fails, the normal expansion chip immediately acquires the configuration file of the normal expansion chip, and modifies the port configuration parameters in the configuration file. And opening the reserved downlink port reserved in the normal extension chip and connected to the fault chip, namely modifying the downlink reserved port in the configuration file from a closed state to a starting state.
For example, as shown in fig. 3, when the expansion chip 1 finds that the expansion chip 2 fails, the expansion chip 1 adjusts its own port configuration parameters, modifies a downlink reserved port in the configuration file from a disable state to a downlink stream state, opens a link of the expansion channel 3, and after the modification is completed, the expansion chip 1 can re-identify its own port condition, and re-read hard disk information from the newly added downlink port, thereby implementing all downlink devices of the expansion chip 2 taking over the failure. At this time, the expansion chip 1 can access not only all the hard disks in the downlink of the expansion chip 3 but also all the hard disks in the downlink of the expansion chip 4.
From the description of the above embodiments, it will be clear to a person skilled in the art that the method according to the above embodiments may be implemented by means of software plus the necessary general hardware platform, but of course also by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk), comprising several instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method described in the embodiments of the present application.
In this embodiment, an extended chip management device is further provided, and the device is used to implement the foregoing embodiments and preferred embodiments, and will not be described in detail. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. While the means described in the following embodiments are preferably implemented in software, implementation in hardware, or a combination of software and hardware, is also possible and contemplated.
Fig. 4 is a block diagram of an extended chip management apparatus according to an embodiment of the present application, as shown in fig. 4, the apparatus includes:
the device comprises an acquisition unit 10, a control unit and a control unit, wherein the acquisition unit is used for acquiring a monitoring signal of each expansion chip and judging whether the expansion chip fails or not through the monitoring signal, wherein the expansion chip is used for realizing access of a disk cluster at a host end, the disk cluster comprises a plurality of disks, each expansion chip is in communication connection with a preset number of devices, and the devices are the disks or the next-stage expansion chip;
a determining unit 20, configured to determine, when the extension chip fails, the extension chip as a failed chip, and determine a normal extension chip for monitoring the failed chip, where the failed chip is an extension chip that cannot be communicatively connected to the device, and the normal extension chip and the failed chip are the same-level extension chip;
a first updating unit 30, configured to update a reserved row port of a normal extended chip from a closed state to an activated state, where a reserved downlink port is used to connect a device communicatively connected to a failed chip;
and a second updating unit 40, configured to update port information of the normal expansion chip, and control the device connected to the normal expansion chip and the failure chip to be in communication connection with the normal expansion chip based on the updated port information.
In an exemplary embodiment, optionally, the apparatus further comprises: the sending unit is used for sending fault information to the host end, wherein the fault information is used for informing a user of the position information of the fault chip; the receiving unit is used for receiving a repairing instruction of a user to the fault chip and repairing the fault chip through the repairing instruction and the fault information.
In an exemplary embodiment, optionally, the apparatus further comprises: the judging unit is used for acquiring the monitoring signal of the repaired fault chip and judging whether the repaired fault chip works normally or not through the monitoring signal, wherein the normal work represents the function of recovering the communication connection of the repaired fault chip and the equipment; the third updating unit is used for updating the reserved row port from the starting state to the closing state under the condition that the repaired fault chip works normally; and the fourth updating unit is used for updating the port information of the normal expansion chip and controlling the equipment connected with the normal expansion chip to be in communication connection with the normal expansion chip based on the updated port information.
In one exemplary embodiment, the acquisition unit 10 optionally includes: the first acquisition module is used for acquiring the working state information of each expansion chip and judging whether the working state information is preset information or not, wherein the preset information is one of the following: presetting a heartbeat signal and a return value of a preset register; the first determining module is used for determining that the expansion chip has no fault under the condition that the working state information is preset information; and the second determining module is used for determining that the expansion chip fails under the condition that the working state information is not preset information.
In an exemplary embodiment, optionally, the expansion chip includes multiple levels, the expansion chip communicatively connected to the host side belongs to the first-level expansion chip, the expansion chip communicatively connected to the nth-level expansion chip belongs to the n+1th-level expansion chip, and the n+1th-level expansion chip is connected to a preset number of disks, where N is a positive integer, and N is greater than or equal to 1.
In an exemplary embodiment, optionally, a monitoring line is disposed between the expansion chips belonging to the same stage, and the expansion chips of the same stage acquire the monitoring signal of the monitored expansion chip through the monitoring line.
In one exemplary embodiment, optionally, the first updating unit 30 includes: the second acquisition module is used for acquiring the configuration file of the normal expansion chip; the modification module is used for modifying the port configuration parameters in the configuration file to obtain updated port configuration parameters, wherein the port configuration parameters are used for configuring the information of each port management disk; and the updating module is used for updating the reserved row ports from the closed state to the started state through the updated port configuration parameters.
It should be noted that each of the above modules may be implemented by software or hardware, and for the latter, it may be implemented by, but not limited to: the modules are all located in the same processor; alternatively, the above modules may be located in different processors in any combination.
Embodiments of the present application also provide a computer readable storage medium having a computer program stored therein, wherein the computer program is arranged to perform the steps of any of the method embodiments described above when run.
In one exemplary embodiment, the computer readable storage medium may include, but is not limited to: a usb disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing a computer program.
Embodiments of the present application also provide an electronic device comprising a memory having stored therein a computer program and a processor arranged to run the computer program to perform the steps of any of the method embodiments described above.
In an exemplary embodiment, the electronic device may further include a transmission device connected to the processor, and an input/output device connected to the processor.
Specific examples in this embodiment may refer to the examples described in the foregoing embodiments and the exemplary implementation, and this embodiment is not described herein.
It will be appreciated by those skilled in the art that the modules or steps of the application described above may be implemented in a general purpose computing device, they may be concentrated on a single computing device, or distributed across a network of computing devices, they may be implemented in program code executable by computing devices, so that they may be stored in a storage device for execution by computing devices, and in some cases, the steps shown or described may be performed in a different order than that shown or described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple modules or steps of them may be fabricated into a single integrated circuit module. Thus, the present application is not limited to any specific combination of hardware and software.
The foregoing description is only of the preferred embodiments of the present application and is not intended to limit the same, but rather, various modifications and variations may be made by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the principles of the present application should be included in the protection scope of the present application.

Claims (10)

1. An extended chip management method, comprising:
acquiring a monitoring signal of each expansion chip, and judging whether the expansion chip fails or not through the monitoring signal, wherein the expansion chip is used for realizing access of a disk cluster at a host end, the disk cluster comprises a plurality of disks, each expansion chip is in communication connection with a preset number of devices, and the devices are disks or next-stage expansion chips;
under the condition that the expansion chip fails, determining the expansion chip as a failure chip, and determining a normal expansion chip for monitoring the failure chip, wherein the failure chip is an expansion chip which cannot be in communication connection with equipment, and the normal expansion chip and the failure chip are the same-level expansion chip;
updating a reserved downlink port of the normal expansion chip from a closed state to an activated state, wherein the reserved downlink port is used for connecting equipment in communication connection with the fault chip;
updating port information of the normal expansion chip, and controlling equipment connected with the normal expansion chip and the fault chip to be in communication connection with the normal expansion chip based on the updated port information.
2. The method of claim 1, wherein after controlling the normal expansion chip and the device connected to the failed chip to be communicatively connected to the normal expansion chip based on the updated port information, the method further comprises:
sending fault information to the host, wherein the fault information is used for notifying a user of the position information of the fault chip;
and receiving a repairing instruction of the user to the fault chip, and repairing the fault chip through the repairing instruction and the fault information.
3. The method of claim 2, wherein after repairing the failed chip by the repair instruction and the failure information, the method further comprises:
acquiring a monitoring signal of the repaired fault chip, and judging whether the repaired fault chip works normally or not through the monitoring signal, wherein normal work represents the function of recovering the communication connection between the repaired fault chip and equipment;
under the condition that the repaired fault chip works normally, updating the reserved downlink port from a starting state to a closing state;
updating port information of the normal expansion chip, and controlling equipment connected with the normal expansion chip to be in communication connection with the normal expansion chip based on the updated port information.
4. The method of claim 1, wherein obtaining a monitor signal for each expansion chip and determining whether the expansion chip has failed based on the monitor signal comprises:
acquiring working state information of each expansion chip, and judging whether the working state information is preset information, wherein the preset information is one of the following: presetting a heartbeat signal and a return value of a preset register;
under the condition that the working state information is the preset information, determining that the expansion chip has no fault;
and under the condition that the working state information is not the preset information, determining that the expansion chip fails.
5. The method of claim 1, wherein the expansion chip comprises a plurality of levels, the expansion chip communicatively connected to the host side belongs to a first level expansion chip, the expansion chip communicatively connected to an nth level expansion chip belongs to an n+1th level expansion chip, and the n+1th level expansion chip is connected to the preset number of disks, wherein N is a positive integer, and N is 1 or more.
6. The method according to claim 5, wherein a monitor line is provided between the expansion chips belonging to the same stage, and the expansion chips of the same stage acquire monitor signals of the monitored expansion chips through the monitor line.
7. The method of claim 1, wherein updating the reserved downstream port of the normal expansion chip from an off state to an on state comprises:
acquiring a configuration file of the normal expansion chip;
modifying the port configuration parameters in the configuration file to obtain updated port configuration parameters, wherein the port configuration parameters are used for configuring information of each port management disk;
and updating the reserved downlink port from the closed state to the started state through the updated port configuration parameters.
8. An extended chip management apparatus, comprising:
the system comprises an acquisition unit, a control unit and a control unit, wherein the acquisition unit is used for acquiring a monitoring signal of each expansion chip and judging whether the expansion chip fails or not through the monitoring signal, the expansion chip is used for realizing that a host accesses a disk cluster, the disk cluster comprises a plurality of disks, each expansion chip is in communication connection with a preset number of devices, and the devices are disks or next-stage expansion chips;
the device comprises a determining unit, a judging unit and a judging unit, wherein the determining unit is used for determining the expansion chip as a fault chip and determining a normal expansion chip for monitoring the fault chip under the condition that the expansion chip breaks down, the fault chip is an expansion chip which cannot be connected with equipment in a communication way, and the normal expansion chip and the fault chip are the same-level expansion chip;
the first updating unit is used for updating the reserved downlink port of the normal expansion chip from a closed state to a starting state, wherein the reserved downlink port is used for connecting equipment in communication connection with the fault chip;
and the second updating unit is used for updating the port information of the normal expansion chip and controlling the equipment connected with the normal expansion chip and the fault chip to be in communication connection with the normal expansion chip based on the updated port information.
9. A computer readable storage medium, characterized in that a computer program is stored in the computer readable storage medium, wherein the computer program, when being executed by a processor, implements the steps of the method according to any of the claims 1 to 7.
10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method of any one of claims 1 to 7 when the computer program is executed.
CN202310071561.XA 2023-01-30 2023-01-30 Extended chip management method and device, storage medium and electronic equipment Pending CN116150068A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310071561.XA CN116150068A (en) 2023-01-30 2023-01-30 Extended chip management method and device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310071561.XA CN116150068A (en) 2023-01-30 2023-01-30 Extended chip management method and device, storage medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN116150068A true CN116150068A (en) 2023-05-23

Family

ID=86355684

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310071561.XA Pending CN116150068A (en) 2023-01-30 2023-01-30 Extended chip management method and device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN116150068A (en)

Similar Documents

Publication Publication Date Title
EP3617886B1 (en) Hot backup system, hot backup method, and computer device
CN101853172B (en) Device and method for dynamically upgrading complex programmable logic device (CPLD)
CN111294845B (en) Node switching method, device, computer equipment and storage medium
CN105430327A (en) NVR cluster backup method and device
CN112653734B (en) Real-time master-slave control and data synchronization system and method for server cluster
CN110967969A (en) High availability industrial automation system and method for transmitting information through the same
CN114510378A (en) Parameter backup method and device for air conditioning unit and electronic equipment
CN103001802B (en) Ether port failure self-repairing method and system
CN111309132B (en) Method for multi-gear power supply redundancy of server
US11874786B2 (en) Automatic switching system and method for front end processor
CN108667640B (en) Communication method and device, and network access system
CN116150068A (en) Extended chip management method and device, storage medium and electronic equipment
CN103152209A (en) Application service operation processing system based on multi-machine hot backup
CN103164297B (en) Storage system, control device, and storage system control method of controlling storage system
CN108984602A (en) A kind of database control method and Database Systems
CN112667428A (en) BMC fault processing circuit, method and device, electronic equipment and storage medium
CN110633176B (en) Working system switching method, cube star and switching device
CN112751688B (en) Flow control processing method of OTN (optical transport network) equipment, electronic equipment and storage medium
CN115705267A (en) Monitoring acquisition equipment, and main/standby switching method and system based on monitoring acquisition equipment
KR20130129566A (en) Apparatus and method for monitoring for plc system
CN114069600A (en) Power supply method, device and system
US7661024B2 (en) Bus terminator/monitor/bridge systems and methods
US9584131B2 (en) Programmable device, information processing device, and control method for processing circuit of programmable device
CN113991827A (en) SSD power-down protection method, device and system and electronic equipment
CN111026586B (en) Main and standby state switching method and device of cluster equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination