Online replacement method and system for server management module
Technical Field
The invention belongs to the technical field of server data processing, and particularly relates to an online replacement method and system for a server management module.
Background
In the current popular design of the server, the main Board and the Management module (i.e. BMC) are designed separately in consideration of reliability, maintainability, etc. However, when a management module (management board) is faulty or damaged, after a new management module is replaced, another management computer is often needed to be manually used and connected to the network port of the management module, and the management module can be continuously used after related parameters are manually reconfigured, which is low in efficiency.
Therefore, at present, a method and a system for replacing a server management module online are needed to solve the above problems.
Disclosure of Invention
The invention aims to provide an online replacement method for a server management module, which is used for solving the technical problems in the prior art, such as: when a management module (management board) is in fault or damaged, after a new management module is replaced, another management computer is often needed to be manually used and connected with a network port of the management module, and the management module can be continuously used after related parameters are manually reconfigured, so that the efficiency is low.
In order to achieve the purpose, the technical scheme of the invention is as follows:
an online replacement method for a server management module comprises the following sub-steps:
s1: configuring and storing user-defined configuration information of a user on the BMC;
s2: when the operating system of the server is started, the FPGA reads the user-defined configuration information stored by the BMC from the BMC;
s3: if the FPGA judges that the Flash of the current BMC does not have any information, the FPGA stores the self-defined configuration information into a nonvolatile storage medium;
s4: when the server operating system starts to run according to a normal state, if the BMC fault needs to be replaced, taking down the fault BMC and inserting a new BMC;
s5: the FPGA acquires the user-defined configuration information of the newly-inserted BMC again;
s6: comparing the historical user-defined configuration information in the stored nonvolatile storage medium with the user-defined configuration information acquired at the moment by the FPGA;
s7: if the custom configuration information is matched, skipping the configuration recovery process;
s8: if the SN information is not matched, the BMC is judged to be a newly replaced BMC, and the FPGA reads the stored historical custom configuration information from the nonvolatile medium and writes the historical custom configuration information into the newly replaced BMC again;
s9: the newly replaced BMC works by using the written custom configuration information to realize seamless switching.
Further, the custom configuration information includes, but is not limited to: user name, password, IP address of management network port and SN information.
Further, when the user-defined configuration information of the BMC is changed in the running process of the server operating system, the changed user-defined configuration information is synchronously refreshed and written into the nonvolatile medium of the FPGA of the mainboard.
Further, in step S3, if the FPGA determines that there is no information in the Flash of the current BMC, the FPGA first keeps the current state and does not perform the next step, the FPGA determination of the previous step is restarted, and if it is determined again that there is no information in the Flash of the current BMC, the FPGA forwards the custom configuration information to the nonvolatile storage medium managed by the FPGA.
Further, when the FPGA determines again that there is no information in the Flash of the current BMC, the server operating system determines that the BMC in step S1 fails to store the custom configuration information.
Further, when the server operating system determines that the BMC in step S1 fails to store the custom configuration information, the BMC storage custom configuration information program in step S1 is restarted, and after the restart is completed, if the FPGA determines that there is no information in the Flash where the BMC is restarted at present in step S3, the server operating system determines that the BMC fails at this time.
The server management module online replacement system adopts the server management module online replacement method to perform online replacement of the server management module.
Compared with the prior art, the invention has the beneficial effects that:
one innovation point of the scheme is that the configuration of the related management module is automatically recovered by intelligently judging whether the management module is changed, so that manual configuration is avoided, and the efficiency and the usability are improved.
Drawings
Fig. 1 is a schematic flow chart illustrating steps according to an embodiment of the present application.
Fig. 2 is a schematic structural diagram of an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to fig. 1 to 2 of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example (b):
as shown in fig. 1 and fig. 2, a more general method for automatically restoring the configuration of the management board is proposed, so that the management module can be replaced online without human intervention, and the management module and the FPGA on the motherboard are connected in some way (e.g., I2C, SPI, etc.).
In the normal operation of the system, the FPGA of the motherboard stores all the key parameters (such as IP, user name, password, key configuration item, SN information, etc.) of the management module, and updates the stored parameter values when the BMC related parameters are updated. When a management module fails, a new management module is replaced. The FPGA detects that a new management module is accessed, and automatically reloads the stored related key parameters into the newly replaced management module configuration parameters, so that the FPGA can automatically take effect without manual intervention, and the efficiency is improved.
There are usually more user-defined configuration information stored on the BMC, such as user name, password, management IP, SN information of the management subsystem, etc.
These custom configuration information is stored on the management module by default.
When the system is started, the FPGA reads SN information and other custom configuration information of the management module from the management module.
And if the FPGA judges that no information exists in the current Flash, the FPGA forwards the SN information and other self-defined configuration information to a nonvolatile storage medium such as Flash.
If the FPGA judges that the Flash of the current BMC does not have any information, the current state is firstly kept and the next step is not carried out, the FPGA judgment of the previous step is restarted, and if the FPGA judges that the Flash of the current BMC does not have any information again, the FPGA stores the self-defined configuration information into a nonvolatile storage medium managed by the FPGA. And subsequent operation errors caused by the operation of the BUG of the system are avoided.
When the FPGA judges again that there is no information in the Flash of the current BMC, the server operating system judges that the BMC in step S1 fails to store the custom configuration information.
When the server operating system judges that the BMC fails to store the custom configuration information, restarting a program of the BMC for storing the custom configuration information, and after restarting is completed, if the FPGA judges that the Flash of the current restarted BMC still has no information, judging that the BMC fails at the moment by the server operating system. Thereby completing the fault judgment of the BMC.
The system starts to operate according to a normal state.
When the management module is needed to be replaced due to failure or other reasons, the failed management module is taken down, and a new management module is inserted.
And the FPGA acquires the SN information of the newly inserted management module again.
And the FPGA compares the stored historical SN information with the SN information acquired this time.
If the SN information matches, the recovery configuration process is skipped.
If the SN information does not match, it indicates that it is a newly replaced management module.
The FPGA reads the stored self-defined configuration information from a nonvolatile medium such as Flash and writes the self-defined configuration information into the management module again.
The management module works by using new self-defined configuration information, seamless switching is realized, and manual configuration is not needed.
Particularly, if the relevant self-defined configuration information of the management module is changed in the running process of the system, the relevant self-defined configuration information needs to be synchronously refreshed and written into a nonvolatile medium such as Flash managed by the FPGA of the mainboard.
The above are preferred embodiments of the present invention, and all changes made according to the technical scheme of the present invention that produce functional effects do not exceed the scope of the technical scheme of the present invention belong to the protection scope of the present invention.