CN115408064A - Method, server and related equipment for supporting kernel online update - Google Patents

Method, server and related equipment for supporting kernel online update Download PDF

Info

Publication number
CN115408064A
CN115408064A CN202110592524.4A CN202110592524A CN115408064A CN 115408064 A CN115408064 A CN 115408064A CN 202110592524 A CN202110592524 A CN 202110592524A CN 115408064 A CN115408064 A CN 115408064A
Authority
CN
China
Prior art keywords
kernel
memory
code
pci
core
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110592524.4A
Other languages
Chinese (zh)
Inventor
周健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Cloud Computing Technologies Co Ltd
Original Assignee
Huawei Cloud Computing Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Cloud Computing Technologies Co Ltd filed Critical Huawei Cloud Computing Technologies Co Ltd
Priority to CN202110592524.4A priority Critical patent/CN115408064A/en
Publication of CN115408064A publication Critical patent/CN115408064A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/4401Bootstrapping
    • G06F9/4403Processor initialisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/60Software deployment
    • G06F8/65Updates
    • G06F8/656Updates while running

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Stored Programmes (AREA)

Abstract

The server can comprise a processor, a memory and a plurality of PCI devices, wherein the memory records codes of a first kernel, the processor is used for executing the codes of the first kernel to start the first kernel, scanning the PCI devices under the condition that the first kernel is started, recording topology information of the PCI devices into a preset address of the memory, in the process of executing the codes of the first kernel, the processor can also receive a kernel jump command, stop executing the codes of the first kernel according to the kernel jump command, execute the codes of a second kernel to start the second kernel, and acquire the topology information from the preset address of the memory under the condition that the second kernel is started. The server can avoid repeated scanning of a plurality of PCI devices in the kernel skipping process, reduce the service interruption time in the kernel skipping process and improve the use experience of users.

Description

Method, server and related equipment for supporting kernel online update
Technical Field
The present application relates to the field of computers, and in particular, to a method, a server, and a related device for supporting online kernel update.
Background
The kernel of the server is the core of the operating system and is responsible for managing the processes, memory, device drivers, files, and network systems of the system. In order to maintain the stability of the system, the server needs to update or repair the kernel at intervals, and the kernel update or repair needs to restart the server, which causes service interruption in the server and has a great influence on users.
In order to avoid the restarting of the server in the kernel updating and repairing process, a kernel jumping technology is developed. The kernel skipping means that after the first kernel to be updated loads the updated start parameter and start file of the second kernel, the first kernel is closed, the second kernel is initialized, and the second kernel takes over the work of the first kernel to update the first kernel. However, in the kernel jump process, the loading and initialization of the second kernel also require a long time, which causes service interruption for tens of seconds, affects service operation, and reduces user experience.
Disclosure of Invention
The application provides a method, a server and related equipment for supporting kernel online updating, which are used for solving the problem of poor user experience caused by overlong service interruption time required by kernel skipping in the kernel updating and repairing process.
In a first aspect, a server supporting kernel online update may include a processor, a memory, and a plurality of PCI devices, where the memory records a code of a first kernel, where the processor is configured to execute the code of the first kernel to start the first kernel, scan a plurality of Peripheral Component Interconnect (PCI) devices when the first kernel is started, record topology information of the plurality of PCI devices in a preset address of the memory, and in a process of executing the code of the first kernel, the processor may further receive a kernel jump command, stop executing the code of the first kernel according to the kernel jump command, execute a code of a second kernel to start the second kernel, and obtain the topology information from the preset address of the memory when the second kernel is started.
In a specific implementation, the processor may be formed by at least one general-purpose processor, such as a Central Processing Unit (CPU), or a combination of a CPU and a hardware chip. The hardware chip may be an application-specific integrated circuit (ASIC), a Programmable Logic Device (PLD), or a combination thereof. The PLD may be a Complex Programmable Logic Device (CPLD), a field-programmable gate array (FPGA), a General Array Logic (GAL), or any combination thereof.
The memory may be a volatile memory (volatile memory), such as a Random Access Memory (RAM), a dynamic random access memory (dynamic RAM, DRAM), a static random access memory (static RAM, SRAM), a Synchronous Dynamic RAM (SDRAM), a double data rate RAM (DDR), a cache (cache), and so on, and the memory may further include a combination of the above types.
The PCI devices may be PCI devices connected to expansion slots of a PCI bus, such as a sound card, a network card (nic), a Universal Serial Bus (USB) card, an Integrated Development Environment (IDE) interface card, a disk array (RAID) card, a video capture card, and the like, which is not limited in this application. The topology information of the multiple PCI devices is a bus topology structure established by the multiple PCI devices when the devices are initialized, and is used for describing a topology structure of a device system composed of the multiple PCI devices, and may specifically be a data structure linked list.
Note that the PCI device may also include a PCIe (peripheral component interconnect express) device.
Optionally, the preset address of the memory for storing the topology structure may be an address of a segment of storage space that has been applied to the memory before the processor executes the code of the first core, or an address of a segment of storage space that is applied to the memory before the processor receives the core jump command and stops executing the code of the first core, which is not limited in this application. The kernel jump command may specifically be a Kexec command, where the command is used to indicate that the currently running kernel jumps to a new kernel, and may also be other kernel jump commands, which is not limited in this application.
In the server described in the first aspect, the processor in the server may execute the code of the first kernel in the memory, store the topology information of the multiple PCI devices in the preset address of the memory, and after the processor stops executing the code of the first kernel and executes the code of the second kernel, obtain the topology information from the preset address of the memory, thereby avoiding repeated scanning of the multiple PCI devices in the kernel jumping process, reducing the service interruption duration in the kernel jumping process, and improving the user experience.
In a possible implementation manner of the first aspect, the code of the second kernel is a code obtained by performing bug fixing or function upgrading on the code of the first kernel, and the processor is further configured to obtain the code of the second kernel according to the kernel jump command, and load the code of the second kernel to the memory.
Alternatively, the processor may obtain the code of the second kernel from a predetermined location, and the system administrator may preset the code of the second kernel at the predetermined location, where the predetermined location is, for example, a certain address of the memory, or a certain network file server.
Specifically, after loading the code of the second core into the memory, the processor may store the start-stop address of the code of the second core in the memory in the page table, obtain the start-stop address of the code of the second core according to the page table, then obtain the code of the second core from the memory according to the start-stop address and execute the code, and start the second core.
Optionally, after the processor stops executing the code of the first kernel and executes the code of the second kernel to start the second kernel, the memory space where the code of the first kernel is located may be recovered, so as to reduce the occupied space of the memory and improve the memory utilization rate. If the kernel update is also required in the running process of the second kernel, a new kernel jump command may be received, the updated code of the third kernel may be received according to the new kernel jump command, and the code of the third kernel may be loaded to the recovered memory space, and of course, the code of the third kernel may also be loaded to other memory spaces, which is not limited in this application.
According to the implementation mode, after the processor receives the kernel jump command, the updated code of the second kernel is loaded into the memory, the processor executes the code of the second kernel, so that not only can the online updating of the kernel be realized, and the inconvenience brought to a user due to the interruption of the server be avoided, but also after the processor executes the code of the second kernel, the topology information can be obtained from the preset address of the memory, the repeated scanning of a plurality of PCI devices in the kernel jump process is avoided, the service interruption duration in the kernel jump process is reduced, and the use experience of the user is improved.
In a possible implementation manner of the first aspect, the processor is further configured to instruct the multiple PCI devices to reserve their own device state information before stopping executing the code of the first kernel according to the kernel jump command, and the processor is further configured to read the device state information from the multiple PCI devices after executing the code of the second kernel to obtain the topology information from the preset address of the memory.
In a specific implementation, the device state information of the PCI device may include a state of a functional characteristic included in the PCI device, for example, a state of SR-IOV of the PCI device is on or off, and the processor may instruct the PCI devices to store the device state information in a local register. When the processor stops executing the code of the first kernel, the processor executes the code of the second kernel to start the second kernel, and acquires the topology information from the preset address of the memory, the processor can communicate with the plurality of PCI devices to acquire the device state information of each PCI device.
By implementing the above implementation manner, the device state information of the PCI device is retained in the local PCI device, so that the processor 110 can recover the plurality of PCI devices in a state inheritance manner after executing the code of the second kernel, and the PCI devices do not need to be restarted and initialized.
In a possible implementation manner of the first aspect, a virtual instance is disposed in the first kernel, and one or any combination of the plurality of PCI devices is directly connected to the virtual instance, where the virtual instance is a virtual machine or a container. In brief, the server supporting online updating of the kernel provided by the application can also be deployed in a cloud data center, and it needs to be explained that before the server performs online updating of the kernel, the virtual instance in the first kernel can be migrated to other servers, and then online updating of the kernel is performed on the server, so that service interruption caused by updating of the kernel to a user is avoided, and user experience is improved.
Alternatively, the PCI devices may be Physical Functions (PFs) or Virtual Functions (VFs) in a hardware device supporting a single root I/O virtualization (SR-IOV) standard. The SR-IOV is a hardware-based virtualization solution, and the standard can virtualize a PCI device into a plurality of PCI virtual devices, wherein each PCI virtual device can be allocated to different virtual machines for use.
According to the implementation mode, the cloud data center generally has massive servers, the requirement for kernel skipping is large, the server supporting kernel online updating provided by the application is used, the server does not need to be restarted in the kernel updating process, heavy operation and maintenance caused by the server restarting of the cloud data center are avoided, the time required by kernel updating can be effectively reduced, long-time service interruption of the cloud data center is avoided, and the use experience of a user is improved.
In a second aspect, an apparatus for supporting online kernel update is provided, the apparatus comprising: the first kernel is used for scanning the PCI devices under the starting condition and recording the topology information of the PCI devices into a preset address of the memory; the first kernel is also used for receiving a kernel jump command and starting the second kernel according to the kernel jump command; and the second kernel acquires topology information from the preset address of the memory under the starting condition.
By implementing the device described in the second aspect, when the first kernel exits, the topology information of the PCI devices may be stored in the memory, and each PCI device is instructed to store its own device state information in the device local.
Optionally, the second kernel is a kernel obtained by performing bug fixing or function upgrading on the first kernel.
Optionally, the first core is further configured to instruct, according to the core jump command, the plurality of PCI devices to reserve their own device state information.
Optionally, the second core is further configured to read device state information from the multiple PCI devices after obtaining topology information from a preset address of the memory.
Optionally, a virtual instance is disposed in the first core, and one or any combination of the plurality of PCI devices is directed to the virtual instance.
Optionally, the virtual instance is a virtual machine or a container.
In a third aspect, a method for supporting online updating of a kernel is provided, which includes the following steps: the method comprises the steps that a first kernel scans a plurality of PCI devices under the condition of starting, and topology information of the PCI devices is recorded into a preset address of a memory; the first kernel receives a kernel jump command, and starts a second kernel according to the kernel jump command; and the second kernel acquires topology information from a preset address of the memory under the condition of starting.
By implementing the method described in the third aspect, when the first kernel exits, the topology information of the PCI devices may be stored in the memory, and each PCI device is instructed to store its own device state information in the device local, so that when the second kernel is loaded and initialized, the device information may be obtained from the memory without rescanning the PCI devices, thereby reducing the time required for loading and initializing the second kernel, and obtaining the device state information from each PCI device, so that the PCI devices may inherit the state according to the device state information, thereby making the PCI devices unnecessary to restart and initialize, further reducing the service interruption duration during kernel jump, and improving the user experience.
Optionally, the method further comprises the steps of: the second kernel is the kernel after the first kernel is subjected to bug fixing or function upgrading.
Optionally, before the first core starts the second core according to the core jump command, the first core may stop the plurality of PCI devices, instructing the plurality of PCI devices to retain their own device state information.
Optionally, after the second kernel obtains the topology information from the preset address of the memory, the second kernel may also read device state information from the plurality of PCI devices.
Optionally, a virtual instance is disposed in the first core, and one or any combination of the plurality of PCI devices is directed to the virtual instance.
Optionally, the virtual instance is a virtual machine or a container.
In a fourth aspect, a computer program product is provided which, when run on a computer, causes the computer to perform the method of the above aspects.
In a fifth aspect, a computer-readable storage medium is provided, having stored therein instructions, which, when run on a computer, cause the computer to perform the method of the above aspects.
In a sixth aspect, a computing device is provided that includes a processor configured to perform the method described in the above aspects.
The present application can further combine to provide more implementations on the basis of the implementations provided by the above aspects.
Drawings
FIG. 1 is a schematic structural diagram of a server supporting online kernel update provided in the present application;
FIG. 2 is a schematic diagram of physical and virtual functions of a PCI device;
FIG. 3 is an architecture diagram of a data center of a public cloud provided herein;
FIG. 4 is a schematic structural diagram of a device supporting online kernel update provided in the present application;
FIG. 5 is a flowchart illustrating steps of a method for supporting online kernel update according to the present application;
FIG. 6 is a schematic structural diagram of a PCI device system provided by the present application;
FIG. 7 is a flowchart illustrating steps of a kernel jump provided by the present application;
fig. 8 is a schematic structural diagram of a computing device provided in the present application.
Detailed Description
In order to facilitate understanding of the technical solution of the present invention, first, an application scenario of "kernel hopping" related to the present invention is explained.
The kernel is system software that provides functions such as a hardware abstraction layer, disk, i.e., file system control, multitasking, etc., and is responsible for managing processes, memories, device drivers, files, and network systems of the system and determining the performance and stability of the system. It should be understood that the steps required by the operating system are very complicated if the operating system directly operates on the hardware, and the kernel of the operating system can provide a hardware abstraction method to complete the operations, so that the programming is simpler, and therefore, the kernel (kernel) is also called as the core of the operating system.
In general, various security holes may occur in the operating process of an operating system, in order to maintain the stability of the system, a kernel needs to be updated or repaired by a server at intervals, and the kernel update or repair needs to be restarted by the server, which greatly affects the service operation. In order to avoid the restarting of the server in the kernel updating and repairing process, a kernel jumping technology is developed. The kernel skipping refers to closing a first kernel to be repaired, loading an updated second kernel, then reinitializing the second kernel, and the second kernel takes over the work of the first kernel to realize the updating of the first kernel.
However, in the kernel jump process, a long time is also required for loading and initializing the second kernel, for example, when the first kernel is exited, all PCI devices connected to the server need to be shut down, and when the second kernel is loaded and initialized, all PCI devices need to be rescanned and initialized.
In order to solve the problem that the service interruption of the kernel jump process for tens of seconds has an influence on service operation, the application provides a server supporting kernel online update, a processor in the server can execute a code of a first kernel in a memory, topology information of a plurality of PCI devices is stored in a preset address of the memory, and when the processor stops executing the code of the first kernel and executes a code of a second kernel, the topology information can be acquired from the preset address of the memory, so that repeated scanning of the PCI devices in the kernel jump process is avoided, the service interruption time in the kernel jump process is shortened, and the use experience of a user is improved.
As shown in fig. 1, fig. 1 is a server supporting kernel online update provided in the present application, and the server may include a processor 110, a memory 120, a plurality of PCI devices 130, and a bus 140, where the processor 110, the memory 120, and the plurality of PCI devices 130 may be connected to each other through the bus 140, and may also implement communication through other means such as wireless transmission. In one embodiment, bus 140 may be a PCI bus, and it should be understood that bus 140 is represented by a single thick line in FIG. 1, but does not represent only one bus or type of bus.
The server 100 may be a physical server, such as an X86 server, an ARM server, or the like, or may be a Virtual Machine (VM) implemented based on a general physical server and a Network Function Virtualization (NFV) technology, where the VM refers to a complete computer system that has a complete hardware system function and runs in a completely isolated environment through software simulation, such as a virtual device in cloud computing, and the present application is not limited specifically.
The processor 110 may be formed of at least one general-purpose processor, such as a Central Processing Unit (CPU), or a combination of a CPU and a hardware chip. The hardware chip may be an application-specific integrated circuit (ASIC), a Programmable Logic Device (PLD), or a combination thereof. The PLD may be a Complex Programmable Logic Device (CPLD), a field-programmable gate array (FPGA), a General Array Logic (GAL), or any combination thereof.
The processor 110 is configured to execute various types of digital storage instructions, such as code stored in the memory 120, that enable the server 100 to provide a variety of services.
Memory 120 may be a Volatile Memory (Volatile Memory) such as Random Access Memory (RAM), dynamic RAM (DRAM), static RAM (Static RAM, SRAM), synchronous DRAM (SDRAM), double Data Rate (DDR), cache (Cache), and so on, and Memory 120 may further include a combination of the above.
The memory 120 is configured to store codes for the processor 110 to execute, where the codes may include at least a code of a first kernel, a code of a second kernel, and topology information, where the code of the first kernel and the code of the second kernel may include program codes required in a kernel function starting, initializing, and running process, and may specifically include a starting parameter, a kernel file, and the like, which is not limited in this application. The topology information is a bus topology structure established by the PCI devices 130 when the devices are initialized, and is used to describe a topology structure of a device system formed by the PCI devices 130, and may specifically be a data structure linked list, and when the first kernel scans the PCI devices 130, the first kernel may search all the PCI devices from the PCI bus, and obtain the topology information after numbering and positioning each PCI device in sequence. The memory 120 may further include more codes and information, for example, if the processor 110 is a multi-core processor, the memory 120 may further include codes of more cores, and the present application is not limited in particular.
In a specific implementation, the processor 110 executes a code of a first kernel in the memory 120 to start the first kernel, scan the PCI devices 130 under the condition that the first kernel is started, and record topology information of the PCI devices 130 into a preset address of the memory 120. The processor 110 may receive the core jump command during the execution of the code of the first core, stop executing the code of the first core according to the core jump command, execute the code of the second core to start the second core, and acquire the topology information from the preset address of the memory 120 when the second core is started. The kernel jump command may specifically be a Kexec command, where the command is used to indicate that a currently running kernel jumps to a new kernel, and may also be other kernel jump commands, which is not limited in this application.
It is understood that the number of the PCI bus 140 in fig. 1 is only 1, but in an actual environment, the number of the bus is very large, and the bus can be divided into a main bus, a secondary bus, a tertiary bus, a secondary bus, and the like, each bus is connected with a PCI device, the topology structure formed by the PCI devices is very complex, and it will take a lot of time for the processor 110 to scan a plurality of PCI devices 130 to obtain the topology information. In the server 100 provided by the application, after the processor 110 executes the code of the second kernel, the topology information is acquired from the preset address of the memory 120, and the plurality of PCI devices 130 do not need to be scanned any more, so that a large amount of time consumed by rescanning the plurality of PCI devices is avoided, the service interruption time in the kernel jump process is reduced, and the user experience is improved.
In an embodiment, the code of the second kernel is obtained after bug fixing or function upgrading is performed on the code of the first kernel, so that after the processor executes the code of the second kernel and obtains the topology information from the preset address of the memory 120, online updating of the kernel can be achieved without stopping the whole server to update the first kernel, and the use experience of the user is improved.
It should be understood that although the memory 120 in fig. 1 includes the code of the first core and the code of the second core, in a specific implementation, the code of the second core is loaded into the memory 120 after the processor receives the core jump instruction and receives the code of the second core. Specifically, the processor 110 may receive a code of the second core according to the core jump command and then load the code of the second core to the memory 120. After stopping executing the code of the first kernel, the code of the executable second kernel starts the second kernel, and the topology information is obtained from the preset address of the memory 120 when the second kernel is started. In a specific implementation, after the processor 110 loads the code of the second core into the memory 120, the start-stop address of the code of the second core in the memory 120 is stored in the page table, the processor 110 obtains the start-stop address of the code of the second core according to the page table, and then obtains and executes the code of the second core from the memory 120 according to the start-stop address, so as to start the second core.
Optionally, after the processor 110 stops executing the code of the first core and executes the code of the second core to start the second core, the memory space where the code of the first core is located may be recovered, so as to reduce the occupied space of the memory 120 and improve the memory utilization rate. If the kernel update is also required in the running process of the second kernel, a new kernel jump command may be received, the updated code of the third kernel may be received according to the new kernel jump command, and the code of the third kernel may be loaded to the recovered memory space, and of course, the code of the third kernel may also be loaded to other memory spaces, which is not limited in this application.
It should be noted that the preset address of the memory 120 for storing the topology structure may be an address of a segment of memory space that has been applied to the memory 120 before the processor 110 executes the code of the first core, or an address of a segment of memory space that has been applied to the memory 120 before the processor 110 receives the core jump command and stops executing the code of the first core, which is not limited in this application.
The PCI devices 130 may be PCI devices connected to expansion slots of a PCI bus, such as a sound card, a network card (nic), a Universal Serial Bus (USB) card, an Integrated Development Environment (IDE) interface card, a disk array (RAID) card, a video capture card, and the like, which is not limited in this application. In fig. 1, the number of the PCI devices 130 is N, for example, and in a specific implementation, the number and the types of the PCI devices 130 are not limited in this application.
The PCI devices 130 may also be Physical Functions (PFs) or Virtual Functions (VFs) in hardware devices supporting a single root I/O virtualization (SR-IOV) standard. The SR-IOV is a hardware-based virtualization solution, and the standard can virtualize one PCI device into a plurality of PCI virtual devices, wherein each PCI virtual device can be allocated to a different virtual machine for use.
Specifically, a hardware device supporting the SR-IOV standard may be divided into at least one PF and at least one VF, where a PF refers to a PCI function supported by the hardware device, a PF may extend multiple VFs, and a VF refers to an instance virtualized by the hardware device. For example, as shown in fig. 2, fig. 2 is a schematic diagram of physical functions and virtual functions of a PCI device, the PCI device 1 opens an SR-IOV function, the PCI device 1 may include a plurality of physical functions, fig. 2 exemplifies PF1 and PF2, each physical function may extend a plurality of virtual functions, and fig. 2 exemplifies PF1 extending VF11 to VF13 and PF2 extending VF21 to VF 23. Assuming that the PCI device 1 shown in fig. 2 is a physical network card supporting the SR-IOV standard, the PF may be a PCI function supported by the physical network card, and each VF is presented to the virtual machine in the form of an independent network card. It should be understood that the above example is for illustration, it should be understood that fig. 2 is for example, and the number of VFs and PFs per PCI device is not limited in this application.
In one embodiment, the processor 110 may instruct the plurality of PCI devices 130 to retain their respective device state information prior to stopping execution of the code of the first core in accordance with the core jump command. The device status information of the PCI device may include a status of a functional characteristic included in the PCI device, for example, the status of SR-IOV of the PCI device is on or off. The processor 110 may instruct the plurality of PCI devices 130 to store the device state information in a local register.
In an embodiment, the processor 110 may be configured to execute the code of the second core, and read the device status information from the plurality of PCI devices after obtaining the topology information from the preset address of the memory. In a specific implementation, the processor 110 executes the code of the second core, and after obtaining the topology information from the preset address of the memory 120, the processor 110 may communicate with the plurality of PCI devices 130, so that each PCI device obtains the device state information from the local register, and since the plurality of PCI devices 130 are not turned off before the processor 110 stops executing the code of the first core, the plurality of PCI devices 130 may inherit the device state according to the state information, and device startup and device initialization do not need to be performed again.
It can be understood that the time required for the PCI device to start and initialize is usually in the order of seconds, and the time required for recovering the device according to the device state is in the order of milliseconds, so that the present application can recover a plurality of PCI devices in a state inheritance manner after the processor 110 executes the code of the second kernel by retaining the device state information of the PCI device in the local of the PCI device, thereby reducing the service interruption duration during the kernel jump, and improving the user experience.
For example, if a PCI device of the PCI devices 130 needs to turn on the SR-IOV function, and the time required for the device to initialize is about 1 second according to the PCI specification (PCI spec), the PCI device may be instructed to keep the device status information in the local register before the processor 110 stops executing the code of the first kernel by using the server supporting kernel online update provided in the present application. In this way, after the processor 110 executes the code of the second kernel and obtains the topology information from the preset address of the memory 120, the device state information may be read from the PCI device, so that the PCI device may implement state inheritance according to the device state information, and the SR-IOV function does not need to be repeatedly turned on, thereby shortening the service interruption time of the kernel jump process by at least 1 second, and if the server is connected to hundreds of PCI devices, it may reduce more service interruption time, and improve the user experience.
It should be understood that there are many partitions of the server 100 provided in the present application, fig. 1 is an exemplary partition, the server 100 may further include more modules, for example, the server 100 may further include an external memory, a communication interface, and the like, the modules may be combined into fewer modules or split into more modules, and the positional relationship between each module and the server 100 does not pose any limitation.
The following describes a deployment scenario of the server 100 provided in the present application.
The server 100 provided by the present application to support kernel online update may be deployed in a single physical server, for example, the server 100 shown in fig. 1 is a physical machine, and the PCI devices 130 are physical devices connected to the physical machine. In this scenario, the server 100 provided by the present application may implement online update of the kernel, reduce the time for service interruption during online update of the kernel, and improve the user experience.
The server 100 provided by the present application may also be deployed in a cloud environment, for example, fig. 3 is an architecture diagram of a public cloud data center, and the server 100 provided by the present application may be deployed in a public cloud data center 300 as shown in fig. 3, where as shown in fig. 3, the public cloud data center 300 may include a cloud management node 310 and a hardware resource pool 320.
The cloud management node 310 may be implemented by a general physical server, such as an ARM server or an X86 server, or may be a virtual machine implemented by the NFV technology, and the cloud management node 310 may also be a virtual machine or a physical server in the hardware resource pool 320, which is not limited in this application.
The hardware resource pool 320 may include at least one physical machine (fig. 3 illustrates an example of a resource pool including a physical machine 1, a physical machine 2, a physical machine 3, and a physical machine 4), where the physical machine may be a general physical server, such as an ARM server or an X86 server, and the present application is not limited in particular. The physical machines in the hardware resource pool 320 may communicate with other physical machines or the cloud management node 310 through an internal network. Each physical machine at least includes hardware resources (for example, the physical machine 1 includes the hardware resources 1, and the physical machine 2 includes the hardware resources 2) and an operating system (for example, the operating system 1 and the operating system 2), some physical machines may further include multiple virtual instances, where the virtual instances may be containers (for example, the physical machine 1 includes the container 11 and the container 12), or may be virtual machines (for example, the virtual machine 21 and the virtual machine 22), and the multiple virtual instances in the physical machine may share the operating system and the hardware resources in the physical machine. The hardware resources (e.g., hardware resource 1 and hardware resource 2) may include various available hardware resources of the server, such as processor 1, memory 1, multiple PCI devices 1, and the like, and may also include other hardware resources that may be needed by the user, which is not specifically limited in this application. The operating systems (such as the operating system 1 and the operating system 2) may be containers, virtual machines, or physical machine-suitable operating systems, such as an Android operating system, a windows operating system, a linux operating system, and the like, and the present application is not limited in particular. It should be noted that the operating system may be an official complete operating system, or an operating system obtained by modifying an individual driver module of the official complete operating system to adapt to the operation mode of the server, and the application is not limited in particular. The number of physical machines, the number of virtual machines, the number of containers, and the types and numbers of hardware resources shown in fig. 3 are only for illustration, and the present application is not particularly limited.
Through the cloud management node 310, a user can rent virtual instances or BMSs of various specifications for a fee according to his or her needs. Specifically, the cloud management node 310 may receive a lease request sent by a user, where the lease request carries a specification requirement, and according to the specification requirement and a resource idle condition of the hardware resource pool 320, a virtual machine, a container, or a BMS that meets the creation specification requirement is leased from the hardware resource pool 320, for example, the user requests to pay for leasing a virtual machine configured with a network card, and the management node 310 may create a VM1 from the hardware resource pool 320 according to the specification requirement and the resource idle condition of the hardware resource pool 320, and allocate the VF1 of the network card to the virtual machine for use. It is to be understood that the above description is illustrative, and that the present application is not limited to this description.
In a specific implementation, if a user requests to create a container with a specification X, after receiving a lease request sent by the user, the cloud management node 310 may determine a physical machine (for example, the physical machine 1 in fig. 3) that creates the container according to the specification X carried in the request and a resource idle condition of the hardware resource pool 320, and then send a container creation request to a cloud management agent node on the physical machine (for example, the cloud management agent node 1 in the physical machine 1 in fig. 3), where the specification X is carried in the container creation request, and the cloud management agent node 1 may create the container with the specification X (for example, the container 11 in fig. 3) according to the container creation request. If a user requests to create a virtual machine with specification Y, after receiving a lease request sent by the user, cloud management node 310 may determine, according to specification Y carried in the request and a resource idle condition of hardware resource pool 320, to create a physical machine (for example, physical machine 2 in fig. 3) of the virtual machine, and then send a virtual machine creation request to a virtual machine manager (for example, virtual machine manager 2 in fig. 3) on the physical machine, where the virtual machine creation request carries specification Y, and virtual machine manager 2 may create, according to the virtual machine creation request, a virtual machine with specification Y (for example, virtual machine 21 in fig. 3). It is to be understood that the foregoing is illustrative and that this application is not intended to be limiting.
Alternatively, the server 100 supporting online kernel update provided by the present application may be a physical machine in the hardware resource pool 320. The processor 110, the memory 120, and the PCI devices 130 in the server 100 may be hardware resources in a physical machine, for example, the processor 110 may be a processor 1, the memory 120 may be a storage 1, and the PCI devices 130 may be PCI devices 1, which is illustrated in fig. 3 for example and is not limited in this application.
In the application scenario shown in fig. 3, the code of the first core, the code of the second core, and the topology information may be stored in a memory (e.g., the memory 1 in fig. 3), and a processor (e.g., the processor 1 in fig. 3) executing the code of the first core in the memory may start the first core, where a virtual instance is set, and one or any combination of multiple PCI devices is directed to the virtual instance.
It should be noted that before the server 100 performs online kernel update, the virtual instance in the first kernel may be migrated to another server, and then the server 100 performs online kernel update, so that service interruption caused by kernel update to the user is avoided, and the user experience is improved.
Alternatively, the server 100 may also be a virtual instance in a physical machine, and it should be understood that multiple virtual instances in the same physical machine may share an operating system and hardware resources in the physical machine, so when the server is a virtual instance, it is also essential to perform online update on a kernel of the physical machine in which the virtual machine is located. For example, assuming that the server 100 is a container, such as the container 11 in fig. 3, the container 11 and the container 12 share an operating system and physical resources of the physical machine 1, in the application scenario, a code of a first kernel may be stored in the memory 1, the processor 1 executing the code of the first kernel may start the first kernel, and scan the PCI devices 1 when the first kernel is started, record topology information of the PCI devices into a preset address of the memory 1, when the processor 1 receives a kernel jump command during executing the code of the first kernel, receive a code of a second kernel according to the kernel jump command, and load the code into the memory 1, then stop executing the code of the first kernel, execute the code of the second kernel to start the second kernel, obtain the topology information from the preset address of the memory 1 when the second kernel is started, implement online update of the first kernel, the entire kernel update process does not need to shut down and restart the physical machine 1, and does not need to repeatedly scan the PCI devices 1, thereby reducing service interruption time of a user.
Further, as can be seen in reference to the embodiment of fig. 2, the plurality of PCI devices 130 communicating with the container 11 may be physical devices, physical functions, or virtual functions. It is assumed that, in the above example, the plurality of PCI devices that are in direct communication with the container 11 may be the physical function PF1 partitioned by the PCI device 1 shown in fig. 2. In the application scenario, after receiving the kernel jump command, the processor 1 may instruct the PF1 to reserve the device state information for the device state information, store the state information in a local register, execute a code of the second kernel to start the second kernel by the processor 1, and after obtaining the topology information from the preset address of the memory 1, may obtain the device state information of the PF1 again, so that the PF1 may inherit the device state according to the state information without performing device start and device initialization again. It is to be understood that the above description is illustrative, and that the present application is not limited to this description.
It can be understood that the cloud data center 300 shown in fig. 3 has a large number of servers, the required amount of kernel skipping is large, and when the server supporting kernel online update provided by the present application is used, not only is the server restart unnecessary in the kernel update process, but also heavy operation and maintenance of the cloud data center due to the server restart is avoided, the time required by kernel update can be effectively reduced, and the use experience of the user is improved due to long-time service interruption of the cloud data center is avoided.
In summary, according to the server supporting the online update of the kernel provided by the application, the processor in the server can execute the code of the first kernel in the memory, store the topology information of the PCI devices in the preset address of the memory, and after the processor stops executing the code of the first kernel and executes the code of the second kernel, the topology information can be obtained from the preset address of the memory, so that repeated scanning of the PCI devices in the kernel jump process is avoided, the service interruption time in the kernel jump process is reduced, and the use experience of a user is improved.
Fig. 4 is a schematic structural diagram of a device supporting online update provided in the present application. The device 200 may be the server 100 in the embodiments of fig. 1 to fig. 3, the device 200 may include a first core 210, a second core 220, a memory 230, and a plurality of PCI devices 240, and the first core 210, the second core 220, the memory 230, and the plurality of PCI devices 240 may be connected to each other through an internal bus, or may be implemented through other means such as wireless transmission, where the bus may be divided into an address bus, a data bus, a control bus, and the like, and the bus may be a PCI bus. It should be understood that fig. 4 is an exemplary division manner, each module unit may be combined or divided into more or fewer module units, the present application is not limited in particular, and the position relationship between the devices and the modules shown in fig. 4 does not constitute any limitation.
The device 200 may be a physical server, such as an X86 server, an ARM server, or the like, or may be a Virtual Machine (VM) implemented based on a general physical server and a Network Function Virtualization (NFV) technology, where the VM refers to a complete computer system that has a complete hardware system function and is run in a completely isolated environment, such as a virtual device in cloud computing, and the present application is not limited specifically.
The first kernel 210 and the second kernel 220 are kernels in the processor of the device 200, wherein the second kernel 220 is a kernel after bug fixing or function upgrading is performed on the first kernel 210, in other words, the first kernel 210 is a kernel to be updated, the second kernel 220 is an updated kernel, and online updating of the first kernel 210 can be achieved by using the second kernel 220 to take over the work of the first kernel 210. It is noted that, although the apparatus 200 in fig. 4 includes two cores, in a specific implementation, the number of cores in the apparatus 200 is not limited in this application.
The PCI devices 240 may be PCI devices connected to expansion slots of a PCI bus in the device 200, and for specific description, reference may be made to the PCI devices 130 in the foregoing embodiments of fig. 1 to fig. 3, which is not repeated herein. Fig. 4 illustrates an example that the number of the PCI devices 240 is N, and in a specific implementation, the number and types of the PCI devices 240 are not limited in this application. It should be noted that, referring to the embodiments in fig. 1 to fig. 3, the PCI devices 240 may be physical devices, or may be PFs or VFs in physical devices supporting the SR-IOV standard. The description of SR-IOV, PF, and VF may refer to the embodiment in fig. 2, and will not be repeated here.
The memory 230 may be a shared memory region that both the first core 210 and the second core 220 can access, and for specific implementation, reference may be made to the description of the memory 120 in the embodiments of fig. 1 to fig. 3, which is not repeated herein. The memory 230 may be configured to store topology information obtained by scanning the PCI device 240 after the first kernel 210 is started. The topology information is used to describe a topology structure of a device system composed of a plurality of PCI devices 240, and may specifically be a data structure linked list. For the description of the topology information, reference may be made to the foregoing embodiments of fig. 1 and fig. 2, and repeated description is omitted here.
In this embodiment of the application, the first kernel 210 is configured to scan the PCI devices 240 in the case of startup, and record topology information of the PCI devices in a preset address of the memory 230, after the first kernel 210 receives a kernel jump command, the device 200 may start the second kernel 220 according to the kernel jump command, and the second kernel 200 is configured to obtain the topology information from the preset address of the memory 230 in the case of startup.
In a specific implementation, after the first core 210 receives the core jump command, the first core 210 may receive a code of the second core, and then load the code of the second core in the memory 230, so that after the first core 210 is turned off, the processor may execute the code of the second core in the memory 230 to implement the starting of the second core 220.
It should be understood that the server 200 may consume a large amount of time for scanning the plurality of PCI devices 240 to obtain the topology information thereof, and the device provided by itself and supporting the kernel online update may be configured such that the first kernel 210 is turned off, and the second kernel 220 may obtain the topology information of the plurality of PCI devices from the memory 230 after being started, thereby avoiding a large amount of time consumed for re-scanning the plurality of PCI devices, reducing the service interruption time during the kernel skipping and updating processes, and improving the user experience.
In an embodiment, the first core 210 may be further configured to instruct the plurality of PCI devices 240 to retain their respective device state information according to a core jump command. Specifically, each PCI device may store its own device state information in a local register, where the state information may include the state of the functional feature included in the device, such as the state of SR-IOV being on or off. Detailed description of the status information can refer to the embodiments of fig. 1 to fig. 3, and will not be repeated here.
In an embodiment, the second kernel 220 is further configured to read topology information from the memory 230, and then reconnect to the multiple PCI devices 240, and read device state information from each PCI device, and since the first kernel 210 does not close the multiple PCI devices 240 before the kernel stops operating, the multiple PCI devices 240 may inherit the device state according to the state information, and there is no need to perform device initialization again, so that time required for reconnecting one PCI device after the second kernel 220 is started is reduced from a second level to a millisecond level, thereby further reducing service interruption duration in a kernel jump process, and improving user experience.
It can be understood that, in the process of kernel hopping, the server 200 does not need to scan multiple PCI devices again, and the topology information of the multiple PCI devices 240 can be acquired from the memory 230, so as to reduce the time required for kernel hopping, and each device can reduce the time required for device initialization through state inheritance, thereby further reducing the time required for kernel hopping, and improving the use experience of a user.
In an embodiment, the first core 210 has a virtual instance disposed therein, and the virtual instance may be a container or a virtual machine in the embodiment of fig. 3, and one or any combination of the PCI devices 240 is directed to the virtual instance. It should be understood that the description of the first core 210 being provided with the virtual instance may refer to the embodiment in fig. 3, and the description is not repeated here.
To sum up, according to the device supporting kernel online update provided by the application, when the first kernel exits, the device can store topology information of a plurality of PCI devices in the memory, and instruct each PCI device to store device state information of the PCI device in the device local, so that when the second kernel is loaded and initialized, the device information can be obtained from the memory without rescanning the plurality of PCI devices, time required by loading and initializing the second kernel is reduced, and the device state information can be obtained from each PCI device, so that the PCI devices can realize state inheritance according to the device state information, the plurality of PCI devices do not need to be restarted and initialized, service interruption time in a kernel jumping process is further reduced, and user experience is improved.
The method for supporting online update of a kernel provided by the present application is described below, and the method may be applied to the server 100 shown in fig. 1 to fig. 3, and may also be applied to the device 200 shown in fig. 4.
Fig. 5 is a schematic flowchart illustrating steps of a method for supporting online kernel update provided by the present application, and as shown in fig. 5, the method for supporting online kernel update provided by the present application may include the following steps:
step S310: the method comprises the steps that a plurality of PCI devices are scanned under the condition that a first kernel is started, and topology information of the PCI devices is recorded into a preset address of a memory, wherein the topology information comprises information of the PCI devices connected with a server.
In a specific implementation, the topology information may be stored in the memory by the first core after the kernel is started, or may be stored in the memory by the first core after the kernel jump instruction is received, which is not specifically limited in this application. It should be understood that the description of the memory and the topology information may refer to the embodiments in fig. 1 to fig. 3, and the description is not repeated here.
In an embodiment, the topology information may be obtained by the first core after searching all devices on the bus, and specifically may be a data structure linked list used for describing a topology structure of a device system composed of multiple PCI devices, where the data linked list may include information such as numbers and positions of all devices.
In the specific implementation, the PCI bus can expand the bus through a PCI bridge (bridge), one end of the PCI bridge can be connected with a primary bus, and the other end of the PCI bridge is connected with a secondary bus, so that the secondary bus can be connected with more PCI devices, and the like, so that a tree structure can be formed by a plurality of PCI buses and PCI bridges to connect a large number of PCI devices. Based on this, the topology information of the PCI device may be obtained as follows: the first kernel may scan the trunk bus using a Depth First Search (DFS) algorithm, then number each PCI device and PCI bridge connected to the scanned trunk bus, number the slave bus connected to each PCI bridge, then scan each slave bus, then number each PCI device and PCI bridge connected to the scanned slave bus, and so on, thereby obtaining topology information.
For example, as shown in fig. 6, fig. 6 is a structural diagram of a PCI device system, and the first core may obtain topology information by: the method includes the steps that a DFS algorithm is used for scanning and numbering devices on a main bus, the number of the main bus is marked as bus0, PCI devices and PCI bridges connected to the scanned main bus are numbered, accordingly, the device 1 is marked as D1, the number of the bridge1 is marked as bridge1, the number of the bridge2 is marked as bridge2, then a slave bus connected with the bridge1 is numbered as bus1, the number of the slave bus connected with the bridge2 is numbered as bus2, the device scanning and numbering are respectively carried out on the bus1 and the bus2, the number of the device 2 connected to the bus1 is numbered as D2, the number of the device 3 connected to the bus2 is numbered as D3, and finally a data structure linked list formed by the number of each device and the number of the device on the bus is obtained, for example, the linked list 1 expresses the PCI devices and the PCI bridges on the bus0, the linked list 2 describes the PCI devices and the PCI bridges on the bus1, and the rest on the same, and topology information of the PCI device system shown in figure 6 is obtained. It should be understood that fig. 6 is for illustration purposes and the present application is not limited in particular.
According to the method, the step flow of scanning the plurality of PCI devices by the first kernel to obtain the topology information is complex, and long time is consumed.
Step S320: the first kernel receives the kernel jump command and indicates the PCI devices to reserve the respective device state information according to the kernel jump command.
In particular, the plurality of PCI devices may be instructed to store respective device state information in a register local to the device. The state information may include a state of a functional feature included in the device, such as the SR-IOV being on or off. For detailed description of the status information, reference may be made to the foregoing embodiments, and details are not repeated here.
Notably, if the PCI device is a VF of a physical device, the state information of the VF and the identification number (BDF) of the VF may be stored in registers of the physical device, where the BDF of the VF is used to identify the virtual instance that is in direct communication with the VF. It can be understood that the plurality of PCI devices are not closed before the first kernel is stopped, so that the state information of the devices can be retained in the local devices, and the second kernel can instruct each PCI device to perform state inheritance according to the local state information after being started, thereby avoiding the restart and initialization of the plurality of PCI devices, reducing the time required by the kernel jump process, and improving the use experience of users.
Step S330: the first kernel stops running and the second kernel starts.
In a specific implementation, after receiving the kernel jump command, the first kernel may receive a code of the second kernel, and then load the code of the second kernel in the memory, so that after the first kernel is turned off, the processor may execute the code of the second kernel in the memory, thereby implementing the start of the second kernel. It should be understood that, since the second kernel is not established before the first kernel exits, the first kernel cannot send the code of the second kernel to the second kernel, so the server may establish a page table for storing the start-stop address, after the first kernel stores the code of the second kernel in the memory, the start-stop address of the code of the second kernel in the memory may be stored in the page table, the processor may obtain the start-stop address from the page table, map the start-stop address with the memory, and implement the start and initialization of the second kernel according to the start-up information in the memory. After the second kernel is started, the space of the code of the first kernel in the memory can be recycled, and the utilization rate of the memory is improved.
For example, as shown in fig. 7, the first kernel is a running kernel, and during the running process, the first kernel may store the topology information of the scanned PCI device in a preset address of the memory. Then, the first kernel may receive a kernel jump command, which may be a Kexec command specifically, receive a code of the second kernel according to the kernel jump command, and load the code in the memory, where the code of the second kernel may be a code obtained by performing bug fixing or function upgrading on the code of the first kernel, and then store a start address of the code of the second kernel in a pre-created page table. After the first kernel is stopped, the processor may obtain a start address of a code of the second kernel from the pre-created page table, then obtain the code of the second kernel from the start address, execute the code of the second kernel by the processor to run the second kernel, and then recycle a memory space where the code of the first kernel is located, thereby improving the utilization rate of the memory. And after the second kernel is started, the topology information can be acquired from the preset address of the memory, so that repeated scanning of a plurality of PCI devices is avoided, the time spent on device scanning is reduced, the time required by kernel skipping is shortened by multiple times, the interrupt time of kernel skipping service is shortened, and the use experience of a user is improved.
Step S340: and the second kernel acquires the topology information from the preset address of the memory.
As can be understood, the second kernel obtains the topology information from the memory, so that repeated device scanning on a plurality of PCI devices can be avoided, the time spent on device scanning is reduced, the time required by kernel skipping is shortened by several times, the interruption time of kernel skipping service is reduced, and the use experience of a user is improved.
Step S350: the second kernel reads the device state information from the plurality of PCI devices to realize the state inheritance of the plurality of PCI devices.
In the specific implementation, the second kernel can determine the number of each PCI device, the PCI bus number where the PCI device is located, and the PCI bridge number according to the topology information, and then communicate with each PCI device, so that each PCI device can inherit the state according to the local state information, for example, the starting condition of each function in the device is determined according to the state information, if part of the functions are already started, it is not necessary to repeat the function initialization on part of the functions, and after the second kernel is started, the state initialization time of a single PCI device is shortened from the second level to the millisecond level.
In some embodiments, if some devices cannot store the state information locally, the plurality of PCI devices may also be stopped, and then the first kernel is exited, the second kernel completes initialization of the second kernel according to the topology information, and the plurality of PCI devices may be initialized again without performing state inheritance, which is not limited in this application.
In an embodiment, a virtual instance is disposed in the first kernel, and the virtual instance may be a container or a virtual machine in the embodiment of fig. 3, and one or any combination of multiple PCI devices is directed to the virtual instance. It should be understood that, for an example in which the first kernel is provided with a virtual instance, reference may be made to the embodiment in fig. 3, and details are not repeated here.
It can be understood that, by using the method provided by the present application to perform the online update of the kernel of the server, the time required by the jump process of the first kernel and the second kernel can be significantly reduced, so that the efficiency of updating and repairing the first kernel is improved, and no matter a single server or the kernel in the cloud environment is updated, not only can the server be prevented from being restarted, but also long-time service interruption caused by the kernel jump can be avoided, and the use experience of the user is improved.
In summary, the method for supporting kernel online update provided by the present application can store topology information of a plurality of PCI devices in a memory when a first kernel exits, and instruct each PCI device to store its own device state information in the device local, so that when a second kernel is loaded and initialized, the device information can be obtained from the memory without rescanning the plurality of PCI devices, thereby reducing the time required for loading and initializing the second kernel, and obtaining the device state information from each PCI device, so that the PCI devices can realize state inheritance according to the device state information, so that the plurality of PCI devices do not need to be restarted and initialized, further reducing the service interruption duration in the kernel skipping process, and improving the user experience.
Fig. 8 is a schematic diagram of a computing device 800 according to the present application. Computing device 800 may be, among other things, server 100 or device 200 of fig. 1-7. As shown in fig. 8, computing device 800 includes: a processor 810, a communication interface 820, and a memory 830. The processor 810, the communication interface 820 and the memory 830 may be connected to each other through an internal bus 840, or may communicate through other means such as wireless transmission. In the embodiment of the present application, the bus connection is taken as an example, and the bus may be a PCI bus. The bus 840 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 8, but this is not intended to represent only one bus or type of bus.
The processor 810 may be constituted by at least one general-purpose processor, such as a CPU, or a combination of a CPU and a hardware chip. The hardware chips may be ASICs, PLDs, or a combination thereof. The aforementioned PLD may be a CPLD, an FPGA, a GAL, or any combination thereof. Processor 230 executes various types of digitally stored instructions, such as software or firmware programs stored in memory 830, which enable computing device 800 to provide a variety of services. Processor 830 may be processor 110 in the embodiment of fig. 1.
The memory 830 is used for storing program codes and is controlled by the processor 810 to execute the processing steps of the server 100 or the device 200 in the above embodiments. The program code may include one or more software modules, which may be software modules provided in the embodiment of fig. 4, such as a first kernel and a second kernel, where the first kernel is configured to scan a plurality of PCI devices in a case of starting, and record topology information of the plurality of PCI devices in a preset address of a memory, and the first kernel is further configured to receive a kernel jump command, and start the second kernel according to the kernel jump command; the second kernel is used for acquiring topology information from a preset address of the memory under the starting condition. Specifically, the method may be used to perform steps S310 to S350 and optional steps thereof in the embodiment in fig. 5, and may also be used to perform other steps performed by the server 100 or the device 200 described in the embodiments in fig. 1 to fig. 7, which are not described herein again.
Memory 830 may include volatile memory such as RAM, DRAM, SRAM, SDRAM, DDR, cache, etc., and memory 830 may also include combinations of the above. The storage 830 may be the memory 120 in the foregoing embodiment of fig. 1, and details are not repeated here.
The communication interface 820 may be a wired interface (e.g., an ethernet interface), may be an internal interface (e.g., a PCI bus interface), a wired interface (e.g., an ethernet interface), or a wireless interface (e.g., a cellular network interface or using a wireless local area network interface) for communicating with other devices or modules.
It should be noted that, this embodiment may be implemented by a general physical server, for example, an ARM server or an X86 server, or may also be implemented by a virtual machine implemented based on the general physical server and combining with an NFV technology, where the virtual machine refers to a complete computer system that has a complete hardware system function and runs in a completely isolated environment through software simulation, for example, this embodiment may be implemented on a cloud computing infrastructure, and a specific process implemented on the cloud computing infrastructure may refer to the embodiment in fig. 3, which is not repeated.
It should be noted that fig. 8 is only one possible implementation manner of the embodiment of the present application, and in practical applications, the computing device 800 may further include more or less components, which is not limited herein. For the content that is not shown or described in the embodiment of the present application, reference may be made to the related explanation in the foregoing embodiments of fig. 1 to 7, which is not described herein again.
It should be understood that the computing device shown in fig. 8 may also be a computer cluster formed by at least one physical server, and reference may be made to the embodiment in fig. 3 specifically, so that details are not described here again to avoid repetition.
Embodiments of the present application also provide a computer-readable storage medium, in which instructions are stored, and when the computer-readable storage medium is executed on a processor, the method flows shown in fig. 1 to 7 are implemented.
Embodiments of the present application also provide a computer program product, and when the computer program product is run on a processor, the method flows shown in fig. 1-7 are implemented.
The above embodiments may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, the above-described embodiments may be implemented in whole or in part in the form of a computer program product. The computer program product includes at least one computer instruction. The procedures or functions according to the embodiments of the invention are wholly or partly generated when the computer program instructions are loaded or executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on a computer readable storage medium or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) means. The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage node, such as a server, a data center, or the like, that contains at least one collection of available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., digital Video Disk (DVD), or a semiconductor medium.
While the invention has been described with reference to specific embodiments, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (20)

1. A server supporting kernel on-line update is characterized by comprising a processor, a memory and a plurality of Peripheral Component Interconnect (PCI) devices, wherein the memory records codes of a first kernel,
the processor is configured to execute a code of the first kernel to start the first kernel, scan the PCI devices when the first kernel is started, and record topology information of the PCI devices in a preset address of the memory;
the processor is further configured to receive a core jump command in a process of executing a code of the first core, stop executing the code of the first core according to the core jump command, execute a code of a second core to start the second core, and acquire the topology information from a preset address of the memory when the second core is started.
2. The server according to claim 1, wherein the code of the second kernel is obtained after bug fixing or function upgrading is performed on the code of the first kernel.
3. The server according to claim 1 or 2, wherein the processor is further configured to obtain the code of the second kernel according to the kernel jump command, and load the code of the second kernel to the memory.
4. The server according to any one of claims 1 to 3,
the processor is further configured to instruct the plurality of PCI devices to retain respective device state information before stopping execution of the code of the first core according to the core jump command.
5. The server according to claim 4,
the processor is further configured to read the device state information from the PCI devices after the code of the second kernel obtains the topology information from the preset address of the memory.
6. The server according to any one of claims 1 to 5, wherein a virtual instance is provided in the first kernel, and one or any combination of the plurality of PCI devices is/are passed through to the virtual instance.
7. The server of claim 6, wherein the virtual instance is a virtual machine or a container.
8. An apparatus for supporting online updating of a kernel, comprising:
the first kernel is used for scanning a plurality of PCI (peripheral component interconnect) devices under the starting condition and recording the topology information of the PCI devices into a preset address of a memory;
the first kernel is also used for receiving a kernel jump command and starting a second kernel according to the kernel jump command;
and the second kernel acquires the topology information from a preset address of the memory under the starting condition.
9. The apparatus of claim 8,
the second kernel is the kernel after bug fixing or function upgrading is carried out on the first kernel.
10. The apparatus according to claim 8 or 9,
the first kernel is further configured to instruct the PCI devices to reserve their own device state information according to the kernel jump command.
11. The apparatus of claim 10,
the second kernel is further configured to read the device state information from the plurality of PCI devices after obtaining the topology information from the preset address of the memory.
12. The apparatus according to any one of claims 8 to 11, wherein a virtual instance is provided in the first core, and one or any combination of the plurality of PCI devices is directed to the virtual instance.
13. The server of claim 12, wherein the virtual instance is a virtual machine or a container.
14. A method for supporting online updating of a kernel, comprising:
the method comprises the steps that a first kernel scans a plurality of Peripheral Component Interconnect (PCI) devices under the condition of starting, and records topology information of the PCI devices into a preset address of a memory;
the first kernel receives a kernel jump command, and starts a second kernel according to the kernel jump command;
and the second kernel acquires the topology information from a preset address of the memory under the condition of starting.
15. The method of claim 14, further comprising:
the second kernel is the kernel after bug fixing or function upgrading is carried out on the first kernel.
16. The method according to claim 14 or 15, wherein before the first core launches a second core according to the core jump command, the method further comprises:
the first kernel instructs the plurality of PCI devices to retain respective device state information.
17. The method of claim 16, wherein after the second core obtains the topology information from the preset address of the memory, the method further comprises:
reading the device state information from the plurality of PCI devices.
18. The method according to any one of claims 14 to 17, wherein a virtual instance is provided in the first core, and one or any combination of the plurality of PCI devices is directed to the virtual instance.
19. The method of claim 18, wherein the virtual instance is a virtual machine or a container.
20. A computing device comprising a processor and a memory, the memory storing code, the processor executing the code to perform the operational steps of the method of any of claims 14 to 19.
CN202110592524.4A 2021-05-28 2021-05-28 Method, server and related equipment for supporting kernel online update Pending CN115408064A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110592524.4A CN115408064A (en) 2021-05-28 2021-05-28 Method, server and related equipment for supporting kernel online update

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110592524.4A CN115408064A (en) 2021-05-28 2021-05-28 Method, server and related equipment for supporting kernel online update

Publications (1)

Publication Number Publication Date
CN115408064A true CN115408064A (en) 2022-11-29

Family

ID=84155411

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110592524.4A Pending CN115408064A (en) 2021-05-28 2021-05-28 Method, server and related equipment for supporting kernel online update

Country Status (1)

Country Link
CN (1) CN115408064A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115857995A (en) * 2023-02-08 2023-03-28 珠海星云智联科技有限公司 Method, medium and computing device for upgrading interconnection device

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115857995A (en) * 2023-02-08 2023-03-28 珠海星云智联科技有限公司 Method, medium and computing device for upgrading interconnection device

Similar Documents

Publication Publication Date Title
US11121915B2 (en) FPGA-enabled compute instances
US9811369B2 (en) Method and system for physical computer system virtualization
CN110134446B (en) Method for starting PCIE equipment scanning
TWI262443B (en) Method, system and recording medium for automatically configuring data processing system
US10061651B2 (en) System and method for hosting multiple recovery operating systems in memory
CN112491600A (en) Method and device for upgrading network card firmware, network card and equipment
KR20170022028A (en) Method and apparatus for security checking of image for container
CN109426613B (en) Method for retrieving debugging data in UEFI and computer system thereof
JP6111181B2 (en) Computer control method and computer
WO2015147981A1 (en) Initialization trace of a computing device
US11429298B2 (en) System and method for tying non-volatile dual inline memory modules to a particular information handling system
US10318460B2 (en) UMA-aware root bus selection
US20230229481A1 (en) Provisioning dpu management operating systems
CN113010265A (en) Pod scheduling method, scheduler, memory plug-in and system
CN116521209A (en) Upgrading method and device of operating system, storage medium and electronic equipment
US20230229480A1 (en) Provisioning dpu management operating systems using firmware capsules
US20200364040A1 (en) System and Method for Restoring a Previously Functional Firmware Image on a Non-Volatile Dual Inline Memory Module
CN115408064A (en) Method, server and related equipment for supporting kernel online update
US20160283250A1 (en) Boot controlling processing apparatus
US20200015296A1 (en) Computer system and method thereof for sharing of wireless connection information between uefi firmware and os
CN111913753A (en) Method and system for changing starting mode in cloud migration of windows system
US8813072B1 (en) Inverse virtual machine
US12073229B2 (en) Single node deployment of container platform
US20230161643A1 (en) Lifecycle management for workloads on heterogeneous infrastructure
CN111767082A (en) Computing chip starting method and device and computer system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination