CN116166434A - Processor allocation method and system, device, storage medium and electronic equipment - Google Patents

Processor allocation method and system, device, storage medium and electronic equipment Download PDF

Info

Publication number
CN116166434A
CN116166434A CN202310180080.2A CN202310180080A CN116166434A CN 116166434 A CN116166434 A CN 116166434A CN 202310180080 A CN202310180080 A CN 202310180080A CN 116166434 A CN116166434 A CN 116166434A
Authority
CN
China
Prior art keywords
host
hosts
target
processor
processors
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310180080.2A
Other languages
Chinese (zh)
Inventor
胡安沙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202310180080.2A priority Critical patent/CN116166434A/en
Publication of CN116166434A publication Critical patent/CN116166434A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The embodiment of the application provides a processor distribution method, a system, a device, a storage medium and electronic equipment, wherein the method comprises the following steps: determining N hosts included in a target server; receiving computing power resources requested by N hosts, wherein the computing power resources are used for representing resources for processing data of the N hosts; and distributing M processors matched with the computational power resources for the N hosts based on the N hosts, wherein the M processors are used for processing data of the N hosts, the M processors are connected with the N hosts through a switch chip, and the switch chip is used for expanding buses connected with the processors. By the method and the device, the problem of low utilization rate of the processor in the related technology is solved, and the effect of improving the utilization rate of the processor is achieved.

Description

Processor allocation method and system, device, storage medium and electronic equipment
Technical Field
The embodiment of the application relates to the field of computers, in particular to a processor distribution method, a processor distribution system, a processor distribution device, a storage medium and electronic equipment.
Background
At present, artificial intelligence is extended into various industries through fusion of data, calculation force, algorithm and scene, and intelligent transformation of energy is promoted and enabled. The powerful computing power improves the processing capacity of complex data such as images, voices and the like, and further changes the traditional man-machine interaction mode, so that a new interaction mode is rapidly applied.
At this stage, heterogeneous computing combination of a central processing unit (Centra l Process i ng Un it, abbreviated as CPU) and a graphics processor (Graph i cs Process i ng Un it, abbreviated as GPU) is still the first choice of artificial intelligence computing power. In practice, many enterprise artificial intelligence (Art i f i c i a lI nte l l i gence, abbreviated as AI) systems call GPUs directly in physical form, and GPUs do not implement resource pooling like computing, storage, network virtualization in cloud scenarios. Therefore, the GPU has extremely low utilization, resulting in limited elastic expansion capability and non-proportional input-output.
In addition, the status of the CPU and GPU differs greatly, one being a necessity and one being an accelerator. The CPU is running at all times, while the GPU is called as a device attached to the computer only when needed. Therefore, the key to efficiently utilizing GPU resources is on-demand calling, running out of release, and no concern about where GPU resources are insufficient.
Most server GPU cards are currently mounted inside a server chassis, and can only serve a single server. Most of the methods for realizing GPU resource pooling adopt a mode of dividing a single physical GPU into a plurality of virtual GPUs according to fixed proportion, for example, 1/2 or 1/4 of the virtual GPU, and the video memory of each virtual GPU is equal, so that the power is calculated and polled. For example, 2021, inflight provided M I G technology on a portion of the apere family of GPUs, which can be split into up to 7 parts for a100 model GPU.
The traditional GPU card of the server is basically matched and integrated in the server, when the server is started, all the GPUs in the chassis are electrified, when the calculation demand is not very large, the spare GPUs in the chassis are electrified to run, the dynamic adjustment of GPU resources along with the actual demand cannot be realized, and unnecessary resource waste and unnecessary power consumption are increased.
Disclosure of Invention
The embodiment of the application provides a processor distribution method, a system, a device, a storage medium and electronic equipment, which are used for at least solving the problem of low utilization rate of a processor in the related technology.
According to one embodiment of the present application, there is provided a processor allocation method including: determining N hosts included in a target server, wherein each host corresponds to a host number, and N is a natural number greater than or equal to 1; receiving computing power resources requested by the N hosts, wherein the computing power resources are used for representing resources for processing data of the N hosts; and distributing M processors matched with the computational power resources for N hosts based on the number of the N hosts, wherein the M processors are used for processing data of the N hosts, the M processors are connected with the N hosts through a switch chip, the switch chip is used for expanding buses connected with the processors, and the M is a natural number greater than or equal to 1.
According to another embodiment of the present application, there is provided a processor distribution system including: the system comprises a target server, wherein N hosts are arranged in the target server, each host corresponds to a host number, and N is a natural number which is greater than or equal to 1; and the switch chip is connected with the N hosts and M processors for expanding buses connected with the processors, wherein M is a natural number greater than or equal to 1, and the M processors are used for processing data of the N hosts.
In one exemplary embodiment, the switch chip includes: the complex programmable logic device CPLD is connected with a management controller BMC in each host through a transmission bus, and is used for receiving the computing power resources requested by the hosts through the BMC, and distributing M processors matched with the computing power resources to the N hosts based on the serial numbers of the N hosts, wherein the computing power resources are used for representing the resources for processing the data of the N hosts.
In one exemplary embodiment, the host includes: and the signal transceiver is connected with the CPLD and is used for transmitting a low-level signal of the host to the CPLD, wherein the low-level signal is used for indicating that the host is accessed to the target server.
In an exemplary embodiment, the above processor distribution system further includes: and the power supply chip is connected with the CPLD in the switch chip and the processor and used for controlling the power supply of the processor.
According to yet another embodiment of the present application, there is also provided a processor allocation apparatus including: a first determining module, configured to determine N hosts included in a target server, where each host corresponds to a host number, and N is a natural number greater than or equal to 1; the first receiving module is used for receiving the computing power resources requested by the N hosts, wherein the computing power resources are used for representing the resources for processing the data of the N hosts; the first allocation module is configured to allocate, for N hosts, M processors matching the computing power resources based on the N hosts, where the M processors are configured to process data of the N hosts, the M processors and the N hosts are connected by a switch chip, the switch chip is configured to extend a bus connected to the processors, and the M is a natural number greater than or equal to 1.
In an exemplary embodiment, the first determining module includes: a first determining unit configured to determine that the host is connected to the target server when a low-level signal is detected by a complex programmable logic device CPLD, where the CPLD is disposed in the switch chip and is connected to a signal transceiver in the host, and the signal transceiver is configured to transmit the low-level signal to the CPLD; and a second determining unit configured to determine the number of the hosts based on the number of the detected low-level signals, and determine N hosts, where one host corresponds to one low-level signal.
In an exemplary embodiment, the above apparatus further includes: the first processing module is used for determining the number of the hosts based on the number of the detected low-level signals, numbering each host through the CPLD after determining N hosts, and obtaining N host numbers; the first storage module is used for storing the N host numbers into a register, wherein the register is arranged in the CPLD; and the first sending module is used for sending each host number to a management controller BMC in the corresponding host through a transmission bus, wherein the transmission bus is connected with the host and the CPLD.
In an exemplary embodiment, the first receiving module includes: and the first receiving unit is used for receiving the computing power resources required by each host sent by the BMC in each host to obtain computing power resources of N hosts.
In an exemplary embodiment, the first allocation module includes: the first sending unit is used for sending a power-on instruction to a target processor to control the target processor to power on under the condition that the target computing power resource requested by the target host is larger than a first preset threshold, wherein the target host is any host in N hosts, and the target processor is a processor matched with the target computing power resource in M processors; and the first establishing unit is used for establishing connection between the target processor and the target host based on the target number of the target host so as to call the target processor to process the data sent by the target host.
In an exemplary embodiment, the above apparatus further includes: the second sending module is used for sending a power-down instruction to the target processor to control the power-down of the target processor under the condition that the target computing power resource requested by the target host is smaller than the first preset threshold after the target processor is connected with the target host based on the target number of the target host; and the first disconnection module is used for disconnecting the connection between the target processor and the target host based on the target number of the target host.
In an exemplary embodiment, the above apparatus further includes: and a third sending module, configured to send a reset signal to the target processor after the connection between the target processor and the target host is disconnected based on the target number of the target host, so as to reset the bus that is in extended connection between the target processor and the switch.
In an exemplary embodiment, each of the processors is correspondingly connected to a power chip, where the power chip is used to control power supply of the processor.
According to a further embodiment of the present application, there is also provided a computer readable storage medium having stored therein a computer program, wherein the computer program is arranged to perform the steps of any of the method embodiments described above when run.
According to a further embodiment of the present application, there is also provided an electronic device comprising a memory having stored therein a computer program and a processor arranged to run the computer program to perform the steps of any of the method embodiments described above.
According to the method and the device for scheduling the power consumption of the target server, the power consumption of the target server is dynamically managed according to the power consumption of the N processors, which are requested by the N hosts, of the target server, and the power consumption of the processors is dynamically managed, so that the accessed hosts can perform power consumption scheduling of the processors according to actual requirements. The utilization rate of the processor can be effectively improved, the utilization rate of computational power resources of the processor is improved, and the overall power consumption of the system is reduced. Therefore, the problem of low utilization rate of the processor in the related art can be solved, and the effect of improving the utilization rate of the processor is achieved.
Drawings
Fig. 1 is a hardware block diagram of a mobile terminal of a processor allocation method according to an embodiment of the present application;
FIG. 2 is a flow chart of a processor allocation method according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a GPU PCI E link according to an embodiment of the present application;
FIG. 4 is a connection topology of a PCI e Switch chip according to an embodiment of the present application;
FIG. 5 is a schematic diagram of identifying multiple hosts according to an embodiment of the present application;
FIG. 6 is a schematic diagram of a GPU power and reset according to an embodiment of the present application;
FIG. 7 is a schematic diagram of a processor distribution system according to an embodiment of the present application;
fig. 8 is a block diagram of a processor distribution device according to an embodiment of the present application.
Detailed Description
Embodiments of the present application will be described in detail below with reference to the accompanying drawings in conjunction with the embodiments.
It should be noted that the terms "first," "second," and the like in the description and claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order.
First, the related art related to the present invention will be described:
BMC: baseboard Management Contro l l er for management of the server motherboard.
CPU: centra l Process i ng Un it, a central processing unit.
I2C: i nter-I ntegrated Ci rcu it, I2C, a bus structure.
CPLD: comp l ex Programmab l e Logi c Dev i ce, complex programmable logic devices.
GPU: graph i cs Process i ng Un it, a graphics processor.
CDFP:400Gbps Form-factor P l uggab l e, a pluggable transceiver module that supports 400Gbps rates.
BMC: baseboard Management Contro l l er for management of the server motherboard.
CPU: centra l Process i ng Un it, a central processing unit.
I2C: i nter-I ntegrated Ci rcu it, I2C, a bus structure.
CPLD: comp l ex Programmab l e Log i c Dev i ce, complex programmable logic devices.
GPU: graph i cs Process i ng Un it, a graphics processor.
CDFP:400Gbps Form-factor P l uggab l e, a pluggable transceiver module that supports 400Gbps rates.
PCI E: per i phera l Component I nterconnect Express, a high-speed serial computer expansion bus standard.
The method embodiments provided in the embodiments of the present application may be performed in a mobile terminal, a computer terminal or similar computing device. Taking the mobile terminal as an example, fig. 1 is a block diagram of a hardware structure of the mobile terminal according to an embodiment of the present application. As shown in fig. 1, a mobile terminal may include one or more (only one is shown in fig. 1) processors 102 (the processor 102 may include, but is not limited to, a microprocessor MCU or a processing device such as a programmable logic device FPGA) and a memory 104 for storing data, wherein the mobile terminal may also include a transmission device 106 for communication functions and an input-output device 108. It will be appreciated by those skilled in the art that the structure shown in fig. 1 is merely illustrative and not limiting of the structure of the mobile terminal described above. For example, the mobile terminal may also include more or fewer components than shown in fig. 1, or have a different configuration than shown in fig. 1.
The memory 104 may be used to store computer programs, such as software programs and modules of application software, such as computer programs corresponding to the processor distribution method in the embodiments of the present application, and the processor 102 executes the computer programs stored in the memory 104 to perform various functional applications and data processing, that is, implement the above-mentioned method. Memory 104 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory remotely located relative to the processor 102, which may be connected to the mobile terminal via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The transmission device 106 is used to receive or transmit data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the mobile terminal. In one example, the transmission device 106 includes a network adapter (Network I nterface Contro l l er, abbreviated NIC) that can communicate with other network devices via a base station to communicate with the Internet. In one example, the transmission device 106 may be a radio frequency (Rad i o Frequency, abbreviated as RF) module for communicating wirelessly with the internet.
In this embodiment, a processor allocation method is provided, and fig. 2 is a flowchart of the processor allocation method according to an embodiment of the present application, as shown in fig. 2, where the flowchart includes the following steps:
step S202, N hosts included in a target server are determined, wherein each host corresponds to a host number, and N is a natural number greater than or equal to 1;
step S204, receiving computing power resources requested by N hosts, wherein the computing power resources are used for representing resources for processing data of the N hosts;
in step S206, M processors matched with the computing resources are allocated to the N hosts based on the numbers of the N hosts, where the M processors are used to process data of the N hosts, the M processors are connected with the N hosts through a switch chip, the switch chip is used to expand a bus connected with the processors, and M is a natural number greater than or equal to 1.
In this embodiment, the values of N and M may be flexibly set based on the actual scenario, for example, the target server includes 2 hosts, and 2 or 3 processors are required to process data. The processor may be a GPU, or may be another processor, which is not limited herein.
In this embodiment, the computing resource is a processing resource required by data in the host, for example, transmitting a video in the host, and two GPUs are required to perform image processing.
In this embodiment, the Switch chip may include a plurality of, for example, two Switch chips PCI e Switch are provided in the processor BOX GPU BOX, and a plurality of GPU card slots are provided to access a plurality of GPU cards. The host computer is connected with the GPU BOX through a CDFP connector and related cables, and the CDFP connector is a 400Gbps pluggable I/O component and is suitable for high-speed application such as a data center, high-performance computing, storage and networking equipment and the like.
Alternatively, as shown in FIG. 3, multiple PC I e Swi tch chips may be provided in the GPU BOX. The Host1 and Host2 respectively output a path of PCI E x16 high-speed signals, each PC I E Switch chip is connected to the GPU BOX, and the downlink port of each PCI E Switch chip is respectively connected with 2 GPU cards. It should be noted that the number of access GPU cards is determined according to the number of ports supported by the PCI E Switch chip. When the number of PCI e channels provided by the CPU is insufficient, the number of PCI e channels in the system can be expanded through the PC IE Switch chip. Although PCI e Gen5 has increased the signal rate to 32GT/s, GPU cards, AI accelerator cards, new generation network cards still require a large data transmission bandwidth. Through the expansion of the PCI e Switch chip, more function expansion cards can be accommodated on the whole server.
Alternatively, the connection topology of the PCI E Swi tch chip is shown in FIG. 4, where the solid line represents the PC IE connection of Host1 and the dotted line represents the PCI E connection of Host 2. As can be seen from FIG. 4, when two hosts are included in the target server, any one Host PC IEX 16 can be connected with any one GPU card through the PC IESwi tch chip, so that each GPU can be accessed by the accessed Host. The H port in the PCI E Switch chip is used for representing an uplink connection Host, F is used for representing PCI E cascade ports of two PCI E Switch chips, and D port is used for representing a downlink port and is used for connecting a GPU card.
The main body of execution of the above steps may be a specific processor provided in a terminal, a server, a terminal or a server, or a processor or a processing device provided separately from the terminal or the server, but is not limited thereto.
Through the steps, according to the M processors with the power resource allocation requested by the N hosts in the target server, the power resources of the processors are dynamically managed, so that the accessed hosts can perform power scheduling of the processors according to actual requirements. The utilization rate of the processor can be effectively improved, the utilization rate of computational power resources of the processor is improved, and the overall power consumption of the system is reduced. Therefore, the problem of low utilization rate of the processor in the related art can be solved, and the effect of improving the utilization rate of the processor is achieved.
In one exemplary embodiment, determining N hosts included in a target server includes: under the condition that a low-level signal is detected through a complex programmable logic device CPLD, determining that a host is accessed into a target server, wherein the CPLD is arranged in a switch chip and is connected with a signal transceiver in the host, and the signal transceiver is used for transmitting the low-level signal to the CPLD; the number of hosts is determined based on the number of detected low level signals, and N hosts are determined, wherein one host corresponds to one low level signal. In this embodiment, the signal transceiver may be a CDFP, for example, as shown in fig. 5, the target Server includes n Host computers, where the Host computers are respectively numbered as Host1-n, and signals between the Server and the GPU BOX are transmitted through the CDFP 1-n. The identification of the number of Host computers is mainly carried out by using HOST_PRESENT signals on the Host computers, the HOST_PRESENT signals are subjected to pull-down grounding treatment on each Host computer, and the Host computers are connected to CPLD of the GPU BOX through a CDFP connector and are effective in low level, namely when the Host computers are accessed, the CPLD of the GPU BOX detects the low level signals due to the pull-down grounding of the HOST_PRESENT, so that the Host computers are judged to be in place; when the CPLD detects n host_present low level signals, then it is considered that n HOST HOSTs are included in the current target server. According to the embodiment, the sharing of GPU computing resources can be accurately realized by identifying the number of the Host computers which are currently accessed.
In an exemplary embodiment, after determining the number of hosts based on the number of detected low level signals and determining N hosts, the method further comprises: numbering each host by using a CPLD to obtain N host numbers; storing N host numbers into a register, wherein the register is arranged in the CPLD; each host number is sent to a management controller BMC in the corresponding host via a transmission bus, where the transmission bus connects the host and the CPLD. In this embodiment, each Host in the target Server needs to know the number of the Host currently accessed, for example, the relative number of 1-n. In this embodiment, the number of the host is mainly transmitted through the I2C signal. As shown in fig. 5, after identifying the number of hosts in the current system, the CPLD of the GPU BOX numbers the hosts and stores the number information in the registers of the CPLD. For example, the current system has 5 Host computers, the number is 1-5, and the transmission of the number information is realized by each Host computer through an I2C signal connected to the CPLD through the CDFP connector. The Host can read register data of the CPLD through the I2C signal to acquire current Host number information. Meanwhile, one path of I2C signal is reserved between the CPLD and the BMC on the GPU BOX and used for transmitting Host number information to the BMC of the GPU BOX, so that the GPU BOX can monitor the Host number conveniently, and the power-on strategy of the GPU card in multiple hosts can be optimized conveniently. According to the embodiment, the number of the host is set, so that the computing resources required by the host can be accurately distributed.
In one exemplary embodiment, receiving N host requested computing power resources includes: and receiving the computing power resources required by each host sent by the BMC in each host, and obtaining the computing power resources of N hosts. In this embodiment, the processor may be accurately allocated to the host by determining the computational power resources required by each host.
In one exemplary embodiment, assigning M processors matching the computing power resource to N hosts based on the number of N hosts includes: under the condition that the target computing power resource requested by the target host is larger than a first preset threshold, a power-on instruction is sent to the target processor to control the target processor to power on, wherein the target host is any host in N hosts, and the target processor is a processor matched with the target computing power resource in M processors; and establishing a connection between the target processor and the target host based on the target number of the target host so as to call the target processor to process the data sent by the target host. In this embodiment, the first preset threshold may be set based on an actual usage scenario, for example, if it is required to identify a portrait in a video stored for 1 month, two GPUs are required to perform processing, and at this time, two target processors need to be controlled to power up. Two GPUs are connected to the host according to the host's number to process video data. According to the embodiment, the resource can be effectively scheduled through the computing power resource allocation processor, and the rationality of resource utilization is increased.
In one exemplary embodiment, after connecting the target processor and the target host based on the target number of the target host, the method further comprises: under the condition that the target computing power resource requested by the target host is smaller than a first preset threshold value, sending a power-down instruction to the target processor so as to control the target processor to power down; the connection between the target processor and the target host is disconnected based on the target number of the target host. In this embodiment, after the data processing of the host is completed, the processor needs to be released in time, and the processor is controlled to be powered down, so as to save energy consumption.
In one exemplary embodiment, after disconnecting the connection between the target processor and the target host based on the target number of the target host, the method further comprises: and sending a reset signal to the target processor to reset the bus of the expansion connection between the target processor and the switch. In this embodiment, in order to implement pooling of GPUs, the power supply of each GPU card needs to be controlled separately, and a power supply chip needs to be connected to each processor correspondingly, so that each GPU card can be powered down separately in idle time or can be powered up in time when connection is required. As shown in fig. 6, a path of I2C signal is respectively connected between each Host and the CPLD of the GPU BOX in the Server, so that the information that the current Host needs to request or release GPU resources is mainly transmitted, and is transmitted to the CPLD of the GPU BOX, the CPLD analyzes the transmitted information, and identifies which GPU card needs to be powered on or powered off, thereby realizing dynamic release and call of the GPU card, and maximizing the utilization of the computing power resources of the GPU card. Meanwhile, a reset signal PERST of the PCI E of the GPU card is also given from the CPLD, and the source of the reset signal PERST is consistent with the enabling signal of the power chip, and the reset signal PERST is obtained from a Host through an I2C signal and is used for resetting the PCI E of the GPU card. The embodiment can realize flexible calling and releasing of the processors by arranging a separate power chip for each processor.
In this embodiment, a processor distribution system is provided, and fig. 7 is a schematic diagram of a processor distribution system according to an embodiment of the present application, as shown in fig. 7, where the system includes:
the system comprises a target server, wherein N hosts are arranged in the target server, each host corresponds to a host number, and N is a natural number which is greater than or equal to 1;
and the switch chip is connected with the N hosts and M processors for expanding buses connected with the processors, wherein M is a natural number greater than or equal to 1, and the M processors are used for processing data of the N hosts.
In this embodiment, the computing resource is a processing resource required by data in the host, for example, transmitting a video in the host, and two GPUs are required to perform image processing.
In this embodiment, the Switch chip may include a plurality of, for example, two Switch chips PCIe Switch are provided in the processor BOX GPU BOX, and a plurality of GPU card slots are provided to access a plurality of GPU cards. The host computer is connected with the GPU BOX through a CDFP connector and related cables, and the CDFP connector is a 400Gbps pluggable I/O component and is suitable for high-speed application such as a data center, high-performance computing, storage and networking equipment and the like.
Alternatively, as shown in FIG. 3, multiple PC I e Swi tch chips may be provided in the GPU BOX. The Host1 and Host2 respectively output a path of PCI E x16 high-speed signals, each PC I E Switch chip is connected to the GPU BOX, and the downlink port of each PCI E Switch chip is respectively connected with 2 GPU cards. It should be noted that the number of access GPU cards is determined according to the number of ports supported by the PCI E Switch chip. When the number of PCI e channels provided by the CPU is insufficient, the number of PCI e channels in the system can be expanded through the PC IE Switch chip. Although PCI e Gen5 has increased the signal rate to 32GT/s, GPU cards, AI accelerator cards, new generation network cards still require a large data transmission bandwidth. Through the expansion of the PCI e Switch chip, more function expansion cards can be accommodated on the whole server.
Alternatively, the connection topology of the PCI E Swi tch chip is shown in FIG. 4, where the solid line represents the PC IE connection of Host1 and the dotted line represents the PCI E connection of Host 2. As can be seen from FIG. 4, when two hosts are included in the target server, any one Host PC IEX 16 can be connected with any one GPU card through the PC IESwi tch chip, so that each GPU can be accessed by the accessed Host. The H port in the PCI E Switch chip is used for representing an uplink connection Host, F is used for representing PCI E cascade ports of two PCI E Switch chips, and D port is used for representing a downlink port and is used for connecting a GPU card.
In one exemplary embodiment, as shown in fig. 4, a switch chip includes: the complex programmable logic device CPLD is connected with a management controller BMC in each host through a transmission bus, is used for receiving computing power resources requested by the host through the BMC, and distributes M processors matched with the computing power resources for N hosts based on the serial numbers of the N hosts, wherein the computing power resources are used for representing resources for processing data of the N hosts.
In one exemplary embodiment, a host includes: and the signal transceiver is connected with the CPLD and is used for transmitting a low-level signal of the host to the CPLD, wherein the low-level signal is used for indicating that the host accesses the target server.
In one exemplary embodiment, the processor distribution system further comprises: and the power supply chip is connected with the CPLD in the switch chip and the processor and is used for controlling the power supply of the processor.
The invention is illustrated below with reference to specific examples:
the present embodiment is described taking control of a GPU as an example, and mainly includes the following:
1. and configuring the PCI E link of the GPU. Taking GPU pooling of a dual Host as an example, as shown in fig. 3, host1 and Host2 are hosts of two servers respectively, and the Host and GPU BOX are connected through CDFP connectors 1 and 2 and related cables. The Host1 and Host2 respectively output a path of PCI E x16 high-speed signal, are connected to PCI E Switch chips on the GPU BOX, each PCI E Switch downlink port is respectively connected with 2 GPU cards, and the number of the connected GPU cards in practical application is determined according to the number of ports supported by the PCI E Switch. When the number of PCI e channels provided by the CPU is insufficient, the number of PCIe channels in the system can be expanded through the PCI e Switch chip. Through the expansion of the PCIe Switch chip, more function expansion cards can be accommodated on the whole server.
2. Setting connection topology of PCI E. As shown in fig. 4, the PCI E connection relationship of Host1 and the PCI E connection relationship of Host2 are indicated. Any PCI E X16 from the Host can be connected with any GPU card in the system through the PCI E Switch, so that the aim that each GPU can be accessed by the accessed Host is achieved.
3. A separate power chip is provided for each GPU. As shown in fig. 6, an independent power chip is set for each GPU, so that each GPU card can be powered down independently in idle time or can be powered up in time when connection is required.
In summary, in this embodiment, the connection between each Host and the GPU is a high-speed connection path of the GPU Chi Huadi; the independent power-on and power-off control of the GPU can be realized through the identification of the number of Host hosts under the system, the power supply design and the reset design of the GPU. Through the pooling design of the GPU, the utilization rate of the GPU under the system can be effectively improved, the utilization rate of the GPU computing power resources is improved, part of the GPU resources can be released when the computing power demand is less, the overall power consumption of the system is reduced, unnecessary power waste is reduced, and therefore the purpose of reducing the cost is achieved.
From the description of the above embodiments, it will be clear to a person skilled in the art that the method according to the above embodiments may be implemented by means of software plus the necessary general hardware platform, but of course also by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk), comprising several instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method described in the embodiments of the present application.
The embodiment also provides a processor allocation device, which is used for implementing the foregoing embodiments and preferred embodiments, and is not described in detail. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. While the means described in the following embodiments are preferably implemented in software, implementation in hardware, or a combination of software and hardware, is also possible and contemplated.
Fig. 8 is a block diagram of a processor distribution device according to an embodiment of the present application, as shown in fig. 8, the device includes:
a first determining module 82, configured to determine N hosts included in a target server, where each host corresponds to a host number, and N is a natural number greater than or equal to 1;
a first receiving module 84, configured to receive computing power resources requested by the N hosts, where the computing power resources are resources for processing data of the N hosts;
a first allocation module 86, configured to allocate, based on N number of the N hosts, M processors matched with the computing power resources, where the M processors are configured to process data of the N hosts, the M processors and the N hosts are connected by a switch chip, the switch chip is configured to extend a bus connected to the processors, and the M is a natural number greater than or equal to 1.
In an exemplary embodiment, the first determining module 82 includes:
a first determining unit configured to determine that the host is connected to the target server when a low-level signal is detected by a complex programmable logic device CPLD, where the CPLD is disposed in the switch chip and is connected to a signal transceiver in the host, and the signal transceiver is configured to transmit the low-level signal to the CPLD;
and a second determining unit configured to determine the number of the hosts based on the number of the detected low-level signals, and determine N hosts, where one host corresponds to one low-level signal.
In an exemplary embodiment, the above apparatus further includes:
the first processing module is used for determining the number of the hosts based on the number of the detected low-level signals, numbering each host through the CPLD after determining N hosts, and obtaining N host numbers;
the first storage module is used for storing the N host numbers into a register, wherein the register is arranged in the CPLD;
and the first sending module is used for sending each host number to a management controller BMC in the corresponding host through a transmission bus, wherein the transmission bus is connected with the host and the CPLD.
In an exemplary embodiment, the first receiving module includes:
and the first receiving unit is used for receiving the computing power resources required by each host sent by the BMC in each host to obtain computing power resources of N hosts.
In an exemplary embodiment, the first allocation module includes:
the first sending unit is used for sending a power-on instruction to a target processor to control the target processor to power on under the condition that the target computing power resource requested by the target host is larger than a first preset threshold, wherein the target host is any host in N hosts, and the target processor is a processor matched with the target computing power resource in M processors;
and the first establishing unit is used for establishing connection between the target processor and the target host based on the target number of the target host so as to call the target processor to process the data sent by the target host.
In an exemplary embodiment, the above apparatus further includes:
the second sending module is used for sending a power-down instruction to the target processor to control the power-down of the target processor under the condition that the target computing power resource requested by the target host is smaller than the first preset threshold after the target processor is connected with the target host based on the target number of the target host;
And the first disconnection module is used for disconnecting the connection between the target processor and the target host based on the target number of the target host.
In an exemplary embodiment, the above apparatus further includes:
and a third sending module, configured to send a reset signal to the target processor after the connection between the target processor and the target host is disconnected based on the target number of the target host, so as to reset the bus that is in extended connection between the target processor and the switch.
In an exemplary embodiment, each of the processors is correspondingly connected to a power chip, where the power chip is used to control power supply of the processor.
It should be noted that each of the above modules may be implemented by software or hardware, and for the latter, it may be implemented by, but not limited to: the modules are all located in the same processor; alternatively, the above modules may be located in different processors in any combination.
Embodiments of the present application also provide a computer readable storage medium having a computer program stored therein, wherein the computer program is arranged to perform the steps of any of the method embodiments described above when run.
In one exemplary embodiment, the computer readable storage medium may include, but is not limited to: a usb disk, a Read-On-y Memory (ROM), a random access Memory (Random Access Memory, RAM), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing a computer program.
Embodiments of the present application also provide an electronic device comprising a memory having stored therein a computer program and a processor arranged to run the computer program to perform the steps of any of the method embodiments described above.
In an exemplary embodiment, the electronic device may further include a transmission device connected to the processor, and an input/output device connected to the processor.
Specific examples in this embodiment may refer to the examples described in the foregoing embodiments and the exemplary implementation, and this embodiment is not described herein.
It will be appreciated by those skilled in the art that the modules or steps of the application described above may be implemented in a general purpose computing device, they may be concentrated on a single computing device, or distributed across a network of computing devices, they may be implemented in program code executable by computing devices, so that they may be stored in a storage device for execution by computing devices, and in some cases, the steps shown or described may be performed in a different order than that shown or described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple modules or steps of them may be fabricated into a single integrated circuit module. Thus, the present application is not limited to any specific combination of hardware and software.
The foregoing description is only of the preferred embodiments of the present application and is not intended to limit the same, but rather, various modifications and variations may be made by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the principles of the present application should be included in the protection scope of the present application.

Claims (15)

1. A method of processor allocation, comprising:
determining N hosts included in a target server, wherein each host corresponds to a host number, and N is a natural number greater than or equal to 1;
receiving computing power resources requested by N hosts, wherein the computing power resources are used for representing resources for processing data of the N hosts;
and distributing M processors matched with the computing power resources for N hosts based on the number of the N hosts, wherein the M processors are used for processing data of the N hosts, the M processors are connected with the N hosts through a switch chip, the switch chip is used for expanding buses connected with the processors, and the M is a natural number greater than or equal to 1.
2. The method of claim 1, wherein determining N hosts included in the target server comprises:
Determining that the host is accessed to the target server under the condition that a low-level signal is detected by a complex programmable logic device CPLD, wherein the CPLD is arranged in the switch chip and is connected with a signal transceiver in the host, and the signal transceiver is used for transmitting the low-level signal to the CPLD;
and determining the number of the hosts based on the detected number of the low-level signals, and determining N hosts, wherein one host corresponds to one low-level signal.
3. The method of claim 2, wherein the number of hosts is determined based on the number of low level signals detected, and wherein after determining N hosts, the method further comprises:
numbering each host through the CPLD to obtain N host numbers;
storing N host numbers into a register, wherein the register is arranged in the CPLD;
and sending each host number to a management controller BMC in the corresponding host through a transmission bus, wherein the transmission bus is connected with the host and the CPLD.
4. The method of claim 1, wherein receiving N computing power resources requested by the host comprises:
And receiving the computing power resources required by each host sent by the BMC in each host, and obtaining computing power resources of N hosts.
5. The method of claim 1, wherein assigning M processors matching the computing resources to N hosts based on the N hosts' numbering comprises:
sending a power-on instruction to a target processor under the condition that a target computing power resource requested by a target host is larger than a first preset threshold value so as to control the target processor to be powered on, wherein the target host is any host in N hosts, and the target processor is a processor matched with the target computing power resource in M processors;
and establishing connection between the target processor and the target host based on the target number of the target host so as to call the target processor to process the data sent by the target host.
6. The method of claim 5, wherein after connecting the target processor and the target host based on the target number of the target host, the method further comprises:
sending a power-down instruction to a target processor under the condition that the target computing power resource requested by a target host is smaller than the first preset threshold value so as to control the target processor to power down;
And disconnecting the connection between the target processor and the target host based on the target number of the target host.
7. The method of claim 6, wherein after disconnecting the connection between the target processor and the target host based on the target number of the target host, the method further comprises:
and sending a reset signal to the target processor to reset a bus of the expansion connection between the target processor and the switch.
8. The method according to any one of claims 1-7, further comprising:
each processor is correspondingly connected with a power chip, wherein the power chip is used for controlling the power supply of the processor.
9. A processor-based distribution system, comprising:
the system comprises a target server, wherein N hosts are arranged in the target server, each host corresponds to a host number, and N is a natural number which is greater than or equal to 1;
and the switch chip is connected with the N hosts and M processors, and is used for expanding buses connected with the processors, wherein M is a natural number greater than or equal to 1, and the M processors are used for processing data of the N hosts.
10. The processor-distribution system of claim 9, wherein the switch chip comprises:
the complex programmable logic device CPLD is connected with a management controller BMC in each host through a transmission bus, and is used for receiving computing power resources requested by the hosts through the BMC, and distributing M processors matched with the computing power resources to N hosts based on the serial numbers of the N hosts, wherein the computing power resources are used for representing resources for processing data of the N hosts.
11. The processor-based distribution system of claim 10 wherein the host comprises:
and the signal transceiver is connected with the CPLD and is used for transmitting a low-level signal of the host to the CPLD, wherein the low-level signal is used for indicating that the host accesses the target server.
12. The processor-distribution system of claim 9, wherein the processor-distribution system further comprises:
and the power supply chip is connected with the CPLD in the switch chip and the processor and used for controlling the power supply of the processor.
13. A processor-dispensing device, comprising:
The first determining module is used for determining N hosts included in the target server, wherein each host corresponds to a host number, and N is a natural number greater than or equal to 1;
the first receiving module is used for receiving computing power resources requested by N hosts, wherein the computing power resources are used for representing resources for processing data of the N hosts;
the first allocation module is used for allocating M processors matched with the computing power resources for N hosts based on the number of the N hosts, wherein the M processors are used for processing data of the N hosts, the M processors are connected with the N hosts through a switch chip, the switch chip is used for expanding buses connected with the processors, and the M is a natural number greater than or equal to 1.
14. A computer readable storage medium, characterized in that a computer program is stored in the computer readable storage medium, wherein the computer program, when being executed by a processor, implements the steps of the method according to any of the claims 1 to 8.
15. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method of any one of claims 1 to 8 when the computer program is executed.
CN202310180080.2A 2023-02-28 2023-02-28 Processor allocation method and system, device, storage medium and electronic equipment Pending CN116166434A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310180080.2A CN116166434A (en) 2023-02-28 2023-02-28 Processor allocation method and system, device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310180080.2A CN116166434A (en) 2023-02-28 2023-02-28 Processor allocation method and system, device, storage medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN116166434A true CN116166434A (en) 2023-05-26

Family

ID=86414529

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310180080.2A Pending CN116166434A (en) 2023-02-28 2023-02-28 Processor allocation method and system, device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN116166434A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116501681A (en) * 2023-06-28 2023-07-28 苏州浪潮智能科技有限公司 CXL data transmission board card and method for controlling data transmission
CN117472596A (en) * 2023-12-27 2024-01-30 苏州元脑智能科技有限公司 Distributed resource management method, device, system, equipment and storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116501681A (en) * 2023-06-28 2023-07-28 苏州浪潮智能科技有限公司 CXL data transmission board card and method for controlling data transmission
CN116501681B (en) * 2023-06-28 2023-09-29 苏州浪潮智能科技有限公司 CXL data transmission board card and method for controlling data transmission
CN117472596A (en) * 2023-12-27 2024-01-30 苏州元脑智能科技有限公司 Distributed resource management method, device, system, equipment and storage medium
CN117472596B (en) * 2023-12-27 2024-03-22 苏州元脑智能科技有限公司 Distributed resource management method, device, system, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN116166434A (en) Processor allocation method and system, device, storage medium and electronic equipment
CN109766302B (en) Method and device for managing equipment
CN116501681B (en) CXL data transmission board card and method for controlling data transmission
CN116243995B (en) Communication method, communication device, computer readable storage medium, and electronic apparatus
CN107948097B (en) Bandwidth adjusting method and equipment
CN116302617B (en) Method for sharing memory, communication method, embedded system and electronic equipment
CN116627520B (en) System operation method of baseboard management controller and baseboard management controller
CN110704365A (en) Reconstruction device based on FPGA
CN114691286A (en) Server system, virtual machine creation method and device
CN116126742A (en) Memory access method, device, server and storage medium
CN117312229A (en) Data transmission device, data processing equipment, system, method and medium
CN116032746B (en) Information processing method and device of resource pool, storage medium and electronic device
US11687377B1 (en) High performance computer with a control board, modular compute boards and resource boards that can be allocated to the modular compute boards
CN111158905A (en) Method and device for adjusting resources
CN112148663A (en) Data exchange chip and server
CN110119111B (en) Communication method and device, storage medium, and electronic device
CN115766044A (en) Communication method based on user mode protocol stack and corresponding device
CN112069108A (en) Flexible server configuration system and method based on PCIE Switch
CN117056275B (en) Communication control method, device and server based on hardware partition system
CN114448963B (en) Method and system for sharing communication by peripheral under fusion control architecture
CN117041147B (en) Intelligent network card equipment, host equipment, method and system
WO2023080339A1 (en) Memory management device for virtual machine
CN112732627B (en) OCP device and server
CN115563039A (en) Method and device for processing data of multi-node server
CN117076356A (en) Instruction sending method and device, substrate management controller and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination