WO2023000673A1 - 硬件加速器设备管理方法、装置及电子设备和存储介质 - Google Patents

硬件加速器设备管理方法、装置及电子设备和存储介质 Download PDF

Info

Publication number
WO2023000673A1
WO2023000673A1 PCT/CN2022/078281 CN2022078281W WO2023000673A1 WO 2023000673 A1 WO2023000673 A1 WO 2023000673A1 CN 2022078281 W CN2022078281 W CN 2022078281W WO 2023000673 A1 WO2023000673 A1 WO 2023000673A1
Authority
WO
WIPO (PCT)
Prior art keywords
accelerator device
information
hardware accelerator
resource
hardware
Prior art date
Application number
PCT/CN2022/078281
Other languages
English (en)
French (fr)
Inventor
张百林
亓开元
苏志远
宋文平
Original Assignee
山东海量信息技术研究院
郑州云海信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 山东海量信息技术研究院, 郑州云海信息技术有限公司 filed Critical 山东海量信息技术研究院
Publication of WO2023000673A1 publication Critical patent/WO2023000673A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5044Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering hardware capabilities

Definitions

  • the present application relates to the field of computer technology, and more specifically, to a hardware accelerator device management method and device, an electronic device, and a computer-readable storage medium.
  • accelerator devices such as GPU (graphics processing unit, graphics processing unit), FPGA (Field Programmable Gate Array, field programmable logic gate array), SmartNIC (smart network card) and so on came into being pregnancy.
  • GPU graphics processing unit
  • FPGA Field Programmable Gate Array
  • SmartNIC smart network card
  • hardware accelerator devices can generate multiple virtual devices from one physical accelerator device, such as GPU graphics cards that support virtualization, which can generally be divided into Time slices of different specifications can be provided to multiple cloud hosts on the cloud platform at the same time, thereby improving the utilization rate of hardware accelerator devices and greatly improving the computing power of the cloud platform.
  • Cyborg is an intelligent accelerator device management project that is very active in the OpenStack international open source community.
  • the currently implemented functions mainly include the discovery, resource reporting, and display functions of accelerated device resources such as GPU, FPGA, and SSD (Solid State Disk).
  • the function of interaction between Nova project and Cyborg project is provided.
  • it is impossible to realize the reservation and protection of the accelerator device and it is impossible to reserve the accelerator device on the cloud platform.
  • accelerator device resources must be specified, and accelerator devices cannot be customized and scheduled according to requirements.
  • the purpose of the present application is to provide a hardware accelerator device management method and device, an electronic device and a computer-readable storage medium, which realize the reservation of the accelerator device and improve the flexibility of the accelerator device.
  • the present application provides a hardware accelerator device management method, including:
  • the hardware accelerator device includes a physical accelerator device and/or a virtualization accelerator device;
  • the status information includes available status, in-use status, and maintenance status
  • the resource pooling information is used to represent the The resource pool to which the hardware accelerator device belongs
  • the basic information, status information and resource pooling information of the hardware accelerator device are displayed through the resource manager.
  • the resource pool has a device virtualization attribute
  • the device virtualization attribute is used to indicate whether the hardware accelerator device in the resource pool supports virtualization. If the device virtualization attribute is enabled, the Contains physical accelerator devices and corresponding virtualization accelerator devices.
  • assigning the hardware accelerator device to a corresponding resource pool includes:
  • assigning the hardware accelerator device to a corresponding resource pool includes:
  • the creation request includes a requested target accelerator device type
  • the target accelerator device type includes a target physical accelerator device type and/or a target virtualization accelerator device type
  • the determining the target physical host conforming to the target accelerator type includes:
  • the basic information, state information and resource pooling information of the hardware accelerator device are stored in the database and reported to the resource manager, including:
  • the state information is reported to the resource manager as the basic information, state information and resource pooling information of the hardware accelerator device in the usable state and in use state;
  • the displaying the basic information, status information and resource pooling information of the hardware accelerator device through the resource manager includes:
  • the state information displayed by the resource manager is the basic information, state information and resource pooling information of the hardware accelerator device in the usable state and in use state.
  • a hardware accelerator device management device including:
  • An allocation module configured to create a resource pool in the cloud platform, and allocate a hardware accelerator device in the cloud platform to a corresponding resource pool; wherein, the hardware accelerator device includes a physical accelerator device and/or a virtualization accelerator device ;
  • An acquisition module configured to acquire basic information, status information, and resource pooling information of hardware accelerator devices in the cloud platform; wherein, the status information includes available status, in-use status, and maintenance status, and the resource pooling information Used to indicate the resource pool to which the hardware accelerator device belongs;
  • a reporting module configured to store the basic information, status information and resource pooling information of the hardware accelerator device in a database and report to the resource manager;
  • a display module configured to display the basic information, status information and resource pooling information of the hardware accelerator device through the resource manager.
  • an electronic device including:
  • the processor is configured to implement the steps of the above hardware accelerator device management method when executing the computer program.
  • the present application provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the steps of the above-mentioned hardware accelerator device management method are implemented.
  • a hardware accelerator device management method includes: creating a resource pool in the cloud platform, and assigning the hardware accelerator device in the cloud platform to the corresponding resource pool; wherein, the The hardware accelerator device includes a physical accelerator device and/or a virtualized accelerator device; obtains basic information, status information and resource pooling information of the hardware accelerator device in the cloud platform; wherein, the status information includes an available state, a state in use and maintenance state, the resource pooling information is used to indicate the resource pool to which the hardware accelerator device belongs; storing the basic information, status information and resource pooling information of the hardware accelerator device in a database and reporting to the resource manager ; Displaying the basic information, status information and resource pooling information of the hardware accelerator device through the resource manager.
  • the hardware accelerator device management method provided by the present application performs resource pool management on the hardware accelerator device, and maintains status information of a single hardware accelerator, including available status, in-use status and maintenance status. Setting the hardware accelerator device to the maintenance state can reserve effective hardware device resources for the user's special business at all times, effectively improving the flexibility, maintainability and operability of the accelerator device.
  • the application also discloses a hardware accelerator device management device, an electronic device, and a computer-readable storage medium, which can also achieve the above-mentioned technical effects.
  • FIG. 1 is a flowchart of a method for managing a hardware accelerator device according to an exemplary embodiment
  • Figure 2 is a schematic diagram of the database table before and after modification of Cyborg devices
  • Fig. 3 is a kind of Cyborg accelerator device management module architecture diagram shown according to an exemplary embodiment
  • Fig. 4 is a flow chart showing another hardware accelerator device management method according to an exemplary embodiment
  • Fig. 5 is a kind of Nova and Cyborg interaction architecture diagram shown according to an exemplary embodiment
  • Fig. 6 is a structural diagram of a hardware accelerator device management device according to an exemplary embodiment
  • Fig. 7 is a structural diagram of an electronic device according to an exemplary embodiment.
  • the embodiment of the present application discloses a hardware accelerator device management method, which realizes the reservation of the accelerator device and improves the flexibility of the accelerator device.
  • a flow chart of a hardware accelerator device management method shown according to an exemplary embodiment, as shown in FIG. 1, includes:
  • S101 Create a resource pool in the cloud platform, and allocate a hardware accelerator device in the cloud platform to the corresponding resource pool; wherein, the hardware accelerator device includes a physical accelerator device and/or a virtualization accelerator device;
  • This embodiment can be applied to the OpenStack cloud platform. After the deployment of the cloud platform is completed, the user needs to configure the hardware acceleration device resource pool on the server carrying the accelerator device.
  • the hardware accelerator device in the cloud platform can include physical accelerator devices and virtualization accelerator device. If the hardware accelerator device is divided into the corresponding resource pool, it cannot be displayed independently.
  • enabled_gpu_types gpu-device-driver
  • pool_name ⁇ resource pool name>
  • the resource pool name pool_name is optional. If the resource pool name is set, it means that the two virtual GPU accelerator devices vgpu-device-1 and vgpu-device-2 will be allocated to the resource pool for management.
  • allocating the hardware accelerator device to a corresponding resource pool includes: allocating hardware accelerators of the same type to a same resource pool.
  • hardware accelerator devices of the same type can be centrally managed in the same resource pool, such as GPU devices from Intel and NVDIA.
  • allocating the hardware accelerator device to a corresponding resource pool includes: allocating hardware accelerators configured on the same physical host to the same resource pool.
  • different types of hardware accelerators configured on the same physical host can be managed through the same resource pool, such as Inspur NVMe SSD devices and Intel GPU devices.
  • the resource pool has a device virtualization attribute, and the device virtualization attribute is used to indicate whether the hardware accelerator device in the resource pool supports virtualization, if the device virtualization If the attribute is enabled, the resource pool includes physical accelerator devices and corresponding virtualization accelerator devices.
  • the resource pool has a device virtualization attribute, which is used to indicate whether the hardware accelerator device in the resource pool supports virtualization. If the device virtualization attribute is enabled, all devices in the resource pool support virtualization, for example, the resource pool may include GPU virtualization and SmartNIC virtualization.
  • S102 Obtain basic information, status information, and resource pooling information of hardware accelerator devices in the cloud platform; wherein, the status information includes available status, in-use status, and maintenance status, and the resource pooling information is used to represent The resource pool to which the hardware accelerator device belongs;
  • the fields of resource pooling information (pool_name) and status information (status) are newly added in the Cyborg devices database table, the resource pooling information is used to indicate the resource pool to which the hardware accelerator device belongs, and the status information is used to indicate the hardware
  • the status of the accelerator device includes an available status (available), an in-use status (in-use) and a maintenance status (maintaining).
  • the database table of Cyborg devices before and after modification is shown in Figure 2, the database table before modification is on the left, and the database table after modification is on the right.
  • Cyborg maintains the status of the device in the database table and the information of the device resource pool through the cyborg-conductor service. Cyborg is deployed on the computing node (Compute). Cyborg- The agent collects accelerator device information through each accelerator device driver, including device manufacturer, UUID, device attribute, device name, device status, etc., and stores the device information in the database table through cyborg-conductor. Added cyborg-api interface, through this API (Application Programming Interface, application programming interface), the status information and resource pooling information of hardware accelerator devices can be set. When setting accelerator resource pool information through this API, check whether the selected multiple hardware accelerator devices have the same attributes, and if so, mark the attributes of the resource pool to provide accurate device resource pool information for the resource manager (Placement) .
  • API Application Programming Interface
  • S103 Store the basic information, status information and resource pooling information of the hardware accelerator device in a database and report to the resource manager;
  • the basic information, state information and resource pooling information of the hardware accelerator device are stored in a database table through cyborg-conductor. Synchronously report the status information of the hardware accelerator device and resource pooling information to the Placement resource manager through cyborg-api. At the same time, it can also regularly report the usage of resource devices to the Placement resource manager through scheduled tasks, so as to provide accurate resource information when Nova creates cloud hosts to schedule resources.
  • this step includes: storing the basic information, state information and resource pooling information of the hardware accelerator device in a database; The basic information, status information and resource pooling information of the device are reported to the resource manager.
  • a hardware accelerator device is set to the maintenance state through cyborg-api, it means that the hardware accelerator device is in maintenance mode and cannot be accessed.
  • the cyborg scheduled task reports resources to the Placement resource manager, it will be removed from the To reduce the impact of this device on computing resources, cloud platform administrators can query the list of hardware accelerator devices under maintenance through cyborg-api and update the status of corresponding accelerator devices.
  • S104 Display basic information, status information, and resource pooling information of the hardware accelerator device through the resource manager.
  • this step the basic information, status information and resource pooling information of the hardware accelerator device are displayed through the resource manager to provide effective scheduling information for the creation of the cloud host.
  • this step may include: displaying, through the resource manager, the basic information, status information and resource pooling information of the hardware accelerator device whose status information is an available status and an in-use status. It is understandable that since the devices in the maintenance state cannot be reported to the resource manager, they cannot be scheduled when creating the cloud host scheduling accelerator, which realizes effective protection or reservation of the target accelerator device.
  • the hardware accelerator device management method performs resource pool management on the hardware accelerator device, and maintains status information of a single hardware accelerator, including available status, in-use status, and maintenance status. Setting the hardware accelerator device to the maintenance state can reserve effective hardware device resources for the user's special business at all times, effectively improving the flexibility, maintainability and operability of the accelerator device.
  • This example will introduce the creation process of the cloud host, specifically:
  • FIG. 4 a flowchart of another hardware accelerator device management method shown according to an exemplary embodiment, as shown in FIG. 4 , includes:
  • S201 Receive a creation request of a cloud host; wherein, the creation request includes a requested target accelerator device type, and the target accelerator device type includes a target physical accelerator device type and/or a target virtualization accelerator device type;
  • the creation request of the cloud host may include cloud host name, description, resource specification, network card information, user, project and other information, and the requested target accelerator device type is set in the resource specification.
  • the interactive architecture of Nova and Cyborg is shown in Figure 5.
  • Nova can request a single physical accelerator device, multiple physical accelerator devices, a single virtualization accelerator device, and multiple virtualization accelerator devices, that is, the target accelerator device type includes the target physical accelerator Device type and/or target virtualization accelerator device type.
  • Nova can also directly request all physical accelerator devices in a certain resource pool.
  • the format of the creation request is: ⁇ "name”: “cloud host name”, “description”: “cloud host description information”, “flavor”: ⁇ 'device_profile_name1': 'DP_GPU ' ⁇ , "network”: “network_id”, “project_id”: “project ID”, “user_id”: “user ID” ⁇ .
  • the format of the creation request is: ⁇ "name”: “cloud host name”, “description”: “cloud host description information”, “flavor”: ⁇ 'device_profile_gpu': 'DP_GPU', 'device_profile_fpga': 'DP_FPGA' ⁇ .
  • the format of the creation request is: ⁇ "name”: “cloud host name”, “description”: “cloud host description information”, “flavor”: ⁇ 'device_profile_gpu':'DP_vGPU ', 'device_profile_fpga': 'DP_vFPGA' ⁇ .
  • the format of the creation request is: ⁇ "name”: “cloud host name”, “description”: “cloud host description information”, “flavor”: ⁇ 'device_profile_gpu': 'DP_vGPU' , 'device_profile_fpga': 'DP_FPGA' ⁇ .
  • S202 Determine a target physical host conforming to the target accelerator type, and acquire performance parameters of the target physical host through information displayed by the resource manager;
  • S203 Determine the best target physical host by using a preset scheduling algorithm based on the performance parameters of the target physical host;
  • S204 Start the cloud host on the optimal target physical host to complete the creation of the cloud host.
  • the Nova API receives the creation request of the cloud host, and analyzes the request through nova-conductor according to the resources provided by the placement.
  • the target physical host equipped with GPU and FPGA devices.
  • nova-scheduler sets the weight of each target physical host through the accelerator intelligent scheduling algorithm combined with the performance parameters such as the number of CPU cores, memory size, and hard disk size of the cloud host, sorts the hosts according to the weight, and takes the target physical host with the highest weight as the request cloud host
  • the optimal target physical host establish the binding information of the cloud host and the target accelerator device configured on the optimal target physical host by calling cyborg-api, and set the status information of the target accelerator device in the database table to the in-use state (in- use), and report to the Placement resource manager to update the device usage. Start the cloud host on the optimal target physical host to complete the creation of the cloud host.
  • the target resource pool includes both GPU devices and FPGA devices.
  • the requested accelerator device is automatically selected when the cloud host is created, which effectively guarantees the flexibility of the cloud platform to create the accelerator device cloud host, and plans accelerator device resources on demand.
  • the following is an introduction to an apparatus for managing a hardware accelerator device provided by an embodiment of the present application.
  • the apparatus for managing a hardware accelerator device described below and the method for managing a hardware accelerator device described above may refer to each other.
  • FIG. 6 a structural diagram of a hardware accelerator device management device according to an exemplary embodiment, as shown in FIG. 6, includes:
  • the allocation module 601 is configured to create a resource pool in the cloud platform, and allocate a hardware accelerator device in the cloud platform to a corresponding resource pool; wherein, the hardware accelerator device includes a physical accelerator device and/or a virtualization accelerator equipment;
  • An acquisition module 602 configured to acquire basic information, status information, and resource pooling information of hardware accelerator devices in the cloud platform; wherein, the status information includes an available state, an in-use state, and a maintenance state, and the resource pooling The information is used to indicate the resource pool to which the hardware accelerator device belongs;
  • a reporting module 603, configured to store the basic information, status information and resource pooling information of the hardware accelerator device in a database and report to the resource manager;
  • the display module 604 is configured to display the basic information, status information and resource pooling information of the hardware accelerator device through the resource manager.
  • the hardware accelerator device management device provided in the embodiment of the present application performs resource pool management on hardware accelerator devices, and maintains status information of a single hardware accelerator, including available status, in-use status, and maintenance status. Setting the hardware accelerator device to the maintenance state can reserve effective hardware device resources for the user's special business at all times, effectively improving the flexibility, maintainability and operability of the accelerator device.
  • the resource pool has a device virtualization attribute, and the device virtualization attribute is used to indicate whether the hardware accelerator device in the resource pool supports virtualization, if If the device virtualization attribute is enabled, the resource pool includes physical accelerator devices and corresponding virtualization accelerator devices.
  • the allocation module 601 specifically creates a resource pool in the cloud platform, and allocates hardware accelerators of the same type to modules in the same resource pool.
  • the allocation module 601 is specifically to create a resource pool in the cloud platform, and allocate hardware accelerators configured on the same physical host to modules in the same resource pool.
  • a receiving module configured to receive a creation request of a cloud host; wherein, the creation request includes a requested target accelerator device type, and the target accelerator device type includes a target physical accelerator device type and/or a target virtualization accelerator device type;
  • a first determining module configured to determine a target physical host conforming to the target accelerator type, and acquire performance parameters of the target physical host through information displayed by the resource manager;
  • the second determination module is configured to determine the best target physical host by using a preset scheduling algorithm based on the performance parameters of the target physical host;
  • the starting module is used to start the cloud host on the optimal target physical host, so as to complete the creation operation of the cloud host.
  • the first determination module specifically determines the target resource pool conforming to the target accelerator type, and determines the target to which the hardware accelerator contained in the target resource pool belongs.
  • the reporting module 603 specifically stores the basic information, status information and resource pooling information of the hardware accelerator device in a database, and stores the status information A module that reports to the resource manager the basic information, status information and resource pooling information of the hardware accelerator device in the usable state and in use state;
  • the display module 604 is specifically a module for displaying the basic information, status information and resource pooling information of hardware accelerator devices whose status information is available and in use through the resource manager.
  • FIG. 7 is a structural diagram of an electronic device according to an exemplary embodiment, as shown in As shown in Figure 7, the electronic equipment includes:
  • Communication interface 1 which can exchange information with other devices such as network devices;
  • the processor 2 is connected to the communication interface 1 to implement information interaction with other devices, and is used to execute the hardware accelerator device management method provided by one or more of the above technical solutions when running a computer program. Instead, the computer program is stored on the memory 3 .
  • bus system 4 is used to realize connection and communication between these components.
  • the bus system 4 also includes a power bus, a control bus and a status signal bus.
  • the various buses are labeled as bus system 4 in FIG. 7 .
  • the memory 3 in the embodiment of the present application is used to store various types of data to support the operation of the electronic device. Examples of such data include: any computer program used to operate on an electronic device.
  • the memory 3 may be a volatile memory or a non-volatile memory, and may also include both volatile and non-volatile memories.
  • the non-volatile memory can be read-only memory (ROM, Read Only Memory), programmable read-only memory (PROM, Programmable Read-Only Memory), erasable programmable read-only memory (EPROM, Erasable Programmable Read-Only Memory) Only Memory), Electrically Erasable Programmable Read-Only Memory (EEPROM, Electrically Erasable Programmable Read-Only Memory), Magnetic Random Access Memory (FRAM, ferromagnetic random access memory), Flash Memory (Flash Memory), Magnetic Surface Memory , CD, or CD-ROM (Compact Disc Read-Only Memory); magnetic surface storage can be disk storage or tape storage.
  • the volatile memory may be random access memory (RAM, Random Access Memory), which is used as an external cache.
  • RAM random access memory
  • RAM Random Access Memory
  • many forms of RAM are available, such as Static Random Access Memory (SRAM, Static Random Access Memory), Synchronous Static Random Access Memory (SSRAM, Synchronous Static Random Access Memory), Dynamic Random Access Memory Memory (DRAM, Dynamic Random Access Memory), synchronous dynamic random access memory (SDRAM, Synchronous Dynamic Random Access Memory), double data rate synchronous dynamic random access memory (DDRSDRAM, Double Data Rate Synchronous Dynamic Random Access Memory), enhanced Synchronous Dynamic Random Access Memory (ESDRAM, Enhanced Synchronous Dynamic Random Access Memory), Synchronous Link Dynamic Random Access Memory (SLDRAM, SyncLink Dynamic Random Access Memory), Direct Memory Bus Random Access Memory (DRRAM, Direct Rambus Random Access Memory ).
  • the memory 2 described in the embodiment of the present application is intended to include but not limited to these and any other suitable types of memory.
  • Processor 2 may be an integrated circuit chip with signal processing capabilities. In the implementation process, each step of the above method can be completed by an integrated logic circuit of hardware in the processor 2 or instructions in the form of software.
  • the aforementioned processor 2 may be a general-purpose processor, DSP, or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and the like.
  • the processor 2 may implement or execute various methods, steps, and logic block diagrams disclosed in the embodiments of the present application.
  • a general purpose processor may be a microprocessor or any conventional processor or the like.
  • the steps of the method disclosed in the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in the storage medium, and the storage medium is located in the memory 3, and the processor 2 reads the program in the memory 3, and completes the steps of the foregoing method in combination with its hardware.
  • the embodiment of the present application also provides a storage medium, that is, a computer storage medium, specifically a computer-readable storage medium, for example, including a memory 3 storing a computer program, the above-mentioned computer program can be executed by the processor 2, To complete the steps described in the aforementioned method.
  • the computer-readable storage medium can be memories such as FRAM, ROM, PROM, EPROM, EEPROM, Flash Memory, magnetic surface memory, optical disk, or CD-ROM.
  • the above-mentioned integrated units of the present application are realized in the form of software function modules and sold or used as independent products, they can also be stored in a computer-readable storage medium.
  • the technical solution of the embodiment of the present application is essentially or the part that contributes to the prior art can be embodied in the form of a software product.
  • the computer software product is stored in a storage medium and includes several instructions for Make an electronic device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the methods described in the various embodiments of the present application.
  • the aforementioned storage medium includes: various media capable of storing program codes such as removable storage devices, ROM, RAM, magnetic disks or optical disks.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Debugging And Monitoring (AREA)

Abstract

本申请公开了一种硬件加速器设备管理方法、装置及一种电子设备和计算机可读存储介质,该方法包括:在云平台中创建资源池,并将所述云平台中的硬件加速器设备分配至对应的资源池中;获取所述云平台中硬件加速器设备的基本信息、状态信息和资源池化信息;其中,所述状态信息包括可使用状态、正在使用状态和维护状态,所述资源池化信息用于表示所述硬件加速器设备所属的资源池;将所述硬件加速器设备的基本信息、状态信息和资源池化信息存储至数据库中并上报至资源管理器;通过所述资源管理器显示所述硬件加速器设备的基本信息、状态信息和资源池化信息。本申请提供的硬件加速器设备管理方法,提高了加速器设备的灵活性、可维护性和可操作性。

Description

硬件加速器设备管理方法、装置及电子设备和存储介质
本申请要求在2021年7月21日提交中国专利局、申请号为2020110825190.0、发明名称为“硬件加速器设备管理方法、装置及电子设备和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机技术领域,更具体地说,涉及一种硬件加速器设备管理方法、装置及一种电子设备和一种计算机可读存储介质。
背景技术
在云计算、人工智能和5G技术盛行的时代,诸如GPU(图形处理器,graphics processing unit)、FPGA(Field Programmable Gate Array,现场可编程逻辑门阵列)、SmartNIC(智能网卡)等加速器设备应运而生。当前云平台对智能加速器设备的管理处于基础阶段,核心技术是通过虚拟化平台的PCI-passthrough技术将主机的物理设备直接绑定到云主机里面使用。当然,随着硬件设备的快速发展,虚拟化硬件加速技术的不断增强,硬件加速器设备实现了一个物理加速器设备可以衍生出多个虚拟设备,例如支持虚拟化的GPU显卡,一般可以根据需要切分成不同的规格的时间片,,能够同时提供给云平台上的多个云主机使用,从而提升硬件加速器设备的利用率,大幅度提升云平台的计算能力。
Cyborg是OpenStack国际开源社区非常活跃的一个智能加速器设备管理项目,当前实现的功能主要有GPU、FPGA、SSD(固态硬盘,Solid State Disk)等加速设备资源的发现、资源上报和展示功能,同时实现了Nova项目和Cyborg项目交互的功能。在相关技术中,无法实现对加速器设备的预留和保护,无法做到对云平台上的加速器设备进行预留。另外,通过Nova创建云主机时,必须指定加速器设备资源,无法根据需求自定义调度加速器设备。
因此,如何实现加速器设备的预留,提高加速器设备的灵活性是本领域 技术人员需要解决的技术问题。
发明内容
本申请的目的在于提供一种硬件加速器设备管理方法、装置及一种电子设备和一种计算机可读存储介质,实现了加速器设备的预留,提高加速器设备的灵活性。
为实现上述目的,本申请提供了一种硬件加速器设备管理方法,包括:
在云平台中创建资源池,并将所述云平台中的硬件加速器设备分配至对应的资源池中;其中,所述硬件加速器设备包括物理加速器设备和/或虚拟化加速器设备;
获取所述云平台中硬件加速器设备的基本信息、状态信息和资源池化信息;其中,所述状态信息包括可使用状态、正在使用状态和维护状态,所述资源池化信息用于表示所述硬件加速器设备所属的资源池;
将所述硬件加速器设备的基本信息、状态信息和资源池化信息存储至数据库中并上报至资源管理器;
通过所述资源管理器显示所述硬件加速器设备的基本信息、状态信息和资源池化信息。
其中,所述资源池具有设备虚拟化属性,所述设备虚拟化属性用于表示所述资源池中的硬件加速器设备是否支持虚拟化,若所述设备虚拟化属性开启,则所述资源池中包含物理加速器设备和对应的虚拟化加速器设备。
其中,将所述硬件加速器设备分配至对应的资源池中,包括:
将相同类型的硬件加速器分配至同一资源池中。
其中,将所述硬件加速器设备分配至对应的资源池中,包括:
将同一物理主机配置的硬件加速器分配至同一资源池中。
其中,还包括:
接收云主机的创建请求;其中,所述创建请求包括请求的目标加速器设备类型,所述目标加速器设备类型包括目标物理加速器设备类型和/或目标虚拟化加速器设备类型;
确定符合所述目标加速器类型的目标物理主机,并通过所述资源管理器 显示的信息获取所述目标物理主机的性能参数;
基于所述目标物理主机的性能参数利用预设调度算法确定最佳目标物理主机;
在所述最佳目标物理主机上启动所述云主机,以完成所述云主机的创建操作。
其中,所述确定符合所述目标加速器类型的目标物理主机,包括:
确定符合所述目标加速器类型的目标资源池,并确定所述目标资源池中包含的硬件加速器所属的目标物理主机。
其中,将所述硬件加速器设备的基本信息、状态信息和资源池化信息存储至数据库中并上报至资源管理器,包括:
将所述硬件加速器设备的基本信息、状态信息和资源池化信息存储至数据库中;
将所述状态信息为可使用状态和正在使用状态的硬件加速器设备的基本信息、状态信息和资源池化信息上报至资源管理器;
相应的,所述通过所述资源管理器显示所述硬件加速器设备的基本信息、状态信息和资源池化信息,包括:
通过所述资源管理器显示所述状态信息为可使用状态和正在使用状态的硬件加速器设备的基本信息、状态信息和资源池化信息。
为实现上述目的,本申请提供了一种硬件加速器设备管理装置,包括:
分配模块,用于在云平台中创建资源池,并将所述云平台中的硬件加速器设备分配至对应的资源池中;其中,所述硬件加速器设备包括物理加速器设备和/或虚拟化加速器设备;
获取模块,用于获取所述云平台中硬件加速器设备的基本信息、状态信息和资源池化信息;其中,所述状态信息包括可使用状态、正在使用状态和维护状态,所述资源池化信息用于表示所述硬件加速器设备所属的资源池;
上报模块,用于将所述硬件加速器设备的基本信息、状态信息和资源池化信息存储至数据库中并上报至资源管理器;
显示模块,用于通过所述资源管理器显示所述硬件加速器设备的基本信息、状态信息和资源池化信息。
为实现上述目的,本申请提供了一种电子设备,包括:
存储器,用于存储计算机程序;
处理器,用于执行所述计算机程序时实现如上述硬件加速器设备管理方法的步骤。
为实现上述目的,本申请提供了一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如上述硬件加速器设备管理方法的步骤。
通过以上方案可知,本申请提供的一种硬件加速器设备管理方法,包括:在云平台中创建资源池,并将所述云平台中的硬件加速器设备分配至对应的资源池中;其中,所述硬件加速器设备包括物理加速器设备和/或虚拟化加速器设备;获取所述云平台中硬件加速器设备的基本信息、状态信息和资源池化信息;其中,所述状态信息包括可使用状态、正在使用状态和维护状态,所述资源池化信息用于表示所述硬件加速器设备所属的资源池;将所述硬件加速器设备的基本信息、状态信息和资源池化信息存储至数据库中并上报至资源管理器;通过所述资源管理器显示所述硬件加速器设备的基本信息、状态信息和资源池化信息。
本申请提供的硬件加速器设备管理方法,对硬件加速器设备进行资源池化管理,并维护单一硬件加速器的状态信息,包括可使用状态、正在使用状态和维护状态。设置硬件加速器设备为维护状态,可以为用户特殊业务时刻保预留有效的硬件设备资源,有效提高了加速器设备的灵活性、可维护性和可操作性。本申请还公开了一种硬件加速器设备管理装置及一种电子设备和一种计算机可读存储介质,同样能实现上述技术效果。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性的,并不能限制本申请。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。附图 是用来提供对本公开的进一步理解,并且构成说明书的一部分,与下面的具体实施方式一起用于解释本公开,但并不构成对本公开的限制。在附图中:
图1为根据一示例性实施例示出的一种硬件加速器设备管理方法的流程图;
图2为Cyborg devices修改前后的数据库表的示意图;
图3为根据一示例性实施例示出的一种Cyborg加速器设备管理模块架构图;
图4为根据一示例性实施例示出的另一种硬件加速器设备管理方法的流程图;
图5为根据一示例性实施例示出的一种Nova和Cyborg交互架构图;
图6为根据一示例性实施例示出的一种硬件加速器设备管理装置的结构图;
图7为根据一示例性实施例示出的一种电子设备的结构图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述。显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。另外,在本申请实施例中,“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。
本申请实施例公开了一种硬件加速器设备管理方法,实现了加速器设备的预留,提高加速器设备的灵活性。
参见图1,根据一示例性实施例示出的一种硬件加速器设备管理方法的流程图,如图1所示,包括:
S101:在云平台中创建资源池,并将所述云平台中的硬件加速器设备分配至对应的资源池中;其中,所述硬件加速器设备包括物理加速器设备和/或虚拟化加速器设备;
本实施例可以应用于OpenStack云平台,云平台部署完成之后,用户需要在携带加速器设备的服务器上进行硬件加速设备资源池化的配置,云平台中的硬件加速器设备可以包括物理加速器设备和虚拟化加速器设备。如果硬件加速器设备被划分到对应的资源池中,是不能再独立展示的。
以GPU和虚拟GPU设备为例,初始化物理GPU设备:
[devices]
enabled_gpu_types=gpu-device-driver
初始化单个虚拟化设备:
[devices]
enabled_vgpu_types=vgpu-device-1
初始化多个虚拟化设备:
[devices]
enabled_vgpu_types=vgpu-device-1,vgpu-device-2
pool_name=<资源池名称>
[vgpu_gpu-device-1]
device_addresses=0000:58:00.0,0000:76:00.0
[vgpu_gpu-device-2]
device_addresses=0000:89:00.0
其中,资源池名称pool_name是可选的,如果设置资源池名称,表示vgpu-device-1和vgpu-device-2这两种虚拟GPU加速器设备会被分配至资源池中进行管理。
作为一种可行的实施方式,将所述硬件加速器设备分配至对应的资源池中,包括:将相同类型的硬件加速器分配至同一资源池中。在具体实施中,可以将相同类型的硬件加速器设备放在同一个资源池中集中式管理,例如来自Intel和NVDIA的GPU设备。
作为另一种可行的实施方式,将所述硬件加速器设备分配至对应的资源池中,包括:将同一物理主机配置的硬件加速器分配至同一资源池中。在具体实施中,可以将同一物理主机配置的不同类型的硬件加速器通过同一个资源池管理,例如来自Inspur NVMe SSD设备和Intel GPU设备。
进一步的,作为一种可选实施方式,所述资源池具有设备虚拟化属性, 所述设备虚拟化属性用于表示所述资源池中的硬件加速器设备是否支持虚拟化,若所述设备虚拟化属性开启,则所述资源池中包含物理加速器设备和对应的虚拟化加速器设备。在具体实施中,资源池具有设备虚拟化属性,用于表示资源池中的硬件加速器设备是否支持虚拟化。如果开启设备虚拟化属性,则资源池中的设备都是支持虚拟化的,例如资源池中可以包含GPU虚拟化和智能网卡(SmartNIC)虚拟化等。
S102:获取所述云平台中硬件加速器设备的基本信息、状态信息和资源池化信息;其中,所述状态信息包括可使用状态、正在使用状态和维护状态,所述资源池化信息用于表示所述硬件加速器设备所属的资源池;
在本实施例中,在Cyborg devices数据库表新增资源池化信息(pool_name)和状态信息(status)的字段,资源池化信息用于表示硬件加速器设备所属的资源池,状态信息用于表示硬件加速器设备所处的状态,包括可使用状态(available)、正在使用状态(in-use)和维护状态(maintaining)。Cyborg devices修改前后的数据库表如图2所示,左边为修改前的数据库表,右边为修改后的数据库表。
在具体实施中,Cyborg加速器设备管理模块架构图如图3所示,Cyborg通过cyborg-conductor服务维护数据库表中设备的状态及设备资源池化信息,Cyborg部署在计算节点(Compute)上,cyborg-agent通过各个加速器设备驱动程序收集加速器设备信息,包括设备厂商、UUID、设备属性、设备名称、设备状态等,并通过cyborg-conductor将这些设备信息存储在数据库表中。新增cyborg-api接口,通过此API(Application Programming Interface,应用程序接口)可以设置硬件加速器设备的状态信息和资源池化信息。通过此API设置加速器资源池化信息时,校验所选的多个硬件加速器设备是否具有相同属性,若是,则标记该资源池的属性,以为资源管理器(Placement)提供准确的设备资源池信息。
S103:将所述硬件加速器设备的基本信息、状态信息和资源池化信息存储至数据库中并上报至资源管理器;
在具体实施中,通过cyborg-conductor将硬件加速器设备的基本信息、状态信息和资源池化信息存储在数据库表中。通过cyborg-api将硬件加速器设备的状态信息和资源池化信息同步上报到Placement资源管理器。同时,还可以 通过定时任务定期向Placement资源管理器上报资源设备使用情况,以在Nova创建云主机调度资源时提供准确的资源信息。
作为一种可选实施方式,本步骤包括:将所述硬件加速器设备的基本信息、状态信息和资源池化信息存储至数据库中;将所述状态信息为可使用状态和正在使用状态的硬件加速器设备的基本信息、状态信息和资源池化信息上报至资源管理器。
在具体实施中,如果通过cyborg-api将某个硬件加速器设备设置为维护状态,代表该硬件加速器设备处于维护模式,不可被访问,cyborg定时任务向Placement资源管理器上报资源时,会将其从上报资源中剔除,从而降低该设备对计算资源的影响,云平台管理员可通过cyborg-api查询处于维护状态的硬件加速器设备列表并更新对应的加速器设备状态。
S104:通过所述资源管理器显示所述硬件加速器设备的基本信息、状态信息和资源池化信息。
在本步骤中,通过资源管理器显示硬件加速器设备的基本信息、状态信息和资源池化信息,为云主机的创建提供有效的调度信息。作为一种可选实施方式,本步骤可以包括:通过所述资源管理器显示所述状态信息为可使用状态和正在使用状态的硬件加速器设备的基本信息、状态信息和资源池化信息。可以理解的是,由于处于维护状态的设备不能上报到资源管理器,因此在创建云主机调度加速器的时候也不能调度到,实现了有效的保护或者预留目标加速器设备。
本申请实施例提供的硬件加速器设备管理方法,对硬件加速器设备进行资源池化管理,并维护单一硬件加速器的状态信息,包括可使用状态、正在使用状态和维护状态。设置硬件加速器设备为维护状态,可以为用户特殊业务时刻保预留有效的硬件设备资源,有效提高了加速器设备的灵活性、可维护性和可操作性。
本实施例将介绍云主机的创建过程,具体的:
参见图4,根据一示例性实施例示出的另一种硬件加速器设备管理方法的流程图,如图4所示,包括:
S201:接收云主机的创建请求;其中,所述创建请求包括请求的目标加 速器设备类型,所述目标加速器设备类型包括目标物理加速器设备类型和/或目标虚拟化加速器设备类型;
在本实施例中,云主机的创建请求可以包括云主机名称、描述、资源规格、网卡信息、用户、项目等信息,请求的目标加速器设备类型是在资源规格里面设置的。Nova和Cyborg交互架构如图5所示,Nova可以请求单个物理加速器设备、多个物理加速器设备、单个虚拟化加速器设备、多个虚拟化加速器设备,也即所述目标加速器设备类型包括目标物理加速器设备类型和/或目标虚拟化加速器设备类型。当然Nova也可以直接请求某个资源池内所有的物理加速器设备。
以请求创建一个绑定GPU设备为例,创建请求的格式为:{“name”:”云主机名称”,“description”:“云主机描述信息”,“flavor”:{‘device_profile_name1’:‘DP_GPU’},“network”:“network_id”,“project_id”:“项目ID”,“user_id”:“用户ID”}。
若请求GPU设备和FPGA设备,则创建请求的格式为:{“name”:”云主机名称”,“description”:“云主机描述信息”,“flavor”:{‘device_profile_gpu’:‘DP_GPU’,‘device_profile_fpga’:‘DP_FPGA’}}。
若请求虚拟GPU设备和虚拟FPGA设备,则创建请求的格式为:{“name”:”云主机名称”,“description”:“云主机描述信息”,“flavor”:{‘device_profile_gpu’:‘DP_vGPU’,‘device_profile_fpga’:‘DP_vFPGA’}}。
若请求虚拟GPU设备和FPGA设备,则创建请求的格式为:{“name”:”云主机名称”,“description”:“云主机描述信息”,“flavor”:{‘device_profile_gpu’:‘DP_vGPU’,‘device_profile_fpga’:‘DP_FPGA’}}。
S202:确定符合所述目标加速器类型的目标物理主机,并通过所述资源管理器显示的信息获取所述目标物理主机的性能参数;
S203:基于所述目标物理主机的性能参数利用预设调度算法确定最佳目标物理主机;
S204:在所述最佳目标物理主机上启动所述云主机,以完成所述云主机的创建操作。
在本实施例中,Nova API接收到云主机的创建请求,将请求通过nova-conductor根据placement提供的资源进行解析,首先筛选出符合条件的 物理主机,若请求GPU设备和FPGA设备,即确定同时装有GPU、FPGA设备的目标物理主机。nova-scheduler通过加速器智能调度算法结合云主机CPU核数、内存大小、硬盘大小等性能参数设置每个目标物理主机的权重,根据权重对主机进行排序,取权重最高的目标物理主机作为请求云主机的最佳目标物理主机,通过调用cyborg-api建立云主机和最佳目标物理主机上配置的目标加速器设备的绑定信息,将数据库表中目标加速器设备的状态信息设置为正在使用状态(in-use),并上报到Placement资源管理器,更新设备使用情况。在该最佳目标物理主机上启动云主机,完成云主机创建操作。
对于将同一物理主机配置的不同类型的硬件加速器通过同一个资源池管理的情况,首先确定符合目标加速器类型的目标资源池,并确定目标资源池中包含的硬件加速器所属的目标物理主机。对于上述例子来说,目标资源池中同时包含GPU设备和FPGA设备。
由此可见,本实施例在创建云主机时自动选择请求的加速器设备,有效保障云平台创建加速器设备云主机的灵活性,按需规划加速器设备资源。
下面对本申请实施例提供的一种硬件加速器设备管理装置进行介绍,下文描述的一种硬件加速器设备管理装置与上文描述的一种硬件加速器设备管理方法可以相互参照。
参见图6,根据一示例性实施例示出的一种硬件加速器设备管理装置的结构图,如图6所示,包括:
分配模块601,用于在云平台中创建资源池,并将所述云平台中的硬件加速器设备分配至对应的资源池中;其中,所述硬件加速器设备包括物理加速器设备和/或虚拟化加速器设备;
获取模块602,用于获取所述云平台中硬件加速器设备的基本信息、状态信息和资源池化信息;其中,所述状态信息包括可使用状态、正在使用状态和维护状态,所述资源池化信息用于表示所述硬件加速器设备所属的资源池;
上报模块603,用于将所述硬件加速器设备的基本信息、状态信息和资源池化信息存储至数据库中并上报至资源管理器;
显示模块604,用于通过所述资源管理器显示所述硬件加速器设备的基本信息、状态信息和资源池化信息。
本申请实施例提供的硬件加速器设备管理装置,对硬件加速器设备进行资源池化管理,并维护单一硬件加速器的状态信息,包括可使用状态、正在使用状态和维护状态。设置硬件加速器设备为维护状态,可以为用户特殊业务时刻保预留有效的硬件设备资源,有效提高了加速器设备的灵活性、可维护性和可操作性。
在上述实施例的基础上,作为一种可选实施方式,所述资源池具有设备虚拟化属性,所述设备虚拟化属性用于表示所述资源池中的硬件加速器设备是否支持虚拟化,若所述设备虚拟化属性开启,则所述资源池中包含物理加速器设备和对应的虚拟化加速器设备。
在上述实施例的基础上,作为一种可选实施方式,所述分配模块601具体为在云平台中创建资源池,并将相同类型的硬件加速器分配至同一资源池中的模块。
在上述实施例的基础上,作为一种可选实施方式,所述分配模块601具体为在云平台中创建资源池,并将同一物理主机配置的硬件加速器分配至同一资源池中的模块。
在上述实施例的基础上,作为一种可选实施方式,还包括:
接收模块,用于接收云主机的创建请求;其中,所述创建请求包括请求的目标加速器设备类型,所述目标加速器设备类型包括目标物理加速器设备类型和/或目标虚拟化加速器设备类型;
第一确定模块,用于确定符合所述目标加速器类型的的目标物理主机,并通过所述资源管理器显示的信息获取所述目标物理主机的性能参数;
第二确定模块,用于基于所述目标物理主机的性能参数利用预设调度算法确定最佳目标物理主机;
启动模块,用于在所述最佳目标物理主机上启动所述云主机,以完成所述云主机的创建操作。
在上述实施例的基础上,作为一种可选实施方式,所述第一确定模块具体为确定符合所述目标加速器类型的目标资源池,确定所述目标资源池中包含的硬件加速器所属的目标物理主机,并通过所述资源管理器显示的信息获取所述目标物理主机的性能参数的模块。
在上述实施例的基础上,作为一种可选实施方式,所述上报模块603具体为将所述硬件加速器设备的基本信息、状态信息和资源池化信息存储至数据库中,将所述状态信息为可使用状态和正在使用状态的硬件加速器设备的基本信息、状态信息和资源池化信息上报至资源管理器的模块;
相应的,所述显示模块604具体为通过所述资源管理器显示所述状态信息为可使用状态和正在使用状态的硬件加速器设备的基本信息、状态信息和资源池化信息的模块。
关于上述实施例中的装置,其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述,此处将不做详细阐述说明。
基于上述程序模块的硬件实现,且为了实现本申请实施例的方法,本申请实施例还提供了一种电子设备,图7为根据一示例性实施例示出的一种电子设备的结构图,如图7所示,电子设备包括:
通信接口1,能够与其它设备比如网络设备等进行信息交互;
处理器2,与通信接口1连接,以实现与其它设备进行信息交互,用于运行计算机程序时,执行上述一个或多个技术方案提供的硬件加速器设备管理方法。而所述计算机程序存储在存储器3上。
当然,实际应用时,电子设备中的各个组件通过总线系统4耦合在一起。可理解,总线系统4用于实现这些组件之间的连接通信。总线系统4除包括数据总线之外,还包括电源总线、控制总线和状态信号总线。但是为了清楚说明起见,在图7中将各种总线都标为总线系统4。
本申请实施例中的存储器3用于存储各种类型的数据以支持电子设备的操作。这些数据的示例包括:用于在电子设备上操作的任何计算机程序。
可以理解,存储器3可以是易失性存储器或非易失性存储器,也可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(ROM,Read Only Memory)、可编程只读存储器(PROM,Programmable Read-Only Memory)、可擦除可编程只读存储器(EPROM,Erasable Programmable Read-Only Memory)、电可擦除可编程只读存储器(EEPROM,Electrically Erasable Programmable Read-Only Memory)、磁性随机存取存储器(FRAM,ferromagnetic random access memory)、快闪存储器(Flash Memory)、 磁表面存储器、光盘、或只读光盘(CD-ROM,Compact Disc Read-Only Memory);磁表面存储器可以是磁盘存储器或磁带存储器。易失性存储器可以是随机存取存储器(RAM,Random Access Memory),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(SRAM,Static Random Access Memory)、同步静态随机存取存储器(SSRAM,Synchronous Static Random Access Memory)、动态随机存取存储器(DRAM,Dynamic Random Access Memory)、同步动态随机存取存储器(SDRAM,Synchronous Dynamic Random Access Memory)、双倍数据速率同步动态随机存取存储器(DDRSDRAM,Double Data Rate Synchronous Dynamic Random Access Memory)、增强型同步动态随机存取存储器(ESDRAM,Enhanced Synchronous Dynamic Random Access Memory)、同步连接动态随机存取存储器(SLDRAM,SyncLink Dynamic Random Access Memory)、直接内存总线随机存取存储器(DRRAM,Direct Rambus Random Access Memory)。本申请实施例描述的存储器2旨在包括但不限于这些和任意其它适合类型的存储器。
上述本申请实施例揭示的方法可以应用于处理器2中,或者由处理器2实现。处理器2可能是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过处理器2中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器2可以是通用处理器、DSP,或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。处理器2可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者任何常规的处理器等。结合本申请实施例所公开的方法的步骤,可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于存储介质中,该存储介质位于存储器3,处理器2读取存储器3中的程序,结合其硬件完成前述方法的步骤。
处理器2执行所述程序时实现本申请实施例的各个方法中的相应流程,为了简洁,在此不再赘述。
在示例性实施例中,本申请实施例还提供了一种存储介质,即计算机存储介质,具体为计算机可读存储介质,例如包括存储计算机程序的存储器3, 上述计算机程序可由处理器2执行,以完成前述方法所述步骤。计算机可读存储介质可以是FRAM、ROM、PROM、EPROM、EEPROM、Flash Memory、磁表面存储器、光盘、或CD-ROM等存储器。
本领域普通技术人员可以理解:实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成,前述的程序可以存储于一计算机可读取存储介质中,该程序在执行时,执行包括上述方法实施例的步骤;而前述的存储介质包括:移动存储设备、ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。
或者,本申请上述集成的单元如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台电子设备(可以是个人计算机、服务器、或者网络设备等)执行本申请各个实施例所述方法的全部或部分。而前述的存储介质包括:移动存储设备、ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (10)

  1. 一种硬件加速器设备管理方法,其特征在于,包括:
    在云平台中创建资源池,并将所述云平台中的硬件加速器设备分配至对应的资源池中;其中,所述硬件加速器设备包括物理加速器设备和/或虚拟化加速器设备;
    获取所述硬件加速器设备的基本信息、状态信息和资源池化信息;其中,所述状态信息包括可使用状态、正在使用状态和维护状态,所述资源池化信息用于表示所述硬件加速器设备所属的资源池;
    将所述硬件加速器设备的基本信息、状态信息和资源池化信息存储至数据库中并上报至资源管理器;
    通过所述资源管理器显示所述硬件加速器设备的基本信息、状态信息和资源池化信息。
  2. 根据权利要求1所述硬件加速器设备管理方法,其特征在于,所述资源池具有设备虚拟化属性,所述设备虚拟化属性用于表示所述资源池中的硬件加速器设备是否支持虚拟化,若所述设备虚拟化属性开启,则所述资源池中包含物理加速器设备和对应的虚拟化加速器设备。
  3. 根据权利要求1所述硬件加速器设备管理方法,其特征在于,将所述硬件加速器设备分配至对应的资源池中,包括:
    将相同类型的硬件加速器分配至同一资源池中。
  4. 根据权利要求1所述硬件加速器设备管理方法,其特征在于,将所述硬件加速器设备分配至对应的资源池中,包括:
    将同一物理主机配置的硬件加速器分配至同一资源池中。
  5. 根据权利要求4所述硬件加速器设备管理方法,其特征在于,还包括:
    接收云主机的创建请求;其中,所述创建请求包括请求的目标加速器设备类型,所述目标加速器设备类型包括目标物理加速器设备类型和/或目标虚拟化加速器设备类型;
    确定符合所述目标加速器类型的目标物理主机,并通过所述资源管理器显示的信息获取所述目标物理主机的性能参数;
    基于所述目标物理主机的性能参数利用预设调度算法确定最佳目标物理主机;
    在所述最佳目标物理主机上启动所述云主机,以完成所述云主机的创建操作。
  6. 根据权利要求5所述硬件加速器设备管理方法,其特征在于,所述确定符合所述目标加速器类型的目标物理主机,包括:
    确定符合所述目标加速器类型的目标资源池,并确定所述目标资源池中包含的硬件加速器所属的目标物理主机。
  7. 根据权利要求1所述硬件加速器设备管理方法,其特征在于,将所述硬件加速器设备的基本信息、状态信息和资源池化信息存储至数据库中并上报至资源管理器,包括:
    将所述硬件加速器设备的基本信息、状态信息和资源池化信息存储至数据库中;
    将所述状态信息为可使用状态和正在使用状态的硬件加速器设备的基本信息、状态信息和资源池化信息上报至资源管理器;
    相应的,所述通过所述资源管理器显示所述硬件加速器设备的基本信息、状态信息和资源池化信息,包括:
    通过所述资源管理器显示所述状态信息为可使用状态和正在使用状态的硬件加速器设备的基本信息、状态信息和资源池化信息。
  8. 一种硬件加速器设备管理装置,其特征在于,包括:
    分配模块,用于在云平台中创建资源池,并将所述云平台中的硬件加速器设备分配至对应的资源池中;其中,所述硬件加速器设备包括物理加速器设备和/或虚拟化加速器设备;
    获取模块,用于获取所述云平台中硬件加速器设备的基本信息、状态信息和资源池化信息;其中,所述状态信息包括可使用状态、正在使用状态和维护状态,所述资源池化信息用于表示所述硬件加速器设备所属的资源池;
    上报模块,用于将所述硬件加速器设备的基本信息、状态信息和资源池化信息存储至数据库中并上报至资源管理器;
    显示模块,用于通过所述资源管理器显示所述硬件加速器设备的基本信息、状态信息和资源池化信息。
  9. 一种电子设备,其特征在于,包括:
    存储器,用于存储计算机程序;
    处理器,用于执行所述计算机程序时实现如权利要求1至7任一项所述硬件加速器设备管理方法的步骤。
  10. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1至7任一项所述硬件加速器设备管理方法的步骤。
PCT/CN2022/078281 2021-07-21 2022-02-28 硬件加速器设备管理方法、装置及电子设备和存储介质 WO2023000673A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110825190.0A CN113674131A (zh) 2021-07-21 2021-07-21 硬件加速器设备管理方法、装置及电子设备和存储介质
CN202110825190.0 2021-07-21

Publications (1)

Publication Number Publication Date
WO2023000673A1 true WO2023000673A1 (zh) 2023-01-26

Family

ID=78539758

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/078281 WO2023000673A1 (zh) 2021-07-21 2022-02-28 硬件加速器设备管理方法、装置及电子设备和存储介质

Country Status (2)

Country Link
CN (1) CN113674131A (zh)
WO (1) WO2023000673A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117234741A (zh) * 2023-11-14 2023-12-15 苏州元脑智能科技有限公司 资源管理与调度方法、装置、电子设备以及存储介质

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113674131A (zh) * 2021-07-21 2021-11-19 山东海量信息技术研究院 硬件加速器设备管理方法、装置及电子设备和存储介质
CN117560691A (zh) * 2022-08-05 2024-02-13 中国移动通信有限公司研究院 一种信息传输方法、装置、云平台、网元和存储介质
CN117389841B (zh) * 2023-12-07 2024-04-19 合芯科技(苏州)有限公司 加速器资源监控方法、装置、集群设备及存储介质

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100228695A1 (en) * 2009-03-06 2010-09-09 Boris Kaplan Computer system in which a received signal-reaction of the computer system of artificial intelligence of a cyborg or an android, an association of the computer system of artificial intelligence of a cyborg or an android, a thought of the computer system of artificial intelligence of a cyborg or an android are substantiated and the working method of this computer system of artificial intelligence of a cyborg or an android
CN110062924A (zh) * 2016-12-12 2019-07-26 亚马逊科技公司 用于虚拟化图形处理的容量预留
CN113674131A (zh) * 2021-07-21 2021-11-19 山东海量信息技术研究院 硬件加速器设备管理方法、装置及电子设备和存储介质

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103309748B (zh) * 2013-06-19 2015-04-29 上海交通大学 云游戏中的gpu虚拟资源自适应调度宿主机系统和调度方法
CN104010028B (zh) * 2014-05-04 2017-11-07 华南理工大学 一种云平台下性能加权的虚拟资源动态管理策略方法
CN105159753B (zh) * 2015-09-25 2018-09-28 华为技术有限公司 加速器虚拟化的方法、装置及集中资源管理器
CN112925634A (zh) * 2019-12-06 2021-06-08 中国电信股份有限公司 异构资源调度方法和系统
CN111736915B (zh) * 2020-06-05 2022-07-05 浪潮电子信息产业股份有限公司 云主机实例硬件加速设备的管理方法、装置、设备及介质

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100228695A1 (en) * 2009-03-06 2010-09-09 Boris Kaplan Computer system in which a received signal-reaction of the computer system of artificial intelligence of a cyborg or an android, an association of the computer system of artificial intelligence of a cyborg or an android, a thought of the computer system of artificial intelligence of a cyborg or an android are substantiated and the working method of this computer system of artificial intelligence of a cyborg or an android
CN110062924A (zh) * 2016-12-12 2019-07-26 亚马逊科技公司 用于虚拟化图形处理的容量预留
CN113674131A (zh) * 2021-07-21 2021-11-19 山东海量信息技术研究院 硬件加速器设备管理方法、装置及电子设备和存储介质

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
ANONYMOUS: "Wonderful Start, Dry Goods are Coming" Based on Functional Enhancement and Optimization of Cyborg Heterogeneous Acceletration Equipment", 19 February 2021 (2021-02-19), XP093026403, Retrieved from the Internet <URL:https://baijiahao.baidu.com/s?id=1692115800438103582&wfr=spider&for=pc> [retrieved on 20230223] *
BRINZHANG_YY: "OpenStack Hardware Management Accelerator: Cyborg", CSDN BLOG, CN, CN, pages 1 - 4, XP009543129, Retrieved from the Internet <URL:https://blog.csdn.net/bai0324lin/article/details/106983683> [retrieved on 20230315] *
CLOUDBOXER PROFESSIONAL SOFTWARE SERVICE PROVIDER: "Introduction to the Openstack Cyborg", ZHIHU, pages 1 - 4, XP009543119, Retrieved from the Internet <URL:https://zhuanlan.zhihu.com/p/346415190> [retrieved on 20230315] *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117234741A (zh) * 2023-11-14 2023-12-15 苏州元脑智能科技有限公司 资源管理与调度方法、装置、电子设备以及存储介质
CN117234741B (zh) * 2023-11-14 2024-02-20 苏州元脑智能科技有限公司 资源管理与调度方法、装置、电子设备以及存储介质

Also Published As

Publication number Publication date
CN113674131A (zh) 2021-11-19

Similar Documents

Publication Publication Date Title
WO2023000673A1 (zh) 硬件加速器设备管理方法、装置及电子设备和存储介质
WO2021051914A1 (zh) 基于gpu资源的数据处理方法、电子设备及系统
US10701139B2 (en) Life cycle management method and apparatus
CN107690622B (zh) 实现硬件加速处理的方法、设备和系统
US10404614B2 (en) Multi-cloud resource allocation
JP5510556B2 (ja) 仮想マシンのストレージスペースおよび物理ホストを管理するための方法およびシステム
CN109684065B (zh) 一种资源调度方法、装置及系统
US7765552B2 (en) System and method for allocating computing resources for a grid virtual system
CN109144710B (zh) 资源调度方法、装置及计算机可读存储介质
TWI696952B (zh) 資源處理方法及裝置
CN105653372B (zh) 基于云平台实现多虚拟化混合管理与调度的方法
US10728169B1 (en) Instance upgrade migration
CN114138405A (zh) 一种虚拟机创建方法、装置及电子设备和存储介质
CN113010265A (zh) Pod的调度方法、调度器、存储插件及系统
CN105677481B (zh) 一种数据处理方法、系统及电子设备
CN113535087B (zh) 数据迁移过程中的数据处理方法、服务器及存储系统
US10397130B2 (en) Multi-cloud resource reservations
CN116578416A (zh) 一种基于gpu虚拟化的信号级仿真加速方法
CN116010093A (zh) 数据处理方法、装置、计算机设备和可读存储介质
CN115080242A (zh) 一种pci设备资源统一调度的方法、装置及介质
US11809911B2 (en) Resuming workload execution in composed information handling system
US20220318042A1 (en) Distributed memory block device storage
WO2023274014A1 (zh) 容器集群的存储资源管理方法、装置及系统
US11954534B2 (en) Scheduling in a container orchestration system utilizing hardware topology hints
US20230081147A1 (en) System and method for a system control processor-controlled partitioning of bare-metal system resources

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22844846

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE