CN115167985A - Virtualized computing power providing method and system - Google Patents

Virtualized computing power providing method and system Download PDF

Info

Publication number
CN115167985A
CN115167985A CN202210886266.5A CN202210886266A CN115167985A CN 115167985 A CN115167985 A CN 115167985A CN 202210886266 A CN202210886266 A CN 202210886266A CN 115167985 A CN115167985 A CN 115167985A
Authority
CN
China
Prior art keywords
computing power
virtual
equipment
management
request
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210886266.5A
Other languages
Chinese (zh)
Inventor
李继平
王一静
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba China Co Ltd
Original Assignee
Alibaba China Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba China Co Ltd filed Critical Alibaba China Co Ltd
Priority to CN202210886266.5A priority Critical patent/CN115167985A/en
Publication of CN115167985A publication Critical patent/CN115167985A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45562Creating, deleting, cloning virtual machine instances
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/4557Distribution of virtual machine instances; Migration and load balancing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45595Network integration; Enabling network access in virtual machine instances

Abstract

The application discloses a virtualized computing power providing method and a virtualized computing power providing system, wherein the virtualized computing power providing method comprises the following steps: acquiring a cluster management and control request, wherein the cluster management and control request comprises a deployment request of a virtual computing power device; acquiring virtual force computing equipment from a kernel space of a computing node according to a deployment request, and generating a service program corresponding to the virtual force computing equipment in a user space of the computing node; mounting the virtual computing power device to a user container; scheduling physical resources for a target application provided by a user container in a resource set through virtual computing power equipment and a service program, wherein the physical resources and the virtual computing power equipment form a mapping relation; and acquiring the running result of the target application through the virtual computing power equipment and the service program, and sending the running result to the user container. The problem that in the prior art, due to the fact that software and hardware are different and the specification of a unified protocol to a software stack is lacked, applicability is poor is solved, and the technical effect of adapting to multiple scenes is achieved.

Description

Virtualized computing power providing method and system
Technical Field
The application relates to the field of internet technology application, in particular to a virtualized computing power providing method and system.
Background
As an emerging field, artificial intelligence is rapidly developed in recent years, and a great number of emerging algorithms and applications emerge in both academic and industrial fields. The three major elements of artificial intelligence are algorithms, computing power and data. Where the computational power is the final vehicle for implementing the algorithm. Fig. 1 is a schematic diagram of a hierarchical relationship between an AI application program and hardware in the prior art, and fig. 1 shows a hierarchical relationship between an AI application program (algorithm) and hardware (algorithm), each layer on the AI application program (algorithm) has a different implementation due to different underlying hardware, and a unified protocol is not formed to specify a software stack from top to bottom at present, so that a user needs to adapt to various different software and hardware to enable and obtain a use effect meeting a data processing requirement when using and executing the algorithm.
Aiming at the problem of poor applicability caused by different software and hardware and lack of the specification of a unified protocol on a software stack in the prior art, the problem is not effectively solved at present.
Disclosure of Invention
The embodiment of the application provides a virtualized computing power providing method and system, so as to at least solve the problem of poor applicability caused by different software and hardware and lack of specification of a unified protocol on a software stack in the prior art.
According to an aspect of the present application, there is provided a virtualized computing power providing method including: acquiring a cluster management and control request, wherein the cluster management and control request comprises a deployment request of a virtual computing power device; acquiring virtual force computing equipment from a kernel space of a computing node according to a deployment request, and generating a service program corresponding to the virtual force computing equipment in a user space of the computing node; mounting the virtual force computing equipment to a user container; scheduling physical resources for a target application provided by a user container in a resource set through virtual computing power equipment and a service program, wherein the physical resources and the virtual computing power equipment form a mapping relation; and acquiring the running result of the target application through the virtual computing power equipment and the service program, and sending the running result to the user container.
Optionally, the method further includes: and creating a virtual computing power device in the kernel space according to the deployment request.
Optionally, the virtual computing power device is provided with a uniform data interface; the scheduling of the physical resources for the target application provided by the user container in the resource set by the virtual computing power device and the service program comprises: acquiring a target application provided by a user container through a data interface provided by the virtual computing power equipment, wherein the target application comprises user data and a computing instruction of the user data; sending the target application from the virtual computing power device to the service program through the shared memory; and sending the target application to the physical resource corresponding to the virtual force computing device through the service program.
Further, optionally, the physical resource is a resource set heterogeneous physical resource or a homogeneous physical resource, and the resource set is connected to the service program through a connection pool, where a protocol in the connection pool includes: local transport protocols and network transport protocols.
Optionally, the local transport protocol includes: PCIe (PCIe); the network transport protocol includes at least one of: RDMA, TCP.
Optionally, the virtual computing power device is provided with a unified management and control interface; the method further comprises the following steps: acquiring a management and control request provided by a user container through a management and control interface of the virtual computing power device, wherein the management and control request comprises at least one of the following: the method comprises the steps of a calculation power query request, a calculation power configuration request, a transmission channel attribute configuration request, a calculation power migration request, a virtual calculation power equipment state query request and a virtual calculation power equipment state configuration request; and processing the management and control request through the service program.
Further, optionally, in a case that the management control request is a computing power configuration request, processing the computing power configuration request by the service program includes: and sending the computing power configuration request to the physical resource management node through the service program so that the physical resource management node establishes a mapping relation for the virtual computing power equipment and the physical resource.
Optionally, the cluster management and control request further includes at least one of: the method comprises the steps of inquiring the running state of the virtual computing power equipment and deleting the virtual computing power equipment; and the cluster management and control request is acquired from the cluster management and control center through the equipment plug-in.
Optionally, the cluster management and control request is managed by an equipment management and control center, and the method further includes: and detecting and storing the state of the service program and/or the equipment management and control center, and performing system recovery according to the stored state of the service program and/or the equipment management and control center under the condition that the state of the service program and/or the equipment management and control center is abnormal.
Optionally, the virtual force computing device is simulated in the form of a block device, a character device, or a network device.
According to another aspect of the present application, there is provided a virtualized computing power providing system including: the device management and control center is used for acquiring a cluster management and control request, wherein the cluster management and control request comprises a deployment request of the virtual computing power device; the equipment management and control center is further used for acquiring the virtual computing power equipment from the kernel space of the computing node according to the deployment request, and generating a service program corresponding to the virtual computing power equipment in the user space of the computing node; the equipment management and control center is also used for mounting the virtual force computing equipment to a user container; the virtual computing power equipment and the service program are used for scheduling physical resources for target application provided by the user container in the resource set, wherein the physical resources and the virtual computing power equipment form a mapping relation; the virtual computing power equipment and the service program are also used for acquiring the running result of the target application and sending the running result to the user container.
Optionally, the physical resource is a heterogeneous physical resource or a homogeneous physical resource provided by a resource set, and the resource set is connected to the service program through a connection pool, where a protocol in the connection pool includes: a local transport protocol and a network transport protocol; the local transport protocol includes: PCIe (PCIe); the network transport protocol includes at least one of: RDMA, TCP.
Optionally, the system further comprises: and the detection center is used for detecting the state of the service program and/or the equipment management and control center.
Further, optionally, the system further includes: and the equipment management file system is used for storing the state of the service program and/or the equipment management and control center so as to perform system recovery according to the stored state of the service program and/or the equipment management and control center under the condition that the state of the service program and/or the equipment management and control center is abnormal.
In the embodiment of the application, a cluster management and control request is obtained, wherein the cluster management and control request comprises a deployment request of a virtual computing power device; acquiring virtual computing power equipment from a kernel space of a computing node according to a deployment request, and generating a service program corresponding to the virtual computing power equipment in a user space of the computing node; mounting the virtual force computing equipment to a user container; scheduling physical resources for a target application provided by a user container in a resource set through virtual computing power equipment and a service program, wherein the physical resources and the virtual computing power equipment form a mapping relation; and acquiring the running result of the target application through the virtual computing power equipment and the service program, and sending the running result to the user container. The method and the device solve the problem that in the prior art, due to the fact that software and hardware are different and the software stack is lack of the standard of a unified protocol, applicability is poor, and the technical effect of adapting to multiple scenes is achieved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this application, are included to provide a further understanding of the application, and the description of the exemplary embodiments of the application are intended to be illustrative of the application and are not intended to limit the application. In the drawings:
FIG. 1 is a schematic diagram of a virtualized computational power providing system according to a first embodiment of the present application;
FIG. 2 is a schematic diagram of interaction of underlying resources with a virtual computing force device in a virtualized computing force providing system, according to a first embodiment of the present disclosure;
FIG. 3 is a schematic diagram of a system framework of a virtual computing power device in a virtualized computing power providing system according to a first embodiment of the present application;
FIG. 4 is a flowchart illustrating a virtualized computational power providing method according to a second embodiment of the present application;
fig. 5 is a schematic diagram of creating a virtual device and starting a service thread/process in a virtualized computing power providing method according to a second embodiment of the present application;
fig. 6 is a schematic diagram of computing power configuration and call flow based on a virtual device in a virtualized computing power providing method according to a second embodiment of the present application;
fig. 7 is a schematic diagram of a device management and allocation flow in a virtualized computing power providing method according to a second embodiment of the present application.
Detailed Description
It should be noted that, in the present application, the embodiments and features of the embodiments may be combined with each other without conflict. The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer-executable instructions and that, although a logical order is illustrated in the flowcharts, in some cases, the steps illustrated or described may be performed in an order different than presented herein.
The technical terms related to the embodiments of the present application are:
vXPU: virtual XPU equipment, wherein X represents computing chips with different forms, such as G represents GPU, N represents NPU, T represents TPU and the like, and X is used as a general name;
calculating the strength: computing the computing power of a chip, generally expressed by OPS, and using FLOPS to represent the computing power of a chip in the AI field;
CN: a ComputingNode compute node, generally located at the front end of a large server system;
HN: the Heterogeneous Node resource Node is generally positioned at the rear end of a large-scale server system;
MN: a ManagementNode management node;
K8S: kubernets, also known as K8S (used below), is a portable, extensible, open source platform for managing containerized workloads and services that facilitates declarative configuration and automation;
a container: a container is a lightweight application code package that also contains dependencies such as a specific version of a programming language runtime and libraries needed to run software services; the container implementation is divided into a normal container and a secure container, and compared with the normal container, the most important difference of the secure container is that each container (pod, to be precise) runs in a single micro virtual machine, has an independent operating system kernel, and has security isolation of a virtualization layer. Because the cloud container instance employs sharing of multiple groups of clusters, the security isolation of the container is more critical than if the user independently owns a private kubernets cluster. With secure containers, the kernel, computing resources, storage, and network are isolated between containers between different tenants. The resources and data of the user are protected from being preempted and stolen by other users. The present application is concerned with generic containers;
and (3) POD: the K8S scheduling minimum unit consists of one or more containers;
IO: input/Ouput Input/output;
ENTRY: the inlet of the container can be an executable body such as a script, an APP and the like;
block equipment: one of three basic device forms of the Linux system;
RDMA: remote DMA, which is called Remote direct memory access, is generated to solve the delay of server-side data processing in network transmission. The method directly transmits data from the memory of one computer to another computer without the intervention of operating systems of both sides, allows network communication with high throughput and low delay, and is particularly suitable for being used in a large-scale parallel computer cluster.
Example 1
According to an aspect of the present application, there is provided a system using a virtual computing power device, and fig. 1 is a schematic diagram of a system using a virtual computing power device according to a first embodiment of the present application, and as shown in fig. 1, a virtualized computing power providing system provided by an embodiment of the present application includes: a policing side 12, a scheduling side 14, a service side 16, and a resource set 18, wherein,
the management and control terminal 12 is configured to receive a user request, and convert request content obtained by analyzing the user request into a corresponding resource form, where the resource form includes a virtual resource and a backend entity resource; the scheduling end 14 is configured to provide virtual resources, create virtual devices according to a user request, and mount the virtual devices to a user container, where the user container is a space where a user runs a computing behavior; a scheduling end 14, configured to allocate a backend entity resource from a resource set; and the resource set is used for providing background entity resources and data transmission services.
Specifically, as shown in fig. 1, in the system using virtual computing power equipment provided in the embodiment of the present application, a management and control end 12 takes kubernets (hereinafter, referred to as K8S) as an example for description, in fig. 1, the management and control end 12 is labeled as K8S, a scheduling end 14 is labeled as MN, a service end 16 is labeled as CN, and a resource set 18 is labeled as HN. Wherein the content of the first and second substances,
the K8S is responsible for receiving the request of the user, analyzing the request content and converting the request content into a resource form which is actually needed, wherein the resource form comprises virtual resources and rear-end entity resources;
CN provides virtual resource, and loads the virtual resource to data processing task/container of CN side in the form of Linux system device; the user container is a space for the user to run calculation behaviors, and comprises compiling, reasoning, training and the like; the types of the Linux system devices in the embodiment of the present application at least include: character devices, block devices, and network devices.
MN is responsible for distributing back end entity resources from HN;
the HN provides basic services such as physical resources and data transmission, and in the embodiment of the present application, the resource set may be a hardware resource set, which is specifically expressed as the HN.
The system using the virtual computing power device provided by the embodiment of the application focuses on the implementation manner and the deployment method of the virtual device part on the CN side. The implementation and deployment method of the CN-side virtual device part are specifically as follows:
the device management and control center is used for acquiring a cluster management and control request, wherein the cluster management and control request comprises a deployment request of the virtual computing power device; the equipment management and control center is further used for acquiring the virtual computing power equipment from the kernel space of the computing node according to the deployment request, and generating a service program corresponding to the virtual computing power equipment in the user space of the computing node; the equipment management and control center is also used for mounting the virtual computing power equipment to a user container; the virtual computing power equipment and the service program are used for scheduling physical resources for target application provided by the user container in the resource set, wherein the physical resources and the virtual computing power equipment form a mapping relation; the virtual computing power equipment and the service program are also used for acquiring the running result of the target application and sending the running result to the user container.
Optionally, the physical resource is a heterogeneous physical resource or a homogeneous physical resource provided by a resource set, and the resource set is connected to the service program through a connection pool, where a protocol in the connection pool includes: a local transport protocol and a network transport protocol; the local transmission protocol comprises: PCIe; the network transport protocol includes at least one of: RDMA, TCP.
Specifically, after the application of the virtual resource and the back-end entity resource is completed, a mapping relationship is established between the virtual resource and the back-end entity resource, and the interaction between the virtual resource and the entity device and the data is controlled and performed through RDMA/TCP/PCIe communication modes or protocols. Fig. 2 is a schematic diagram of interaction between underlying resources and virtual computing power devices in a virtualized computing power providing system according to an embodiment of the present disclosure, and as shown in fig. 2, in the embodiment of the present disclosure, pooling is performed to improve resource utilization, and the underlying resources are managed uniformly through a network or other transmission means, and are scheduled and allocated by one or more control centers. Current network protocols include TCP (Transmission Control Protocol) and RDMA (Remote Direct memory access). However, the local computing devices are often connected by means of a PCIe (Peripheral Component Interconnect express) topology. That is, in the embodiment of the present application, a connection is performed with a computing terminal through a preset communication service, where the preset communication service includes: the internet communication protocol may be TCP and RDMA, and the device interface communication protocol may be PCIe in this embodiment.
Optionally, the virtualized computing power providing system provided in the embodiment of the present application further includes: and the detection center is used for detecting the state of the service program and/or the equipment management and control center.
Further, optionally, the virtualized computing power providing system provided in the embodiment of the present application further includes: and the equipment management file system is used for storing the state of the service program and/or the equipment management and control center so as to carry out system recovery according to the stored state of the service program and/or the equipment management and control center under the condition that the state of the service program and/or the equipment management and control center is abnormal.
Specifically, fig. 3 is a schematic diagram of a system framework of a virtual computing power Device in a virtualized computing power providing system according to an embodiment of the present application, an internal architecture of a server 16 is as shown in fig. 3 in a process of implementing the virtual computing power Device, and a resource form converted according to a user request is received by a service receiving management and control end 12 through a K8S management and control node service and forwarded to a virtual Device management and control and service container through a Device plug Device Plugin, where in an embodiment of the present application, a container operating environment is a virtual Device management and control and service container, and in a management and control function, a container entry is used to be responsible for starting a Device management and control center and external detection; the external detection is used for detecting the whole equipment control center, reporting to the cluster detection center when an abnormality occurs, and performing corresponding processing according to the abnormality level, such as pulling up the equipment control center again; the Device management and control center is used for docking Device Plugin Device plugins and processing Plugin requests, such as creating new virtual devices, inquiring the running state of the devices, deleting the devices and the like; the function of pulling up the internal detection is also responsible; internal detection, which is used to detect the backend service programs (processes or threads) of all devices, report to external detection when an exception occurs, and perform corresponding processing according to the exception level, such as re-pulling the device backend service; and the device management file is used for being responsible for storing the running states of the devices and the services, and when an exception occurs, such as a core dump, the program can recover the system according to the device management file.
It should be noted that the virtual device management and control and service container in the embodiment of the present application is implemented in the form of an SPDK (Storage Performance Development Kit).
As shown in fig. 3, in the IO plane of the device, the server 16 further includes: the device back-end service is characterized in that in the process of realizing the function of the virtual device, the virtual device consists of two parts, wherein one part is driven by a kernel and is responsible for receiving the request and data of a user and is a producer; the other part is equipment back-end service, which is in charge of processing the request of the user, forwarding data and the like in a user mode and is a consumer;
the kernel driver is configured to create a virtual device, and implement interaction and processing of data and commands through a memory sharing area in a form of a block device (that is, a computing power device in the embodiment of the present application, which is described as an example in the embodiment of the present application), and may also simulate the virtual computing power device in a form of a character device, or simulate the virtual computing power device using a device in a standard form (such as a virtio device).
Specifically, the device backend service includes: the management and control processing program is used for processing the requests such as computing power inquiry, configuration and the like from the equipment; and the data service program is used for being responsible for data transmission between the data processing task and the communication service.
The kernel driver comprises: data and queue management: performing content interaction with the equipment back-end service in a memory mapping mode; a management and control command: mainly receives the computing power configuration inquiry and other requests of the data processing task side and sends the requests to the back-end service program through the netlink.
In addition, fig. 3 also includes a user container, wherein the user container includes: a calculation configuration/query function, an inference/training function and a virtual example device; the user container mounts the virtual equipment created by the kernel driver to the corresponding user container through interaction with the kernel driver, and a hardware transmission network can be shielded through the form of equipment mounting, so that the user container only needs to pay attention to the configuration of calculation power and the transmission of data, and does not need to pay attention to the network configuration; user container access does not require privileged mode.
Example 2
According to an aspect of the present application, a virtualized computing power providing method is provided, and fig. 4 is a schematic flowchart of a virtualized computing power providing method according to a second embodiment of the present application, and as shown in fig. 4, a virtualized computing power providing method provided in an embodiment of the present application includes:
step S402, a cluster management and control request is obtained, wherein the cluster management and control request comprises a deployment request of virtual computing power equipment;
in the above step S402 of the present application, applied to the virtualized computing power providing system in embodiment 1, at the server side, a cluster management and control request is received through K8S, where the cluster management and control request may be an application instance request at least including an AI application program (algorithm), after the K8S receives the cluster management and control request, a corresponding virtual resource and a backend entity resource are obtained through parsing, and the server executes a service thread/process/program for creating a virtual device based on the cluster management and control request parsed by K8S, that is, step S404.
Step S404, acquiring virtual force computing equipment from a kernel space of the computing node according to the deployment request, and generating a service program corresponding to the virtual force computing equipment in a user space of the computing node;
optionally, the virtualized computing power providing method provided in the embodiment of the present application further includes: and creating a virtual computing power device in the kernel space according to the deployment request.
Specifically, in the system shown in fig. 1 in embodiment 1, the K8S management and control node service receiving management and control end forwards the resource form converted according to the deployment request to the virtual Device management and control and service container through the Device Plugin Device plug-in, analyzes the deployment request through the virtual Device management and service container, creates a virtual Device, and in the form of a computing power Device in the memory drive, through the memory sharing area, implements interaction and processing of data and commands, and creates a virtual computing power Device.
Fig. 5 is a schematic diagram of creating a virtual device and starting a service thread/process in a virtualized computing power providing method according to a second embodiment of the present application, and as shown in fig. 5, creating a virtual computing power device and a user container in a kernel space according to a deployment request may be:
s1, starting an equipment control center program/SPDKAPP;
s2, is it a user request to create a virtual device?
S3, sending a request to a kernel space driver of the computing node, and creating virtual equipment for vring;
s4, registering a user account Admin management and control Poller to an SPDChector;
s5, registering an IO processing Poller to an SPDChector;
and S6, updating the equipment list.
Step S406, mounting the virtual force computing equipment to a user container;
specifically, as shown in fig. 3 in embodiment 1, at the CN end, after a virtual device is created in the memory drive, a virtual computing example device is mapped and generated in a corresponding user container, so that the virtual computing force device is mounted to the user container through interaction between the virtual device and the virtual computing force device.
Step S408, scheduling physical resources for the target application provided by the user container in the resource set through the virtual computing power equipment and the service program, wherein the physical resources and the virtual computing power equipment form a mapping relation;
specifically, based on the user container in step S406, the corresponding physical resource is scheduled for the target application through the virtual computing power device and the service program.
And step S410, acquiring the operation result of the target application through the virtual force computing equipment and the service program, and sending the operation result to the user container.
Specifically, through interaction between the virtual computing power device and the service program and the user container, after the target application is completed, the running result is returned to the user container.
Fig. 6 is a schematic diagram of a device management and control and allocation flow in a method using a virtual computing power device according to a second embodiment of the present application, and in conjunction with steps S402 to S410, a virtualized computing power providing method according to the second embodiment of the present application may be:
s1, K8S requests resources;
s2, the Device plug sends a request to a virtual Device management and control process;
s3, the virtual device management and control process creates a virtual device;
s4, the virtual device management and control process creates a service thread/process of the virtual device;
s5, the virtual equipment management and control process returns a Device plug/K8S;
s6, mounting the virtual equipment to a user container by K8S;
s7, mounting the virtual equipment to a user container by the K8S;
and S8, using the virtual equipment by the user container.
Optionally, the virtual computing power device is provided with a uniform data interface; the scheduling of the physical resources for the target application provided by the user container in the resource set by the virtual computing power device and the service program comprises: acquiring a target application provided by a user container through a data interface provided by the virtual computing force equipment, wherein the target application comprises user data and a computing instruction of the user data; sending the target application from the virtual computing power device to the service program through the shared memory; and sending the target application to the physical resource corresponding to the virtual force computing device through the service program.
The virtualized computing power providing method provided by the embodiment of the application can be applied to cloud application scenes, the computing power requirement instances can be instances for realizing various scenes and various requirements, for example, a data query request sent by a user through a user terminal is executed in the cloud according to the data query request, a data query result is returned to the user terminal, and the whole data query process can be the computing power requirement instances.
Specifically, the sending, by the virtual computing power device and the service program, the computing power demand instance provided by the user container to the physical resource corresponding to the virtual computing power device includes, as shown in fig. 3 in embodiment 1, scheduling the physical resource for the target application provided by the user container in the resource set includes: acquiring a target application provided by a user container through a data interface provided by the virtual computing power equipment, wherein the target application comprises user data and a computing instruction of the user data; sending the target application from the virtual computing power device to the service program through the shared memory; and sending the target application to the physical resource corresponding to the virtual force computing device through the service program.
Further, optionally, the physical resource is a resource set heterogeneous physical resource or a homogeneous physical resource, and the resource set is connected to the service program through a connection pool, where a protocol in the connection pool includes: local transport protocols and network transport protocols.
Optionally, the local transport protocol includes: PCIe (PCIe); the network transport protocol includes at least one of: RDMA, TCP.
Specifically, after the application of the virtual resource and the back-end entity resource is completed, a mapping relationship is established between the virtual resource and the back-end entity resource, and the interaction between the virtual resource and the entity device and the data is controlled and performed through RDMA/TCP/PCIe communication modes or protocols. As shown in fig. 2 in embodiment 1, the purpose of pooling in this embodiment is to improve resource utilization, and the resources are scheduled and allocated by one or several management centers by uniformly managing the underlying resources through a network or other transmission means. Current network protocols include TCP (Transmission Control Protocol) and RDMA (Remote Direct memory access). However, the local computing devices are often connected by means of a PCIe (Peripheral Component Interconnect express) topology. That is, in the embodiment of the present application, a preset communication service is connected to a computing terminal, where the preset communication service includes: the internet communication protocol may be TCP and RDMA, and the device interface communication protocol may be PCIe in this embodiment.
Optionally, the virtual computing power device is provided with a uniform management and control interface; the virtualized computing power providing method provided by the embodiment of the application further includes: acquiring a management and control request provided by a user container through a management and control interface of the virtual computing power device, wherein the management and control request comprises at least one of the following: the method comprises the steps of a calculation power query request, a calculation power configuration request, a transmission channel attribute configuration request, a calculation power migration request, a virtual calculation power equipment state query request and a virtual calculation power equipment state configuration request; and processing the management and control request through the service program.
Further, optionally, in a case that the management control request is a computing power configuration request, processing the computing power configuration request by the service program includes: and sending the computing power configuration request to the physical resource management node through the service program so that the physical resource management node establishes a mapping relation for the virtual computing power equipment and the physical resource.
Specifically, fig. 7 is a schematic diagram of a computing power configuration and call flow based on a virtual device in a virtualized computing power providing method according to a second embodiment of the present application, and as shown in fig. 7, the computing power configuration and call flow based on the virtual device in the embodiment of the present application may be:
s1, starting a program by a user;
s2, preparing computing resources, data and the like by a user;
s3, a user calls a calculation force operation interface;
s4, calling a calculation force operation interface, and configuring calculation force attributes by a management and control interface of the virtual equipment;
s5, calling a computational power operation interface, and configuring the attribute of the transmission channel by using a control interface of the virtual equipment;
s6, calling a computational power operation interface, and transmitting data through a data path interface of the virtual equipment;
and S7, calling the computational power operation interface, and acquiring an operation result by the data path interface of the virtual equipment.
Optionally, the cluster management and control request further includes at least one of the following: the method comprises the steps of inquiring the running state of the virtual computing power equipment and deleting the virtual computing power equipment; and the cluster management and control request is acquired from the cluster management and control center through the equipment plug-in.
Optionally, the cluster management and control request is managed by an equipment management and control center, and the virtualized computing power providing method provided in the embodiment of the present application further includes: and detecting and storing the state of the service program and/or the equipment management and control center, and performing system recovery according to the stored state of the service program and/or the equipment management and control center under the condition that the state of the service program and/or the equipment management and control center is abnormal.
Optionally, the virtual force computing device is simulated in the form of a block device, a character device, or a network device.
In the embodiment of the present application, a block device is used as an example for description, and in addition, a virtual computing power device may be simulated in a form of a character device, or a device in a standard form (for example, a virtio device) is used for simulating the virtual computing power device, so that interaction and processing of data and commands are realized through a memory sharing area.
In addition, in the virtualized computing power providing method provided in the embodiment of the present application, the virtual computing power device interface is defined as follows:
virtual computing power operation interface
Force operation (pseudo code):
1. obtaining the number of computing power devices
vxpu_get_xpu_count(fd,&count);
2. Obtaining an calculated force attribute
vxpu_get_feature(fd,vxpu_attr,id);
3. Setting computational power attributes
vxpu_set_feature(fd,vxpu_attr,id);
And (3) management and control operation:
1. initializing vxpu devices
vxpu_init(vxpu_name)
2. Releasing vxpu devices
vxpu_deinit(vxpu_name)
3. Applying for memory for vxpu
buf=vxpu_malloc(size)
4. Releasing memory applied by vxpu
vxpu_free(buf)
Read-write operation (synchronization):
1. opening vxpu device
fd=vxpu_open(“/dev/vxpu”,flags,mode)
2. Reading data from vxpu devices
vxpu_read(fd,buf,count,offset,direct)
3. Writing data to vxpu devices
vxpu_write(fd,buf,count,offset,direct)
Vxpu management and control operation
vxpu_admin(fd,opcode,flag,buf,size,direction,opaque)
5. Shutting down vxpu devices
vxpu_close(fd)
Read-write operation (asynchronous):
1. configuring asynchronous context
vxpu_io_setup(unsignednr_events,aio_context_t*ctx_idp);
2. Submitting a request
vxpu_io_submit(aio_context_t ctx_id,long nr,struct iocb**iocbpp);
3. Obtaining results
vxpu_io_getevents(aio_context_t ctx_id,long min_nr,long nr,
struct io_event*events,struct timespec*timeout);
4. Destroy context
vxpu_io_destroy(aio_context_t ctx_id);
5. Canceling request
vxpu_io_cancel(aio_context_t ctx_id,struct iocb*iocb,
struct io_event*result);
The virtual computing power equipment realized in the virtualized computing power providing method provided by the embodiment of the application can be suitable for natural adaptation cloud native scenes, virtual equipment service runs in a container mode, and interacts with cluster management and control centers such as K8S through equipment plug-ins; the virtual equipment can be created and inquired at any time and can be taken for use at any time; in addition, theoretically, a single node can support thousands of instances, and by taking the shared memory to read and write 32MB as an example (the size of virtual equipment), for a 512GB memory system, 512GB/2/64MB =4096 instances can be theoretically provided, and a hardware transmission network can be shielded in a device mounting mode, so that a user container only needs to pay attention to the calculation power configuration and data transmission, and does not need to pay attention to the network configuration; user container access does not require privileged mode; the back-end service can be used for butting local physical computing resources and can also be used for butting network transmission service for pooling so as to adapt to multiple scenes.
In the embodiment of the application, a cluster management and control request is obtained, wherein the cluster management and control request comprises a deployment request of a virtual computing power device; acquiring virtual force computing equipment from a kernel space of a computing node according to a deployment request, and generating a service program corresponding to the virtual force computing equipment in a user space of the computing node; mounting the virtual computing power device to a user container; scheduling physical resources for a target application provided by a user container in a resource set through virtual computing power equipment and a service program, wherein the physical resources and the virtual computing power equipment form a mapping relation; and acquiring the running result of the target application through the virtual computing power equipment and the service program, and sending the running result to the user container. The method and the device solve the problem that in the prior art, due to the fact that software and hardware are different and the software stack is lack of the standard of a unified protocol, applicability is poor, and the technical effect of adapting to multiple scenes is achieved.
Example 3
According to another aspect of the present application, there is also provided an electronic device comprising a memory and a processor; wherein the memory is configured to store one or more computer instructions, wherein the one or more computer instructions are executed by the processor to implement the method steps of embodiment 2 described above.
The embodiment of the application can provide an electronic device, which can be any one electronic device in an electronic device group. Alternatively, in this embodiment, the electronic device may be replaced with a terminal device such as a mobile terminal.
Optionally, in this embodiment, the electronic device may be located in at least one network device of a plurality of network devices of a computer network.
Optionally, the electronic device may include: one or more processors, and a memory.
The memory may be configured to store software programs and modules, such as program instructions/modules corresponding to the load balancing method and apparatus in the embodiments of the present application, and the processor executes various functional applications and data processing by running the software programs and modules stored in the memory, that is, implementing the load balancing method. The memory may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory may further include memory located remotely from the processor, and these remote memories may be connected to the electronic device through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The processor can call the information and the application program stored in the memory through the transmission module to execute the following steps: acquiring a cluster management and control request, wherein the cluster management and control request comprises a deployment request of a virtual computing power device; acquiring virtual force computing equipment from a kernel space of a computing node according to a deployment request, and generating a service program corresponding to the virtual force computing equipment in a user space of the computing node; mounting the virtual force computing equipment to a user container; scheduling physical resources for a target application provided by a user container in a resource set through virtual computing power equipment and a service program, wherein the physical resources and the virtual computing power equipment form a mapping relation; and acquiring the running result of the target application through the virtual computing power equipment and the service program, and sending the running result to the user container.
Optionally, the processor may further execute the program code of the following steps: and creating a virtual computing power device in the kernel space according to the deployment request.
Optionally, the processor may further execute the program code of the following steps: the virtual computing power equipment is provided with a uniform data interface; wherein scheduling physical resources for a target application provided by a user container in a resource set comprises: acquiring a target application provided by a user container through a data interface provided by the virtual computing power equipment, wherein the target application comprises user data and a computing instruction of the user data; sending the target application from the virtual computing power device to the service program through the shared memory; and sending the target application to the physical resource corresponding to the virtual force computing device through the service program.
Further optionally, the processor may further execute the program code of the following steps: the physical resources are resource set heterogeneous physical resources or isomorphic physical resources, the resource set is connected with the service program through a connection pool, wherein a protocol in the connection pool comprises: a local transport protocol and a network transport protocol.
Optionally, the processor may further execute the program code of the following steps: the local transmission protocol comprises: PCIe; the network transport protocol includes at least one of: RDMA, TCP.
Optionally, the processor may further execute the program code of the following steps: the virtual computing power equipment is provided with a uniform management and control interface; acquiring a management and control request provided by a user container through a management and control interface of the virtual computing power device, wherein the management and control request comprises at least one of the following: the method comprises the steps of a calculation power query request, a calculation power configuration request, a transmission channel attribute configuration request, a calculation power migration request, a virtual calculation power equipment state query request and a virtual calculation power equipment state configuration request; and processing the management and control request through the service program.
Further, optionally, the processor may further execute the program code of the following steps: in a case where the management control request is a computing power configuration request, processing the computing power configuration request by the service program includes: and sending the computing power configuration request to the physical resource management node through the service program so that the physical resource management node establishes a mapping relation for the virtual computing power equipment and the physical resource.
Optionally, the processor may further execute the program code of the following steps: the cluster governance request further comprises at least one of: the method comprises the steps of inquiring the running state of the virtual computing power equipment and deleting the virtual computing power equipment; and the cluster management and control request is acquired from the cluster management and control center through the equipment plug-in.
Optionally, the processor may further execute the program code of the following steps: the cluster management and control request is managed by an equipment management and control center, and the method further comprises the following steps: and detecting and storing the state of the service program and/or the equipment management and control center, and performing system recovery according to the stored state of the service program and/or the equipment management and control center under the condition that the state of the service program and/or the equipment management and control center is abnormal.
Optionally, the processor may further execute the program code of the following steps: the virtual computing force device is simulated in the form of a block device, a character device, or a network device.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
In the above embodiments of the present invention, the description of each embodiment has its own emphasis, and reference may be made to the related description of other embodiments for parts that are not described in detail in a certain embodiment.
In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one type of logical functional division, and other divisions may be implemented in practice, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed coupling or direct coupling or communication connection between each other may be an indirect coupling or communication connection through some interfaces, units or modules, and may be electrical or in other forms.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit may be implemented in the form of hardware, or may also be implemented in the form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk, which can store program codes.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (14)

1. A virtualized computing power providing method, comprising:
acquiring a cluster management and control request, wherein the cluster management and control request comprises a deployment request of a virtual computing power device;
acquiring virtual computing power equipment from a kernel space of a computing node according to the deployment request, and generating a service program corresponding to the virtual computing power equipment in a user space of the computing node;
mounting the virtual computing power device to a user container;
scheduling physical resources for a target application provided by the user container in a resource set through the virtual computing power device and the service program, wherein the physical resources and the virtual computing power device form a mapping relation;
and acquiring the running result of the target application through the virtual computing equipment and the service program, and sending the running result to the user container.
2. The virtualized computing power providing method of claim 1, wherein the method further comprises:
and creating the virtual computing power equipment in the kernel space according to the deployment request.
3. The virtualized computing power providing method according to claim 1, wherein the virtual computing power device is provided with a unified data interface; wherein scheduling, by the virtual computing power device and the service program, physical resources for a target application provided by the user container in a resource set comprises:
acquiring a target application provided by the user container through the data interface provided by the virtual computing power device, wherein the target application comprises user data and computing instructions of the user data;
sending the target application from the virtual computing power device to the service program through a shared memory;
and sending the target application to a physical resource corresponding to the virtual computing power equipment through the service program.
4. The virtualized computing power providing method according to claim 3, wherein the physical resource is a resource set heterogeneous physical resource or a homogeneous physical resource, the resource set and the service program are connected through a connection pool, wherein a protocol in the connection pool comprises: local transport protocols and network transport protocols.
5. The virtualized computing power providing method of claim 4 wherein the local transport protocol comprises: PCIe (PCIe); the network transport protocol includes at least one of: RDMA, TCP.
6. The virtualized computing power providing method according to claim 1, wherein the virtual computing power device is provided with a unified management and control interface; the method further comprises the following steps:
obtaining, by the management interface of the virtual computing power device, a management request provided by the user container, wherein the management request includes at least one of: the method comprises the steps of a calculation power query request, a calculation power configuration request, a transmission channel attribute configuration request, a calculation power migration request, a virtual calculation power equipment state query request and a virtual calculation power equipment state configuration request;
and processing the management and control request through the service program.
7. The virtualized computing power providing method according to claim 6, wherein, in a case where the governing request is the computing power configuration request, processing the computing power configuration request by the service program comprises:
and sending the computing power configuration request to a physical resource management node through the service program so that the physical resource management node can establish a mapping relation for the virtual computing power equipment and the physical resource.
8. The virtualized effort delivery method of claim 1, wherein the cluster governance request further comprises at least one of: the method comprises the steps of inquiring the running state of the virtual computing power equipment and deleting the virtual computing power equipment; and the cluster management and control request is acquired from a cluster management and control center through an equipment plug-in.
9. The virtualized computing power providing method according to any one of claims 1 to 8, wherein the cluster management control request is managed by an appliance management center, the method further comprising:
and detecting and storing the state of the service program and/or the equipment management and control center, and performing system recovery according to the stored state of the service program and/or the equipment management and control center under the condition that the state of the service program and/or the equipment management and control center is abnormal.
10. The virtualized computing force providing method according to any one of claims 1 to 8, wherein the virtual computing force device is simulated in the form of a block device, a character device, or a network device.
11. A virtualized computing power providing system, comprising:
the device management and control center is used for acquiring a cluster management and control request, wherein the cluster management and control request comprises a deployment request of the virtual computing power device;
the equipment management and control center is further used for acquiring virtual computing power equipment from a kernel space of a computing node according to the deployment request, and generating a service program corresponding to the virtual computing power equipment in a user space of the computing node;
the equipment management and control center is also used for mounting the virtual force computing equipment to a user container;
the virtual computing power equipment and the service program are used for scheduling physical resources for target applications provided by the user container in a resource set, wherein the physical resources and the virtual computing power equipment form a mapping relation;
the virtual computing power equipment and the service program are further used for obtaining the running result of the target application and sending the running result to the user container.
12. The virtualized computing power providing system of claim 11 wherein the physical resource is a heterogeneous physical resource or a homogeneous physical resource provided by a set of resources connected to the service by a connection pool, wherein protocols in the connection pool include: a local transport protocol and a network transport protocol; the local transport protocol comprises: PCIe; the network transport protocol includes at least one of: RDMA, TCP.
13. The virtualized computing power providing system of claim 11, wherein the system further comprises:
and the detection center is used for detecting the state of the service program and/or the equipment management and control center.
14. The virtualized computing power providing system of claim 13, wherein the system further comprises:
the equipment management file system is used for storing the state of the service program and/or the equipment management and control center so as to judge whether the state of the service program and/or the equipment management and control center is abnormal according to the stored state of the service program and/or the stored state of the equipment management and control center
Or the state of the equipment management and control center is recovered by the system.
CN202210886266.5A 2022-07-26 2022-07-26 Virtualized computing power providing method and system Pending CN115167985A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210886266.5A CN115167985A (en) 2022-07-26 2022-07-26 Virtualized computing power providing method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210886266.5A CN115167985A (en) 2022-07-26 2022-07-26 Virtualized computing power providing method and system

Publications (1)

Publication Number Publication Date
CN115167985A true CN115167985A (en) 2022-10-11

Family

ID=83496474

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210886266.5A Pending CN115167985A (en) 2022-07-26 2022-07-26 Virtualized computing power providing method and system

Country Status (1)

Country Link
CN (1) CN115167985A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116610270A (en) * 2023-07-21 2023-08-18 湖南马栏山视频先进技术研究院有限公司 Video processing calculation and separation method and video calculation and separation system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116610270A (en) * 2023-07-21 2023-08-18 湖南马栏山视频先进技术研究院有限公司 Video processing calculation and separation method and video calculation and separation system
CN116610270B (en) * 2023-07-21 2023-10-03 湖南马栏山视频先进技术研究院有限公司 Video processing calculation and separation method and video calculation and separation system

Similar Documents

Publication Publication Date Title
US11010681B2 (en) Distributed computing system, and data transmission method and apparatus in distributed computing system
US10908926B2 (en) Plug-in management wrappers
US11716264B2 (en) In situ triggered function as a service within a service mesh
CN109117252B (en) Method and system for task processing based on container and container cluster management system
CN103780655A (en) Message transmission interface task and resource scheduling system and method
US10795646B2 (en) Methods and systems that generate proxy objects that provide an interface to third-party executables
US11321090B2 (en) Serializing and/or deserializing programs with serializable state
US20200167713A1 (en) Business processing method, apparatus, device and system using the same, and readable storage medium of the same
CN112256414A (en) Method and system for connecting multiple computing storage engines
CN110245029A (en) A kind of data processing method, device, storage medium and server
CN103677983A (en) Scheduling method and device of application
CN115167985A (en) Virtualized computing power providing method and system
CN113407353B (en) Method and device for using graphics processor resources and electronic equipment
WO2022109932A1 (en) Multi-task submission system based on slurm computing platform
WO2019117767A1 (en) Method, function manager and arrangement for handling function calls
CN110266787B (en) Hybrid cloud management system and method and computer equipment
CN113793246B (en) Method and device for using graphics processor resources and electronic equipment
CN115964128A (en) Heterogeneous GPU resource management and scheduling method and system
US20220405104A1 (en) Cross platform and platform agnostic accelerator remoting service
Campos et al. The chance for Ada to support distribution and real-time in embedded systems
CN113037812A (en) Data packet scheduling method and device, electronic equipment, medium and intelligent network card
US11537425B2 (en) Methods for application deployment across multiple computing domains and devices thereof
CN116629382B (en) Method, device and system for docking HPC cluster by machine learning platform based on Kubernetes
CN117056029B (en) Resource processing method, system, device, storage medium and electronic equipment
CN109901826B (en) Data processing method and device for Java program and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination