CN116594761A - Method and apparatus for managing resources of a computing device - Google Patents

Method and apparatus for managing resources of a computing device Download PDF

Info

Publication number
CN116594761A
CN116594761A CN202310124985.8A CN202310124985A CN116594761A CN 116594761 A CN116594761 A CN 116594761A CN 202310124985 A CN202310124985 A CN 202310124985A CN 116594761 A CN116594761 A CN 116594761A
Authority
CN
China
Prior art keywords
application
computing
resources
resource manager
computing device
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310124985.8A
Other languages
Chinese (zh)
Inventor
奥斯卡·P·平托
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Samsung Electronics Co Ltd
Original Assignee
Samsung Electronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US17/941,002 external-priority patent/US20230259404A1/en
Application filed by Samsung Electronics Co Ltd filed Critical Samsung Electronics Co Ltd
Publication of CN116594761A publication Critical patent/CN116594761A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/445Program loading or initiating
    • G06F9/44594Unloading
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5022Mechanisms to release resources

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Stored Programmes (AREA)

Abstract

A method and apparatus for managing resources of a computing device are provided. The method may comprise: allocating resources of the computing device to the application using the programming interface; tracking the resource using a resource manager; and determining an operation of the application using the resource manager. The method may further comprise: the state of at least a portion of the resource is modified by a resource manager based on the step of determining the operation of the application. The operation of the application may include modification of the execution of the application. The modification may be based on an execution state (e.g., an effective execution state) of the application. The method may further comprise: based on the step of determining the operation of the application, execution of the application is transferred to a mechanism for controlling the application. The method may further comprise: based on the step of determining the operation of the application, a mechanism for monitoring the operation of the application is performed.

Description

Method and apparatus for managing resources of a computing device
The present application claims priority and benefit from U.S. provisional patent application Ser. No. 63/309,511, filed on day 11, 2, 2022, U.S. provisional patent application Ser. No. 63/355,089, filed on day 23, 6, 2022, U.S. patent application Ser. No. 63/346,817, filed on day 27, 5, 2022, and U.S. patent application Ser. No. 17/941,002, filed on day 8, 2022, all of which are incorporated herein by reference.
Technical Field
The present disclosure relates generally to computing devices and, more particularly, to systems, methods, and apparatus for managing resources of computing devices.
Background
The data processing system may provide one or more storage resources to enable applications to store input data, intermediate data, output data, and the like. For example, an application may access one or more local storage devices and/or remote storage devices that may be located at a host, storage server, storage node, or the like. Applications such as data mapping, graphics processing, machine learning, etc., may involve the use of increased storage.
The above information disclosed in this background section is only for enhancement of understanding of the background of the invention principles and therefore it may contain information that does not form the prior art.
Disclosure of Invention
A method may include: allocating resources of the computing device to the application using the programming interface; tracking the resource using a resource manager; and determining an operation of the application using the resource manager. The method may further comprise: based on determining the operation of the application, the state of at least a portion of the resource is modified by a resource manager. The operation of the application may include modification of the execution of the application. The modification may be based on the execution state of the application. The execution states may include valid execution states. The method may further comprise: based on determining the operation of the application, execution of the application is transferred to a mechanism for controlling the application. The method may further comprise: based on determining the operation of the application, a mechanism for monitoring the operation of the application is performed. The method may further comprise: the resource is cleaned by a resource manager based on determining the operation of the application. The method may further comprise: based on determining the operation of the application, the status of the request from the application is modified by the resource manager. The request may be a queued request. The request may be a pending request. The resource may be one of a computing engine, a computing execution environment, a computing device function, or a memory. The application may be one of an application, a virtual machine, a hypervisor, a container, or a container platform. The step of tracking may be performed at least in part by the host. The step of tracking may be performed at least in part by a computing device. The resource may be a first resource, the computing device may be a first computing device, and the method may further include: the method further includes allocating, using the programming interface, a second resource of the second computing device to the application, and tracking, using the resource manager, the second resource. The method may further comprise: based on determining the operation of the application, the state of at least a portion of the second resource is modified by the resource manager.
An apparatus may include: at least one processor configured to: resources of the computing device are allocated to the application using the programming interface, tracked using the resource manager, and operations of the application are determined using the resource manager. The at least one processor may be configured to: based on the operation of the application, a resource manager is used to modify the state of at least a portion of the resource. The at least one processor may be configured to: based on the operation of the application, the resource is cleaned up using a resource manager. The at least one processor may be configured to: based on the operation of the application, a resource manager is used to modify the state of the request from the application. The device may be a host. The device may be a computing device.
An apparatus may include: computing resources; and at least one processor configured to: the computing resources are provided to the application using the programming interface, tracked using the resource manager, and the operation of the application is determined using the resource manager. The at least one processor may be configured to: computing resources are allocated to the applications. The at least one processor may be configured to: based on the operation of the application, a resource manager is used to modify the state of at least a portion of the resource. The at least one processor may be configured to: based on the operation of the application, the resource is cleaned up using a resource manager. The at least one processor may be configured to: based on the operation of the application, a resource manager is used to modify the state of the request from the application. The at least one processor may be configured to: the resource manager is caused to operate at least in part.
Drawings
The figures are not necessarily to scale and elements of similar structure or function may be represented generally by like reference numerals or parts thereof throughout the figures for illustrative purposes. The drawings are only intended to facilitate the description of the various embodiments described herein. The drawings do not depict every aspect of the teachings disclosed herein and do not limit the scope of the claims. To prevent the figures from becoming unclear, not all components, connections, etc. may be shown and not all components may have reference numerals. However, the mode of the component configuration can be readily made clear from the drawings. The accompanying drawings illustrate example embodiments of the present disclosure and, together with the description, serve to explain the principles of the disclosure.
FIG. 1 illustrates an embodiment of a computing device solution according to a disclosed example embodiment.
Fig. 2 illustrates an embodiment of an architecture for a computing device in accordance with a disclosed example embodiment.
FIG. 3 illustrates an embodiment of a resource management scheme for a computing device in accordance with a disclosed example embodiment.
Fig. 4 illustrates some example implementation details of an embodiment of a resource management scheme for a computing device according to example embodiments of the disclosure.
FIG. 5 illustrates an embodiment of a computing resource manager in accordance with a disclosed example embodiment.
Fig. 6 illustrates an example embodiment of a host device according to an example embodiment of the disclosure.
FIG. 7 illustrates an example embodiment of a computing device that may be used to provide access to one or more computing resources to a user through a programming interface, in accordance with the disclosed example embodiments.
FIG. 8 illustrates an embodiment of a method for computing resource management of a computing device, according to a disclosed example embodiment.
Detailed Description
A Computing Device (CD) may implement one or more functions that may perform operations on data. The host may offload processing tasks to the computing device by invoking functionality that may be implemented by the device. The computing device may perform this function, for example, using one or more computing resources. The computing device may perform this function on data that may be stored at the device and/or data that may be received from a host or another device.
A computing device may include one or more resources that may be allocated to and/or used by an application (such as a program, virtual Machine (VM), container, etc.). Examples of resources may include memory, storage devices, computing resources, computing functions, and the like. Resources may be allocated, for example, using an Application Programming Interface (API). However, once a resource is allocated to and/or used by an application, one or more conditions may be generated that may prevent the resource from being effectively used.
For example, if an application exits unconditionally (which may also be referred to as a crash), the resources that have been allocated to the application may become unavailable to other applications. Furthermore, even if an application exits conditionally (which may be referred to as a normal exit or a controlled exit), the application may not be able to release the resources that have been allocated to the application before the exit (or during the exit process). Thus, the resource may become unusable by other applications. As another example, an application may exit (e.g., unconditionally) when a request (e.g., a command) initiated by the application to a computing device may be queued and/or outstanding. A computing device having resources allocated to a terminated application may not know that the application has exited and may continue to process requests, wasting resources.
A resource management scheme according to a disclosed example embodiment may track one or more computing device resources allocated to one or more applications. Depending on implementation details, this may enable the resource management scheme to release one or more of the allocated resources when they are no longer used (e.g., when an application exits, when a container stops, when a VM closes, etc.). Tracking one or more allocated resources may also enable a resource management scheme to implement one or more security features (or functions) (such as clearing released device memory to protect confidential information of an application to which the confidential information is allocated).
Tracking one or more computing device resources allocated to one or more applications may also enable a resource management scheme according to a disclosed example embodiment to cancel one or more queued requests and/or complete one or more outstanding requests if a requesting application terminates. For example, if an application submits a command to a commit queue and the application exits before the computing device to which the command is directed completes processing the request, the resource management scheme may cancel the request (e.g., if the computing device has not yet begun processing the request) and/or complete the request (e.g., in an error state) by placing the corresponding completion in a completion queue (e.g., if the computing device has begun processing the request).
In some embodiments, resource management schemes according to disclosed example embodiments may implement trap mechanisms (trap mechanisms) and/or debug hooks (debug hooks). For example, in some embodiments, the trap mechanism may gain control over the operation of an application based on actions such as an exit of the application (e.g., conditionally and/or unconditionally), a stop of a container, a shutdown of a VM, and so forth. As another example, in some embodiments, debug hooks may trigger execution of profiling code to understand resource flows and/or determine remedial actions through a resource management scheme.
In some embodiments, a resource management scheme in accordance with the disclosed example embodiments may implement group policies, e.g., to enable one or more policies (e.g., for cleaning up freed memory) to be applied across one or more applications (such as programs, containers, VMs, etc.). In some embodiments, a resource management scheme according to the disclosed example embodiments may record one or more actions (e.g., freeing resources, cleaning memory, triggering traps or debug hooks, etc.), errors, etc. to a system log, a user application, etc. In some embodiments, a resource management scheme according to the disclosed example embodiments may operate across any number of computing devices having any number and/or type of computing device resources.
The present disclosure encompasses many inventive principles related to managing resources of a computing device. The principles disclosed herein may have independent utility and may be implemented separately and not every embodiment may utilize every principle. Furthermore, the principles may be implemented in various combinations, some of which may synergistically amplify some of the benefits of the various principles. For example, some embodiments may implement multiple complementary features (such as freeing resources, flushing freed memory, dequeued requests, and/or outstanding requests to complete an exiting application) based on tracking one or more computing resources allocated to one or more applications.
For purposes oF illustration, some embodiments may be described in the context oF a Computing Storage (CS) architecture, programming model, computing storage API, etc., and/or storage protocol (such as non-volatile memory express (NVMe), NVMe over network (NVMe-oh), computing express link (CXL), etc.) provided by the Storage Network Industry Association (SNIA). However, the principles are not limited to use with computing storage architectures, SNIA architectures, programming models, and/or API, NVMe, NVMe-ob, CXL protocols, or any other implementation details disclosed herein, and may be applied to any computing scheme, system, method, apparatus, device, etc.
FIG. 1 illustrates an embodiment of a computing device solution according to a disclosed example embodiment. The embodiment shown in fig. 1 may include one or more hosts (e.g., host 1) 101-1, … …, (e.g., host N) 101-N (which may be referred to individually or collectively as 101) and one or more computing devices 102 connected by a communication fabric (communication fabric) 103. The host 101 may include one or more device drivers (e.g., computing device drivers) 115. The device driver 115 may enable the host 101 to interact with the respective computing device 102. In some embodiments, the API 116 may provide an interface (e.g., an abstract interface) that enables the host 101 to access one or more computing resources of the computing device 102 as described below. For example, the API 116 may provide one or more mechanisms for discovering, configuring, and/or allocating computing resources of the computing device 102.
Computing device 102 may include device storage 104, device memory 105, computing resources 106, device controller 107, input and/or output (I/O or IO) interfaces 108, and/or management interfaces 109.
The computing resources 106 may include one or more Computing Engines (CEs) 110, and the one or more Computing Engines (CEs) 110 may provide (e.g., run) one or more computing execution environments (CEs) 111 that may in turn execute (e.g., run) one or more Computing Device Functions (CDFs) 112. Computing resources 106 may also include a resource store 113, which resource store 113 may include one or more computing device functions 112 and/or one or more computing execution environments 111 that are unassigned. The computing resources 106 may also include a functional data store (FDM) 114.
Examples of the one or more compute engines 110 may include a Central Processing Unit (CPU), such as a Complex Instruction Set Computer (CISC) processor (e.g., an x86 processor), and/or a Reduced Instruction Set Computer (RISC) processor, such as an ARM processor, a Graphics Processor (GPU), a Field Programmable Gate Array (FPGA), an application specific circuit (ASIC), a Neural Processor (NPU), a Tensor Processor (TPU), a Data Processor (DPU), or the like, or any combination thereof.
Examples of the one or more computing execution environments 111 may include an operating system (e.g., linux), a sandbox (sadbox) and/or virtual machine within an operating system (e.g., an extended berkeley packet filter (eBPF) environment), a container platform (e.g., a container engine), a bitstream environment (e.g., a bitstream environment for an FPGA), etc., or any combination thereof.
Examples of computing device functions 112 may include any type of accelerator function, compression and/or decompression, database filters, encryption and/or decryption, erasure coding, regular expressions (RegEx), scatter-gather, hash (hash) computations, cyclic Redundancy Check (CRC), data deduplication (duplicate), redundant Array of Independent Drives (RAID), etc., or any combination thereof. In some embodiments, the computing device functions 112 may be provided by the computing device 102, downloaded by the host 101, or the like, or any combination thereof. For example, in some embodiments, one or more of the computing device functions 112 may be loaded into the computing device 102 as it is manufactured, shipped, installed, updated and/or upgraded (e.g., by firmware update and/or upgrade), and the like. In some embodiments, the functionality may be referred to as a program, for example, in the context of executable computing device functionality 112 that may be downloaded.
The embodiment shown in fig. 1 may enable host 101 to offload processing operations to computing device 102. For example, in some embodiments, an application 117 running on the host 101 may use the API 116 to request one or more computing resources (such as the computing engine 110, the computing execution environment 111 for running on the computing engine 110, and the computing device functions 112 for running in the environment). The application 117 may also request an amount of functional data storage 114 for use by the computing device function 112.
If the requested resources are available, the API 116 may allocate the requested resources to the application 117. For example, the API 116 may assign the entire physical compute engine 110 to the application 117. Alternatively or additionally, the API 116 may allocate the time-shared portion of the physical compute engine 110, the VM running on the compute engine 110, to the application 117. As another example, the API 116 may allocate a portion of the functional data store 114 (indicated as allocated FDM 126) to an application for use by the allocated computing engine 110 and/or the computing execution environment 111.
In some embodiments, the resource repository 113 may include reference copies of one or more computing execution environments 111 and/or one or more computing device functions 112. To assign the computing execution environment 111 or computing device function 112 to the application 117, the API 116 may instantiate a reference copy of the computing execution environment 111 or computing device function 112 (e.g., create a working copy of the reference copy of the computing execution environment 111 or computing device function 112) and load it into the assigned computing engine 110 and/or computing execution environment 111.
In some embodiments, functional data store 114 may be implemented with a memory that may be separate from device memory 105. Alternatively or additionally, the functional data store 114 may be implemented at least in part with the device memory 105. To the extent that the functional data store 114 may be implemented with the device memory 105, the functional data store 114 may include data structures (e.g., a mapping table) that may enable the API 116, applications, assigned computing engines 110, assigned computing execution environments 111, assigned computing device functions 112, etc. to determine which portion of the device memory 105 has been assigned to the application 117.
The device memory 105 and/or the functional data memory 114 may be implemented with volatile memory, such as Dynamic Random Access Memory (DRAM) and/or Static Random Access Memory (SRAM), non-volatile memory including flash memory, persistent memory, such as cross-grid non-volatile memory, memory with bulk resistance changes, phase Change Memory (PCM), and the like, or any combination thereof.
One or more hosts 101 may be implemented with any component or combination of components that may utilize computing resources 106 of computing device 102. For example, host 101 may be implemented with a server (such as a computing server, a storage server, a web server, a cloud server, etc.), a node (such as a storage node), a computer (such as a workstation), a personal computer, a tablet, a smart phone, etc., or multiples and/or combinations thereof.
The one or more computing devices 102 may be implemented with one or more of any type of device, such as an accelerator device, a storage device (e.g., a computing storage device), a network device (e.g., a Network Interface Card (NIC)), a memory expansion and/or buffer device, a Graphics Processor (GPU), a Neural Processor (NPU), a Tensor Processor (TPU), etc., or multiples and/or combinations thereof. In some embodiments, the compute storage device may be implemented as a Compute Storage Drive (CSD), a Compute Storage Processor (CSP), and/or a Compute Storage Array (CSA).
The device controller 107 may be implemented with any type of controller that may be suitable for the type of computing device 102. For example, if the computing device 102 is implemented as an SSD, the device controller 107 may be implemented as a storage device controller that may include a Flash Translation Layer (FTL).
The management interface 109 may include any type of functionality for discovering, monitoring, configuring, and/or updating the computing device 102. For example, in embodiments where computing device 102 communicates using NVMe protocols, management interface 109 may implement NVMe management interface (NVMe-MI) protocols.
The communication fabric 103 may be implemented with one or more interconnects, one or more networks, a network of networks (e.g., the internet), etc., or any combination thereof, using any type of interface and/or protocol. For example, the communication fabric 103 may be implemented with peripheral component interconnect express (PCIe), NVMe over network (NVMe-oh), ethernet, transmission control protocol/internet protocol (TCP/IP), direct Memory Access (DMA), remote DMA (RDMA), RDMA Over Converged Ethernet (ROCE), fibre channel, infiniband, serial ATA (SATA), small Computer System Interface (SCSI), serial Attached SCSI (SAS), iWARP, computing fast link (CXL) and/or coherence protocol (such as cxl.mem, cxl.cache, cxl.io, etc.), gen-Z, open coherence accelerator processor interface (opencaps), cache coherence interconnect for accelerators (CCIX), etc., advanced extensible interface (AXI), any generation wireless network (including 2G, 3G, 4G, 5G, 6G, etc.), any generation Wi-Fi, bluetooth, near Field Communication (NFC), etc., or any combination thereof. In some embodiments, the communication fabric 103 may include one or more switches, hubs, nodes, routers, and the like.
For example, in embodiments where computing device 102 may be implemented as a storage device, I/O interface 108 may implement a storage protocol (such as NVMe) that may enable host 101 and computing device 102 to exchange commands, data, etc. through communication fabric 103.
Fig. 2 illustrates an embodiment of an architecture for a computing device in accordance with a disclosed example embodiment. Although not limited to any particular use, the architecture shown in fig. 2 may be used, for example, with the computing device approach and/or components shown in fig. 1. In some aspects, one or more of the elements shown in fig. 2 may be similar to corresponding elements in fig. 1, and may be indicated by reference numerals ending with the same numerals.
Referring to fig. 2, the api architecture may be implemented using an Operating System (OS) 218 running on a host 201, and the host 201 may communicate with the computing device 202 using a communication fabric 203. The operating system 218 may include a kernel space 219 and a user space 220.
The API library 221 and one or more applications 222-1, 222-2, and/or 222-3 (which may be referred to individually or collectively as 222) may run in the user space 220. Examples of the one or more applications 222 may include a storage application, a cloud computing application, a data analysis application, and the like. In some embodiments, an application adapter 223 may run in the user space 220 and translate input and/or output between applications 222 and/or between one or more applications 222 and the API library 221.
The device driver 215 may run in the kernel space 219 and may provide a software interface that may enable the OS 218, applications 222, API library 221, etc. to access one or more hardware functions of the computing device 202. Thus, in some embodiments, the device driver 215 may partially or fully manage the computing device 202 for the OS 218.
In some embodiments, plug-ins 225 may run in user space 220 and enable API library 221 and/or applications 222 to communicate with computing device 202 and/or device driver 215. For example, in some embodiments, plug-in 225 may be implemented with device-specific code that may process requests from application 222 and/or API library 221 by mapping (e.g., forwarding) the requests to device driver 215. Thus, in some embodiments, the API library 221 may use different plug-ins (e.g., FPGA plug-ins, NVMe plug-ins, etc.) to interface to different device drivers for different types of computing devices and/or interface technologies. According to implementation details, plug-in 225 may be implemented in relatively simple code that may be easily created by a computing device vendor (e.g., manufacturer, vendor, etc.) to communicate with the computing device and operate within the framework of API library 221.
Although the embodiment shown in FIG. 2 is shown with some specific elements in kernel space 219 and user space 220, in other embodiments, any of the elements may be implemented in different types of OS spaces. For example, in some embodiments, some or all of the API library 221 and/or plug-ins 225 may run partially or completely in the kernel space 219. Furthermore, although the embodiment shown in fig. 2 may be shown with only one host 201 and/or one computing device 202, any number of additional hosts 201 and/or computing devices 202 may be connected through the communication structure 203, and any of the hosts 201 may use the API library 221 to access any of the computing devices 202.
In some embodiments, the API library 221 may provide an interface (e.g., an abstract interface) that may implement one or more mechanisms to discover, configure, allocate, utilize, etc., the resources (e.g., computing resources) 206 of the computing device 202 to enable one or more applications 222 to offload processing operations to the computing device 202. Thus, in some embodiments, the API architecture shown in fig. 2 may be used to enable the application 117 shown in fig. 1 to access the computing resources 106 of the computing device 102 shown in fig. 1.
Referring again to fig. 2, in some embodiments, one or more applications 222 may connect to the computing device 202 through an API library 221 connectable to the device driver 215. Different applications 222 may use the computing device 202 in different ways (e.g., for different use cases) to offload computing tasks to the computing resources 206 of the computing device 202. Depending on implementation details, this may improve performance, for example, by providing faster processing, lower latency, etc. In some embodiments, the API library 221 may provide a transparent mechanism that may present applications 222 with the same or similar interfaces to the computing device 202, for example, even when communications between the applications 222 and the computing device 202 cross network connection boundaries.
However, in some embodiments, applications 222 that have been allocated one or more of computing resources 206 may behave in a manner that may prevent efficient use of resources 206. For example, if application 222 exits unconditionally (e.g., crashes), application 222 may not release resources 206 that have been allocated to it. Thus, depending on implementation details, one or more of the resources 206 (e.g., the functional data store 214, the one or more compute engines 210, etc.) that have been allocated to the application 222 may become unavailable to other applications. (in some embodiments, this may be referred to as a hold-up resource.)
Furthermore, in some embodiments, if an application 222 crashes (e.g., when computing device function 212 is running in computing execution environment 211 on computing engine 210), it may leave computing device function 212, computing execution environment 211, and/or computing engine 210 in an indeterminate state and thus unusable by other applications.
As another example, programming practices established for application 222 may include releasing resources that have been allocated to the application prior to termination of the application (or as part of an exit process). However, in some embodiments, the API library 221 or an operator of the host 201 may not be able to apply specific programming practices to the application 222. Thus, even if the application 222 conditionally exits (e.g., performs a normal exit or a controlled exit), the application 222 may not release one or more resources allocated to it prior to the exit (or during the exit process). Thus, after the application 222 exits, the resources allocated to the application 222 may become unavailable to other applications.
In some embodiments, memory resources may be particularly susceptible to the potential problems described above. The memory allocated to the application 222 may be marked as being in use by the API library 221 (e.g., indicated as allocated FDM). If the application 222 does not release memory (whether conditional or unconditional) prior to or during exit, the allocated memory may become unusable by other applications, creating one or more memory holes (holes). Ultimately, this may deny adequate (e.g., most or all) access to the functional data store 214 and/or the device memory 205, which may render the computing device 202 unusable.
Additional potential inefficiencies may result when the application 222 exits (e.g., conditionally or unconditionally) when a request (e.g., an NVMe command) is queued (e.g., waiting for processing in a commit queue) and/or pending (e.g., currently being processed). In such a case, computing resource 206 may not be aware that application 222 has exited, and thus may begin and/or continue processing requests, wasting resources.
Because resources may be consumed by one or more applications to which they have been allocated, any of these potential problem situations may result in denial of service for computing device 202 even though the resources may not be in use. In some embodiments, one or more of the recovery of the unavailable resources may involve a software reset, a system reset (e.g., a total system reset), etc. of the computing device 202. However, resetting the computing device 202 may destroy one or more other applications 222, hosts 201, computing devices 202, etc. Further, the API library 221 and/or any standards (e.g., SNIA), protocols (e.g., NVMe), etc. may not provide a mechanism for resetting the computing device 202. While some standards and/or protocols may provide one or more mechanisms to enable APIs (e.g., API libraries) and/or computing devices to discover, allocate, configure, and/or manage resources, they may not provide mechanisms to manage (e.g., release) resources based on the manner in which the resources are available to applications after they are allocated.
In some embodiments, one or more of these potential problem situations may increase the difficulty of debugging the application 222 (e.g., during a concept-verifying (PoC) launch). Further, in some embodiments, one or more of these potential problem situations may cause data (e.g., sensitive data or confidential data) from the application 222 to remain in the allocated functional data store, queue, or the like. Depending on implementation details, this may present a security risk.
FIG. 3 illustrates an embodiment of a resource management scheme for a computing device in accordance with a disclosed example embodiment. The embodiment shown in fig. 3 may include one or more computing devices 302, one or more applications 327, a programming interface 316, and/or a computing resource manager 328. In some embodiments, the computing resource manager 328 may be at least partially included in the programming interface 316 (e.g., as part of an API library). However, in some other embodiments, the computing resource manager 328 may be separate from the programming interface 316. For example, in some embodiments, the computing resource manager 328 may be at least partially included in one or more computing devices 302. One or more computing devices 302 may communicate with programming interface 316 and/or one or more applications 327 through communication structure 303.
In some embodiments, the computing resource manager 328 may track one or more resources 306 of one or more computing devices 302 (e.g., after the one or more resources 306 of the one or more computing devices 302 have been allocated to the application 327). Depending on implementation details, this may enable the computing resource manager 328 to determine the use of one or more resources 306 by the application 327. In some embodiments, based on the use of one or more resources 306 by the application 327, the computing resource manager 328 may perform one or more actions to manage the computing resources 306.
Additionally or alternatively, the computing resource manager 328 may track one or more resources of one or more hosts (e.g., after the one or more resources of the one or more hosts on which the one or more applications 327 may run have been allocated to the application 327) and perform one or more actions to manage the one or more resources of the one or more hosts based on the use of the one or more resources of the one or more hosts by the application 327.
For example, in some embodiments, the computing resource manager 328 may determine that an application may have terminated (e.g., may have conditionally or unconditionally exited, may have become frozen, or otherwise at least partially unresponsive and/or out of service) without releasing one or more resources that have been allocated to the application (e.g., program, container, VM, etc.), thereby rendering the one or more resources unavailable. Based on this type of condition, the computing resource manager 328 may release (e.g., deallocate) at least some of the one or more unavailable resources so that they may be used by additional applications.
As another example, the computing resource manager 328 may determine that an application may have terminated when one or more requests from the application are queued and/or outstanding. Based on this type of condition, the computing resource manager 328 may cancel one or more queued requests and/or complete one or more outstanding requests. For example, if the application 327 has submitted a command to a commit queue (e.g., an NVMe commit queue) and the application terminates before the computing device 302 to which the command is directed begins processing the request (e.g., the request is still present in the commit queue), the computing resource manager 328 may cancel the request, e.g., by removing the request from the queue and/or notifying the application. As another example, if the application 327 has submitted a command to a commit queue (e.g., an NVMe commit queue) and the application terminates after the computing device 302 to which the command is directed begins processing the request but before the computing device 302 has completed processing the request (e.g., has read the request from the commit queue but the corresponding completion has not been placed in a completion queue), the computing resource manager 328 may complete the request (e.g., with an error state) by placing the corresponding completion in the completion queue, for example.
As another example, based on determining that an application may have terminated without releasing allocated resources, clearing queued requests, clearing outstanding requests, etc., the computing resource manager 328 may execute a policy (e.g., a group policy) that may clean up one or more memory resources associated with the application. For example, the computing resource manager 328 may clean (e.g., fill with predetermined data values) any device memory and/or content of functional data memory that may have been allocated to the terminated application, as well as any queues, buffers, etc. that may contain information for the terminated application.
In some embodiments, the computing resource manager 328 may track one or more anomalies associated with an application, for example, by implementing a trap mechanism to gain control of the application when the application fails. Depending on implementation details, this may enable exception handlers associated with traps to free up resources that have been allocated to an application.
In some embodiments, the computing resource manager 328 may implement a debug hook to track one or more resources that have been allocated to an application. For example, the computing resource manager 328 may load profiling code (e.g., when an application is loaded), which may aid in understanding and/or managing the use or resources of the application based on capturing one or more code execution points.
In some embodiments, any feature implemented by the computing resource manager 328 may be implemented independently of any other feature. Thus, the computing resource manager 328 may implement resource tracking without traps and/or debug hooking mechanisms, and vice versa. However, some embodiments may combine one or more of the possible features of the computing resource manager 328 to achieve synergistic results.
Fig. 4 illustrates some example implementation details of an embodiment of a resource management scheme for a computing device according to example embodiments of the disclosure. The embodiment shown in fig. 4 may be used, for example, to implement the embodiment shown in fig. 3. In some aspects, the embodiment shown in fig. 4 may include some elements that may be similar to corresponding elements in fig. 1 and 2 and may be indicated by reference numerals ending with the same numerals. The example implementation details described with respect to fig. 4 are for illustration purposes, and some embodiments may not include all or any of the example implementation details shown in fig. 4.
The embodiment shown in fig. 4 may include one or more hosts 401 and one or more computing devices 402 with computing resources 406. One or more hosts 401 and one or more computing devices 402 may communicate using a communication fabric 403. Host 401 may include an operating system 418 having a kernel space 419 and/or a user space 420.
The programming interface library 421 may run in the user space 420 along with one or more types of applications and/or accompanying support components (e.g., application adapters). For example, one or more programs 422-1, 422-2, … …, 422-N (which may be referred to individually or collectively as 422) may interface to the programming interface library 421 directly and/or through an application adapter (e.g., program adapter) 423. As another example, one or more VMs 429-1, 429-2, … …, 429-N (which may be referred to individually or collectively as 429) may be connected to the programming interface library 421 directly and/or through the hypervisor 430. As another example, one or more containers 431-1, 431-2, … …, 431-N (which may be referred to individually or collectively as 431) may be connected to the programming interface library 421 directly and/or through a container platform (e.g., container engine) 432.
The first portion 428a of the computing resource manager may be at least partially included in the user space 420, for example, at least partially as part of the programming interface library 421 as shown in fig. 4. The second portion 428b of the computing resource manager may be at least partially included in the computing device 402 as shown in fig. 4. The first portion 428a of the computing resource manager and the second portion 428b of the computing resource manager may be referred to, individually or collectively, as the computing resource manager 428. In some embodiments, any portion of computing resource manager 428 may be implemented (e.g., run) in any suitable location (e.g., anywhere in kernel space 419 (e.g., as part of a device driver, service, etc.), anywhere in user space 420 (e.g., as part of a library, application adapter, etc.), and/or anywhere in computing device 402 or other devices communicating through communication fabric 403.
In some embodiments, the computing resource manager 428 may track and/or manage the computing resources 406 of one or more computing devices 402 at different levels. For example, in some embodiments, the computing resources 406 may be tracked at the level of the respective programs 422, e.g., to prevent one application from rendering the resources 406 unusable by another application, to prevent confidential data of one application from being accessed by another application, and so forth.
As another example, in some embodiments, computing resources 406 may be tracked and/or managed at the level of the VM. This may be useful, for example, where one or more applications running on VM 429 may need to access data from one or more other applications running on VM 429. Thus, when the VM 429 is closed, the computing resource manager 428 may free up resources 406 that may have been allocated to the VM 429 and/or one or more applications running on the VM 429. In some embodiments, when the VM 429 is closed, the computing resource manager 428 may clear (e.g., cancel and/or complete) one or more requests from one or more applications running on the VM 429 that may be queued and/or pending.
As another example, in some embodiments, computing resources 406 may be tracked and/or managed at the level of container 431 and/or container platform (e.g., container engine) 432. This may be useful, for example, where one or more applications running in container 431 or container platform 432 may need to access data from one or more other applications running in container 431 and/or container platform 432. Thus, when container 431 and/or container platform 432 stop, computing resource manager 428 may release resources 406 that may have been allocated to container 431, container platform 432, and/or one or more applications running in container 431 and/or container platform 432. In some embodiments, when container 431 or container platform 432 stops, computing resource manager 428 may clear (e.g., cancel and/or complete) one or more requests from one or more applications running in container 431 and/or container platform 432 that may be queued and/or pending.
In some embodiments, the computing resources 406 may be tracked and/or managed at any hierarchical combination. For example, in some embodiments, the computing resources 406 of one or more applications running in the first VM 429-1 may be tracked and/or managed at the application level (e.g., individually), while the computing resources 406 of one or more applications running in the second VM 429-2 may be tracked and/or managed at the VM level (e.g., collectively).
FIG. 5 illustrates an embodiment of a computing resource manager in accordance with a disclosed example embodiment. The embodiment shown in fig. 5 may be used, for example, to implement the computing resource manager 428 as described with respect to fig. 4, and may be described with reference to some components of the scheme shown in fig. 4. The computing resource manager 528 illustrated in fig. 5 is not limited to any particular implementation details. However, for purposes of illustration, the computing resource manager 528 may be any number of the following types of logic for implementing any number of the following features.
(1) In some embodiments, computing resource manager 528 may include tracking logic 533 to track devices and/or host computing resources 406 assigned to one or more applications. For example, in some embodiments, in response to a resource request from an application, the programming interface library 421 may return the requested resource to the application and/or an allocation handler (handler) to identify the resource. The handler may include details of the resource (such as device handlers, memory segment handlers, etc.), for example. In some embodiments, the programming interface library 421 may maintain a list or other data structure to track resources (e.g., any computing resources) 406 across one or more computing devices 402. For example, when one or more resources are allocated to an application, the programming interface library 421 may add the handler of the resource to the list, and when one or more resources are released (e.g., released by the application), the programming interface library 421 may remove the corresponding handler from the list.
Such a list of tracking resources may be maintained, for example, by an existing API library (e.g., SNIA computing storage API) and may be used and/or adapted by a computing resource management scheme according to the disclosed example embodiments to track computing resources 406 that computing resource manager 528 may release when an application or other application terminates (conditionally or unconditionally) without releasing computing resources 406 that may have been allocated to it. Thus, in some embodiments, and in accordance with implementation details, a computing resource management scheme in accordance with the disclosed example embodiments may be integrated into an existing API in a collaborative manner.
In some embodiments of tracking logic 533, computing resource 406 may be tracked using any number of the following techniques. (a) The allocated device memory range may be tracked and/or represented by an offset (e.g., an offset of a starting address or other location) and an amount of memory (e.g., an allocated number of bytes). The memory range may be derived, for example, from one or more memory devices that computing device 402 may expose. (b) Computing resources, such as the computing engine 410 (e.g., CPU, FPGA, GPU, ASIC, DPU, etc.), may be tracked based on a managed state (e.g., activated, deactivated, powered down, etc.). (c) Memory resources (e.g., private memory resources in host memory, device memory, functional data memory, etc.) may be marked, for example, by one or more modules in a path (e.g., a path including a plug-in) of the programming interface library 421, which programming interface library 421 may store additional context information that may create a memory hole if not released. (d) Computing Device Functions (CDFs) 412 may be tracked, for example, based on the source of the functions (e.g., whether the functions are built-in or downloaded to computing device 402). In some embodiments, the handler of computing device function 412 may include information (e.g., regarding the source, state, queued IOs, outstanding IOs, configurations, errors, etc. of computing device function 412).
In some embodiments of tracking logic 533, tracking of computing resources 406 may be performed on a per-computing device handler basis. Multiple handlers may provide individual tracking details (e.g., through a resource), wherein opaque handlers may map back to the actual resource. Tracking one or more (e.g., all) of the computing resources may provide details that may enable the computing resource manager 528 to release resources allocated to an application or other application, which may terminate (e.g., conditionally or unconditionally) without releasing their all allocated resources.
(2) In some embodiments, the computing resource manager 528 may include exception logic 534, and the exception logic 534 may implement one or more trap mechanisms to track application exceptions, for example, when unclean (e.g., unconditional) exits. In some embodiments, exception logic 534 may install an exception handler (e.g., during initialization of programming interface library 421). The exception handler may gain control of the application when the application fails. In some embodiments, installing the exception handler may include subscribing to a signaled exception handler of the operating system.
The type of exception handler that signals may depend on the type of operating system. For example, for some operating systems, one or more exception handlers that may be installed may enable exception logic 534 to obtain control of an application based on one or more different types of definitions of a crash. In some embodiments, some exception handlers, once installed, may transfer control of the state of an application prior to the application terminating. In such embodiments, implementing the exception window prior to termination of the application may enable the exception logic 534 to release computing resources 406 that have been allocated to the application but not released by the application.
In some embodiments, the programming interface library 421 may be implemented, at least in part, as modules to which the program 422 and/or other applications may be linked. Thus, the programming interface library 421 may be loaded prior to an application, and because, for example, traps installed by the exception logic 534 may be implemented as system traps instead of application traps, traps implemented by the programming interface library 421 (e.g., execution handlers) may be invoked prior to any traps loaded by the program 422 and/or other applications.
(3) In some embodiments, the computing resource manager 528 may include policy logic 535 (e.g., group policy logic), and the policy logic 535 may flush some or all of the memory resources that may be freed by the computing resource manager 528. Depending on implementation details, this may facilitate cleaning up (e.g., for security purposes) the contents of host memory, device memory 405, functional data memory 414, etc., that may have been allocated to applications and/or released by computing resource manager 528. For example, if an application is terminated (e.g., the application unconditionally crashes and exits), the data of the application may remain in one or more memory resources allocated to the application but not deleted, overwritten or otherwise cleaned up prior to termination, and not released by the application. For example, if an unauthorized application or other user gains access to one or more memory resources, the data remaining in the one or more memory resources may represent a security risk. In some embodiments, policy logic 535 may implement a policy that may clean one or more memory resources of a terminated application, for example, before computing resource manager 528 may return the one or more memory resources to a memory pool in computing resource 406. In some embodiments, the scrubbing may involve filling the freed memory with a repeating data pattern (e.g., all zeros). In some embodiments, policy logic 535 may implement a policy that cleans up memory as it is allocated, e.g., after an application requests the memory resource, but before the memory resource is returned to the application.
(4) In some embodiments, the computing resource manager 528 may include debug logic 536, and the debug logic 536 may implement one or more debug hooks. For example, debug logic 536 may load one or more pieces of parse code that may be used to observe and/or understand the flow of application code and/or computing resources at various points in the application code based on one or more traps (e.g., debug hooks) arriving at various points in the application code. In contrast to an execution trap, which may inform computing resource manager 528 of a trap, a debug hook may inform debug logic 536 of a trap occurring in a particular portion of code.
This may facilitate the computing resource manager 528 to track and/or release resources that may have been allocated to an application, depending on implementation details. For example, if an application is executing a three-tier pipelined data processing algorithm, and three-tier data becomes corrupted (e.g., due to a crash), debugging hooks the executable profiling code, which may enable the application and/or debugging logic 536 to determine that the data processing of all three tiers may be reversed to eliminate the corrupted data. Depending on implementation details, this may help the computing resource manager 528 track one or more computing resources 406 of one or more computing devices 402 that may have been allocated to an application.
(5) In some embodiments, the computing resource manager 528 may include request clearing logic 537, the request clearing logic 537 may determine that an application may have terminated when queued and/or outstanding from one or more requests applied to one or more computing devices 402. For example, if the application terminates while the request is still in the commit queue, the request cleanup logic 537 may cancel the request. As another example, if an application terminates when a request is processed by one or more computing resources 406 of one or more computing devices 402, the request cleanup logic 537 may complete the request (e.g., with an error state) by placing the corresponding completion in a completion queue. In embodiments where the commit and/or completion queues may be implemented with an NVMe queue, the NVMe subsystem may automatically place errors in the completion queue in response to the request being canceled by the request clearing logic 537.
In any of the embodiments disclosed herein, any computing resources (e.g., 106, 206, 306, and/or 406) can be implemented using one or more namespaces (e.g., NVMe namespaces). For example, in some embodiments, one or more (e.g., each) of the computing engines (e.g., 110, 210, and/or 410) may be configured to operate with or as a respective computing namespace. According to implementation details, the use of namespaces may facilitate implementation of a computing resource manager in a virtualized environment in accordance with disclosed example embodiments. For example, one or more (e.g., each) of the VMs 429 shown in fig. 4 may be configured to use a different namespace.
As mentioned above, any of the computing resource managers 328, 428, and/or 528 may be implemented at one or more hosts, one or more computing devices, or any combination thereof. For example, in some embodiments, the computing resource manager may be implemented entirely or almost entirely at the host. In such embodiments, any or all of the logic described with respect to fig. 5 may be implemented by a computing resource manager (e.g., 328, 428, and/or 528) that may be at least partially included within a programming interface and/or programming interface library (e.g., 316 and/or 421) that may be located between one or more applications (e.g., 327, 422, 429, 430, 431, and/or 432) and one or more computing devices (e.g., 102, 202, 302, and/or 402). In such embodiments, the computing resource manager may manage some or all of the computing resources (e.g., 106, 206, 306, and/or 406), (e.g., when an application (e.g., conditionally or unconditionally) terminates, becomes frozen, or becomes unresponsive and/or ceases to function), setting debug hooks and/or traps mechanisms for the application at least in part. The computing resource manager may also track some or all of the allocation of computing resources by some or all of the applications and/or computing devices, and release resources that may have been allocated to the applications and/or clear requests submitted by the applications, for example, when the applications terminate.
However, in some embodiments, one or more portions of the functionality of the computing resource manager (e.g., portion 428b shown in fig. 4) may be implemented in one or more computing devices 402. For example, in some embodiments, a portion 428b of the computing resource manager implemented at the computing device 402 may at least partially execute discovery, configuration, allocation, tracking, release, etc. of some or all of the computing resources 406 of the computing device 402 for one or more applications (such as the program 422, VM 429, and/or container 431). In some embodiments, this may be described as offloading additional processing from host 401 to computing device 402. Depending on implementation details, computing device 402 may be able to implement one or more features of computing resource manager 428 more efficiently than host 401. In some embodiments, the computing device 402 may include at least one processor, and the at least one processor may cause, at least in part, the computing resource manager 428 to operate.
In embodiments in which the computing resource manager 428 may be implemented at least in part at the computing device 402, the portion 428b of the computing resource manager may perform one or more operations to support computing resource management performed at the computing device 402. However, depending on implementation details, a portion 428b of the computing resource manager implemented at the computing device 402 may not be aware of the operating system environment at the host 401, the multi-tenant (multi-tency) at the host 401, the context of applications that may be connected to the computing device 402 through the programming interface library 421, and so on. Furthermore, if some portion 428b of the computing resource manager is implemented at the computing device 402, one or more trap mechanisms and/or debug hooks may be implemented at the host 401. Thus, in some embodiments, any number of these features may be offloaded to a portion 428b of a computing resource manager implemented at computing device 402. For example, in some embodiments, computing device 402 may be connected into a host and/or host operating system context, an application context (e.g., a program context), and so forth. Thus, for example, the programming interface library 421 may provide "the context of the application and instructions and/or requests to allocate one or more resources 406 of the computing device 402 to the application" to the computing resource manager 428b at the computing device 402. In some embodiments, the context may include one or more elements that may run in or utilize a computing execution environment 411, such as one or more computing device functions 412, one or more memory resources (e.g., device memory 405, allocated FDM 426, etc.).
Table 1 illustrates an example embodiment of pseudocode that may be used to track one or more computing resources in accordance with a disclosed example embodiment. The embodiments shown in table 1 may be used, for example, to implement any of the computing resource managers disclosed herein. For example, the pseudocode shown in Table 1 may be invoked by a computing resource manager to add one or more resources to a tracking list when it is assigned to an application (such as a program).
Table 2 illustrates an example embodiment of pseudocode that may be used to track one or more computing resources in accordance with a disclosed example embodiment. The embodiments shown in table 2 may be used, for example, to implement any of the computing resource managers disclosed herein. For example, the pseudocode shown in table 2 may be invoked by a computing resource manager to release and/or remove one or more resources from the tracking list, e.g., when an application (such as a program) terminates (e.g., crashes). In some embodiments, pseudocode may also be invoked when an application normally exits. For example, an operating system hook may be installed to capture an exit, which in turn may invoke backtracking (unwind).
For purposes of illustration, the pseudocode shown in tables 1 and 2 may be described in the context of a compute storage API provided by SNIA, where CSx may refer to a compute storage device (e.g., a Compute Storage Processor (CSP), a Compute Storage Drive (CSD), and/or a Compute Storage Array (CSA)), CSF may refer to a compute storage function, CSE may refer to a compute storage engine, and CSEE may refer to a compute storage execution environment. The inventive principles may be applied to any other type of resource management scheme for any type of computing device.
TABLE 1
Pseudo code for tracking CS device resources
1 Tracking ()
2 {
3 For (For) each open CSx (OpenCSx) request
4 Creating CSx tracking list
5 Adding CSx handlers to the header of a list
6 For each open/allocate request
7 Creating a handler and attaching to a CSx tracking list
8 Tracking parent/child relationships
9 If (If) memory resource
10 Tracking resources in a memory sub-list
11 If computing resources
12 Tracking resources in a compute sub-list
13 If the state is managed
14 Tracking states in sub-lists
15 If CSF resources
16 Tracking resources in sub-lists
17 If the resource request is released
18 Looking up a handler in a tracking list
19 Freeing resources and removing handlers from lists
20 Release process
21 }
TABLE 2
In some embodiments, a computing resource management scheme according to the disclosed example embodiments may transparently track computing device resources, for example, within a programming interface (e.g., API) library. Depending on implementation details, such an approach may be implemented without the need for additional programming interfaces. According to implementation details, a computing resource management scheme in accordance with the disclosed example embodiments may enable scaling of currently existing functionality and/or functionality that may be developed in the future to run on a computing device, which may include one or more computing resources (such as one or more computing engines). In some embodiments, a computing resource management scheme in accordance with the disclosed example embodiments may assist a host and/or application in scheduling jobs.
Fig. 6 illustrates an example embodiment of a host device according to an example embodiment of the disclosure. The host device shown in fig. 6 may be used, for example, to implement any of the hosts disclosed herein. The host device 600 shown in fig. 6 may include a processor 602, a system memory 606, host logic 608, and/or a communication interface 610, and the processor 602 may include a memory controller 604. Any or all of the components shown in fig. 6 may communicate over one or more system buses 612. In some embodiments, one or more of the components shown in fig. 6 may be implemented using other components. For example, in some embodiments, host logic 608 may be implemented by processor 602 executing instructions stored in system memory 606 or other memory. In some embodiments, host logic 608 may implement any host functionality disclosed herein (including, for example, any functionality of a computing resource manager).
FIG. 7 illustrates an example embodiment of a computing device that may be used to provide access to one or more computing resources to a user through a programming interface, in accordance with the disclosed example embodiments. The embodiment 700 shown in fig. 7 may be used, for example, to implement any of the computing devices disclosed herein. Computing device 700 may include a device controller 702, one or more computing resources 708, command logic 716, device functional circuitry 706, and a communication interface 710. The components shown in fig. 7 may communicate via one or more device buses 712.
The device function circuitry 706 may include any hardware for implementing the primary functions of the computing device 700. For example, if the computing device 700 is implemented as a storage device, the device functional circuitry 706 may include a storage medium (such as one or more flash memory devices, FTLs, etc.). As another example, if computing device 700 is implemented as a Network Interface Card (NIC), device functional circuitry 706 may include one or more modems, network interfaces, physical layers (PHYs), medium access control layers (MACs), and the like. As another example, if the computing device 700 is implemented as an accelerator, the device functional circuitry 706 may include one or more accelerator circuits, memory circuits, and the like.
Any of the functions described herein, including any host functions, device functions, etc. (e.g., computing resource manager 328, 428, and/or 528), as well as any of the functions described with respect to the embodiments shown in fig. 1-8, may be implemented in hardware, software, firmware, or any combination thereof (including, for example, hardware and/or software combinational logic, sequential logic, timers, counters, registers, state machines, volatile memory (such as DRAM and/or SRAM), non-volatile memory including flash memory, persistent memory (such as cross-grid non-volatile memory), memory with changes in bulk resistance, PCM, etc., and/or combinations thereof, complex Programmable Logic Devices (CPLDs) that execute instructions stored in any type of memory, FPGAs, ASICs, CPU, GPU, NPU, TPU including CISC processors (such as x86 processors), and/or RISC processors (such as ARM processors), etc.).
FIG. 8 illustrates an embodiment of a method for computing resource management of a computing device, according to a disclosed example embodiment. The method may begin at operation 802. At operation 804, the method may allocate resources of the computing device to the application using the programming interface. For example, an application may include a program, a VM, and/or a program running on a VM, a hypervisor, a container, and/or a program running in a container, a container platform, and so forth. At operation 806, the method may track the resource using a resource manager. For example, the resources may include computing engines, computing execution environments, computing device functions, memory, and the like. At operation 808, the method may use the resource manager to determine the operation of the application. For example, operations may include termination (such as conditional or unconditional exit of an application), VM shutdown, container shutdown, and the like. For example, the operation of the application may include a modification to the execution of the application, the modification to the execution of the application may be based on an execution state of the application, and the execution state may include an active execution state. In some embodiments, based on determining the operation of the application, the method may transfer execution of the application to a mechanism for controlling the application, to a mechanism for monitoring the operation of the application, to modify the state of at least a portion of the resources by the resource manager, to clean up the resources by the resource manager, to modify the state of a request from the application by the resource manager, or any combination thereof. The method may end at operation 810.
The embodiment shown in fig. 8, as well as all other embodiments described herein, are example operations and/or example components. In some embodiments, some operations and/or components may be omitted, and/or other operations and/or components may be included. Furthermore, in some embodiments, the temporal and/or spatial order of operations and/or components may be changed. Although some components and/or operations may be shown as separate components, in some embodiments, some components and/or operations shown separately may be integrated into a single component and/or operation and/or some components and/or operations shown as a single component and/or operation may be implemented with multiple components and/or operations.
The embodiments disclosed above have been described in the context of various implementation details, but the principles of the present disclosure are not limited to these or any other specific details. For example, some functions have been described as being implemented by a particular component, but in other embodiments, functions may be distributed among different systems and components in different locations and with various user interfaces. Particular embodiments have been described as having particular processes, operations, etc., but these terms also include embodiments in which a particular process, operation, etc., may be implemented with multiple processes, operations, etc., or in which multiple processes, operations, etc., may be integrated into a single process, operation, etc. References to a component or element may only refer to portions of the component or element. For example, a reference to a block may refer to an entire block or one or more sub-blocks. References to a component or element may refer to one or more of the component or element, and references to multiple components or elements may refer to a single component or element. For example, a reference to a resource may refer to one or more resources, and a reference to a resource may refer to a single resource. Unless otherwise clear from the context, terms such as "first" and "second" are used in the present disclosure and claims for the purpose of distinguishing between elements that they modify and may not indicate any spatial or temporal order. In some embodiments, reference to an element may refer to at least a portion of the element, e.g., "based on" may refer to "based at least in part on" and the like. The reference to a first element may not indicate the presence of a second element. The principles disclosed herein have independent utility and may be implemented separately and not every embodiment may utilize every principle. However, the principles may also be implemented in various combinations, some of which may amplify the benefits of the respective principles in a synergistic manner. The various details and embodiments described above may be combined to produce additional embodiments in accordance with the inventive principles of this patent disclosure.
Since the inventive principles of this patent disclosure may be modified in arrangement and detail without departing from the inventive concepts, such changes and modifications are considered to be within the scope of the appended claims.

Claims (20)

1. A method for managing resources of a computing device, comprising:
allocating resources of the computing device to the application using the programming interface;
tracking the resource using a resource manager; and
an operation of the application is determined using the resource manager.
2. The method of claim 1, further comprising: based on determining the operation of the application, the state of at least a portion of the resource is modified by a resource manager.
3. The method of claim 2, wherein the operation of the application comprises a modification to the execution of the application.
4. A method according to claim 3, wherein the modification of the execution of the application is based on the execution state of the application.
5. The method of claim 4, wherein the execution state comprises an active execution state.
6. The method of claim 1, further comprising: based on determining the operation of the application, execution of the application is transferred to a mechanism for controlling the application.
7. The method of claim 1, further comprising: based on determining the operation of the application, a mechanism for monitoring the operation of the application is performed.
8. The method of claim 1, further comprising: the resource is cleaned by a resource manager based on determining the operation of the application.
9. The method of claim 1, further comprising: based on determining the operation of the application, the status of the request from the application is modified by the resource manager.
10. The method of claim 9, wherein the request comprises a queued request.
11. The method of any of claims 1-10, wherein the resource comprises one of a computing engine, a computing execution environment, a computing device function, and a memory.
12. The method of any of claims 1 to 10, wherein the application comprises one of a program, a virtual machine, a hypervisor, a container, and a container platform.
13. The method of claim 1, wherein the step of tracking is performed at least in part by a computing device.
14. An apparatus for managing resources of a computing device, comprising:
at least one processor configured to:
resources of the computing device are allocated to the application using the programming interface,
tracking the resource using a resource manager, and
an operation of the application is determined using the resource manager.
15. The apparatus of claim 14, wherein the at least one processor is configured to: based on the operation of the application, a resource manager is used to modify the state of at least a portion of the resource.
16. The apparatus of claim 14, wherein the at least one processor is configured to: based on the operation of the application, a resource manager is used to modify the state of the request from the application.
17. An apparatus for managing computing resources, comprising:
computing resources; and
at least one processor configured to:
the computing resources are provided to the application using a programming interface,
tracking computing resources using a resource manager, and
a resource manager is used to determine the operation of the application.
18. The apparatus of claim 17, wherein the at least one processor is configured to: computing resources are allocated to the applications.
19. The apparatus of claim 17, wherein the at least one processor is configured to: based on the operation of the application, a state of at least a portion of the computing resources is modified using the resource manager.
20. The apparatus of claim 17, wherein the at least one processor is configured to: the resource manager is caused to operate at least in part.
CN202310124985.8A 2022-02-11 2023-02-07 Method and apparatus for managing resources of a computing device Pending CN116594761A (en)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US63/309,511 2022-02-11
US63/346,817 2022-05-27
US63/355,089 2022-06-23
US17/941,002 US20230259404A1 (en) 2022-02-11 2022-09-08 Systems, methods, and apparatus for managing resources for computational devices
US17/941,002 2022-09-08

Publications (1)

Publication Number Publication Date
CN116594761A true CN116594761A (en) 2023-08-15

Family

ID=87599643

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310124985.8A Pending CN116594761A (en) 2022-02-11 2023-02-07 Method and apparatus for managing resources of a computing device

Country Status (1)

Country Link
CN (1) CN116594761A (en)

Similar Documents

Publication Publication Date Title
US20200371700A1 (en) Coordinated allocation of external memory
US11126420B2 (en) Component firmware update from baseboard management controller
US11586514B2 (en) High reliability fault tolerant computer architecture
US9684545B2 (en) Distributed and continuous computing in a fabric environment
US8230077B2 (en) Hypervisor-based facility for communicating between a hardware management console and a logical partition
EP2577450B1 (en) Virtual machine migration techniques
US7484029B2 (en) Method, apparatus, and computer usable program code for migrating virtual adapters from source physical adapters to destination physical adapters
Burtsev et al. Fido: Fast Inter-Virtual-Machine Communication for Enterprise Appliances.
US10241722B1 (en) Proactive scheduling of background operations for solid state drives
US10754774B2 (en) Buffer manager
JP2020526843A (en) Methods for dirty page tracking and full memory mirroring redundancy in fault-tolerant servers
US10318393B2 (en) Hyperconverged infrastructure supporting storage and compute capabilities
EP4227811A1 (en) Systems, methods, and apparatus for managing resources for computational devices
US7536694B2 (en) Exception handling in a multiprocessor system
US20070300051A1 (en) Out of band asset management
US9354967B1 (en) I/O operation-level error-handling
CN116594761A (en) Method and apparatus for managing resources of a computing device
US11561856B2 (en) Erasure coding of replicated data blocks
US8336055B2 (en) Determining the status of virtual storage in the first memory within the first operating system and reserving resources for use by augmenting operating system
JP2018181305A (en) Local disks erasing mechanism for pooled physical resources
CN114115703A (en) Bare metal server online migration method and system
EP4227790B1 (en) Systems, methods, and apparatus for copy destination atomicity in devices
US11914512B2 (en) Writeback overhead reduction for workloads

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication