US20230244605A1 - Data set and node cache-based scheduling method and device - Google Patents

Data set and node cache-based scheduling method and device Download PDF

Info

Publication number
US20230244605A1
US20230244605A1 US18/024,732 US202118024732A US2023244605A1 US 20230244605 A1 US20230244605 A1 US 20230244605A1 US 202118024732 A US202118024732 A US 202118024732A US 2023244605 A1 US2023244605 A1 US 2023244605A1
Authority
US
United States
Prior art keywords
data set
node
cache
host node
training task
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US18/024,732
Other versions
US11698863B1 (en
Inventor
Dekui Wang
Pei Chen
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Wave Intelligent Technology Co Ltd
Original Assignee
Suzhou Wave Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Wave Intelligent Technology Co Ltd filed Critical Suzhou Wave Intelligent Technology Co Ltd
Assigned to INSPUR SUZHOU INTELLIGENT TECHNOLOGY CO., LTD. reassignment INSPUR SUZHOU INTELLIGENT TECHNOLOGY CO., LTD. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: CHEN, PEI, Wang, Dekui
Application granted granted Critical
Publication of US11698863B1 publication Critical patent/US11698863B1/en
Publication of US20230244605A1 publication Critical patent/US20230244605A1/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5044Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering hardware capabilities
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/104Peer-to-peer [P2P] networks
    • H04L67/1044Group management mechanisms 
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0842Multiuser, multiprocessor or multiprocessing cache systems for multiprocessing or multitasking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0866Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches for peripheral storage systems, e.g. disk cache
    • G06F12/0871Allocation or management of cache space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0891Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches using clearing, invalidating or resetting means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5033Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering data affinity
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/56Provisioning of proxy services
    • H04L67/568Storing data temporarily at an intermediate stage, e.g. caching
    • H04L67/5682Policies or rules for updating, deleting or replacing the stored data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/501Performance criteria
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • the present disclosure relates to the technical field of computer, in particular to a data set and node cache-based scheduling method and device.
  • AI artificial intelligence
  • training tasks have a strong dependence on data sets.
  • the quality of the data sets will affect the accuracy of a model, and the speed of loading data sets by training scripts will affect the training speed of the model.
  • the data sets used for AI training are usually an open-source data set, such as image network (ImageNet) data set, handwritten digit recognition (MNIST) data set, etc. or an industry-related data set, such as medical treatment, transportation, etc.; for an AI resource management platform, it is usually impossible to evaluate the quality of the data set but are required to be ensured by algorithm personnel.
  • the algorithm personnel when initiating a training task, the algorithm personnel usually need to manually download these data to a node to start the training task; however, with regard to the AI resource management platform, a manual download data set is usually optimized as an automatic download data set; and when starting a training task, the AI resource management platform will automatically download the required data set for the training task.
  • the AI resource management platform a variety of data sets will be provided for the algorithm personnel, and these data sets will be cached to a computing node according to the requirements of training tasks; however, due to the limited storage resources of nodes, there will be the following problems:
  • problem 1 when scheduling resources, in response to that more tasks using a large data set are scheduled to the same node, the storage resources of the node will be insufficient, and there may be a problem that the storage resources of the node are less, but the central processing unit (CPU) and memory are more idle;
  • CPU central processing unit
  • problem 2 all the computing nodes of the cluster may simultaneously cache a large number of data sets which are no longer used, resulting in insufficient storage resources of the nodes, and when scheduling resources, it may be found that there is no suitable node for caching data sets.
  • the present disclosure provides a data set and node cache-based scheduling method, and the method includes:
  • the method further includes:
  • each host node includes a data set cache required for a training task
  • the obtaining operation information of a training task in response to receiving the training task, and the screening host nodes that satisfy a space required by the training task according to the operation information and the storage resource information further includes:
  • the obtaining storage resource information of each host node further includes:
  • Kubernetes K8s, a container cluster management system
  • the method further includes:
  • the scoring each host node according to storage resource information in response to no host node satisfying the space required by the training task further includes:
  • the scoring each host node according to storage resource information in response to no host node satisfying the space required by the training task further includes:
  • the embodiments of the present disclosure also provides a data set and node cache-based scheduling device, including:
  • a storage resource information obtaining module configured to obtain storage resource information of each host node
  • a host node screening module configured to obtain operation information of a training task in response to receiving the training task, and screen host nodes that satisfy a space required by the training task according to the operation information and the storage resource information;
  • a host node scoring module configured to score each host node according to the storage resource information in response to no host nodes satisfying the space required by the training task;
  • a host node selection module configured to select from among all the host nodes a host node to be executed that is used to execute the training task according to scoring results
  • a training task execution module configured to obtain and delete an obsolete data set cache in the host node to be executed, and execute the training task in the host node to be executed.
  • the device further includes:
  • a cache determination module configured to determine whether each host node includes a data set cache required for a training task, and select a host node that executes a training task from the host nodes including the data set cache in response to determining that there is the host node including the data set cache.
  • the host node screening module is further configured to:
  • the present disclosure is a scheduling strategy for selecting a node based on the sizes of a node storage and a data set required by training task in a cluster environment.
  • an AI training task may be operated on a host node with a required data set or a host node with sufficient node storage space, and at the same time, when the remaining space of all nodes in the cluster is insufficient, a node data set cache deletion strategy is defined, and the training task may be operated on a host node with temporarily insufficient storage space. Based on this node selection strategy, it may effectively reduce the time to download data sets and the time to wait for available nodes, thereby improving the competitiveness of the AI management platform.
  • FIG. 1 shows a schematic block diagram of an embodiment of a data set and node cache-based scheduling method according to the present disclosure.
  • FIG. 2 shows a flow chart of an embodiment of a data set and node cache-based scheduling method according to the present disclosure.
  • FIG. 3 shows a schematic block diagram of an embodiment of a data set and node cache-based scheduling device according to the present disclosure.
  • FIG. 1 shows a schematic block diagram of an embodiment of a data set and node cache-based scheduling method according to the present disclosure.
  • the method includes at least the following steps:
  • FIG. 2 shows a flow chart of an embodiment of a data set and host node (which may be simply referred to as a node) cache-based scheduling method according to the present disclosure; as shown in FIG. 2 , the present disclosure relates to a scheduler extension mechanism based on Kubernetes, which uses a self-developed node agent to report the state of all the data sets of a node to a scheduler, and at the same time, the scheduler queries the operation condition of a training task of an AI resource management platform; according to the remaining storage of the node, the size of the data set cache and the number of times that the data set cache is used, data set cache cleaning strategy and other factors, nodes of the cluster are scored, and the score is combined with other scheduling strategies of Kubernetes to select the optimal node for operating the training tasks.
  • a scheduler extension mechanism based on Kubernetes
  • the storage resource information of the node is collected, including a storage space node i DiskTotalSize, a storage free space node i DiskFreeSize, and an information list of the data set node cache (a unique identifier of a data set dataSet j Id, a size of the data set dataSet j Size, and a number of times that the data set is used in the last one month dataSet j UseNumber).
  • step S 200 the user submits a training task on the resource management platform, operation information of the training task includes the used data set information including a name of the data set, a unique identifier of the data set dataSet task Id used by the task, a size of the data set dataSet task Size used by the task, and other basic resource information (CPU, memory, graphics processing unit (GPU), etc.) for operating the training task; and after receiving the resource request of the training task, the scheduler firstly uses a kubernetes default algorithm to screen out nodes with sufficient CPU, memory and GPU cards.
  • the used data set information including a name of the data set, a unique identifier of the data set dataSet task Id used by the task, a size of the data set dataSet task Size used by the task, and other basic resource information (CPU, memory, graphics processing unit (GPU), etc.) for operating the training task
  • the scheduler after receiving the resource request of the training task, the scheduler firstly uses a kubernetes default algorithm to screen out
  • step S 300 when the space node i DiskFreeSize of all the nodes in the cluster does not satisfy the space dataSet j Size required by the data set, with regard to the node node i , when the size of the data set node cache which is no longer used by the node is greater than or equal to the size of the data set cache used by the training task, i.e.,
  • the node is taken as an alternate node; for the node node i , the data set node caches needing to be deleted is selected, and a model for the data set node caches is built, and each host node is scored according to the model.
  • step S 400 selecting, from among all the host nodes, a host node to be executed that is used to execute the training task according to the scoring results.
  • step S 500 when the resource scheduling module selects an appropriate node, a list of the data set cache needing to be deleted is notified to an agent, and the agent deletes the data set cache.
  • a deletion operation is to delete a node file, and downloading a data set is to download from a remote end using a Hyper Text Transfer Protocol (http) service
  • http Hyper Text Transfer Protocol
  • the method further includes:
  • each host node includes a data set cache required for the training task
  • the host node when the host node already has the data set required for the training task (i.e., the host node has the data set cache), the host node is used to operate the training task firstly, thereby avoiding downloading the data set again.
  • the node with the largest node residual space node i DiskFreeSize is selected to download the data set, and the training task is operated.
  • obtaining operation information of a training task in response to receiving the training task, and screening host nodes that satisfy a space required by the training task according to the operation information and the storage resource information further includes:
  • the scheduler after receiving the resource request of the training task, the scheduler firstly uses a kubernetes default algorithm to screen out nodes with sufficient CPU, memory, and GPU cards. Based on the scheduling strategy, the node with the largest residual space dataSet j Size is selected to download the data set, and the training tasks are operated.
  • the obtaining storage resource information of each host node further includes:
  • a Kubernetes cluster is deployed within the cluster, and a self-developed agent is deployed at each host node for collecting the storage resource information of the node, including a storage space node i DiskTotalSize, a storage free space node i DiskFreeSize, and an information list of the data set node cache (the unique identifier of a data set dataSet i Id, the size of the data set dataSet i Size, and the number of times that the data set is used in the last one month dataSet j UseNumber).
  • the method further includes:
  • the storage resource information when the storage resource information changes, the storage resource information needs to be reported to a resource scheduling module in real time, and the resource scheduling module performs node selection and data set node cache deletion strategy based on these.
  • the scoring each host node according to storage resource information in response to no host node satisfying a space required by a training task further includes:
  • the space node i DiskFreeSize of all the nodes in the cluster does not satisfy the space dataSet j Size required by the data set, with regard to the node node i , when the size of the data set node cache which is no longer used by the node is greater than or equal to the size of the data set cache used by the training task, i.e.,
  • the node is taken as an alternate node; for the node node i , the data set node caches needing to be deleted is selected, and a model for the node data set caches is built.
  • the scoring each host node according to the storage resource information in response to no host node satisfying a space required by a training task further includes:
  • the first M data sets with the minimum number is selected, where M satisfies the following condition:
  • a node scoring standard taking the data set cache to be deleted in the node as a factor is established, and a node with a larger score is selected firstly, as below:
  • FIG. 3 shows a schematic block diagram of an embodiment of a data set and node cache-based scheduling device according to the present disclosure, as shown in FIG. 3 , the device 101 includes:
  • a storage resource information obtaining module 11 configured to obtain storage resource information of each host node
  • a host node screening module 12 configured to obtain operation information of the training task in response to receiving the training task, and screen host nodes that satisfy the space required by the training task according to the operation information and the storage resource information;
  • a host node scoring module 13 configured to score each host node according to the storage resource information in response to no host node satisfying the space required by the training task;
  • a host node selection module 14 configured to select, from among all the host nodes, a host node to be executed that is used to execute the training task according to the scoring results;
  • a training task execution module 15 configured to obtain and delete an obsolete data set cache in the host node to be executed, and execute the training task in the host node to be executed.
  • the device 101 further includes:
  • a cache determination module (not shown) configured to determine whether each host node includes a data set cache required for the training task, and select a host node that executes the training task from the host nodes including the data set cache in response to determining that there is the host node including the data set cache.
  • the host node screening module 12 is further configured to:
  • a host node that executes the training task from a plurality of pending host nodes based on the scheduling strategy in response to screening out the plurality of pending host nodes in the host nodes that satisfy the space required for the training task.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

Disclosed is a data set and node cache-based scheduling method, which includes: obtaining storage resource information of each host node; in response to receiving a training task, obtaining operation information of the training task, and according to the operation information and the storage resource information, screening host nodes that satisfy a space required by the training task; in response to no host node satisfying the space required by the training task, scoring each host node according to the storage resource information; according to scoring results, selecting, from among all of the host nodes, a host node to be executed that is used to execute the training task; and obtaining and deleting an obsolete data set cache in the host node to be executed, and executing the training task in the host node to be executed.

Description

  • The present disclosure claims the priority of the Chinese patent application filed on Sep. 4, 2020 before the CNIPA, China National Intellectual Property Administration with the application number of 202010923074.8 and the title of “DATA SET AND NODE CACHE-BASED SCHEDULING METHOD AND DEVICE”, which is incorporated herein in its entirety by reference.
  • FIELD
  • The present disclosure relates to the technical field of computer, in particular to a data set and node cache-based scheduling method and device.
  • BACKGROUND
  • In an artificial intelligence (AI) scenario, training tasks have a strong dependence on data sets. For example, the quality of the data sets will affect the accuracy of a model, and the speed of loading data sets by training scripts will affect the training speed of the model. The data sets used for AI training are usually an open-source data set, such as image network (ImageNet) data set, handwritten digit recognition (MNIST) data set, etc. or an industry-related data set, such as medical treatment, transportation, etc.; for an AI resource management platform, it is usually impossible to evaluate the quality of the data set but are required to be ensured by algorithm personnel. Generally, when initiating a training task, the algorithm personnel usually need to manually download these data to a node to start the training task; however, with regard to the AI resource management platform, a manual download data set is usually optimized as an automatic download data set; and when starting a training task, the AI resource management platform will automatically download the required data set for the training task. As the AI resource management platform, a variety of data sets will be provided for the algorithm personnel, and these data sets will be cached to a computing node according to the requirements of training tasks; however, due to the limited storage resources of nodes, there will be the following problems:
  • problem 1: when scheduling resources, in response to that more tasks using a large data set are scheduled to the same node, the storage resources of the node will be insufficient, and there may be a problem that the storage resources of the node are less, but the central processing unit (CPU) and memory are more idle;
  • problem 2: all the computing nodes of the cluster may simultaneously cache a large number of data sets which are no longer used, resulting in insufficient storage resources of the nodes, and when scheduling resources, it may be found that there is no suitable node for caching data sets.
  • SUMMARY
  • In view of this, it is an object of embodiments of the present disclosure to provide a data set and node cache-based scheduling strategy, which may achieve the effect of load balancing of storage resources of cluster nodes on the premise of using local data set cache to satisfy training tasks.
  • In view of the above, in an aspect, the present disclosure provides a data set and node cache-based scheduling method, and the method includes:
  • obtaining storage resource information of each host node;
  • obtaining operation information of a training task in response to receiving the training task, and screening host nodes that satisfy the space required by the training task according to the operation information and the storage resource information;
  • scoring each host node according to the storage resource information in response to no host node satisfying the space required by the training task;
  • selecting from among all the host nodes a host node to be executed that is used to execute the training task according to scoring results; and
  • obtaining and deleting an obsolete data set cache in the host node to be executed, and executing the training task in the host node to be executed.
  • In some embodiments of the data set and node cache-based scheduling method of the present disclosure, the method further includes:
  • determining whether each host node includes a data set cache required for a training task;
  • selecting a host node that executes a training task from the host nodes including the data set cache in response to determining that there is the host node including the data set cache.
  • In some embodiments of the data set and node cache-based scheduling method of the present disclosure, the obtaining operation information of a training task in response to receiving the training task, and the screening host nodes that satisfy a space required by the training task according to the operation information and the storage resource information further includes:
  • selecting a host node that executes the training task from a plurality of pending host nodes based on the scheduling strategy in response to screening out the plurality of pending host nodes that satisfy the space required for the training task from the host nodes.
  • In some embodiments of the data set and node cache-based scheduling method of the present disclosure, the obtaining storage resource information of each host node further includes:
  • deploying a Kubernetes (K8s, a container cluster management system) cluster within a cluster, and obtaining the storage resource information of a host node based on the Kubernetes cluster.
  • In some embodiments of the data set and node cache-based scheduling method of the present disclosure, the method further includes:
  • monitoring whether storage resource information in a host node changes;
  • reporting changed storage resource information in real time in response to monitoring that the storage resource information in the host node changes.
  • In some embodiments of the data set and node cache-based scheduling method of the present disclosure, the scoring each host node according to storage resource information in response to no host node satisfying the space required by the training task further includes:
  • obtaining a usage frequency of all data set caches in each host node, obtaining the obsolete data set cache in all data set caches according to the usage frequency, and scoring the host node according to the obsolete data set cache.
  • In some embodiments of the data set and node cache-based scheduling method of the present disclosure, the scoring each host node according to storage resource information in response to no host node satisfying the space required by the training task further includes:
  • determining a size of each data set cache in each host node, taking the data set cache with the size less than a preset size threshold value as the obsolete data set cache, and scoring the host node according to the obsolete data set cache.
  • In another aspect, the embodiments of the present disclosure also provides a data set and node cache-based scheduling device, including:
  • a storage resource information obtaining module configured to obtain storage resource information of each host node;
  • a host node screening module configured to obtain operation information of a training task in response to receiving the training task, and screen host nodes that satisfy a space required by the training task according to the operation information and the storage resource information;
  • a host node scoring module configured to score each host node according to the storage resource information in response to no host nodes satisfying the space required by the training task;
  • a host node selection module configured to select from among all the host nodes a host node to be executed that is used to execute the training task according to scoring results; and
  • a training task execution module configured to obtain and delete an obsolete data set cache in the host node to be executed, and execute the training task in the host node to be executed.
  • In some embodiments of the data set and node cache-based scheduling device of the present disclosure, and the device further includes:
  • a cache determination module configured to determine whether each host node includes a data set cache required for a training task, and select a host node that executes a training task from the host nodes including the data set cache in response to determining that there is the host node including the data set cache.
  • In some embodiments of the data set and node cache-based scheduling device of the present disclosure, the host node screening module is further configured to:
  • select a host node that executes the training task from a plurality of pending host nodes based on the scheduling strategy in response to screening out the plurality of pending host nodes that satisfy the space required for the training task from the host nodes.
  • The present disclosure has at least the following advantageous technical effects: the present disclosure is a scheduling strategy for selecting a node based on the sizes of a node storage and a data set required by training task in a cluster environment. According to the present disclosure, an AI training task may be operated on a host node with a required data set or a host node with sufficient node storage space, and at the same time, when the remaining space of all nodes in the cluster is insufficient, a node data set cache deletion strategy is defined, and the training task may be operated on a host node with temporarily insufficient storage space. Based on this node selection strategy, it may effectively reduce the time to download data sets and the time to wait for available nodes, thereby improving the competitiveness of the AI management platform.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • In order to explain the embodiments of the present disclosure or the technical solutions in the prior art more clearly, a brief description will be given below with reference to the accompanying drawings which are used in the description of the embodiments or the prior art, and it is obvious that the drawings in the description below are merely some embodiments of the present disclosure, and it would have been obvious for a person skilled in the art to obtain other drawings according to these drawings without involving any inventive effort.
  • FIG. 1 shows a schematic block diagram of an embodiment of a data set and node cache-based scheduling method according to the present disclosure.
  • FIG. 2 shows a flow chart of an embodiment of a data set and node cache-based scheduling method according to the present disclosure.
  • FIG. 3 shows a schematic block diagram of an embodiment of a data set and node cache-based scheduling device according to the present disclosure.
  • DETAILED DESCRIPTION
  • In order that the objects, aspects, and advantages of the present disclosure will become more fully apparent, embodiments of the present disclosure will be described hereinafter with reference to the accompanying drawings.
  • It should be noted that all the expressions using “first” and “second” in the embodiments of the present disclosure are intended to distinguish two entities with the same name but different from the same name or different parameters. “First” and “second” are merely for the convenience of expressions and should not be construed as limiting the embodiments of the present disclosure, and the subsequent embodiments will not be described one by one.
  • Based on the above object, in a first aspect of an embodiment of the present disclosure, an embodiment of a data set and node cache-based scheduling method is proposed. FIG. 1 shows a schematic block diagram of an embodiment of a data set and node cache-based scheduling method according to the present disclosure. In the embodiment shown in FIG. 1 , the method includes at least the following steps:
  • S100, obtaining storage resource information of each host node;
  • S200, obtaining operation information of a training task in response to receiving the training task, and screening host nodes that satisfy a space required by the training task according to the operation information and the storage resource information;
  • S300, scoring each host node according to the storage resource information in response to no host node satisfying the space required by the training task;
  • S400, selecting, from among all the host nodes, a host node to be executed that is used to execute the training task according to scoring results; and
  • S500, obtaining and deleting an obsolete data set cache in the host node to be executed, and executing the training task in the host node to be executed.
  • In some embodiments of the present disclosure, FIG. 2 shows a flow chart of an embodiment of a data set and host node (which may be simply referred to as a node) cache-based scheduling method according to the present disclosure; as shown in FIG. 2 , the present disclosure relates to a scheduler extension mechanism based on Kubernetes, which uses a self-developed node agent to report the state of all the data sets of a node to a scheduler, and at the same time, the scheduler queries the operation condition of a training task of an AI resource management platform; according to the remaining storage of the node, the size of the data set cache and the number of times that the data set cache is used, data set cache cleaning strategy and other factors, nodes of the cluster are scored, and the score is combined with other scheduling strategies of Kubernetes to select the optimal node for operating the training tasks.
  • In some embodiments of the present disclosure, according to step S100, the storage resource information of the node is collected, including a storage space nodeiDiskTotalSize, a storage free space nodeiDiskFreeSize, and an information list of the data set node cache (a unique identifier of a data set dataSetjId, a size of the data set dataSetjSize, and a number of times that the data set is used in the last one month dataSetjUseNumber). According to step S200, the user submits a training task on the resource management platform, operation information of the training task includes the used data set information including a name of the data set, a unique identifier of the data set dataSettaskId used by the task, a size of the data set dataSettaskSize used by the task, and other basic resource information (CPU, memory, graphics processing unit (GPU), etc.) for operating the training task; and after receiving the resource request of the training task, the scheduler firstly uses a kubernetes default algorithm to screen out nodes with sufficient CPU, memory and GPU cards. According to step S300, when the space nodeiDiskFreeSize of all the nodes in the cluster does not satisfy the space dataSetjSize required by the data set, with regard to the node nodei, when the size of the data set node cache which is no longer used by the node is greater than or equal to the size of the data set cache used by the training task, i.e.,
  • j = 1 N dataSet j Size dataSet task Size ,
  • the node is taken as an alternate node; for the node nodei, the data set node caches needing to be deleted is selected, and a model for the data set node caches is built, and each host node is scored according to the model. According to step S400, selecting, from among all the host nodes, a host node to be executed that is used to execute the training task according to the scoring results. According to step S500, when the resource scheduling module selects an appropriate node, a list of the data set cache needing to be deleted is notified to an agent, and the agent deletes the data set cache. Since a deletion operation is to delete a node file, and downloading a data set is to download from a remote end using a Hyper Text Transfer Protocol (http) service, the speed of deleting the data set cache must be much greater than the speed of downloading the data set cache. At this point, the download of the data set may begin immediately after the training task is scheduled to the node.
  • In some embodiments of the data set and node cache-based scheduling method according to the present disclosure, the method further includes:
  • determining whether each host node includes a data set cache required for the training task;
  • selecting a host node that executes the training task from the host nodes including the data set cache in response to determining that there is the host node including the data set cache.
  • In some embodiments of the present disclosure, for the host node in the cluster, when the host node already has the data set required for the training task (i.e., the host node has the data set cache), the host node is used to operate the training task firstly, thereby avoiding downloading the data set again. When all the nodes in the cluster do not have the node cache of the data set, the node with the largest node residual space nodeiDiskFreeSize is selected to download the data set, and the training task is operated.
  • According to some embodiments of the data set and node cache-based scheduling method of the present disclosure, obtaining operation information of a training task in response to receiving the training task, and screening host nodes that satisfy a space required by the training task according to the operation information and the storage resource information further includes:
  • selecting a host node that executes the training task from a plurality of pending host nodes based on the scheduling strategy in response to screening out the plurality of pending host nodes that satisfy the space required for the training task from the host nodes.
  • In some embodiments of the present disclosure, after receiving the resource request of the training task, the scheduler firstly uses a kubernetes default algorithm to screen out nodes with sufficient CPU, memory, and GPU cards. Based on the scheduling strategy, the node with the largest residual space dataSetjSize is selected to download the data set, and the training tasks are operated.
  • According to some embodiments of the data set and node cache-based scheduling method of the present disclosure, the obtaining storage resource information of each host node further includes:
  • deploying a Kubernetes cluster within the cluster, and obtaining the storage resource information of the host node based on the Kubernetes cluster.
  • In some embodiments of the present disclosure, a Kubernetes cluster is deployed within the cluster, and a self-developed agent is deployed at each host node for collecting the storage resource information of the node, including a storage space nodeiDiskTotalSize, a storage free space nodeiDiskFreeSize, and an information list of the data set node cache (the unique identifier of a data set dataSetiId, the size of the data set dataSetiSize, and the number of times that the data set is used in the last one month dataSetjUseNumber).
  • In some embodiments of the data set and node cache-based scheduling method according to the present disclosure, the method further includes:
  • monitoring whether the storage resource information in the host node changes;
  • reporting the changed storage resource information in real time in response to monitoring that the storage resource information in the host node changes.
  • In some embodiments of the present disclosure, when the storage resource information changes, the storage resource information needs to be reported to a resource scheduling module in real time, and the resource scheduling module performs node selection and data set node cache deletion strategy based on these.
  • According to some embodiments of the data set and node cache-based scheduling method of the present disclosure, the scoring each host node according to storage resource information in response to no host node satisfying a space required by a training task further includes:
  • obtaining a usage frequency of all data set caches in each host node, obtaining the obsolete data set cache in all data set caches according to the usage frequency, and scoring the host node according to the obsolete data set cache.
  • In some embodiments of the present disclosure, when the space nodeiDiskFreeSize of all the nodes in the cluster does not satisfy the space dataSetjSize required by the data set, with regard to the node nodei, when the size of the data set node cache which is no longer used by the node is greater than or equal to the size of the data set cache used by the training task, i.e.,
  • j = 1 N dataSet j Size dataSet task Size ,
  • the node is taken as an alternate node; for the node nodei, the data set node caches needing to be deleted is selected, and a model for the node data set caches is built.
  • When the data set cache datasetj of the node is used more times in the last one month, it means that the data set is more likely to be used by other training tasks in a future period of time; in order to avoid downloading the data set again when a new training task needs to use the data set, in this scheduling rule, we do not choose to delete the data set cache as much as possible, and we define the weight value of selecting the data set to be deleted as follows:
  • ( 1 - dataSet j UseNumber k = 1 k = N dataSet k UseNumber ) × 10
  • According to some embodiments of the data set and node cache-based scheduling method of the present disclosure, the scoring each host node according to the storage resource information in response to no host node satisfying a space required by a training task further includes:
  • determining the size of each data set cache in each host node, taking the data set cache with the size less than a preset size threshold value as the obsolete data set cache, and scoring the host node according to the obsolete data set cache.
  • In some embodiments of the present disclosure, when the size of the data set node cache is larger, in order to reduce the time for deleting the data set node cache, we do not select a large data set cache as much as possible to be deleted, i.e., the possibility of deleting the large data set cache is low, and we define the weight value for selecting the data set to be deleted as follows:
  • ( 1 - dataSet j Size k = 1 k = N dataSet k Size ) × 10
  • In some embodiments of the present disclosure, for a node data set, we compute and sort according to the following formula:
  • dataSet j Score = ( 1 - dataSet j Size k = 1 k = N dataSet k Size ) × 10 + ( 1 - dataSet j Size k = 1 k = N dataSet k Size ) × 10
  • the first M data sets with the minimum number is selected, where M satisfies the following condition:
  • j = 1 M dataSet j Size dataSet task Size , M ( 1 N )
  • a node scoring standard taking the data set cache to be deleted in the node as a factor is established, and a node with a larger score is selected firstly, as below:
  • data i Score = j = 1 M dataSet j Score M
  • according to a first aspect of an embodiment of the present disclosure, an embodiment of a data set and node cache-based scheduling device is proposed. FIG. 3 shows a schematic block diagram of an embodiment of a data set and node cache-based scheduling device according to the present disclosure, as shown in FIG. 3 , the device 101 includes:
  • a storage resource information obtaining module 11 configured to obtain storage resource information of each host node;
  • a host node screening module 12 configured to obtain operation information of the training task in response to receiving the training task, and screen host nodes that satisfy the space required by the training task according to the operation information and the storage resource information;
  • a host node scoring module 13 configured to score each host node according to the storage resource information in response to no host node satisfying the space required by the training task;
  • a host node selection module 14 configured to select, from among all the host nodes, a host node to be executed that is used to execute the training task according to the scoring results; and
  • a training task execution module 15 configured to obtain and delete an obsolete data set cache in the host node to be executed, and execute the training task in the host node to be executed.
  • According to some embodiments of the data set and node cache-based scheduling device of the present disclosure, the device 101 further includes:
  • a cache determination module (not shown) configured to determine whether each host node includes a data set cache required for the training task, and select a host node that executes the training task from the host nodes including the data set cache in response to determining that there is the host node including the data set cache.
  • According to some embodiments of the data set and node cache-based scheduling device of the present disclosure, the host node screening module 12 is further configured to:
  • select a host node that executes the training task from a plurality of pending host nodes based on the scheduling strategy in response to screening out the plurality of pending host nodes in the host nodes that satisfy the space required for the training task.
  • As such, a person skilled in the art will appreciate that all embodiments, features, and advantages set forth above with respect to the data set and node cache-based scheduling method according to the present disclosure apply equally to the device according to the present disclosure. For the sake of brevity of the present disclosure, this description is not repeated here.
  • It should be noted that a person skilled in the art would understand that the implementation of all or part of the flows in the methods of the above-mentioned embodiments may be performed by a computer program instructing relevant hardware, and a program of a data set and node cache-based scheduling method may be stored in a computer-readable storage medium, and when executed, the program may include the flows of the embodiments of the methods as described above. The storage medium of the program may be a magnetic disk, an optical disk, a ROM, read-Only Memory or RAM, random Access Memory, etc. Embodiments of the computer program described above may achieve the same or similar effects as any of the method embodiments described above corresponding thereto.
  • A person skilled in the art would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as software or hardware depends upon the application and design constraints imposed on the overall system. A person skilled in the art may implement the described functionality in varying ways for each application, but such implementation decisions should not be interpreted as causing a departure from the scope of the disclosed embodiments.
  • It will be understood that, as used herein, the singular forms “a”, “an”, and “the” are intended to include the plural forms as well, unless the context clearly supports the exception. It should also be understood that “and/or” as used herein is meant to include all possible combinations of one or more of the associated listed items.
  • The above-described embodiments of the present disclosure have been disclosed for the purpose of illustration only, and are not intended to represent the advantages and disadvantages of the embodiments.
  • A person skilled in the art will appreciate that the above discussion of any embodiments is intended to be exemplary only, and is not intended to suggest that the scope of the disclosed embodiments (including the claims) is limited to these examples; combinations of features in the above embodiments or in different embodiments are also possible within the framework of embodiments of the present disclosure, and many other variations of different aspects of the embodiments of the present disclosure as described above are not provided in detail for the sake of clarity. Accordingly, it is intended that the present disclosure cover the modifications and variations of this invention provided they come within the scope of the appended claims and their equivalents.

Claims (20)

1. A data set and node cache-based scheduling method, comprising:
obtaining storage resource information of each host node;
obtaining operation information of a training task in response to receiving the training task, and screening host nodes that satisfy a space required by the training task according to the operation information and the storage resource information;
scoring each host node according to the storage resource information in response to no host node satisfying the space required by the training task;
selecting, from among all the host nodes, a host node to be executed that is used to execute the training task according to scoring results; and
obtaining and deleting an obsolete data set cache in the host node to be executed, and executing the training task in the host node to be executed.
2. The data set and node cache-based scheduling method according to claim 1, further comprising:
determining whether each host node includes a data set cache required for the training task; and
selecting a host node that executes the training task from the host nodes including the data set cache in response to determining that there is the host node including the data set cache.
3. The data set and node cache-based scheduling method according to claim 1, wherein the obtaining operation information of a training task in response to receiving the training task, and screening host nodes that satisfy a space required by the training task according to the operation information and the storage resource information further comprises:
selecting a host node that executes the training task from a plurality of pending host nodes based on the scheduling strategy in response to screening out the plurality of pending host nodes that satisfy the space required for the training task from the host nodes.
4. The data set and node cache-based scheduling method according to claim 1, wherein the obtaining storage resource information of each host node further comprises:
deploying a Kubernetes cluster within a cluster, and obtaining the storage resource information of the host node based on the Kubernetes cluster.
5. The data set and node cache-based scheduling method according to claim 1, comprising:
monitoring whether the storage resource information in the host node changes;
reporting changed storage resource information in real time in response to monitoring that the storage resource information in the host node changes.
6. The data set and node cache-based scheduling method according to claim 1, wherein the scoring each host node according to the storage resource information in response to no host node satisfying the space required by the training task further comprises:
obtaining a usage frequency of all data set caches in each host node, obtaining the obsolete data set cache in all data set caches according to the usage frequency, and scoring the host node according to the obsolete data set cache.
7. The data set and node cache-based scheduling method according to claim 1, wherein the scoring each host node according to the storage resource information in response to no host node satisfying the space required by the training task further comprises:
determining a size of each data set cache in each host node, taking the data set cache with the size less than a preset size threshold value as the obsolete data set cache, and scoring the host node according to the obsolete data set cache.
8. A data set and node cache-based scheduling device, comprising:
a storage resource information obtaining module configured to obtain storage resource information of each host node;
a host node screening module configured to obtain operation information of a training task in response to receiving the training task, and screen host nodes that satisfy a space required by the training task according to the operation information and the storage resource information;
a host node scoring module configured to score each host node according to the storage resource information in response to no host node satisfying the space required by the training task;
a host node selection module configured to select, from among all the host nodes, a host node to be executed that is used to execute the training task according to scoring results; and
a training task execution module configured to obtain and delete an obsolete data set cache in the host node to be executed, and execute the training task in the host node to be executed.
9. The data set and node cache-based scheduling device according to claim 8, wherein the device further comprises:
a cache determination module configured to determine whether each host node includes a data set cache required for a training task, and select a host node that executes a training task from the host nodes including the data set cache in response to determining that there is the host node including the data set cache.
10. The data set and node cache-based scheduling device according to claim 8, wherein the host node screening module is further configured to:
select a host node that executes the training task from a plurality of pending host nodes based on the scheduling strategy in response to screening out the plurality of pending host nodes that satisfy the space required for the training task from the host nodes.
11. The data set and node cache-based scheduling method according to claim 1, wherein the storage resource information comprises a storage space, a storage free space and an information list of a data set node cache; and
the information list of the data set node cache comprises a unique identifier of a data set, a size of the data set and a number of times that the data set is used in the last one month.
12. The data set and node cache-based scheduling method according to claim 1, wherein the operation information of the training task comprises used data set information and basic resource information for operating the training task;
the used data set information comprises a name of a data set, a unique identifier of the data set used by the training task and a size of the data set used by the training task; and
the basic resource information comprises a central processing unit, a memory and a graphics processing unit.
13. The data set and node cache-based scheduling method according to claim 1, wherein the scoring each host node according to the storage resource information in response to no host node satisfying the space required by the training task further comprises:
regarding a host node as an alternate node when a size of a data set node cache which is no longer used by the host node is greater than or equal to a size of a data set cache used by the training task;
selecting data set node caches to be deleted in the alternate node, and building a model for the data set node caches; and
scoring each host node according to the model.
14. The data set and node cache-based scheduling method according to claim 2, further comprising:
in response to determining that there is no host node including the data set cache, selecting a host node with a largest node residual space download a data set, and operating the training task.
15. The data set and node cache-based scheduling device according to claim 8, wherein storage resource information obtaining module is further configured to:
deploy a Kubernetes cluster within a cluster, and obtaining the storage resource information of the host node based on the Kubernetes cluster.
16. The data set and node cache-based scheduling device according to claim 8, wherein the device further comprises:
a storage resource information monitoring module configured to monitor whether the storage resource information in the host node changes, and report changed storage resource information in real time in response to monitoring that the storage resource information in the host node changes.
17. The data set and node cache-based scheduling device according to claim 8, wherein the host node scoring module is further configured to:
obtain a usage frequency of all data set caches in each host node, obtain the obsolete data set cache in all data set caches according to the usage frequency, and score the host node according to the obsolete data set cache.
18. The data set and node cache-based scheduling device according to claim 8, wherein the host node scoring module is further configured to:
determine a size of each data set cache in each host node, take the data set cache with the size less than a preset size threshold value as the obsolete data set cache, and score the host node according to the obsolete data set cache.
19. The data set and node cache-based scheduling device according to claim 8, wherein the host node scoring module is further configured to:
regard a host node as an alternate node when a size of a data set node cache which is no longer used by the host node is greater than or equal to a size of a data set cache used by the training task;
select data set node caches to be deleted in the alternate node, and build a model for the data set node caches; and
score each host node according to the model.
20. The data set and node cache-based scheduling device according to claim 9, wherein the cache determination module is further configured to:
in response to determining that there is no host node including the data set cache, select a host node with a largest node residual space download a data set, and operate the training task.
US18/024,732 2020-09-04 2021-07-30 Data set and node cache-based scheduling method and device Active US11698863B1 (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
CN202010923074.8A CN112202837B (en) 2020-09-04 2020-09-04 Scheduling method and device based on data set and node cache
CN202010923074.8 2020-09-04
PCT/CN2021/109626 WO2022048365A1 (en) 2020-09-04 2021-07-30 Data set and node cache-based scheduling method and device

Publications (2)

Publication Number Publication Date
US11698863B1 US11698863B1 (en) 2023-07-11
US20230244605A1 true US20230244605A1 (en) 2023-08-03

Family

ID=74006276

Family Applications (1)

Application Number Title Priority Date Filing Date
US18/024,732 Active US11698863B1 (en) 2020-09-04 2021-07-30 Data set and node cache-based scheduling method and device

Country Status (5)

Country Link
US (1) US11698863B1 (en)
EP (1) EP4203437A4 (en)
KR (1) KR20230093420A (en)
CN (1) CN112202837B (en)
WO (1) WO2022048365A1 (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112202837B (en) 2020-09-04 2022-05-17 苏州浪潮智能科技有限公司 Scheduling method and device based on data set and node cache
CN112905325B (en) * 2021-02-10 2023-01-10 山东英信计算机技术有限公司 Method, system and medium for distributed data cache accelerated training
CN112925640A (en) * 2021-02-10 2021-06-08 杭州幻方人工智能基础研究有限公司 Cluster training node distribution method and electronic equipment
CN113094183B (en) * 2021-06-09 2021-09-17 苏州浪潮智能科技有限公司 Training task creating method, device, system and medium of AI (Artificial Intelligence) training platform
CN116339968A (en) * 2021-12-24 2023-06-27 华为云计算技术有限公司 Computing resource and cache resource scheduling method, device and system
CN115904673B (en) * 2023-03-09 2023-06-27 华南师范大学 Cloud computing resource concurrent scheduling method, device, system, equipment and medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190132392A1 (en) * 2017-10-28 2019-05-02 TuSimple Storage architecture for heterogeneous multimedia data
US20200092392A1 (en) * 2018-09-19 2020-03-19 International Business Machines Corporation Data caching and data-aware placement to accelerate machine learning applications

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11954565B2 (en) * 2018-07-06 2024-04-09 Qliktech International Ab Automated machine learning system
CN110502487B (en) * 2019-08-09 2022-11-22 苏州浪潮智能科技有限公司 Cache management method and device
CN110795217B (en) * 2019-09-27 2022-07-15 广东浪潮大数据研究有限公司 Task allocation method and system based on resource management platform
CN111158852A (en) * 2019-12-14 2020-05-15 苏州浪潮智能科技有限公司 Training resource dynamic allocation method, system, terminal and storage medium
CN111444019B (en) * 2020-03-31 2024-01-26 中国科学院自动化研究所 Cloud collaborative deep learning model distributed training method and system
CN112202837B (en) 2020-09-04 2022-05-17 苏州浪潮智能科技有限公司 Scheduling method and device based on data set and node cache

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190132392A1 (en) * 2017-10-28 2019-05-02 TuSimple Storage architecture for heterogeneous multimedia data
US20200092392A1 (en) * 2018-09-19 2020-03-19 International Business Machines Corporation Data caching and data-aware placement to accelerate machine learning applications

Also Published As

Publication number Publication date
CN112202837A (en) 2021-01-08
EP4203437A1 (en) 2023-06-28
EP4203437A4 (en) 2023-09-20
WO2022048365A1 (en) 2022-03-10
CN112202837B (en) 2022-05-17
KR20230093420A (en) 2023-06-27
US11698863B1 (en) 2023-07-11

Similar Documents

Publication Publication Date Title
US20230244605A1 (en) Data set and node cache-based scheduling method and device
JP7192103B2 (en) DATA PROCESSING METHOD AND APPARATUS, AND COMPUTING NODE
US9785468B2 (en) Finding resource bottlenecks with low-frequency sampled data
US8621109B2 (en) Adaptable management in sync engines
DE112010003610T5 (en) Prefilling a cache on thread migration
CN110221901A (en) Container asset creation method, apparatus, equipment and computer readable storage medium
CN110292775A (en) Obtain the method and device of variance data
CN104202373A (en) Method and system for migrating mobile cloud computing
US11487571B2 (en) Method and system for efficient utilization of resources in containers
US20230153100A1 (en) Method and apparatus for managing model file in inference application
CN111767145A (en) Container scheduling system, method, device and equipment
WO2020211363A1 (en) Method and apparatus for improving efficiency of program loading, computer device and storage medium
CN114356714A (en) Resource integration monitoring and scheduling device based on Kubernetes intelligent board card cluster
CN111143033B (en) Operation execution method and device based on scalable operation system
CN113127179A (en) Resource scheduling method and device, electronic equipment and computer readable medium
CN108121514B (en) Meta information updating method and device, computing equipment and computer storage medium
CN115686825A (en) Resource management method, device, server and storage medium
CN113110804B (en) Duplicate picture deleting method, device, equipment and storage medium
CN105843735B (en) A kind of consumption method and device of terminal memory
CN111897959A (en) Method, apparatus, device and storage medium for reasoning within dynamic legal events
CN109739649A (en) Method for managing resource, device, equipment and computer readable storage medium
CN116089248B (en) Write I/O burst distribution prediction method, device, equipment and storage medium
CN115840770B (en) Local cache data processing method and related equipment based on distributed environment
US11494697B2 (en) Method of selecting a machine learning model for performance prediction based on versioning information
CN113407192A (en) Model deployment method and device

Legal Events

Date Code Title Description
AS Assignment

Owner name: INSPUR SUZHOU INTELLIGENT TECHNOLOGY CO., LTD., CHINA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:WANG, DEKUI;CHEN, PEI;REEL/FRAME:062880/0983

Effective date: 20230114

FEPP Fee payment procedure

Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

STCF Information on status: patent grant

Free format text: PATENTED CASE