CN109117265A - The method, apparatus, equipment and storage medium of schedule job in the cluster - Google Patents
The method, apparatus, equipment and storage medium of schedule job in the cluster Download PDFInfo
- Publication number
- CN109117265A CN109117265A CN201810761530.6A CN201810761530A CN109117265A CN 109117265 A CN109117265 A CN 109117265A CN 201810761530 A CN201810761530 A CN 201810761530A CN 109117265 A CN109117265 A CN 109117265A
- Authority
- CN
- China
- Prior art keywords
- node
- pod
- cluster
- resource
- condition
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5083—Techniques for rebalancing the load in a distributed system
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses the method, apparatus of schedule job in the cluster, equipment and storage mediums.The described method includes: obtaining Pod data corresponding with operation;According to the node scheduling condition and the node state of each node in cluster in Pod data, one or more destination nodes are selected from cluster;Pod is disposed respectively according to Pod data on each destination node, the operation process of the operation operation technical solution disposes operation process by the way of containerization in the Pod of deployment, different work is deployed in each independent container, so that the operation respectively applied on cluster can be independent of each other, the scene of interaction can needed to be communicated again, the effective use for realizing cluster resource needs deep learning etc. mostly to can be realized stable job scheduling using the project of pipelineization cooperation.
Description
[technical field]
The present invention relates to job scheduling fields, and in particular to the method, apparatus, equipment and storage of schedule job in the cluster
Medium.
[background technique]
Using more physics units at cluster, deployment services are the conventional means that Internet enterprises realize project on it,
Therefore, how to carry out rational management to all kinds of operations serviced on cluster is always the problem of technical staff constantly studies.
By taking deep learning project as an example, technical staff is often desirable to run all portions of project on same foundation architecture platform
Point, and the sample data of deep learning is often from the product of service line, that is, needing to dispatch multiclass jointly in same cluster
The operation of type.However, existing Scheduling Framework can not realize this point well.
[summary of the invention]
In view of this, the present invention provides the method, apparatus of schedule job in the cluster, equipment and storage medium, with solution
Certainly in same cluster the problem of the operation of scheduling different application or service.
Specific technical solution is as follows:
A method of schedule job in the cluster, comprising:
Obtain container pod Pod data corresponding with operation;
According to the node scheduling condition and the node state of each node in the cluster in the Pod data, from the collection
One or more destination nodes are selected in group;
Pod is disposed respectively according to the Pod data on each destination node, and the operation is run in the Pod of deployment
Operation process.
Optionally, the node scheduling condition includes rigid schedulable condition and/or soft schedulable condition;
The node state of each node in the node scheduling attribute according in the Pod data and the cluster, from institute
It states and selects one or more destination nodes in cluster and include:
If the node state of a node meets the rigid schedulable condition, belong to destination node;
And/or
It is scheduled scoring according to the node state of each node and the soft schedulable condition, selects one according to appraisal result
A or multiple destination nodes.
Optionally, the rigid schedulable condition includes the hardware information of node and/or the area information of node.
Optionally, the node shape of the node scheduling condition according in the Pod data and each node in the cluster
State, one or more destination nodes are selected from the cluster includes:
The multiple nodes being located on the same available area are selected as destination node;
It is described that dispose Pod respectively according to the Pod data on each destination node include: on each destination node according to institute
It states Pod data and disposes a Pod respectively, to form multiple examples of the operation.
Optionally, the node scheduling condition further includes example numerical lower limits and example the upper limit of the number;Described select is located at
Multiple nodes on the same available area are as destination node further include:
When the number of nodes selected is greater than or equal to the example numerical lower limits, according to the number of nodes selected and
The example the upper limit of the number determines quantity of the lesser value as destination node in the two;
When the number of nodes selected is less than the example numerical lower limits, this job scheduling is terminated.
Optionally, the soft schedulable condition includes the affine condition of operation, and the node state is included in this node top
The corresponding operation of each Pod of administration.
Optionally, the operation corresponds to following one or more applications and/or service:
Deep learning system, Web service, log collector, Distributed Queuing Service, log connector.
Optionally, the operation is deep learning training operation, the operation that the operation is run in the Pod of deployment
Process includes:
A parameter server process and a training aids process are run in the Pod of deployment, by the training aids process
Deep learning task is obtained from the metamessage management node of deep learning, is obtained according to local deep learning training pattern training
It is sent to the parameter server process after gradient, and obtains updated parameter from the parameter server process;
The parameter server process saves training snapshot in distributed storage at predetermined intervals, in Pod or Pod
Process restarting resumed training according to the trained snapshot;
Deep learning training pattern is stored to distribution and is deposited by the parameter server process and/or the training aids process
Storage.
Optionally, the node scheduling condition includes that resource request lower limit corresponding with each computing resource and resource are asked
Seek the upper limit;
The node state of each node in the node scheduling condition according in the Pod data and the cluster, from institute
It states and selects one or more destination nodes in cluster and include:
Under the schedulable resource upper limit and schedulable resource that calculate each computing resource according to the Pod disposed in each node
Limit;
It is provided when the resource request lower limit of each computing resource in the node scheduling condition is respectively less than or is equal to corresponding calculate
When the schedulable resource lower limit in source, one or more destination nodes are selected from the cluster.
Optionally, the node scheduling condition further includes job priority;
The node state of each node in the node scheduling condition according in the Pod data and the cluster, from institute
It states and selects one or more destination nodes in cluster further include:
It is excellent according to operation when the resource request lower limit in the node scheduling condition is greater than the schedulable resource lower limit
First grade kills or blocks the Pod disposed, alternatively, terminating this job scheduling.
Optionally, it is described kill or block Pod include:
When it is compressible resource that the node scheduling condition is corresponding, block Pod;
When it is incompressible resource that the node scheduling condition is corresponding, Pod is killed.
Optionally, this method further include:
It is that the Pod disposed on each node distributes computing resource by node scheduling condition;Wherein, if portion in a node
The sum of resource request upper limit of compressible resource of each Pod of administration is less than the upper limit of the compressible resource of the node, then will not divided
The compressible resource matched is proportionately distributed to each Pod disposed on the node.
Optionally, this method further include:
The EMS memory occupation point for calculating each operation process reaches corresponding with the operation process in the EMS memory occupation being calculated point
Preset value when kill the operation process.
Optionally, this method further include:
Obtain the cpu busy percentage of each Pod dispose based on same Pod data, according to the arithmetic mean of instantaneous value of cpu busy percentage and
Node scheduling condition in the Pod data calculates Pod quantity adjusted.
Optionally, this method further include:
Whether have not Pod by successful dispatch, be the section for further determining whether extendible capacity if monitoring in the cluster
Point;
It is the node for starting at least partly extendible capacity, will be dispatched on the node newly started by the Pod of successful dispatch.
Optionally, this method further include:
Judge whether node meets capacity reducing condition according to the node state of each node, is corresponding node to be closed, in phase
Other nodes being dispatched to the Pod disposed when having deployed Pod on the node answered in cluster.
Optionally, the capacity reducing condition includes following one or more:
The computing resource utilization rate of node is less than preset value;
The Pod disposed in node can be scheduled to other nodes in cluster;
The Pod disposed in node is confirmed as to be drifted about according to PodDisruptionBudget controller;
Node is not locally stored.
Optionally, dilatation and/or the capacity reducing of cluster are carried out according to following one or more strategies:
Randomly choose node;
Node is selected according to the Pod quantity disposed;
Node is selected according to computing resource utilization rate;
Node is selected according to the use price of physical machine;
When thering is the node of preset quantity and/or preset ratio to be abnormal in the cluster, suspend dilatation and/or capacity reducing.
A kind of device of schedule job in the cluster, which is characterized in that the device includes:
Pod data capture unit, for obtaining container pod Pod data corresponding with operation;
Scheduling unit, for according to the node scheduling condition and the node of each node in the cluster in the Pod data
State selects one or more destination nodes from the cluster;
Pod deployment unit, for disposing Pod respectively according to the Pod data on each destination node, in the Pod of deployment
The operation process of the middle operation operation.
Optionally, the node scheduling condition includes rigid schedulable condition and/or soft schedulable condition;
The scheduling unit belongs to target if the node state for a node meets the rigid schedulable condition
Node;And/or scoring is scheduled according to the node state of each node and the soft schedulable condition, it is selected according to appraisal result
One or more destination nodes out.
Optionally, the rigid schedulable condition includes the hardware information of node and/or the area information of node.
Optionally, the scheduling unit, for selecting the multiple nodes being located on the same available area as destination node;
The Pod deployment unit, for disposing a Pod respectively according to the Pod data on each destination node, with shape
At multiple examples of the operation.
Optionally, the node scheduling condition further includes example numerical lower limits and example the upper limit of the number;
The scheduling unit, for when the number of nodes selected be greater than or equal to the example numerical lower limits when, according to institute
It states the number of nodes selected and the example the upper limit of the number determines quantity of the lesser value as destination node in the two;When
When the number of nodes selected is less than the example numerical lower limits, this job scheduling is terminated.
Optionally, the soft schedulable condition includes the affine condition of operation, and the node state is included in this node top
The corresponding operation of each Pod of administration.
Optionally, the operation corresponds to following one or more applications and/or service:
Deep learning system, Web service, log collector, Distributed Queuing Service, log connector.
Optionally, the operation is deep learning training operation;
The deployment unit, for running a parameter server process and a training aids process in the Pod of deployment,
Deep learning task is obtained from the metamessage management node of deep learning by the training aids process, according to local deep learning
Training pattern training is sent to the parameter server process after obtaining gradient, and obtains more from the parameter server process
Parameter after new;The parameter server process saves training snapshot in distributed storage at predetermined intervals, in Pod or
Process restarting in Pod is resumed training according to the trained snapshot;The parameter server process and/or the training aids
Process stores deep learning training pattern to distributed storage.
Optionally, the node scheduling condition includes that resource request lower limit corresponding with each computing resource and resource are asked
Seek the upper limit;
The scheduling unit, for calculating the schedulable resource of each computing resource according to the Pod disposed in each node
Limit and schedulable resource lower limit;When the resource request lower limit of each computing resource in the node scheduling condition is respectively less than or is equal to
When the schedulable resource lower limit of corresponding computing resource, one or more destination nodes are selected from the cluster.
Optionally, the node scheduling condition further includes job priority;
The scheduling unit, for being greater than the schedulable resource when the resource request lower limit in the node scheduling condition
It when lower limit, is killed according to job priority or blocks the Pod disposed, alternatively, terminating this job scheduling.
Optionally, the scheduling unit, for blocking when it is compressible resource that the node scheduling condition is corresponding
Pod;When it is incompressible resource that the node scheduling condition is corresponding, Pod is killed.
Optionally, the scheduling unit is also used to be that the Pod distribution disposed on each node is calculated by node scheduling condition
Resource;Wherein, if the sum of resource request upper limit of compressible resource of each Pod disposed in a node can less than the node
Unassigned compressible resource is then proportionately distributed to each Pod disposed on the node by the upper limit of compressed resource.
Optionally, the scheduling unit is also used to calculate the EMS memory occupation point of each operation process, in the memory being calculated
It occupies to divide and kills the operation process when reaching preset value corresponding with the operation process.
Optionally, the scheduling unit, for obtaining the cpu busy percentage of each Pod based on the deployment of same Pod data, root
Pod quantity adjusted is calculated according to the node scheduling condition in the arithmetic mean of instantaneous value of cpu busy percentage and the Pod data.
Optionally, the scheduling unit, it is then that whether be also used to monitor in the cluster, which has the not Pod by successful dispatch,
Further determine whether the node of extendible capacity;It is the node for starting at least partly extendible capacity, it will be not by the Pod of successful dispatch
It is dispatched on the node newly started.
Optionally, the scheduling unit, for judging whether node meets capacity reducing condition according to the node state of each node,
It is to close corresponding node, the Pod disposed is dispatched to other in cluster when having deployed Pod on corresponding node
Node.
Optionally, the capacity reducing condition includes following one or more: the computing resource utilization rate of node is less than default
Value;The Pod disposed in node can be scheduled to other nodes in cluster;The Pod disposed in node according to
PodDisruptionBudget controller is confirmed as to be drifted about;Node is not locally stored.
Optionally, the scheduling unit, for carrying out the dilatation and/or contracting of cluster according to following one or more strategies
Hold: random selection node;Node is selected according to the Pod quantity disposed;Node is selected according to computing resource utilization rate;According to object
The use price of reason machine selects node;When having the node of preset quantity and/or preset ratio to be abnormal in the cluster, pause is expanded
Appearance and/or capacity reducing.
A kind of computer equipment, including memory, processor and be stored on the memory and can be in the processor
The computer program of upper operation, the processor realize method as described above when executing described program.
A kind of computer readable storage medium is stored thereon with computer program, real when described program is executed by processor
Now method as described above.
It can be seen that based on above-mentioned introduction using scheme of the present invention, getting container pod Pod corresponding with operation
After data, according to the node state of each node in node scheduling condition therein and cluster, several targets are selected from cluster
Node, and further Pod is disposed respectively according to Pod data on each destination node, the operation is run in the Pod of deployment
Operation process.The technical solution disposes operation process by the way of containerization, different work is deployed in each only
In vertical container, so that the operation respectively applied on cluster can be independent of each other, and it can be carried out in the scene for needing interaction
Communication, realizes the effective use of cluster resource, and deep learning etc. is needed mostly can be real using the project of pipelineization cooperation
Now stable job scheduling.
[Detailed description of the invention]
Fig. 1 shows a kind of process signal of method of schedule job in the cluster according to an embodiment of the invention
Figure.
Fig. 2 shows a kind of structural representations of the device of schedule job in the cluster according to an embodiment of the invention
Figure.
Fig. 3 shows a kind of deep learning system architecture schematic diagram according to an embodiment of the invention.
Fig. 4 shows the block diagram for being suitable for the exemplary computer system/server 12 for being used to realize embodiment of the present invention.
[specific embodiment]
In order to be clearer and more clear technical solution of the present invention, hereinafter, referring to the drawings and the embodiments, to institute of the present invention
The scheme of stating is further described.
Obviously, described embodiments are some of the embodiments of the present invention, instead of all the embodiments.Based on the present invention
In embodiment, those skilled in the art's all other embodiment obtained without creative efforts, all
Belong to the scope of protection of the invention.
Fig. 1 shows a kind of process signal of method of schedule job in the cluster according to an embodiment of the invention
Figure.As shown in Figure 1, this method comprises:
Step S110 obtains container pod Pod data corresponding with operation.
Container (container) can provide independent running environment, and this point and virtual machine are similar.However in this hair
Operation process is disposed using the thinking of containerization in bright embodiment, rather than uses virtual machine, this is because container is more light
Amount, efficiency and utilization rate are all significantly larger than virtual machine.
Container pod Pod is the set of one or more containers, it is however generally that including a root container, and operation operation into
Each container of journey.In an embodiment of the present invention, a Pod can correspond to one or more examples of operation, but it is general and
Speech will not realize multiple examples using Pod.
Pod data can be stored on etcd, and etcd is a key assignments storage repository, for configuring shared and service hair
It is existing.The Pod data of new job and the Pod data of killed Pod can be stored on etcd, when dispatching corresponding operation
It is obtained.Specifically, the job request that can be submitted according to user generates Pod data.
Step S120, according to the node scheduling condition and the node state of each node in cluster in Pod data, from cluster
Select one or more destination nodes.
Here, node scheduling condition can have very much, and specific description form can be label (Label), equally, node
State can also be described in the form of Label.Lablel is the key-value pair of a key-value, and wherein key and value are by user
Oneself is specified.It can be attached on various resource objects, a resource object can define any number of Label, Ke Yitong
Cross LabelSelector (label selector) inquiry and screening resource object.
Step S130 disposes Pod according to Pod data on each destination node respectively, and operation is run in the Pod of deployment
Operation process.
As it can be seen that method shown in FIG. 1, after getting container pod Pod data corresponding with operation, according to node therein
The node state of each node in schedulable condition and cluster, selects several destination nodes from cluster, and further in each target
Pod is disposed respectively according to Pod data on node, and the operation process of operation is run in the Pod of deployment.The technical solution is using appearance
The mode of device disposes operation process, different work is deployed in each independent container, so that on cluster
The operation respectively applied can be independent of each other, and the scene of interaction can needed to be communicated, and realize the effective of cluster resource
It utilizes, deep learning etc. is needed mostly to can be realized stable job scheduling using the project of pipelineization cooperation.
In one embodiment of the invention, in the above method, node scheduling condition includes rigid schedulable condition and/or soft
Property schedulable condition;According to the node state of each node in the node scheduling attribute and cluster in Pod data, one is selected from cluster
If the node state that a or multiple destination nodes include: a node meets rigid schedulable condition, belong to destination node;With/
Or, being scheduled scoring according to the node state of each node and soft schedulable condition, one or more is selected according to appraisal result
Destination node.
As can be seen that rigid schedulable condition is more strict requirements.For example, in one embodiment of the present of invention
In, in the above method, rigid schedulable condition includes the hardware information of node and/or the area information of node.
In one example, user wishes that certain operation is deployed only on the node of CPU model Intel, it is clear that this is one
The hardware information of a node.In another example, user wishes that certain operation is deployed on the node of region A, then this is one
The area information of node.It summarizes ground to say, when set every in the rigid schedulable condition of an operation is the node of a node
In state when the subset of every set, then this node is exactly the destination node selected.It specifically can also be with tag-shaped
Formula marks rigid schedulable condition and node state, then can be realized by the way of NodeSelector (Node Selector).
Rigid schedulable condition can help quickly to filter out available node, however also be easy to cause no node available in this way
The case where.Therefore the operation less harsh for those demands, can also be arranged soft schedulable condition, grade is according to each node
Node state and soft schedulable condition are scheduled scoring, select one or more destination nodes according to appraisal result.In this way, by
In being no longer strictly to match, then can preferentially select with operation affinity highest rather than perfect affine node.
In one embodiment of the invention, in the above method, according in the node scheduling condition and cluster in Pod data
The node state of each node includes: to select on the same available area from one or more destination nodes are selected in cluster
Multiple nodes are as destination node;Disposing Pod respectively according to Pod data on each destination node includes: on each destination node
A Pod is disposed respectively according to Pod data, to form multiple examples of operation.
A kind of job scheduling thinking is provided in the present embodiment, i.e., in AZ (Availability Zone, available area)
Multiple examples (i.e. Pod) of deployment operation (operation here mean to apply or service, such as log collector), i.e., singly answer
It is affine used in each example of AZ rank.Available area refers in the mutually isolated one or more data of the infrastructure such as electric power, network
The heart.One region includes that one or more available areas will not influence making for other available areas after an available area breaks down
With.
And be then to dispose an example respectively on each destination node in AZ, also mean that a node will not be disposed
Two examples also only will affect an example in this way when a node failure.A kind of thinking is done in machine frame (cabinet) rank
It is anti-affine, as it is possible that a machine frame all can failure.But substantially they are all a labels on node, and when scheduling only needs
Dynamic Packet is carried out by this special label, handle affine and anti-affine relationship.
In one embodiment of the invention, in the above method, node scheduling condition further includes example numerical lower limits and reality
Example the upper limit of the number;The multiple nodes being located on the same available area are selected as destination node further include: when the number of nodes selected
When amount is greater than or equal to example numerical lower limits, determined according to the number of nodes and example the upper limit of the number selected one smaller in the two
Quantity of the value as destination node;When the number of nodes selected is less than example numerical lower limits, this job scheduling is terminated.
This is the insurmountable problem of many Scheduling Frameworks such as Slurm.It simply introduces herein: Slurm task tune
(predecessor is extremely letter Linux resource management tool, English: Simple Linux Utility for Resource to degree tool
Management takes initial, is abbreviated as SLURM) or Slurm, be one for Linux and Unix core system it is free,
The task schedule tool of open source, is widely used by worldwide supercomputer and computer cluster.It provides three passes
Key function.First, specially enjoying or the non-resource (computer node) specially enjoyed for certain time is distributed for user, so that user executes work
Make.Second, it provides a frame, (usually parallel for starting, executing, monitoring the running task on node
Task, such as MPI), third reasonably distributes resource for task queue.It is all run on about 60% the last 500 supercomputer
Slurm, including the computer Milky Way -2 most fast on former world in 2016.Slurm is used based on the scheduling of Hilbert curve or fertilizer
The most suitable algorithm of fat network topology structure, to optimize the distribution of the task in parallel computer.
MPI is the communications protocol across language, for writing parallel computer.Support point-to-point and broadcast.MPI is one
A information transmits application programming interfaces, including agreement and and semantic description, they indicate how it plays it in various implementations
Characteristic.The target of MPI is high-performance, extensive property, and portable.MPI is in the main models that today is still high-performance calculation.
Main MPI-1 model does not include shared drive concept, and MPI-2 only has limited Distributed shared-memory concept.But MPI program
Often run on the machine of shared drive.In MPI model periphery design program well because of MPI than being designed under NUMA architecture
Encourage memory localization.Although MPI belongs to the layer 5 or higher of OSI Reference Model, his realization may pass through transport layer
Sockets and Transmission Control Protocol (TCP) cover most layer.Most MPI realize by
Some specified convention collection (API) compositions, can be by C, C++, Fortran, or has the language such as C#, Java or of this class libraries
Python is called directly.MPI is portability and speed because of him better than legacy information transmitting library.
However, having 99 enabled nodes when using Slrum or MPI frame and needing the example of 100 submission operations
In, operation has to wait for without the use of any enabled node.Or if occurring mistake in cluster, entire task can be labeled
At failure, to waste a large amount of cluster resources.
It is then not in such problems apparently according to the present embodiment, due to being provided with example numerical lower limits and example quantity
The upper limit (is referred to as number of copies, because being realized according to the same Pod data), if that the pair of operation is submitted in setting
This number is 80~100, then example numerical lower limits are 80, then 99 enabled nodes are clearly meet demand.So operation can
To be scheduled on this 99 enabled nodes.
In one embodiment of the invention, in the above method, soft schedulable condition includes the affine condition of operation, node shape
State includes the corresponding operation of each Pod in this node top administration.
This gives a kind of example of soft schedulable condition, the i.e. affine condition of operation, it is referred to as using parent
And condition.For example, business operation affinity corresponding with monitoring log processing and local data is higher, if where them
Pod distance farther out, then the problem of network overhead accessed will result in inefficiency.Therefore it is actually in the present embodiment
The nearest deployment of affine application is provided, such as is deployed in same node, just reduces network overhead naturally.
Affine is a mutual behavior, therefore, is each required with the possible affine operation of other operations with operation parent
Oneself which affine/anti-affine operation is marked with condition (being also possible to label), inspection is just will do it in Pod deployment, realizes
Symmetry.
Another is common issue is that there is a situation where migrate by affine application.Need to illustrate has two o'clock, first is that
Algorithm design on done symmetry the considerations of, either first deployment or deployment, even if this application hang, it is rebuilding
And when scheduled, which affine Pod or affine by which Pod in current system can be still checked, preferentially with their portions to one
It rises and goes.In addition, at present RC/RS (copy set is statelessly applied) only have node hang when just can there is a situation where rebuild Pod,
Node is not hung, and is exited and oneself is restarted in situ extremely.This can guarantee that affine application section exists from two levels
Together, anti-affine application separates this demand.
In one embodiment of the invention, in the above method, operation correspond to following one or more applications and/or
Service: deep learning system, Web service, log collector, Distributed Queuing Service, log connector.
Such deep learning project can be used for the research of artificial intelligence (AI), meet industrial requirement.Industrial user inclines
To in using deep learning operation as the subset stage of partial data pipeline, including Web server and log collector.It is this logical
Flexible scheduling priority-based is needed with cluster.This to run more processes in Web server operation,
And deep learning is then less within the network flow higher period, and depth is then preferentially carried out when network flow is lower
It practises.SLURM or MPI is unable to satisfy the demand of flexible dispatching in this regard.
Deep learning training frame itself needs to be designed to support distributed training.There are three angles in deep learning cluster
Color: parameter server (Parameter Server) and training aids (Trainer) and metamessage management node (Master).Often
A parameter server process all safeguards the fragment (shard) of world model (global model).Each training aids has it
Ground model (local model) copy, and use its local data more new model.In training process, training aids by model more
It newly is sent to parameter server, parameter server is responsible for summarizing these updates, so that training aids can be by its local replica and complete
Office's mold sync.
Cluster training includes following module: single metamessage management node is responsible for distributed tasks (task), by data
Collection (dataset) is divided into task and is distributed in each training aids, keeps trained by using task queue (task queue)
The trackability of task;Multiple training aids are responsible for receiving appointing for master by sgd (stochastic gradient descent) training pattern
Business, processing task calculate and upload gradient (gradient) to parameter server, while downloading newest gradient (to claim
For parameter, model) arrive oneself local model;Multiple parameters server is responsible for storage and updates training pattern, specifically, from
Gradient is obtained at training aids, undated parameter returns to the newest parameter of training aids;Periodically parameter is stored to distributed field system
System or etcd, cover original parameter.Specific training framework may refer to Fig. 3.Deep learning model is divided into two in Fig. 3
Fragment, respectively by two parameter server management.
When master starts, a master lock is taken, checks whether the task queue to be created has existed, if
Through existing, just restore the task queue, if it does not exist, then creation.Monitoring/trainer/ catalogue looks for existing
Trainer, distribution task give existing trainer, and updates task queue at the same time.When master fault recovery,
It is restarted automatically master, and restores corresponding data from etcd.
When trainer starts, the relevant catalogue/ps/ of parameter server is monitored, parameter server is waited
Reach specified quantity.A unique id is generated, is write under etcd ,/trainer/, due to having lease, master can
With know trainer on line or it is online under, wait task to be assigned.When trainer fault recovery, it can be restarted automatically
Trainer, and task is pulled from todo queue (queue to be processed), and continue to train.
Parameter server start when, read parameter server target sum, search etcd key be/
Ps/'s, index (index) is less than target sum, looks at the key which has be not present, if so, entering with regard to supplement.
Parameter server can read the data being stored under the path, and store into memory, then start externally to provide
Service.
In one embodiment of the invention, in the above method, operation is deep learning training operation, in the Pod of deployment
The operation process of middle operation operation includes: that a parameter server process and a training aids process are run in the Pod of deployment,
Deep learning task is obtained from the metamessage management node of deep learning by training aids process, according to local deep learning training
Model training is sent to parameter server process after obtaining gradient, and obtains updated parameter from parameter server process;
Parameter server process saves training snapshot in distributed storage at predetermined intervals, is opened again with the process in Pod or Pod
It is dynamic to be resumed training according to training snapshot;Parameter server process and/or training aids process by deep learning training pattern store to
Distributed storage.
The realization of model data checkpoint can effectively avoid the single-point or multiple spot simultaneous faults of parameter server.Mould
Shape parameter checkpoint passes through the complete mirror that the model data that portion is stored in parameter server memory is periodically saved on disk
Picture, to guarantee that training process can be restarted from intermediate state.In the training mission that one can not interrupt and lack backup,
Appearance can be reached to distributed storage service by the data snapshot (snapshot) of the interim each parameter server of preservation
The purpose of calamity, such as every 10 minutes newest snapshots, and delete snapshot earlier.When there is Single Point of Faliure, it is only necessary to extensive
This multiple node, or this node is moved to another node and started can resume training task.
For example, being realized using lock (lock) mechanism, every 10 minutes, parameter server can go application read lock (read lock),
Save checkpoint.And at the same time, which can stop write operation, and checkpoint to be saved is waited to complete.Parameter service later
Recent snapshot is written in device in distributed storage, and deletes other original snapshots, after the completion of operation, discharges read lock, writes
Operation can continue.
When snapshot is read, the reading check point file mark uuid from etcd, the load check point snapshot document from disk,
And load wherein parameter.If load is unsuccessful, initial data initiation parameter is used.
Specific implementation can realize the shared storage of data using public cloud bosfs file system, can also use publicly-owned
The newest nfs storage system of cloud.Data can uniformly be converted into RecordIO interface, provide standardization translation interface.
There are two types of selections, i.e. parameter server process and/or training aids process, and deep learning is trained mould for the storage of model
Type is stored to distributed storage.Since the data in current parameter server are fragments, and model is dense in training aids
Update, so it possesses entire model, so in terms of ease for use can preferably training aids carry out storage model.Specifically, it instructs
Practice device to elect by etcd, be exported to select one of node as the storage of model.
In one embodiment of the invention, in the above method, node scheduling condition includes right respectively with each computing resource
The resource request lower limit and the resource request upper limit answered;According to the node scheduling condition and the section of each node in cluster in Pod data
Dotted state includes: to calculate each calculating money according to the Pod disposed in each node from one or more destination nodes are selected in cluster
The schedulable resource upper limit in source and schedulable resource lower limit;When the resource request lower limit of each computing resource in node scheduling condition
Respectively less than or equal to corresponding computing resource schedulable resource lower limit when, one or more destination nodes are selected from cluster.
A kind of resource-based scheduling mode is present embodiments provided, i.e. Pod can request item for CPU, memory setting
Part specifically includes a resource request upper limit and a resource request lower limit.For each resource, 0≤resource request lower limit
≤ resource request the upper limit≤infinite.If container can be guaranteed by successful dispatch to node, the resource request of container.
So, entire cluster can be safeguarded and be calculated under the schedulable resource upper limit and schedulable resource of each computing resource
Limit, if the resource request lower limit of each computing resource in node scheduling condition is respectively less than or adjustable equal to corresponding computing resource
When spending resource lower limit, it is clear that resource is enough.For example, Pod requires the memory lower limit of 1024MB, that is, if it is not provided
The memory of 1024MB, then operation can not carry out, and if node can provide the memory of 2048MB as schedulable resource at this time
Lower limit, it is clear that the operation can be scheduled on the node.
In one embodiment of the invention, in the above method, node scheduling condition further includes job priority;According to
The node state of each node, selects one or more target sections from cluster in node scheduling condition and cluster in Pod data
Point further include: when the resource request lower limit in node scheduling condition is greater than schedulable resource lower limit, killed according to job priority
Fall or block the Pod disposed, alternatively, terminating this job scheduling.
It can be divided into and be divided into three priority: Best-effort (optimal adaptation rank) by Service Quality Management (Qos),
Resource request lower limit and the resource request upper limit are not write in node scheduling condition exactly, when such resource abundance can use most
Resource (such as deep learning training operation can be set to this priority, occupy in the case where night service flow is small
Resource as much as possible is trained), but can be also killed at first when resource anxiety (such as when daytime, service traffics were larger
Need preferentially to guarantee the stabilization of business);Burstable (unstable rank), as long as there is the resource of a container in corresponding Pod
Request lower limit, the resource request upper limit that other containers are set inconsistent, then the QoS of this POD is exactly Burstable grades
Not.Guaranteed (assurance level), all containers must all be unified to be provided with resource request lower limit, the resource request upper limit, and
And setting parameter is all consistent.
In one embodiment of the invention, in the above method, killing or blocking Pod includes: when node scheduling condition pair
When what is answered is compressible resource, block Pod;When it is incompressible resource that node scheduling condition is corresponding, Pod is killed.Here
According to the property of computing resource, since the occupied needs of memory are released, it is consequently belonging to incompressible resource, and CPU then can be with
Dynamic adjustment service condition, belongs to compressible resource.Therefore corresponding processing is also different.
In one embodiment of the invention, the above method further include: by node scheduling condition be each node on disposed
Pod distribute computing resource;Wherein, if the resource request upper limit of the compressible resource of each Pod disposed in a node it
The upper limit with the compressible resource of the node is less than, then be proportionately distributed on the node portion for unassigned compressible resource
Each Pod of administration.
The smallest cpu resource is limited to 10M.This is linux kernel limit decision.Container can obtain the CPU of requirement
Can amount obtain task of the additional CPU time depending on other operations.In addition to the CPU quantity of request, additional cpu resource
It is shared.Such as, it is assumed that specified 60%, the B container for needing CPU of A container requests the 30% of CPU, it is assumed that two containers are all
Cpu resource is used as much as possible, that 10% additional cpu resource will be respectively allocated to container A according to the ratio of 2:1 and hold
Device B.Container resource uses more than resource constraint and will be blocked, if resource constraint is not specified, when there is cpu resource that can make
Used time, container can use additional CPU.
Container can obtain the memory source amount of request, if requested beyond memory source, container can be killed (when other
When container needs memory), but if the resource of container consumption is less than the stock number of resource request lower limit, will not be killed
(unless system task or finger daemon need more memories).It is more than that memory source requests the upper limit in container memory usage amount
When, container can be killed.
In one embodiment of the invention, the above method further include: the EMS memory occupation point for calculating each operation process is being counted
Obtained EMS memory occupation point kills the operation process when reaching preset value corresponding with the operation process.
EMS memory occupation point also referred to as OOM (out of memory) score in the present embodiment.The OOM score of process be into
10 times of the percentage of the memory of journey consumption, are adjusted by OOM_SCORE_ADJ (preset value), and killing has higher OOM score
Process.Basic OOM score is between 0 and 1000, and the final OOM score of process is also between 0 and 1000.It is shown below three
Example is arranged in the OOM_SCORE_ADJ of priority:
The OOM_SCORE that the process in OOM_SCORE_ADJ:1000- container is arranged in Best-effort- will be 1000;
The OOM_SCORE that the process in OOM_SCORE_ADJ:-998- container is arranged in Guaranteed- is 0 or 1;
Burstable- by OOM_SCORE_ADJ be set as 1000-10* (memory of resource request lower limit it is shared entire
The percentage of node memory), this ensures OOM_SCORE > 1 of Burstable copy.If memory request is 0, OOM_SCORE_
ADJ is arranged to 999.
In one embodiment of the invention, the above method further include: obtain each Pod disposed based on same Pod data
Cpu busy percentage, Pod adjusted is calculated according to the node scheduling condition in the arithmetic mean of instantaneous value of cpu busy percentage and Pod data
Quantity.
This is also referred to as the horizontal automatic telescopic of example or copy (practical is also Pod).Automatically adjust device
(Autoscaler) it is implemented as control loop, periodically passes through the cpu busy percentage of the copy of query node state collection Pod.
Then, the arithmetic mean of instantaneous value of the cpu busy percentage of copy is compared by it with target defined in node scheduling condition, and according to
Need to adjust the copy amount of scale to match target.Reserve: MinReplicas (example numerical lower limits)≤
Replicas (instance number)≤MaxReplicas (example the upper limit of the number).
Automatically adjust device period by controller management device-horizontal-pod-autoscaler-sync-
Period mark control.Default value is 30 seconds.Cpu busy percentage is nearest CPU usage (last 1 minute flat an of copy
Mean value) divided by the CPU of pod request.
The pod of destination number is calculated by following formula: TargetNumOfPods=ceil (sum
(CurrentPodsCPUUtilization)/Target), ceil () is floor operation, indicates to take more than or equal to certain number
A nearest integer;Sum is arithmetic sum operation;CurrentPodsCPUUtilization is some Pod in nearest one minute
CPU usage/amount average value;Target is the resource request upper limit of CPU.
Noise (for example, starting may temporarily increase CPU) may be brought to measurement by starting and stopping in window phase.Cause
This, after each movement, automatic adjustment device should wait for a period of time to obtain reliable data.Only at the past 3 minutes
When there is no re-scaling inside, can just it amplify.Reducing will be from last time re-scaling to waiting 5 minutes.In addition, only working as copy
The arithmetic mean of instantaneous value of cpu busy percentage and the ratio of resource request lower limit drop below 0.9 or be increased above 1.1 (10%
Tolerance) when, just carry out any scaling.
There are two benefits for this method: on the one hand, automatic adjustment device is worked in a manner of conservative.It is negative if there is new user
It carries, it is important that quickling increase the quantity of pod for us, so as not to refuse user's request, and reduces the quantity of pod not
It is so urgent.On the other hand, automatic adjustment device needs avoid shaking: if load is unstable, quick execution conflict being prevented to determine
Plan.
In one embodiment of the invention, the above method further include: whether monitor has in cluster not by successful dispatch
Pod is the node for further determining whether extendible capacity;It is the node for starting at least partly extendible capacity, it will be not by success
The Pod of scheduling is dispatched on the node newly started.
The embodiment passes through the expansion to cluster interior joint not provide a kind of resolving ideas by the Pod of successful dispatch
Hold and carry out meet demand, because the node in cluster might not be all in starting state.For example, realized using dilatation component,
Dilatation component creates a monitoring to all pod, it can check whether there is the pod that can not be scheduled every 10 seconds, and pod is general
The state that can not be scheduled can be fallen into due to not having the node that can be dispatched.The pod that can not be scheduled can be monitored them
PodCondition (state) is unscheduled (not scheduled).If there is this situation occurs, dilatation component can be it
A new node is looked for dispatch.It may also insure that pod all in the copy set where the pod is in the same node group,
Newly-built machine type can be consistent with the other machines in the node group in this way.
It accounts for from another point of view, in one embodiment of the invention, the above method further include: according to each node
Node state judges whether node meets capacity reducing condition, is, closes corresponding node, has deployed Pod on corresponding node
When the Pod disposed is dispatched to other nodes in cluster.Namely avoid the waste of resource.
For example, being realized using capacity reducing component, every 10 seconds capacity reducing components can check whether that suitable node can be deleted.
In one embodiment of the invention, in the above method, capacity reducing condition includes following one or more: the computing resource of node
Utilization rate is less than preset value;The Pod disposed in node can be scheduled to other nodes in cluster;The Pod disposed in node
It is confirmed as to be drifted about according to PodDisruptionBudget controller;Node is not locally stored.
In one embodiment of the invention, in the above method, cluster is carried out according to following one or more strategies
Dilatation and/or capacity reducing: random selection node;Node is selected according to the Pod quantity disposed;It is selected according to computing resource utilization rate
Node;Node is selected according to the use price of physical machine;There is the node of preset quantity and/or preset ratio to occur in the cluster different
Chang Shi suspends dilatation and/or capacity reducing.Such as it is unavailable in order to prevent extensive node caused by network or other problems
Cause pod that can not dispose, to create the avalanche conditions of more unavailable nodes, certain rule can be formulated.Such as it is arranged
In 30% node, or when maximum 3 nodes are abnormal, suspend scalable appearance function, until clustered node integrally restores.
Fig. 2 shows a kind of structural representations of the device of schedule job in the cluster according to an embodiment of the invention
Figure.As shown in Fig. 2, the device 200 of schedule job in the cluster includes:
Pod data capture unit 210, for obtaining container pod Pod data corresponding with operation.
Scheduling unit 220, for the node state according to the node scheduling condition in Pod data and each node in cluster,
One or more destination nodes are selected from cluster.
Pod deployment unit 230, for disposing Pod respectively according to Pod data on each destination node, in the Pod of deployment
Run the operation process of operation.
As it can be seen that device shown in Fig. 2, after getting container pod Pod data corresponding with operation, according to node therein
The node state of each node in schedulable condition and cluster, selects several destination nodes from cluster, and further in each target
Pod is disposed respectively according to Pod data on node, and the operation process of operation is run in the Pod of deployment.The technical solution is using appearance
The mode of device disposes operation process, different work is deployed in each independent container, so that on cluster
The operation respectively applied can be independent of each other, and the scene of interaction can needed to be communicated, and realize the effective of cluster resource
It utilizes, deep learning etc. is needed mostly to can be realized stable job scheduling using the project of pipelineization cooperation.
In one embodiment of the invention, in above-mentioned apparatus, node scheduling condition includes rigid schedulable condition and/or soft
Property schedulable condition;Scheduling unit 220 belongs to target section if the node state for a node meets rigid schedulable condition
Point;And/or scoring is scheduled according to the node state of each node and soft schedulable condition, one is selected according to appraisal result
Or multiple destination nodes.
In one embodiment of the invention, in above-mentioned apparatus, rigid schedulable condition include node hardware information and/or
The area information of node.
In one embodiment of the invention, in above-mentioned apparatus, scheduling unit 220 is located at same can be used for selecting
Multiple nodes in area are as destination node;Pod deployment unit 230 is used on each destination node according to Pod data difference portion
A Pod is affixed one's name to, to form multiple examples of operation.
In one embodiment of the invention, in above-mentioned apparatus, node scheduling condition further includes example numerical lower limits and reality
Example the upper limit of the number;Scheduling unit 220, for when the number of nodes selected is greater than or equal to example numerical lower limits, according to selecting
Number of nodes and example the upper limit of the number determine quantity of the lesser value as destination node in the two;When the node selected
When quantity is less than example numerical lower limits, this job scheduling is terminated.
In one embodiment of the invention, in above-mentioned apparatus, soft schedulable condition includes the affine condition of operation, node shape
State includes the corresponding operation of each Pod in this node top administration.
In one embodiment of the invention, in above-mentioned apparatus, operation correspond to following one or more applications and/or
Service: deep learning system, Web service, log collector, Distributed Queuing Service, log connector.
In one embodiment of the invention, in above-mentioned apparatus, operation is deep learning training operation;Deployment unit is used
A parameter server process and a training aids process are run in the Pod in deployment, by training aids process from deep learning
Metamessage management node obtain deep learning task, sent after obtaining gradient according to local deep learning training pattern training
Parameter server process is given, and obtains updated parameter from parameter server process;Between parameter server process is by making a reservation for
It is interposed between in distributed storage and saves training snapshot, instruction is restored according to training snapshot with the process restarting in Pod or Pod
Practice;Parameter server process and/or training aids process store deep learning training pattern to distributed storage.
In one embodiment of the invention, in above-mentioned apparatus, node scheduling condition includes right respectively with each computing resource
The resource request lower limit and the resource request upper limit answered;Scheduling unit 220, it is each for being calculated according to the Pod disposed in each node
The schedulable resource upper limit of computing resource and schedulable resource lower limit;When the resource of each computing resource in node scheduling condition is asked
Ask lower limit be respectively less than or equal to corresponding computing resource schedulable resource lower limit when, one or more target sections are selected from cluster
Point.
In one embodiment of the invention, in above-mentioned apparatus, node scheduling condition further includes job priority;Scheduling is single
Member 220, for being killed according to job priority when the resource request lower limit in node scheduling condition is greater than schedulable resource lower limit
Fall or block the Pod disposed, alternatively, terminating this job scheduling.
In one embodiment of the invention, in above-mentioned apparatus, scheduling unit 220, for being corresponded to when node scheduling condition
Be compressible resource when, block Pod;When it is incompressible resource that node scheduling condition is corresponding, Pod is killed.
In one embodiment of the invention, in above-mentioned apparatus, scheduling unit 220, be also used to be by node scheduling condition
The Pod distribution computing resource disposed on each node;Wherein, if the compressible resource of each Pod disposed in a node
The sum of resource request upper limit is less than the upper limit of the compressible resource of the node, then is divided in portion unassigned compressible resource
To each Pod disposed on the node.
In one embodiment of the invention, in above-mentioned apparatus, scheduling unit 220 is also used to calculate each operation process
EMS memory occupation point, killed when the EMS memory occupation being calculated point reaches preset value corresponding with the operation process operation into
Journey.
In one embodiment of the invention, in above-mentioned apparatus, scheduling unit 220 is based on same Pod data for obtaining
The cpu busy percentage of each Pod of deployment is calculated according to the node scheduling condition in the arithmetic mean of instantaneous value of cpu busy percentage and Pod data
Pod quantity adjusted.
In one embodiment of the invention, in above-mentioned apparatus, whether scheduling unit 220, being also used to monitor in cluster has
It not by the Pod of successful dispatch, is to further determine whether the node of extendible capacity;It is the section for starting at least partly extendible capacity
Point will be dispatched on the node newly started by the Pod of successful dispatch.
In one embodiment of the invention, in above-mentioned apparatus, scheduling unit 220, for the node shape according to each node
State judges whether node meets capacity reducing condition, is, closes corresponding node, will when having deployed Pod on corresponding node
The Pod of deployment is dispatched to other nodes in cluster.
In one embodiment of the invention, in above-mentioned apparatus, capacity reducing condition includes following one or more: node
Computing resource utilization rate is less than preset value;The Pod disposed in node can be scheduled to other nodes in cluster;In the middle part of node
The Pod of administration is confirmed as to be drifted about according to PodDisruptionBudget controller;Node is not locally stored.
In one embodiment of the invention, in above-mentioned apparatus, scheduling unit 220, for according to following one kind or more
Kind strategy carries out dilatation and/or the capacity reducing of cluster: random selection node;Node is selected according to the Pod quantity disposed;According to meter
It calculates resource utilization and selects node;Node is selected according to the use price of physical machine;There is preset quantity in the cluster and/or presets
When the node of ratio is abnormal, suspend dilatation and/or capacity reducing.
The specific embodiment that the specific embodiment of above-mentioned apparatus embodiment is referred to preceding method embodiment is come real
It is existing, it is not repeating herein.
It should be noted that for the various method embodiments described above, for simple description, therefore, it is stated as a series of
Combination of actions, but those skilled in the art should understand that, the present invention is not limited by the sequence of acts described because
According to the present invention, certain steps can use other sequences or carry out simultaneously.Secondly, those skilled in the art should also know
It knows, the embodiments described in the specification are all preferred embodiments, and related actions and modules is not necessarily of the invention
It is necessary.
In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, there is no the portion being described in detail in some embodiment
Point, it may refer to the associated description of other embodiments.
Fig. 4 shows the block diagram for being suitable for the exemplary computer system/server 12 for being used to realize embodiment of the present invention.
The computer system/server 12 that Fig. 4 is shown is only an example, should not function and use scope to the embodiment of the present invention
Bring any restrictions.
As shown in figure 4, computer system/server 12 is showed in the form of universal computing device.Computer system/service
The component of device 12 can include but is not limited to: one or more processor (processing unit) 16, memory 28, connect not homology
The bus 18 of system component (including memory 28 and processor 16).
Bus 18 indicates one of a few class bus structures or a variety of, including memory bus or Memory Controller,
Peripheral bus, graphics acceleration port, processor or the local bus using any bus structures in a variety of bus structures.It lifts
For example, these architectures include but is not limited to industry standard architecture (ISA) bus, microchannel architecture (MAC)
Bus, enhanced isa bus, Video Electronics Standards Association (VESA) local bus and peripheral component interconnection (PCI) bus.
Computer system/server 12 typically comprises a variety of computer system readable media.These media, which can be, appoints
What usable medium that can be accessed by computer system/server 12, including volatile and non-volatile media, it is moveable and
Immovable medium.
Memory 28 may include the computer system readable media of form of volatile memory, such as random access memory
Device (RAM) 30 and/or cache memory 32.Computer system/server 12 may further include it is other it is removable/no
Movably, volatile/non-volatile computer system storage medium.Only as an example, storage system 34 can be used for reading and writing
Immovable, non-volatile magnetic media (Fig. 4 do not show, commonly referred to as " hard disk drive ").Although not shown in fig 4, may be used
To provide the disc driver for reading and writing to removable non-volatile magnetic disk (such as " floppy disk "), and it is non-volatile to moving
Property CD (such as CD-ROM, DVD-ROM or other optical mediums) read and write CD drive.In these cases, each drive
Dynamic device can be connected by one or more data media interfaces with bus 18.Memory 28 may include at least one program
Product, the program product have one group of (for example, at least one) program module, these program modules are configured to perform the present invention
The function of each embodiment.
Program/utility 40 with one group of (at least one) program module 42 can store in such as memory 28
In, such program module 42 includes --- but being not limited to --- operating system, one or more application program, other programs
It may include the realization of network environment in module and program data, each of these examples or certain combination.Program mould
Block 42 usually executes function and/or method in embodiment described in the invention.
Computer system/server 12 can also be (such as keyboard, sensing equipment, aobvious with one or more external equipments 14
Show device 24 etc.) communication, it is logical that the equipment interacted with the computer system/server 12 can be also enabled a user to one or more
Letter, and/or with the computer system/server 12 any is set with what one or more of the other calculating equipment was communicated
Standby (such as network interface card, modem etc.) communicates.This communication can be carried out by input/output (I/O) interface 22.And
And computer system/server 12 can also pass through network adapter 20 and one or more network (such as local area network
(LAN), wide area network (WAN) and/or public network, such as internet) communication.As shown in figure 4, network adapter 20 passes through bus
18 communicate with other modules of computer system/server 12.It should be understood that although not shown in the drawings, computer can be combined
Systems/servers 12 use other hardware and/or software module, including but not limited to: microcode, device driver, at redundancy
Manage unit, external disk drive array, RAID system, tape drive and data backup storage system etc..
The program that processor 16 is stored in memory 28 by operation, at various function application and data
Reason, such as realize the method in embodiment illustrated in fig. 1, i.e.,
The present invention discloses a kind of computer readable storage mediums, are stored thereon with computer program, the program quilt
Processor will realize the method in embodiment as shown in Figure 1 when executing.
It can be using any combination of one or more computer-readable media.Computer-readable medium can be calculating
Machine readable signal medium or computer readable storage medium.Computer readable storage medium for example can be --- but it is unlimited
In system, device or the device of --- electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor, or any above combination.It calculates
The more specific example (non exhaustive list) of machine readable storage medium storing program for executing includes: electrical connection with one or more conducting wires, just
Taking formula computer disk, hard disk, random access memory (RAM), read-only memory (ROM), erasable type may be programmed read-only storage
Device (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic memory device,
Or above-mentioned any appropriate combination.In this document, computer readable storage medium can be it is any include or storage journey
The tangible medium of sequence, the program can be commanded execution system, device or device use or in connection.
Computer-readable signal media may include in a base band or as carrier wave a part propagate data-signal,
Wherein carry computer-readable program code.The data-signal of this propagation can take various forms, including --- but
It is not limited to --- electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be
Any computer-readable medium other than computer readable storage medium, which can send, propagate or
Transmission is for by the use of instruction execution system, device or device or program in connection.
The program code for including on computer-readable medium can transmit with any suitable medium, including --- but it is unlimited
In --- wireless, electric wire, optical cable, RF etc. or above-mentioned any appropriate combination.
The computer for executing operation of the present invention can be write with one or more programming languages or combinations thereof
Program code, described program design language include object oriented program language-such as Java, Smalltalk, C++,
Further include conventional procedural programming language-such as " C " language or similar programming language.Program code can be with
It fully executes, partly execute on the user computer on the user computer, being executed as an independent software package, portion
Divide and partially executes or executed on a remote computer or server completely on the remote computer on the user computer.?
Be related in the situation of remote computer, remote computer can pass through the network of any kind --- including local area network (LAN) or
Wide area network (WAN)-be connected to subscriber computer, or, it may be connected to outer computer (such as mentioned using Internet service
It is connected for quotient by internet).
In several embodiments provided by the present invention, it should be understood that disclosed device and method etc. can pass through
Other modes are realized.For example, the apparatus embodiments described above are merely exemplary, for example, the division of the unit,
Only a kind of logical function partition, there may be another division manner in actual implementation.
The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit
The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple
In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme
's.
It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit
It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list
Member both can take the form of hardware realization, can also realize in the form of hardware adds SFU software functional unit.
The above-mentioned integrated unit being realized in the form of SFU software functional unit can store and computer-readable deposit at one
In storage media.Above-mentioned SFU software functional unit is stored in a storage medium, including some instructions are used so that a computer
It is each that equipment (can be personal computer, server or the network equipment etc.) or processor (processor) execute the present invention
The part steps of embodiment the method.And storage medium above-mentioned include: USB flash disk, mobile hard disk, read-only memory (ROM,
Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic or disk etc. it is various
It can store the medium of program code.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all in essence of the invention
Within mind and principle, any modification, equivalent substitution, improvement and etc. done be should be included within the scope of the present invention.
Claims (38)
1. a kind of method of schedule job in the cluster, which is characterized in that this method comprises:
Obtain container pod Pod data corresponding with operation;
According to the node scheduling condition and the node state of each node in the cluster in the Pod data, from the cluster
Select one or more destination nodes;
Pod is disposed respectively according to the Pod data on each destination node, and the operation of the operation is run in the Pod of deployment
Process.
2. the method according to claim 1, wherein the node scheduling condition include rigid schedulable condition and/
Or soft schedulable condition;
The node state of each node in the node scheduling attribute according in the Pod data and the cluster, from the collection
Selecting one or more destination nodes in group includes:
If the node state of a node meets the rigid schedulable condition, belong to destination node;
And/or
Be scheduled scoring according to the node state of each node and the soft schedulable condition, according to appraisal result select one or
Multiple destination nodes.
3. according to the method described in claim 2, it is characterized in that, the hardness schedulable condition includes the hardware information of node
And/or the area information of node.
4. according to the method described in claim 3, it is characterized in that, the node scheduling condition according in the Pod data
With the node state of node each in the cluster, one or more destination nodes are selected from the cluster includes:
The multiple nodes being located on the same available area are selected as destination node;
It is described on each destination node according to the Pod data dispose respectively Pod include: on each destination node according to
Pod data dispose a Pod respectively, to form multiple examples of the operation.
5. according to the method described in claim 4, it is characterized in that, the node scheduling condition further include example numerical lower limits and
Example the upper limit of the number;The multiple nodes selected on the same available area are as destination node further include:
When the number of nodes selected is greater than or equal to the example numerical lower limits, according to the number of nodes selected and described
Example the upper limit of the number determines quantity of the lesser value as destination node in the two;
When the number of nodes selected is less than the example numerical lower limits, this job scheduling is terminated.
6. according to the method described in claim 2, it is characterized in that, the soft schedulable condition includes the affine condition of operation, institute
Stating node state includes the corresponding operation of each Pod in this node top administration.
7. the method according to claim 1, wherein the operation corresponds to following one or more applications
And/or service:
Deep learning system, Web service, log collector, Distributed Queuing Service, log connector.
8. the method according to the description of claim 7 is characterized in that the operation be deep learning training operation, it is described in portion
The operation process that the operation is run in the Pod of administration includes:
A parameter server process and a training aids process are run in the Pod of deployment, by the training aids process from depth
The metamessage management node of degree study obtains deep learning task, obtains gradient according to local deep learning training pattern training
After be sent to the parameter server process, and obtain updated parameter from the parameter server process;
The parameter server process saves training snapshot in distributed storage at predetermined intervals, in Pod or Pod into
Journey restarting is resumed training according to the trained snapshot;
The parameter server process and/or the training aids process store deep learning training pattern to distributed storage.
9. the method according to claim 1, wherein the node scheduling condition includes distinguishing with each computing resource
Corresponding resource request lower limit and the resource request upper limit;
The node state of each node in the node scheduling condition according in the Pod data and the cluster, from the collection
Selecting one or more destination nodes in group includes:
According to the Pod disposed in each node calculate each computing resource the schedulable resource upper limit and schedulable resource lower limit;
When the resource request lower limit of each computing resource in the node scheduling condition is respectively less than or equal to corresponding computing resource
When schedulable resource lower limit, one or more destination nodes are selected from the cluster.
10. according to the method described in claim 9, it is characterized in that, the node scheduling condition further includes job priority;
The node state of each node in the node scheduling condition according in the Pod data and the cluster, from the collection
One or more destination nodes are selected in group further include:
When the resource request lower limit in the node scheduling condition is greater than the schedulable resource lower limit, according to job priority
It kills or blocks the Pod disposed, alternatively, terminating this job scheduling.
11. according to the method described in claim 10, it is characterized in that, it is described kill or block Pod include:
When it is compressible resource that the node scheduling condition is corresponding, block Pod;
When it is incompressible resource that the node scheduling condition is corresponding, Pod is killed.
12. according to the method described in claim 9, it is characterized in that, this method further include:
It is that the Pod disposed on each node distributes computing resource by node scheduling condition;Wherein, if having disposed in a node
The sum of resource request upper limit of compressible resource of each Pod is less than the upper limit of the compressible resource of the node, then will be unassigned
Compressible resource is proportionately distributed to each Pod disposed on the node.
13. according to the method for claim 12, which is characterized in that this method further include:
The EMS memory occupation point for calculating each operation process reaches corresponding pre- with the operation process in the EMS memory occupation being calculated point
If killing the operation process when value.
14. according to the method described in claim 9, it is characterized in that, this method further include:
Obtain the cpu busy percentage of each Pod disposed based on same Pod data, according to the arithmetic mean of instantaneous value of cpu busy percentage with it is described
Node scheduling condition in Pod data calculates Pod quantity adjusted.
15. the method according to claim 1, wherein this method further include:
Whether have not Pod by successful dispatch, be the node for further determining whether extendible capacity if monitoring in the cluster;
It is the node for starting at least partly extendible capacity, will be dispatched on the node newly started by the Pod of successful dispatch.
16. the method according to claim 1, wherein this method further include:
Judge whether node meets capacity reducing condition according to the node state of each node, is corresponding node to be closed, corresponding
Other nodes being dispatched to the Pod disposed when having deployed Pod on node in cluster.
17. according to the method for claim 16, which is characterized in that the capacity reducing condition includes following one or more:
The computing resource utilization rate of node is less than preset value;
The Pod disposed in node can be scheduled to other nodes in cluster;
The Pod disposed in node is confirmed as to be drifted about according to PodDisruptionBudget controller;
Node is not locally stored.
18. method described in any one of 5-17 according to claim 1, which is characterized in that according to following one or more plans
Slightly carry out dilatation and/or the capacity reducing of cluster:
Randomly choose node;
Node is selected according to the Pod quantity disposed;
Node is selected according to computing resource utilization rate;
Node is selected according to the use price of physical machine;
When thering is the node of preset quantity and/or preset ratio to be abnormal in the cluster, suspend dilatation and/or capacity reducing.
19. a kind of device of schedule job in the cluster, which is characterized in that the device includes:
Pod data capture unit, for obtaining container pod Pod data corresponding with operation;
Scheduling unit, for the node state according to the node scheduling condition in the Pod data and each node in the cluster,
One or more destination nodes are selected from the cluster;
Pod deployment unit is transported in the Pod of deployment for disposing Pod respectively according to the Pod data on each destination node
The operation process of the row operation.
20. device according to claim 19, which is characterized in that the node scheduling condition includes rigid schedulable condition
And/or soft schedulable condition;
The scheduling unit belongs to destination node if the node state for a node meets the rigid schedulable condition;
And/or scoring is scheduled according to the node state of each node and the soft schedulable condition, one is selected according to appraisal result
Or multiple destination nodes.
21. device according to claim 20, which is characterized in that the hardness schedulable condition includes the hardware information of node
And/or the area information of node.
22. device according to claim 21, which is characterized in that
The scheduling unit, for selecting the multiple nodes being located on the same available area as destination node;
The Pod deployment unit, for disposing a Pod respectively according to the Pod data on each destination node, to be formed
State multiple examples of operation.
23. device according to claim 22, which is characterized in that the node scheduling condition further includes example numerical lower limits
With example the upper limit of the number;
The scheduling unit, for when the number of nodes selected be greater than or equal to the example numerical lower limits when, according to the choosing
Number of nodes and the example the upper limit of the number out determines quantity of the lesser value as destination node in the two;When selecting
Number of nodes be less than the example numerical lower limits when, terminate this job scheduling.
24. device according to claim 20, which is characterized in that the soft schedulable condition includes the affine condition of operation,
The node state includes the corresponding operation of each Pod in this node top administration.
25. device according to claim 19, which is characterized in that the operation corresponds to following one or more applications
And/or service:
Deep learning system, Web service, log collector, Distributed Queuing Service, log connector.
26. device according to claim 25, which is characterized in that the operation is deep learning training operation;
The deployment unit, for running a parameter server process and a training aids process in the Pod of deployment, by institute
It states training aids process and obtains deep learning task from the metamessage management node of deep learning, according to local deep learning training
Model training is sent to the parameter server process after obtaining gradient, and after parameter server process acquisition update
Parameter;The parameter server process saves training snapshot in distributed storage at predetermined intervals, in Pod or Pod
Process restarting resumed training according to the trained snapshot;The parameter server process and/or the training aids process
Deep learning training pattern is stored to distributed storage.
27. device according to claim 19, which is characterized in that the node scheduling condition includes and each computing resource is divided
Not corresponding resource request lower limit and the resource request upper limit;
The scheduling unit, for calculated according to the Pod disposed in each node each computing resource the schedulable resource upper limit and
Schedulable resource lower limit;When the resource request lower limit of each computing resource in the node scheduling condition is respectively less than or is equal to corresponding
When the schedulable resource lower limit of computing resource, one or more destination nodes are selected from the cluster.
28. device according to claim 27, which is characterized in that the node scheduling condition further includes job priority;
The scheduling unit, for being greater than the schedulable resource lower limit when the resource request lower limit in the node scheduling condition
When, it is killed according to job priority or blocks the Pod disposed, alternatively, terminating this job scheduling.
29. device according to claim 28, which is characterized in that
The scheduling unit, for blocking Pod when it is compressible resource that the node scheduling condition is corresponding;When the section
Point schedulable condition is corresponding when being incompressible resource, kills Pod.
30. device according to claim 27, which is characterized in that
The scheduling unit is also used to be that the Pod disposed on each node distributes computing resource by node scheduling condition;Wherein,
If the sum of resource request upper limit of compressible resource of each Pod disposed in a node is less than the compressible resource of the node
Unassigned compressible resource is then proportionately distributed to each Pod disposed on the node by the upper limit.
31. device according to claim 30, which is characterized in that
The scheduling unit is also used to calculate the EMS memory occupation point of each operation process, reaches in the EMS memory occupation being calculated point
The operation process is killed when preset value corresponding with the operation process.
32. device according to claim 27, which is characterized in that
The scheduling unit, for obtaining the cpu busy percentage of each Pod based on the deployment of same Pod data, according to cpu busy percentage
Arithmetic mean of instantaneous value and the Pod data in node scheduling condition calculate Pod quantity adjusted.
33. device according to claim 19, which is characterized in that
Whether the scheduling unit, being also used to monitor in the cluster has the not Pod by successful dispatch, be further judge be
The no node for having extendible capacity;It is the node for starting at least partly extendible capacity, new starting will be dispatched to by the Pod of successful dispatch
Node on.
34. device according to claim 19, which is characterized in that
The scheduling unit is to close phase for judging whether node meets capacity reducing condition according to the node state of each node
The node answered, other nodes being dispatched to the Pod disposed when having deployed Pod on corresponding node in cluster.
35. device according to claim 34, which is characterized in that the capacity reducing condition includes following one or more:
The computing resource utilization rate of node is less than preset value;The Pod disposed in node can be scheduled to other nodes in cluster;Section
The Pod disposed in point is confirmed as to be drifted about according to PodDisruptionBudget controller;Node is not deposited locally
Storage.
36. the device according to any one of claim 33-35, which is characterized in that
The scheduling unit, for carrying out dilatation and/or the capacity reducing of cluster: random selection according to following one or more strategies
Node;Node is selected according to the Pod quantity disposed;Node is selected according to computing resource utilization rate;According to the use of physical machine
Price selects node;When thering is the node of preset quantity and/or preset ratio to be abnormal in the cluster, suspend dilatation and/or contracting
Hold.
37. a kind of computer equipment, including memory, processor and it is stored on the memory and can be on the processor
The computer program of operation, which is characterized in that the processor is realized when executing described program as any in claim 1~18
Method described in.
38. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that described program is processed
The method as described in any one of claim 1~18 is realized when device executes.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810761530.6A CN109117265A (en) | 2018-07-12 | 2018-07-12 | The method, apparatus, equipment and storage medium of schedule job in the cluster |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810761530.6A CN109117265A (en) | 2018-07-12 | 2018-07-12 | The method, apparatus, equipment and storage medium of schedule job in the cluster |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109117265A true CN109117265A (en) | 2019-01-01 |
Family
ID=64862839
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810761530.6A Pending CN109117265A (en) | 2018-07-12 | 2018-07-12 | The method, apparatus, equipment and storage medium of schedule job in the cluster |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109117265A (en) |
Cited By (43)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109508238A (en) * | 2019-01-05 | 2019-03-22 | 咪付(广西)网络技术有限公司 | A kind of resource management system and method for deep learning |
CN109960585A (en) * | 2019-02-02 | 2019-07-02 | 浙江工业大学 | A kind of resource regulating method based on kubernetes |
CN110166429A (en) * | 2019-04-12 | 2019-08-23 | 深圳壹账通智能科技有限公司 | Data package processing method, device, computer readable storage medium and server |
CN110297670A (en) * | 2019-05-17 | 2019-10-01 | 北京瀚海星云科技有限公司 | A kind of method and system improving distributed task scheduling training effectiveness on container cloud |
CN110347490A (en) * | 2019-07-12 | 2019-10-18 | 北京天云融创软件技术有限公司 | A kind of resource scalable combined schedule method in kubernetes |
CN110413391A (en) * | 2019-07-24 | 2019-11-05 | 上海交通大学 | Deep learning task service method for ensuring quality and system based on container cluster |
CN110515730A (en) * | 2019-08-22 | 2019-11-29 | 北京宝兰德软件股份有限公司 | Resource secondary dispatching method and device based on kubernetes container arranging system |
CN110580194A (en) * | 2019-08-29 | 2019-12-17 | 上海仪电(集团)有限公司中央研究院 | Container scheduling method based on memory hot plug technology and management node scheduler |
CN110597626A (en) * | 2019-08-23 | 2019-12-20 | 第四范式(北京)技术有限公司 | Method, device and system for allocating resources and tasks in distributed system |
CN110753107A (en) * | 2019-10-21 | 2020-02-04 | 中国科学院空间应用工程与技术中心 | Resource scheduling system, method and storage medium under space-based cloud computing architecture |
CN110825520A (en) * | 2019-10-18 | 2020-02-21 | 山东省计算中心(国家超级计算济南中心) | Cluster top-speed elastic expansion method for realizing efficient resource utilization |
CN110968424A (en) * | 2019-09-12 | 2020-04-07 | 广东浪潮大数据研究有限公司 | Resource scheduling method, device and storage medium based on K8s |
CN111045821A (en) * | 2019-12-06 | 2020-04-21 | 北京浪潮数据技术有限公司 | Container scheduling method and device, container scheduler and readable storage medium |
CN111147565A (en) * | 2019-12-22 | 2020-05-12 | 北京浪潮数据技术有限公司 | Cluster node control method, device and equipment and readable storage medium |
CN111399855A (en) * | 2020-03-09 | 2020-07-10 | 山东汇贸电子口岸有限公司 | Automatic application instance publishing method based on container technology |
CN111435299A (en) * | 2019-01-14 | 2020-07-21 | 阿里巴巴集团控股有限公司 | Application processing method and device |
CN111625420A (en) * | 2020-05-21 | 2020-09-04 | 浪潮电子信息产业股份有限公司 | Distributed training task processing method, device, equipment and storage medium |
CN111666034A (en) * | 2019-03-05 | 2020-09-15 | 北京京东尚科信息技术有限公司 | Container cluster disk management method and device |
CN111880936A (en) * | 2020-07-31 | 2020-11-03 | 广州华多网络科技有限公司 | Resource scheduling method and device, container cluster, computer equipment and storage medium |
CN112130991A (en) * | 2020-08-28 | 2020-12-25 | 北京思特奇信息技术股份有限公司 | Application program control method and system based on machine learning |
CN112148438A (en) * | 2019-06-28 | 2020-12-29 | 杭州海康威视数字技术股份有限公司 | Abnormal task processing method, abnormal task scheduling method, abnormal task processing device, abnormal task scheduling device and computer storage medium |
CN112291288A (en) * | 2019-07-24 | 2021-01-29 | 北京金山云网络技术有限公司 | Container cluster expansion method and device, electronic equipment and readable storage medium |
CN112350837A (en) * | 2019-08-06 | 2021-02-09 | 南京南瑞继保电气有限公司 | Cloud platform-based power application cluster management method and device |
CN112363820A (en) * | 2020-12-01 | 2021-02-12 | 成都精灵云科技有限公司 | Uniform resource pooling container scheduling engine based on heterogeneous hardware and scheduling method thereof |
CN112416368A (en) * | 2020-11-25 | 2021-02-26 | 中国科学技术大学先进技术研究院 | Cache deployment and task scheduling method, terminal and computer readable storage medium |
CN112835915A (en) * | 2019-11-25 | 2021-05-25 | 中国移动通信集团辽宁有限公司 | MPP database system, data storage method and data query method |
WO2021103790A1 (en) * | 2019-11-26 | 2021-06-03 | 北京京东尚科信息技术有限公司 | Container scheduling method and apparatus, and non-volatile computer-readable storage medium |
CN113037800A (en) * | 2019-12-09 | 2021-06-25 | 华为技术有限公司 | Job scheduling method and job scheduling device |
CN113190344A (en) * | 2021-03-26 | 2021-07-30 | 中国科学院软件研究所 | Method and device for dynamic reconfiguration and deployment of neural network for software-defined satellite |
CN113296870A (en) * | 2020-04-07 | 2021-08-24 | 阿里巴巴集团控股有限公司 | Method and device for predicting Kubernetes cluster configuration |
CN113296868A (en) * | 2021-07-27 | 2021-08-24 | 杭州筋斗腾云科技有限公司 | Application platform and application management method |
CN113407305A (en) * | 2021-05-31 | 2021-09-17 | 北京达佳互联信息技术有限公司 | Task deployment method and device, electronic equipment and storage medium |
CN113452758A (en) * | 2021-06-04 | 2021-09-28 | 中国联合网络通信集团有限公司 | Service access method and device |
CN113760448A (en) * | 2021-04-30 | 2021-12-07 | 中科天玑数据科技股份有限公司 | Big data management platform based on kubernets |
CN113778610A (en) * | 2021-01-12 | 2021-12-10 | 北京沃东天骏信息技术有限公司 | Method and apparatus for determining resources |
CN113872997A (en) * | 2020-06-30 | 2021-12-31 | 华为技术有限公司 | Container group POD reconstruction method based on container cluster service and related equipment |
EP3955174A3 (en) * | 2021-03-10 | 2022-05-04 | Beijing Baidu Netcom Science And Technology Co. Ltd. | Method, apparatus and storage medium for training a deep learning framework |
CN114661247A (en) * | 2022-05-23 | 2022-06-24 | 武汉四通信息服务有限公司 | Automatic capacity expansion method and device, electronic equipment and storage medium |
CN114691050A (en) * | 2022-05-26 | 2022-07-01 | 深圳前海环融联易信息科技服务有限公司 | Cloud native storage method, device, equipment and medium based on kubernets |
CN116155750A (en) * | 2023-04-19 | 2023-05-23 | 之江实验室 | Deep learning job resource placement method, system, equipment and storage medium |
CN116339926A (en) * | 2023-05-22 | 2023-06-27 | 成都交控轨道科技有限公司 | Containerized deployment method of ATS software |
CN117632444A (en) * | 2024-01-26 | 2024-03-01 | 之江实验室 | NPU fault-tolerant scheduling system of computer cluster |
CN117806815A (en) * | 2023-11-27 | 2024-04-02 | 本原数据(北京)信息技术有限公司 | Data processing method, system, electronic device and storage medium |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102624919A (en) * | 2012-03-30 | 2012-08-01 | 电子科技大学 | Distributed service integrated system for service-oriented architecture and application method thereof |
US20150055499A1 (en) * | 2013-08-26 | 2015-02-26 | Vmware, Inc. | Networking stack of virtualization software configured to support latency sensitive virtual machines |
CN106027643A (en) * | 2016-05-18 | 2016-10-12 | 无锡华云数据技术服务有限公司 | Resource scheduling method based on Kubernetes container cluster management system |
CN106933650A (en) * | 2017-03-03 | 2017-07-07 | 北方工业大学 | load management method and system of cloud application system |
CN107105009A (en) * | 2017-03-22 | 2017-08-29 | 北京荣之联科技股份有限公司 | Job scheduling method and device based on Kubernetes system docking workflow engines |
CN107562545A (en) * | 2017-09-11 | 2018-01-09 | 南京奥之云信息技术有限公司 | A kind of container dispatching method based on Docker technologies |
CN108228354A (en) * | 2017-12-29 | 2018-06-29 | 杭州朗和科技有限公司 | Dispatching method, system, computer equipment and medium |
-
2018
- 2018-07-12 CN CN201810761530.6A patent/CN109117265A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102624919A (en) * | 2012-03-30 | 2012-08-01 | 电子科技大学 | Distributed service integrated system for service-oriented architecture and application method thereof |
US20150055499A1 (en) * | 2013-08-26 | 2015-02-26 | Vmware, Inc. | Networking stack of virtualization software configured to support latency sensitive virtual machines |
CN106027643A (en) * | 2016-05-18 | 2016-10-12 | 无锡华云数据技术服务有限公司 | Resource scheduling method based on Kubernetes container cluster management system |
CN106933650A (en) * | 2017-03-03 | 2017-07-07 | 北方工业大学 | load management method and system of cloud application system |
CN107105009A (en) * | 2017-03-22 | 2017-08-29 | 北京荣之联科技股份有限公司 | Job scheduling method and device based on Kubernetes system docking workflow engines |
CN107562545A (en) * | 2017-09-11 | 2018-01-09 | 南京奥之云信息技术有限公司 | A kind of container dispatching method based on Docker technologies |
CN108228354A (en) * | 2017-12-29 | 2018-06-29 | 杭州朗和科技有限公司 | Dispatching method, system, computer equipment and medium |
Non-Patent Citations (5)
Title |
---|
唐瑞: "基于Kubernetes的容器云平台资源调度策略研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
容器时代 平凡: "在Kubernetes集群中部署Heapster", 《KUBERNETES中文社区》 * |
张夏: "DockOne微信分享(一四九):Kubernetes调度详解", 《DOCKONE》 * |
张夏: "Kuberntes 服务质量保证(QoS)", 《KUBERNETES中文社区》 * |
运维个西瓜: "k8s技术预研7--深⼊掌握Kubernetes Pod", 《CSDN》 * |
Cited By (63)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109508238A (en) * | 2019-01-05 | 2019-03-22 | 咪付(广西)网络技术有限公司 | A kind of resource management system and method for deep learning |
CN111435299B (en) * | 2019-01-14 | 2023-06-20 | 阿里巴巴集团控股有限公司 | Application processing method and device |
CN111435299A (en) * | 2019-01-14 | 2020-07-21 | 阿里巴巴集团控股有限公司 | Application processing method and device |
CN109960585A (en) * | 2019-02-02 | 2019-07-02 | 浙江工业大学 | A kind of resource regulating method based on kubernetes |
CN111666034A (en) * | 2019-03-05 | 2020-09-15 | 北京京东尚科信息技术有限公司 | Container cluster disk management method and device |
CN110166429A (en) * | 2019-04-12 | 2019-08-23 | 深圳壹账通智能科技有限公司 | Data package processing method, device, computer readable storage medium and server |
CN110166429B (en) * | 2019-04-12 | 2022-03-22 | 深圳壹账通智能科技有限公司 | Data packet processing method and device, computer readable storage medium and server |
CN110297670A (en) * | 2019-05-17 | 2019-10-01 | 北京瀚海星云科技有限公司 | A kind of method and system improving distributed task scheduling training effectiveness on container cloud |
CN110297670B (en) * | 2019-05-17 | 2023-06-27 | 深圳致星科技有限公司 | Method and system for improving training efficiency of distributed tasks on container cloud |
CN112148438A (en) * | 2019-06-28 | 2020-12-29 | 杭州海康威视数字技术股份有限公司 | Abnormal task processing method, abnormal task scheduling method, abnormal task processing device, abnormal task scheduling device and computer storage medium |
CN110347490A (en) * | 2019-07-12 | 2019-10-18 | 北京天云融创软件技术有限公司 | A kind of resource scalable combined schedule method in kubernetes |
CN112291288A (en) * | 2019-07-24 | 2021-01-29 | 北京金山云网络技术有限公司 | Container cluster expansion method and device, electronic equipment and readable storage medium |
CN112291288B (en) * | 2019-07-24 | 2022-10-04 | 北京金山云网络技术有限公司 | Container cluster expansion method and device, electronic equipment and readable storage medium |
CN110413391A (en) * | 2019-07-24 | 2019-11-05 | 上海交通大学 | Deep learning task service method for ensuring quality and system based on container cluster |
CN112350837B (en) * | 2019-08-06 | 2022-10-28 | 南京南瑞继保电气有限公司 | Cloud platform-based power application cluster management method and device |
CN112350837A (en) * | 2019-08-06 | 2021-02-09 | 南京南瑞继保电气有限公司 | Cloud platform-based power application cluster management method and device |
CN110515730A (en) * | 2019-08-22 | 2019-11-29 | 北京宝兰德软件股份有限公司 | Resource secondary dispatching method and device based on kubernetes container arranging system |
CN110597626A (en) * | 2019-08-23 | 2019-12-20 | 第四范式(北京)技术有限公司 | Method, device and system for allocating resources and tasks in distributed system |
CN110597626B (en) * | 2019-08-23 | 2022-09-06 | 第四范式(北京)技术有限公司 | Method, device and system for allocating resources and tasks in distributed system |
CN110580194A (en) * | 2019-08-29 | 2019-12-17 | 上海仪电(集团)有限公司中央研究院 | Container scheduling method based on memory hot plug technology and management node scheduler |
CN110968424A (en) * | 2019-09-12 | 2020-04-07 | 广东浪潮大数据研究有限公司 | Resource scheduling method, device and storage medium based on K8s |
CN110968424B (en) * | 2019-09-12 | 2023-04-07 | 广东浪潮大数据研究有限公司 | Resource scheduling method, device and storage medium based on K8s |
CN110825520B (en) * | 2019-10-18 | 2023-08-29 | 山东省计算中心(国家超级计算济南中心) | Cluster extremely-fast elastic telescoping method for realizing efficient resource utilization |
CN110825520A (en) * | 2019-10-18 | 2020-02-21 | 山东省计算中心(国家超级计算济南中心) | Cluster top-speed elastic expansion method for realizing efficient resource utilization |
CN110753107A (en) * | 2019-10-21 | 2020-02-04 | 中国科学院空间应用工程与技术中心 | Resource scheduling system, method and storage medium under space-based cloud computing architecture |
CN112835915B (en) * | 2019-11-25 | 2023-07-18 | 中国移动通信集团辽宁有限公司 | MPP database system, data storage method and data query method |
CN112835915A (en) * | 2019-11-25 | 2021-05-25 | 中国移动通信集团辽宁有限公司 | MPP database system, data storage method and data query method |
WO2021103790A1 (en) * | 2019-11-26 | 2021-06-03 | 北京京东尚科信息技术有限公司 | Container scheduling method and apparatus, and non-volatile computer-readable storage medium |
EP4068090A4 (en) * | 2019-11-26 | 2024-01-03 | Beijing Jingdong Shangke Information Technology Co Ltd | Container scheduling method and apparatus, and non-volatile computer-readable storage medium |
CN111045821A (en) * | 2019-12-06 | 2020-04-21 | 北京浪潮数据技术有限公司 | Container scheduling method and device, container scheduler and readable storage medium |
CN113037800A (en) * | 2019-12-09 | 2021-06-25 | 华为技术有限公司 | Job scheduling method and job scheduling device |
CN113037800B (en) * | 2019-12-09 | 2024-03-05 | 华为云计算技术有限公司 | Job scheduling method and job scheduling device |
CN111147565A (en) * | 2019-12-22 | 2020-05-12 | 北京浪潮数据技术有限公司 | Cluster node control method, device and equipment and readable storage medium |
CN111399855A (en) * | 2020-03-09 | 2020-07-10 | 山东汇贸电子口岸有限公司 | Automatic application instance publishing method based on container technology |
CN111399855B (en) * | 2020-03-09 | 2023-10-20 | 山东省电子口岸有限公司 | Automatic application instance publishing method based on container technology |
CN113296870A (en) * | 2020-04-07 | 2021-08-24 | 阿里巴巴集团控股有限公司 | Method and device for predicting Kubernetes cluster configuration |
CN113296870B (en) * | 2020-04-07 | 2024-03-08 | 阿里巴巴集团控股有限公司 | Method and device for predicting Kubernetes cluster configuration |
CN111625420A (en) * | 2020-05-21 | 2020-09-04 | 浪潮电子信息产业股份有限公司 | Distributed training task processing method, device, equipment and storage medium |
CN113872997B (en) * | 2020-06-30 | 2022-08-26 | 华为技术有限公司 | Container group POD reconstruction method based on container cluster service and related equipment |
CN113872997A (en) * | 2020-06-30 | 2021-12-31 | 华为技术有限公司 | Container group POD reconstruction method based on container cluster service and related equipment |
CN111880936A (en) * | 2020-07-31 | 2020-11-03 | 广州华多网络科技有限公司 | Resource scheduling method and device, container cluster, computer equipment and storage medium |
CN111880936B (en) * | 2020-07-31 | 2023-08-08 | 广州华多网络科技有限公司 | Resource scheduling method, device, container cluster, computer equipment and storage medium |
CN112130991A (en) * | 2020-08-28 | 2020-12-25 | 北京思特奇信息技术股份有限公司 | Application program control method and system based on machine learning |
CN112416368B (en) * | 2020-11-25 | 2024-01-16 | 中国科学技术大学先进技术研究院 | Cache deployment and task scheduling method, terminal and computer readable storage medium |
CN112416368A (en) * | 2020-11-25 | 2021-02-26 | 中国科学技术大学先进技术研究院 | Cache deployment and task scheduling method, terminal and computer readable storage medium |
CN112363820A (en) * | 2020-12-01 | 2021-02-12 | 成都精灵云科技有限公司 | Uniform resource pooling container scheduling engine based on heterogeneous hardware and scheduling method thereof |
CN113778610A (en) * | 2021-01-12 | 2021-12-10 | 北京沃东天骏信息技术有限公司 | Method and apparatus for determining resources |
CN113778610B (en) * | 2021-01-12 | 2024-04-09 | 北京沃东天骏信息技术有限公司 | Method and device for determining resources |
EP3955174A3 (en) * | 2021-03-10 | 2022-05-04 | Beijing Baidu Netcom Science And Technology Co. Ltd. | Method, apparatus and storage medium for training a deep learning framework |
CN113190344A (en) * | 2021-03-26 | 2021-07-30 | 中国科学院软件研究所 | Method and device for dynamic reconfiguration and deployment of neural network for software-defined satellite |
CN113190344B (en) * | 2021-03-26 | 2023-12-15 | 中国科学院软件研究所 | Method and device for dynamic reconfiguration deployment of neural network for software defined satellite |
CN113760448A (en) * | 2021-04-30 | 2021-12-07 | 中科天玑数据科技股份有限公司 | Big data management platform based on kubernets |
CN113407305A (en) * | 2021-05-31 | 2021-09-17 | 北京达佳互联信息技术有限公司 | Task deployment method and device, electronic equipment and storage medium |
CN113452758A (en) * | 2021-06-04 | 2021-09-28 | 中国联合网络通信集团有限公司 | Service access method and device |
CN113296868A (en) * | 2021-07-27 | 2021-08-24 | 杭州筋斗腾云科技有限公司 | Application platform and application management method |
CN114661247B (en) * | 2022-05-23 | 2022-09-20 | 武汉四通信息服务有限公司 | Automatic capacity expansion method and device, electronic equipment and storage medium |
CN114661247A (en) * | 2022-05-23 | 2022-06-24 | 武汉四通信息服务有限公司 | Automatic capacity expansion method and device, electronic equipment and storage medium |
CN114691050A (en) * | 2022-05-26 | 2022-07-01 | 深圳前海环融联易信息科技服务有限公司 | Cloud native storage method, device, equipment and medium based on kubernets |
CN116155750A (en) * | 2023-04-19 | 2023-05-23 | 之江实验室 | Deep learning job resource placement method, system, equipment and storage medium |
CN116339926B (en) * | 2023-05-22 | 2023-08-08 | 成都交控轨道科技有限公司 | Containerized deployment method of ATS software |
CN116339926A (en) * | 2023-05-22 | 2023-06-27 | 成都交控轨道科技有限公司 | Containerized deployment method of ATS software |
CN117806815A (en) * | 2023-11-27 | 2024-04-02 | 本原数据(北京)信息技术有限公司 | Data processing method, system, electronic device and storage medium |
CN117632444A (en) * | 2024-01-26 | 2024-03-01 | 之江实验室 | NPU fault-tolerant scheduling system of computer cluster |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109117265A (en) | The method, apparatus, equipment and storage medium of schedule job in the cluster | |
CN107273185B (en) | Load balancing control method based on virtual machine | |
US9977689B2 (en) | Dynamic scaling of management infrastructure in virtual environments | |
US9684542B2 (en) | Smart cloud workload balancer | |
CN109034396B (en) | Method and apparatus for processing deep learning jobs in a distributed cluster | |
US10715460B2 (en) | Opportunistic resource migration to optimize resource placement | |
CN105049268B (en) | Distributed computing resource distribution system and task processing method | |
US11106508B2 (en) | Elastic multi-tenant container architecture | |
CN106933669B (en) | Apparatus and method for data processing | |
US20170060707A1 (en) | High availability dynamic restart priority calculator | |
EP3688585A1 (en) | Autonomous multitenant database cloud service framework | |
CN109075994A (en) | More depot complexes | |
CN107864211B (en) | Cluster resource dispatching method and system | |
CN107451853B (en) | Method, device and system for real-time red packet distribution and storage medium | |
CN117480494A (en) | Coordinated container scheduling for improved resource allocation in virtual computing environments | |
CN110399272A (en) | Log processing equipment, method, electronic equipment and computer readable storage medium | |
CN113342477A (en) | Container group deployment method, device, equipment and storage medium | |
CN111666158A (en) | Kubernetes-based container scheduling method and device, storage medium and electronic equipment | |
CN116450355A (en) | Multi-cluster model training method, device, equipment and medium | |
CN114090191A (en) | Method, device and equipment for scheduling storage resources and storage medium | |
US10416892B2 (en) | Fileset-based data locality enablement in distributed file systems | |
CN112148458A (en) | Task scheduling method and device | |
CN109005071A (en) | A kind of decision and deployment method and controlling equipment | |
CN108196797A (en) | A kind of data processing system based on cloud computing | |
CN117155804B (en) | Cloud server deployment method and related device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190101 |