CN109117265A - The method, apparatus, equipment and storage medium of schedule job in the cluster - Google Patents

The method, apparatus, equipment and storage medium of schedule job in the cluster Download PDF

Info

Publication number
CN109117265A
CN109117265A CN201810761530.6A CN201810761530A CN109117265A CN 109117265 A CN109117265 A CN 109117265A CN 201810761530 A CN201810761530 A CN 201810761530A CN 109117265 A CN109117265 A CN 109117265A
Authority
CN
China
Prior art keywords
node
pod
cluster
resource
condition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810761530.6A
Other languages
Chinese (zh)
Inventor
周倜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201810761530.6A priority Critical patent/CN109117265A/en
Publication of CN109117265A publication Critical patent/CN109117265A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses the method, apparatus of schedule job in the cluster, equipment and storage mediums.The described method includes: obtaining Pod data corresponding with operation;According to the node scheduling condition and the node state of each node in cluster in Pod data, one or more destination nodes are selected from cluster;Pod is disposed respectively according to Pod data on each destination node, the operation process of the operation operation technical solution disposes operation process by the way of containerization in the Pod of deployment, different work is deployed in each independent container, so that the operation respectively applied on cluster can be independent of each other, the scene of interaction can needed to be communicated again, the effective use for realizing cluster resource needs deep learning etc. mostly to can be realized stable job scheduling using the project of pipelineization cooperation.

Description

The method, apparatus, equipment and storage medium of schedule job in the cluster
[technical field]
The present invention relates to job scheduling fields, and in particular to the method, apparatus, equipment and storage of schedule job in the cluster Medium.
[background technique]
Using more physics units at cluster, deployment services are the conventional means that Internet enterprises realize project on it, Therefore, how to carry out rational management to all kinds of operations serviced on cluster is always the problem of technical staff constantly studies.
By taking deep learning project as an example, technical staff is often desirable to run all portions of project on same foundation architecture platform Point, and the sample data of deep learning is often from the product of service line, that is, needing to dispatch multiclass jointly in same cluster The operation of type.However, existing Scheduling Framework can not realize this point well.
[summary of the invention]
In view of this, the present invention provides the method, apparatus of schedule job in the cluster, equipment and storage medium, with solution Certainly in same cluster the problem of the operation of scheduling different application or service.
Specific technical solution is as follows:
A method of schedule job in the cluster, comprising:
Obtain container pod Pod data corresponding with operation;
According to the node scheduling condition and the node state of each node in the cluster in the Pod data, from the collection One or more destination nodes are selected in group;
Pod is disposed respectively according to the Pod data on each destination node, and the operation is run in the Pod of deployment Operation process.
Optionally, the node scheduling condition includes rigid schedulable condition and/or soft schedulable condition;
The node state of each node in the node scheduling attribute according in the Pod data and the cluster, from institute It states and selects one or more destination nodes in cluster and include:
If the node state of a node meets the rigid schedulable condition, belong to destination node;
And/or
It is scheduled scoring according to the node state of each node and the soft schedulable condition, selects one according to appraisal result A or multiple destination nodes.
Optionally, the rigid schedulable condition includes the hardware information of node and/or the area information of node.
Optionally, the node shape of the node scheduling condition according in the Pod data and each node in the cluster State, one or more destination nodes are selected from the cluster includes:
The multiple nodes being located on the same available area are selected as destination node;
It is described that dispose Pod respectively according to the Pod data on each destination node include: on each destination node according to institute It states Pod data and disposes a Pod respectively, to form multiple examples of the operation.
Optionally, the node scheduling condition further includes example numerical lower limits and example the upper limit of the number;Described select is located at Multiple nodes on the same available area are as destination node further include:
When the number of nodes selected is greater than or equal to the example numerical lower limits, according to the number of nodes selected and The example the upper limit of the number determines quantity of the lesser value as destination node in the two;
When the number of nodes selected is less than the example numerical lower limits, this job scheduling is terminated.
Optionally, the soft schedulable condition includes the affine condition of operation, and the node state is included in this node top The corresponding operation of each Pod of administration.
Optionally, the operation corresponds to following one or more applications and/or service:
Deep learning system, Web service, log collector, Distributed Queuing Service, log connector.
Optionally, the operation is deep learning training operation, the operation that the operation is run in the Pod of deployment Process includes:
A parameter server process and a training aids process are run in the Pod of deployment, by the training aids process Deep learning task is obtained from the metamessage management node of deep learning, is obtained according to local deep learning training pattern training It is sent to the parameter server process after gradient, and obtains updated parameter from the parameter server process;
The parameter server process saves training snapshot in distributed storage at predetermined intervals, in Pod or Pod Process restarting resumed training according to the trained snapshot;
Deep learning training pattern is stored to distribution and is deposited by the parameter server process and/or the training aids process Storage.
Optionally, the node scheduling condition includes that resource request lower limit corresponding with each computing resource and resource are asked Seek the upper limit;
The node state of each node in the node scheduling condition according in the Pod data and the cluster, from institute It states and selects one or more destination nodes in cluster and include:
Under the schedulable resource upper limit and schedulable resource that calculate each computing resource according to the Pod disposed in each node Limit;
It is provided when the resource request lower limit of each computing resource in the node scheduling condition is respectively less than or is equal to corresponding calculate When the schedulable resource lower limit in source, one or more destination nodes are selected from the cluster.
Optionally, the node scheduling condition further includes job priority;
The node state of each node in the node scheduling condition according in the Pod data and the cluster, from institute It states and selects one or more destination nodes in cluster further include:
It is excellent according to operation when the resource request lower limit in the node scheduling condition is greater than the schedulable resource lower limit First grade kills or blocks the Pod disposed, alternatively, terminating this job scheduling.
Optionally, it is described kill or block Pod include:
When it is compressible resource that the node scheduling condition is corresponding, block Pod;
When it is incompressible resource that the node scheduling condition is corresponding, Pod is killed.
Optionally, this method further include:
It is that the Pod disposed on each node distributes computing resource by node scheduling condition;Wherein, if portion in a node The sum of resource request upper limit of compressible resource of each Pod of administration is less than the upper limit of the compressible resource of the node, then will not divided The compressible resource matched is proportionately distributed to each Pod disposed on the node.
Optionally, this method further include:
The EMS memory occupation point for calculating each operation process reaches corresponding with the operation process in the EMS memory occupation being calculated point Preset value when kill the operation process.
Optionally, this method further include:
Obtain the cpu busy percentage of each Pod dispose based on same Pod data, according to the arithmetic mean of instantaneous value of cpu busy percentage and Node scheduling condition in the Pod data calculates Pod quantity adjusted.
Optionally, this method further include:
Whether have not Pod by successful dispatch, be the section for further determining whether extendible capacity if monitoring in the cluster Point;
It is the node for starting at least partly extendible capacity, will be dispatched on the node newly started by the Pod of successful dispatch.
Optionally, this method further include:
Judge whether node meets capacity reducing condition according to the node state of each node, is corresponding node to be closed, in phase Other nodes being dispatched to the Pod disposed when having deployed Pod on the node answered in cluster.
Optionally, the capacity reducing condition includes following one or more:
The computing resource utilization rate of node is less than preset value;
The Pod disposed in node can be scheduled to other nodes in cluster;
The Pod disposed in node is confirmed as to be drifted about according to PodDisruptionBudget controller;
Node is not locally stored.
Optionally, dilatation and/or the capacity reducing of cluster are carried out according to following one or more strategies:
Randomly choose node;
Node is selected according to the Pod quantity disposed;
Node is selected according to computing resource utilization rate;
Node is selected according to the use price of physical machine;
When thering is the node of preset quantity and/or preset ratio to be abnormal in the cluster, suspend dilatation and/or capacity reducing.
A kind of device of schedule job in the cluster, which is characterized in that the device includes:
Pod data capture unit, for obtaining container pod Pod data corresponding with operation;
Scheduling unit, for according to the node scheduling condition and the node of each node in the cluster in the Pod data State selects one or more destination nodes from the cluster;
Pod deployment unit, for disposing Pod respectively according to the Pod data on each destination node, in the Pod of deployment The operation process of the middle operation operation.
Optionally, the node scheduling condition includes rigid schedulable condition and/or soft schedulable condition;
The scheduling unit belongs to target if the node state for a node meets the rigid schedulable condition Node;And/or scoring is scheduled according to the node state of each node and the soft schedulable condition, it is selected according to appraisal result One or more destination nodes out.
Optionally, the rigid schedulable condition includes the hardware information of node and/or the area information of node.
Optionally, the scheduling unit, for selecting the multiple nodes being located on the same available area as destination node;
The Pod deployment unit, for disposing a Pod respectively according to the Pod data on each destination node, with shape At multiple examples of the operation.
Optionally, the node scheduling condition further includes example numerical lower limits and example the upper limit of the number;
The scheduling unit, for when the number of nodes selected be greater than or equal to the example numerical lower limits when, according to institute It states the number of nodes selected and the example the upper limit of the number determines quantity of the lesser value as destination node in the two;When When the number of nodes selected is less than the example numerical lower limits, this job scheduling is terminated.
Optionally, the soft schedulable condition includes the affine condition of operation, and the node state is included in this node top The corresponding operation of each Pod of administration.
Optionally, the operation corresponds to following one or more applications and/or service:
Deep learning system, Web service, log collector, Distributed Queuing Service, log connector.
Optionally, the operation is deep learning training operation;
The deployment unit, for running a parameter server process and a training aids process in the Pod of deployment, Deep learning task is obtained from the metamessage management node of deep learning by the training aids process, according to local deep learning Training pattern training is sent to the parameter server process after obtaining gradient, and obtains more from the parameter server process Parameter after new;The parameter server process saves training snapshot in distributed storage at predetermined intervals, in Pod or Process restarting in Pod is resumed training according to the trained snapshot;The parameter server process and/or the training aids Process stores deep learning training pattern to distributed storage.
Optionally, the node scheduling condition includes that resource request lower limit corresponding with each computing resource and resource are asked Seek the upper limit;
The scheduling unit, for calculating the schedulable resource of each computing resource according to the Pod disposed in each node Limit and schedulable resource lower limit;When the resource request lower limit of each computing resource in the node scheduling condition is respectively less than or is equal to When the schedulable resource lower limit of corresponding computing resource, one or more destination nodes are selected from the cluster.
Optionally, the node scheduling condition further includes job priority;
The scheduling unit, for being greater than the schedulable resource when the resource request lower limit in the node scheduling condition It when lower limit, is killed according to job priority or blocks the Pod disposed, alternatively, terminating this job scheduling.
Optionally, the scheduling unit, for blocking when it is compressible resource that the node scheduling condition is corresponding Pod;When it is incompressible resource that the node scheduling condition is corresponding, Pod is killed.
Optionally, the scheduling unit is also used to be that the Pod distribution disposed on each node is calculated by node scheduling condition Resource;Wherein, if the sum of resource request upper limit of compressible resource of each Pod disposed in a node can less than the node Unassigned compressible resource is then proportionately distributed to each Pod disposed on the node by the upper limit of compressed resource.
Optionally, the scheduling unit is also used to calculate the EMS memory occupation point of each operation process, in the memory being calculated It occupies to divide and kills the operation process when reaching preset value corresponding with the operation process.
Optionally, the scheduling unit, for obtaining the cpu busy percentage of each Pod based on the deployment of same Pod data, root Pod quantity adjusted is calculated according to the node scheduling condition in the arithmetic mean of instantaneous value of cpu busy percentage and the Pod data.
Optionally, the scheduling unit, it is then that whether be also used to monitor in the cluster, which has the not Pod by successful dispatch, Further determine whether the node of extendible capacity;It is the node for starting at least partly extendible capacity, it will be not by the Pod of successful dispatch It is dispatched on the node newly started.
Optionally, the scheduling unit, for judging whether node meets capacity reducing condition according to the node state of each node, It is to close corresponding node, the Pod disposed is dispatched to other in cluster when having deployed Pod on corresponding node Node.
Optionally, the capacity reducing condition includes following one or more: the computing resource utilization rate of node is less than default Value;The Pod disposed in node can be scheduled to other nodes in cluster;The Pod disposed in node according to PodDisruptionBudget controller is confirmed as to be drifted about;Node is not locally stored.
Optionally, the scheduling unit, for carrying out the dilatation and/or contracting of cluster according to following one or more strategies Hold: random selection node;Node is selected according to the Pod quantity disposed;Node is selected according to computing resource utilization rate;According to object The use price of reason machine selects node;When having the node of preset quantity and/or preset ratio to be abnormal in the cluster, pause is expanded Appearance and/or capacity reducing.
A kind of computer equipment, including memory, processor and be stored on the memory and can be in the processor The computer program of upper operation, the processor realize method as described above when executing described program.
A kind of computer readable storage medium is stored thereon with computer program, real when described program is executed by processor Now method as described above.
It can be seen that based on above-mentioned introduction using scheme of the present invention, getting container pod Pod corresponding with operation After data, according to the node state of each node in node scheduling condition therein and cluster, several targets are selected from cluster Node, and further Pod is disposed respectively according to Pod data on each destination node, the operation is run in the Pod of deployment Operation process.The technical solution disposes operation process by the way of containerization, different work is deployed in each only In vertical container, so that the operation respectively applied on cluster can be independent of each other, and it can be carried out in the scene for needing interaction Communication, realizes the effective use of cluster resource, and deep learning etc. is needed mostly can be real using the project of pipelineization cooperation Now stable job scheduling.
[Detailed description of the invention]
Fig. 1 shows a kind of process signal of method of schedule job in the cluster according to an embodiment of the invention Figure.
Fig. 2 shows a kind of structural representations of the device of schedule job in the cluster according to an embodiment of the invention Figure.
Fig. 3 shows a kind of deep learning system architecture schematic diagram according to an embodiment of the invention.
Fig. 4 shows the block diagram for being suitable for the exemplary computer system/server 12 for being used to realize embodiment of the present invention.
[specific embodiment]
In order to be clearer and more clear technical solution of the present invention, hereinafter, referring to the drawings and the embodiments, to institute of the present invention The scheme of stating is further described.
Obviously, described embodiments are some of the embodiments of the present invention, instead of all the embodiments.Based on the present invention In embodiment, those skilled in the art's all other embodiment obtained without creative efforts, all Belong to the scope of protection of the invention.
Fig. 1 shows a kind of process signal of method of schedule job in the cluster according to an embodiment of the invention Figure.As shown in Figure 1, this method comprises:
Step S110 obtains container pod Pod data corresponding with operation.
Container (container) can provide independent running environment, and this point and virtual machine are similar.However in this hair Operation process is disposed using the thinking of containerization in bright embodiment, rather than uses virtual machine, this is because container is more light Amount, efficiency and utilization rate are all significantly larger than virtual machine.
Container pod Pod is the set of one or more containers, it is however generally that including a root container, and operation operation into Each container of journey.In an embodiment of the present invention, a Pod can correspond to one or more examples of operation, but it is general and Speech will not realize multiple examples using Pod.
Pod data can be stored on etcd, and etcd is a key assignments storage repository, for configuring shared and service hair It is existing.The Pod data of new job and the Pod data of killed Pod can be stored on etcd, when dispatching corresponding operation It is obtained.Specifically, the job request that can be submitted according to user generates Pod data.
Step S120, according to the node scheduling condition and the node state of each node in cluster in Pod data, from cluster Select one or more destination nodes.
Here, node scheduling condition can have very much, and specific description form can be label (Label), equally, node State can also be described in the form of Label.Lablel is the key-value pair of a key-value, and wherein key and value are by user Oneself is specified.It can be attached on various resource objects, a resource object can define any number of Label, Ke Yitong Cross LabelSelector (label selector) inquiry and screening resource object.
Step S130 disposes Pod according to Pod data on each destination node respectively, and operation is run in the Pod of deployment Operation process.
As it can be seen that method shown in FIG. 1, after getting container pod Pod data corresponding with operation, according to node therein The node state of each node in schedulable condition and cluster, selects several destination nodes from cluster, and further in each target Pod is disposed respectively according to Pod data on node, and the operation process of operation is run in the Pod of deployment.The technical solution is using appearance The mode of device disposes operation process, different work is deployed in each independent container, so that on cluster The operation respectively applied can be independent of each other, and the scene of interaction can needed to be communicated, and realize the effective of cluster resource It utilizes, deep learning etc. is needed mostly to can be realized stable job scheduling using the project of pipelineization cooperation.
In one embodiment of the invention, in the above method, node scheduling condition includes rigid schedulable condition and/or soft Property schedulable condition;According to the node state of each node in the node scheduling attribute and cluster in Pod data, one is selected from cluster If the node state that a or multiple destination nodes include: a node meets rigid schedulable condition, belong to destination node;With/ Or, being scheduled scoring according to the node state of each node and soft schedulable condition, one or more is selected according to appraisal result Destination node.
As can be seen that rigid schedulable condition is more strict requirements.For example, in one embodiment of the present of invention In, in the above method, rigid schedulable condition includes the hardware information of node and/or the area information of node.
In one example, user wishes that certain operation is deployed only on the node of CPU model Intel, it is clear that this is one The hardware information of a node.In another example, user wishes that certain operation is deployed on the node of region A, then this is one The area information of node.It summarizes ground to say, when set every in the rigid schedulable condition of an operation is the node of a node In state when the subset of every set, then this node is exactly the destination node selected.It specifically can also be with tag-shaped Formula marks rigid schedulable condition and node state, then can be realized by the way of NodeSelector (Node Selector).
Rigid schedulable condition can help quickly to filter out available node, however also be easy to cause no node available in this way The case where.Therefore the operation less harsh for those demands, can also be arranged soft schedulable condition, grade is according to each node Node state and soft schedulable condition are scheduled scoring, select one or more destination nodes according to appraisal result.In this way, by In being no longer strictly to match, then can preferentially select with operation affinity highest rather than perfect affine node.
In one embodiment of the invention, in the above method, according in the node scheduling condition and cluster in Pod data The node state of each node includes: to select on the same available area from one or more destination nodes are selected in cluster Multiple nodes are as destination node;Disposing Pod respectively according to Pod data on each destination node includes: on each destination node A Pod is disposed respectively according to Pod data, to form multiple examples of operation.
A kind of job scheduling thinking is provided in the present embodiment, i.e., in AZ (Availability Zone, available area) Multiple examples (i.e. Pod) of deployment operation (operation here mean to apply or service, such as log collector), i.e., singly answer It is affine used in each example of AZ rank.Available area refers in the mutually isolated one or more data of the infrastructure such as electric power, network The heart.One region includes that one or more available areas will not influence making for other available areas after an available area breaks down With.
And be then to dispose an example respectively on each destination node in AZ, also mean that a node will not be disposed Two examples also only will affect an example in this way when a node failure.A kind of thinking is done in machine frame (cabinet) rank It is anti-affine, as it is possible that a machine frame all can failure.But substantially they are all a labels on node, and when scheduling only needs Dynamic Packet is carried out by this special label, handle affine and anti-affine relationship.
In one embodiment of the invention, in the above method, node scheduling condition further includes example numerical lower limits and reality Example the upper limit of the number;The multiple nodes being located on the same available area are selected as destination node further include: when the number of nodes selected When amount is greater than or equal to example numerical lower limits, determined according to the number of nodes and example the upper limit of the number selected one smaller in the two Quantity of the value as destination node;When the number of nodes selected is less than example numerical lower limits, this job scheduling is terminated.
This is the insurmountable problem of many Scheduling Frameworks such as Slurm.It simply introduces herein: Slurm task tune (predecessor is extremely letter Linux resource management tool, English: Simple Linux Utility for Resource to degree tool Management takes initial, is abbreviated as SLURM) or Slurm, be one for Linux and Unix core system it is free, The task schedule tool of open source, is widely used by worldwide supercomputer and computer cluster.It provides three passes Key function.First, specially enjoying or the non-resource (computer node) specially enjoyed for certain time is distributed for user, so that user executes work Make.Second, it provides a frame, (usually parallel for starting, executing, monitoring the running task on node Task, such as MPI), third reasonably distributes resource for task queue.It is all run on about 60% the last 500 supercomputer Slurm, including the computer Milky Way -2 most fast on former world in 2016.Slurm is used based on the scheduling of Hilbert curve or fertilizer The most suitable algorithm of fat network topology structure, to optimize the distribution of the task in parallel computer.
MPI is the communications protocol across language, for writing parallel computer.Support point-to-point and broadcast.MPI is one A information transmits application programming interfaces, including agreement and and semantic description, they indicate how it plays it in various implementations Characteristic.The target of MPI is high-performance, extensive property, and portable.MPI is in the main models that today is still high-performance calculation. Main MPI-1 model does not include shared drive concept, and MPI-2 only has limited Distributed shared-memory concept.But MPI program Often run on the machine of shared drive.In MPI model periphery design program well because of MPI than being designed under NUMA architecture Encourage memory localization.Although MPI belongs to the layer 5 or higher of OSI Reference Model, his realization may pass through transport layer Sockets and Transmission Control Protocol (TCP) cover most layer.Most MPI realize by Some specified convention collection (API) compositions, can be by C, C++, Fortran, or has the language such as C#, Java or of this class libraries Python is called directly.MPI is portability and speed because of him better than legacy information transmitting library.
However, having 99 enabled nodes when using Slrum or MPI frame and needing the example of 100 submission operations In, operation has to wait for without the use of any enabled node.Or if occurring mistake in cluster, entire task can be labeled At failure, to waste a large amount of cluster resources.
It is then not in such problems apparently according to the present embodiment, due to being provided with example numerical lower limits and example quantity The upper limit (is referred to as number of copies, because being realized according to the same Pod data), if that the pair of operation is submitted in setting This number is 80~100, then example numerical lower limits are 80, then 99 enabled nodes are clearly meet demand.So operation can To be scheduled on this 99 enabled nodes.
In one embodiment of the invention, in the above method, soft schedulable condition includes the affine condition of operation, node shape State includes the corresponding operation of each Pod in this node top administration.
This gives a kind of example of soft schedulable condition, the i.e. affine condition of operation, it is referred to as using parent And condition.For example, business operation affinity corresponding with monitoring log processing and local data is higher, if where them Pod distance farther out, then the problem of network overhead accessed will result in inefficiency.Therefore it is actually in the present embodiment The nearest deployment of affine application is provided, such as is deployed in same node, just reduces network overhead naturally.
Affine is a mutual behavior, therefore, is each required with the possible affine operation of other operations with operation parent Oneself which affine/anti-affine operation is marked with condition (being also possible to label), inspection is just will do it in Pod deployment, realizes Symmetry.
Another is common issue is that there is a situation where migrate by affine application.Need to illustrate has two o'clock, first is that Algorithm design on done symmetry the considerations of, either first deployment or deployment, even if this application hang, it is rebuilding And when scheduled, which affine Pod or affine by which Pod in current system can be still checked, preferentially with their portions to one It rises and goes.In addition, at present RC/RS (copy set is statelessly applied) only have node hang when just can there is a situation where rebuild Pod, Node is not hung, and is exited and oneself is restarted in situ extremely.This can guarantee that affine application section exists from two levels Together, anti-affine application separates this demand.
In one embodiment of the invention, in the above method, operation correspond to following one or more applications and/or Service: deep learning system, Web service, log collector, Distributed Queuing Service, log connector.
Such deep learning project can be used for the research of artificial intelligence (AI), meet industrial requirement.Industrial user inclines To in using deep learning operation as the subset stage of partial data pipeline, including Web server and log collector.It is this logical Flexible scheduling priority-based is needed with cluster.This to run more processes in Web server operation, And deep learning is then less within the network flow higher period, and depth is then preferentially carried out when network flow is lower It practises.SLURM or MPI is unable to satisfy the demand of flexible dispatching in this regard.
Deep learning training frame itself needs to be designed to support distributed training.There are three angles in deep learning cluster Color: parameter server (Parameter Server) and training aids (Trainer) and metamessage management node (Master).Often A parameter server process all safeguards the fragment (shard) of world model (global model).Each training aids has it Ground model (local model) copy, and use its local data more new model.In training process, training aids by model more It newly is sent to parameter server, parameter server is responsible for summarizing these updates, so that training aids can be by its local replica and complete Office's mold sync.
Cluster training includes following module: single metamessage management node is responsible for distributed tasks (task), by data Collection (dataset) is divided into task and is distributed in each training aids, keeps trained by using task queue (task queue) The trackability of task;Multiple training aids are responsible for receiving appointing for master by sgd (stochastic gradient descent) training pattern Business, processing task calculate and upload gradient (gradient) to parameter server, while downloading newest gradient (to claim For parameter, model) arrive oneself local model;Multiple parameters server is responsible for storage and updates training pattern, specifically, from Gradient is obtained at training aids, undated parameter returns to the newest parameter of training aids;Periodically parameter is stored to distributed field system System or etcd, cover original parameter.Specific training framework may refer to Fig. 3.Deep learning model is divided into two in Fig. 3 Fragment, respectively by two parameter server management.
When master starts, a master lock is taken, checks whether the task queue to be created has existed, if Through existing, just restore the task queue, if it does not exist, then creation.Monitoring/trainer/ catalogue looks for existing Trainer, distribution task give existing trainer, and updates task queue at the same time.When master fault recovery, It is restarted automatically master, and restores corresponding data from etcd.
When trainer starts, the relevant catalogue/ps/ of parameter server is monitored, parameter server is waited Reach specified quantity.A unique id is generated, is write under etcd ,/trainer/, due to having lease, master can With know trainer on line or it is online under, wait task to be assigned.When trainer fault recovery, it can be restarted automatically Trainer, and task is pulled from todo queue (queue to be processed), and continue to train.
Parameter server start when, read parameter server target sum, search etcd key be/ Ps/'s, index (index) is less than target sum, looks at the key which has be not present, if so, entering with regard to supplement. Parameter server can read the data being stored under the path, and store into memory, then start externally to provide Service.
In one embodiment of the invention, in the above method, operation is deep learning training operation, in the Pod of deployment The operation process of middle operation operation includes: that a parameter server process and a training aids process are run in the Pod of deployment, Deep learning task is obtained from the metamessage management node of deep learning by training aids process, according to local deep learning training Model training is sent to parameter server process after obtaining gradient, and obtains updated parameter from parameter server process; Parameter server process saves training snapshot in distributed storage at predetermined intervals, is opened again with the process in Pod or Pod It is dynamic to be resumed training according to training snapshot;Parameter server process and/or training aids process by deep learning training pattern store to Distributed storage.
The realization of model data checkpoint can effectively avoid the single-point or multiple spot simultaneous faults of parameter server.Mould Shape parameter checkpoint passes through the complete mirror that the model data that portion is stored in parameter server memory is periodically saved on disk Picture, to guarantee that training process can be restarted from intermediate state.In the training mission that one can not interrupt and lack backup, Appearance can be reached to distributed storage service by the data snapshot (snapshot) of the interim each parameter server of preservation The purpose of calamity, such as every 10 minutes newest snapshots, and delete snapshot earlier.When there is Single Point of Faliure, it is only necessary to extensive This multiple node, or this node is moved to another node and started can resume training task.
For example, being realized using lock (lock) mechanism, every 10 minutes, parameter server can go application read lock (read lock), Save checkpoint.And at the same time, which can stop write operation, and checkpoint to be saved is waited to complete.Parameter service later Recent snapshot is written in device in distributed storage, and deletes other original snapshots, after the completion of operation, discharges read lock, writes Operation can continue.
When snapshot is read, the reading check point file mark uuid from etcd, the load check point snapshot document from disk, And load wherein parameter.If load is unsuccessful, initial data initiation parameter is used.
Specific implementation can realize the shared storage of data using public cloud bosfs file system, can also use publicly-owned The newest nfs storage system of cloud.Data can uniformly be converted into RecordIO interface, provide standardization translation interface.
There are two types of selections, i.e. parameter server process and/or training aids process, and deep learning is trained mould for the storage of model Type is stored to distributed storage.Since the data in current parameter server are fragments, and model is dense in training aids Update, so it possesses entire model, so in terms of ease for use can preferably training aids carry out storage model.Specifically, it instructs Practice device to elect by etcd, be exported to select one of node as the storage of model.
In one embodiment of the invention, in the above method, node scheduling condition includes right respectively with each computing resource The resource request lower limit and the resource request upper limit answered;According to the node scheduling condition and the section of each node in cluster in Pod data Dotted state includes: to calculate each calculating money according to the Pod disposed in each node from one or more destination nodes are selected in cluster The schedulable resource upper limit in source and schedulable resource lower limit;When the resource request lower limit of each computing resource in node scheduling condition Respectively less than or equal to corresponding computing resource schedulable resource lower limit when, one or more destination nodes are selected from cluster.
A kind of resource-based scheduling mode is present embodiments provided, i.e. Pod can request item for CPU, memory setting Part specifically includes a resource request upper limit and a resource request lower limit.For each resource, 0≤resource request lower limit ≤ resource request the upper limit≤infinite.If container can be guaranteed by successful dispatch to node, the resource request of container.
So, entire cluster can be safeguarded and be calculated under the schedulable resource upper limit and schedulable resource of each computing resource Limit, if the resource request lower limit of each computing resource in node scheduling condition is respectively less than or adjustable equal to corresponding computing resource When spending resource lower limit, it is clear that resource is enough.For example, Pod requires the memory lower limit of 1024MB, that is, if it is not provided The memory of 1024MB, then operation can not carry out, and if node can provide the memory of 2048MB as schedulable resource at this time Lower limit, it is clear that the operation can be scheduled on the node.
In one embodiment of the invention, in the above method, node scheduling condition further includes job priority;According to The node state of each node, selects one or more target sections from cluster in node scheduling condition and cluster in Pod data Point further include: when the resource request lower limit in node scheduling condition is greater than schedulable resource lower limit, killed according to job priority Fall or block the Pod disposed, alternatively, terminating this job scheduling.
It can be divided into and be divided into three priority: Best-effort (optimal adaptation rank) by Service Quality Management (Qos), Resource request lower limit and the resource request upper limit are not write in node scheduling condition exactly, when such resource abundance can use most Resource (such as deep learning training operation can be set to this priority, occupy in the case where night service flow is small Resource as much as possible is trained), but can be also killed at first when resource anxiety (such as when daytime, service traffics were larger Need preferentially to guarantee the stabilization of business);Burstable (unstable rank), as long as there is the resource of a container in corresponding Pod Request lower limit, the resource request upper limit that other containers are set inconsistent, then the QoS of this POD is exactly Burstable grades Not.Guaranteed (assurance level), all containers must all be unified to be provided with resource request lower limit, the resource request upper limit, and And setting parameter is all consistent.
In one embodiment of the invention, in the above method, killing or blocking Pod includes: when node scheduling condition pair When what is answered is compressible resource, block Pod;When it is incompressible resource that node scheduling condition is corresponding, Pod is killed.Here According to the property of computing resource, since the occupied needs of memory are released, it is consequently belonging to incompressible resource, and CPU then can be with Dynamic adjustment service condition, belongs to compressible resource.Therefore corresponding processing is also different.
In one embodiment of the invention, the above method further include: by node scheduling condition be each node on disposed Pod distribute computing resource;Wherein, if the resource request upper limit of the compressible resource of each Pod disposed in a node it The upper limit with the compressible resource of the node is less than, then be proportionately distributed on the node portion for unassigned compressible resource Each Pod of administration.
The smallest cpu resource is limited to 10M.This is linux kernel limit decision.Container can obtain the CPU of requirement Can amount obtain task of the additional CPU time depending on other operations.In addition to the CPU quantity of request, additional cpu resource It is shared.Such as, it is assumed that specified 60%, the B container for needing CPU of A container requests the 30% of CPU, it is assumed that two containers are all Cpu resource is used as much as possible, that 10% additional cpu resource will be respectively allocated to container A according to the ratio of 2:1 and hold Device B.Container resource uses more than resource constraint and will be blocked, if resource constraint is not specified, when there is cpu resource that can make Used time, container can use additional CPU.
Container can obtain the memory source amount of request, if requested beyond memory source, container can be killed (when other When container needs memory), but if the resource of container consumption is less than the stock number of resource request lower limit, will not be killed (unless system task or finger daemon need more memories).It is more than that memory source requests the upper limit in container memory usage amount When, container can be killed.
In one embodiment of the invention, the above method further include: the EMS memory occupation point for calculating each operation process is being counted Obtained EMS memory occupation point kills the operation process when reaching preset value corresponding with the operation process.
EMS memory occupation point also referred to as OOM (out of memory) score in the present embodiment.The OOM score of process be into 10 times of the percentage of the memory of journey consumption, are adjusted by OOM_SCORE_ADJ (preset value), and killing has higher OOM score Process.Basic OOM score is between 0 and 1000, and the final OOM score of process is also between 0 and 1000.It is shown below three Example is arranged in the OOM_SCORE_ADJ of priority:
The OOM_SCORE that the process in OOM_SCORE_ADJ:1000- container is arranged in Best-effort- will be 1000;
The OOM_SCORE that the process in OOM_SCORE_ADJ:-998- container is arranged in Guaranteed- is 0 or 1;
Burstable- by OOM_SCORE_ADJ be set as 1000-10* (memory of resource request lower limit it is shared entire The percentage of node memory), this ensures OOM_SCORE > 1 of Burstable copy.If memory request is 0, OOM_SCORE_ ADJ is arranged to 999.
In one embodiment of the invention, the above method further include: obtain each Pod disposed based on same Pod data Cpu busy percentage, Pod adjusted is calculated according to the node scheduling condition in the arithmetic mean of instantaneous value of cpu busy percentage and Pod data Quantity.
This is also referred to as the horizontal automatic telescopic of example or copy (practical is also Pod).Automatically adjust device (Autoscaler) it is implemented as control loop, periodically passes through the cpu busy percentage of the copy of query node state collection Pod. Then, the arithmetic mean of instantaneous value of the cpu busy percentage of copy is compared by it with target defined in node scheduling condition, and according to Need to adjust the copy amount of scale to match target.Reserve: MinReplicas (example numerical lower limits)≤ Replicas (instance number)≤MaxReplicas (example the upper limit of the number).
Automatically adjust device period by controller management device-horizontal-pod-autoscaler-sync- Period mark control.Default value is 30 seconds.Cpu busy percentage is nearest CPU usage (last 1 minute flat an of copy Mean value) divided by the CPU of pod request.
The pod of destination number is calculated by following formula: TargetNumOfPods=ceil (sum (CurrentPodsCPUUtilization)/Target), ceil () is floor operation, indicates to take more than or equal to certain number A nearest integer;Sum is arithmetic sum operation;CurrentPodsCPUUtilization is some Pod in nearest one minute CPU usage/amount average value;Target is the resource request upper limit of CPU.
Noise (for example, starting may temporarily increase CPU) may be brought to measurement by starting and stopping in window phase.Cause This, after each movement, automatic adjustment device should wait for a period of time to obtain reliable data.Only at the past 3 minutes When there is no re-scaling inside, can just it amplify.Reducing will be from last time re-scaling to waiting 5 minutes.In addition, only working as copy The arithmetic mean of instantaneous value of cpu busy percentage and the ratio of resource request lower limit drop below 0.9 or be increased above 1.1 (10% Tolerance) when, just carry out any scaling.
There are two benefits for this method: on the one hand, automatic adjustment device is worked in a manner of conservative.It is negative if there is new user It carries, it is important that quickling increase the quantity of pod for us, so as not to refuse user's request, and reduces the quantity of pod not It is so urgent.On the other hand, automatic adjustment device needs avoid shaking: if load is unstable, quick execution conflict being prevented to determine Plan.
In one embodiment of the invention, the above method further include: whether monitor has in cluster not by successful dispatch Pod is the node for further determining whether extendible capacity;It is the node for starting at least partly extendible capacity, it will be not by success The Pod of scheduling is dispatched on the node newly started.
The embodiment passes through the expansion to cluster interior joint not provide a kind of resolving ideas by the Pod of successful dispatch Hold and carry out meet demand, because the node in cluster might not be all in starting state.For example, realized using dilatation component, Dilatation component creates a monitoring to all pod, it can check whether there is the pod that can not be scheduled every 10 seconds, and pod is general The state that can not be scheduled can be fallen into due to not having the node that can be dispatched.The pod that can not be scheduled can be monitored them PodCondition (state) is unscheduled (not scheduled).If there is this situation occurs, dilatation component can be it A new node is looked for dispatch.It may also insure that pod all in the copy set where the pod is in the same node group, Newly-built machine type can be consistent with the other machines in the node group in this way.
It accounts for from another point of view, in one embodiment of the invention, the above method further include: according to each node Node state judges whether node meets capacity reducing condition, is, closes corresponding node, has deployed Pod on corresponding node When the Pod disposed is dispatched to other nodes in cluster.Namely avoid the waste of resource.
For example, being realized using capacity reducing component, every 10 seconds capacity reducing components can check whether that suitable node can be deleted. In one embodiment of the invention, in the above method, capacity reducing condition includes following one or more: the computing resource of node Utilization rate is less than preset value;The Pod disposed in node can be scheduled to other nodes in cluster;The Pod disposed in node It is confirmed as to be drifted about according to PodDisruptionBudget controller;Node is not locally stored.
In one embodiment of the invention, in the above method, cluster is carried out according to following one or more strategies Dilatation and/or capacity reducing: random selection node;Node is selected according to the Pod quantity disposed;It is selected according to computing resource utilization rate Node;Node is selected according to the use price of physical machine;There is the node of preset quantity and/or preset ratio to occur in the cluster different Chang Shi suspends dilatation and/or capacity reducing.Such as it is unavailable in order to prevent extensive node caused by network or other problems Cause pod that can not dispose, to create the avalanche conditions of more unavailable nodes, certain rule can be formulated.Such as it is arranged In 30% node, or when maximum 3 nodes are abnormal, suspend scalable appearance function, until clustered node integrally restores.
Fig. 2 shows a kind of structural representations of the device of schedule job in the cluster according to an embodiment of the invention Figure.As shown in Fig. 2, the device 200 of schedule job in the cluster includes:
Pod data capture unit 210, for obtaining container pod Pod data corresponding with operation.
Scheduling unit 220, for the node state according to the node scheduling condition in Pod data and each node in cluster, One or more destination nodes are selected from cluster.
Pod deployment unit 230, for disposing Pod respectively according to Pod data on each destination node, in the Pod of deployment Run the operation process of operation.
As it can be seen that device shown in Fig. 2, after getting container pod Pod data corresponding with operation, according to node therein The node state of each node in schedulable condition and cluster, selects several destination nodes from cluster, and further in each target Pod is disposed respectively according to Pod data on node, and the operation process of operation is run in the Pod of deployment.The technical solution is using appearance The mode of device disposes operation process, different work is deployed in each independent container, so that on cluster The operation respectively applied can be independent of each other, and the scene of interaction can needed to be communicated, and realize the effective of cluster resource It utilizes, deep learning etc. is needed mostly to can be realized stable job scheduling using the project of pipelineization cooperation.
In one embodiment of the invention, in above-mentioned apparatus, node scheduling condition includes rigid schedulable condition and/or soft Property schedulable condition;Scheduling unit 220 belongs to target section if the node state for a node meets rigid schedulable condition Point;And/or scoring is scheduled according to the node state of each node and soft schedulable condition, one is selected according to appraisal result Or multiple destination nodes.
In one embodiment of the invention, in above-mentioned apparatus, rigid schedulable condition include node hardware information and/or The area information of node.
In one embodiment of the invention, in above-mentioned apparatus, scheduling unit 220 is located at same can be used for selecting Multiple nodes in area are as destination node;Pod deployment unit 230 is used on each destination node according to Pod data difference portion A Pod is affixed one's name to, to form multiple examples of operation.
In one embodiment of the invention, in above-mentioned apparatus, node scheduling condition further includes example numerical lower limits and reality Example the upper limit of the number;Scheduling unit 220, for when the number of nodes selected is greater than or equal to example numerical lower limits, according to selecting Number of nodes and example the upper limit of the number determine quantity of the lesser value as destination node in the two;When the node selected When quantity is less than example numerical lower limits, this job scheduling is terminated.
In one embodiment of the invention, in above-mentioned apparatus, soft schedulable condition includes the affine condition of operation, node shape State includes the corresponding operation of each Pod in this node top administration.
In one embodiment of the invention, in above-mentioned apparatus, operation correspond to following one or more applications and/or Service: deep learning system, Web service, log collector, Distributed Queuing Service, log connector.
In one embodiment of the invention, in above-mentioned apparatus, operation is deep learning training operation;Deployment unit is used A parameter server process and a training aids process are run in the Pod in deployment, by training aids process from deep learning Metamessage management node obtain deep learning task, sent after obtaining gradient according to local deep learning training pattern training Parameter server process is given, and obtains updated parameter from parameter server process;Between parameter server process is by making a reservation for It is interposed between in distributed storage and saves training snapshot, instruction is restored according to training snapshot with the process restarting in Pod or Pod Practice;Parameter server process and/or training aids process store deep learning training pattern to distributed storage.
In one embodiment of the invention, in above-mentioned apparatus, node scheduling condition includes right respectively with each computing resource The resource request lower limit and the resource request upper limit answered;Scheduling unit 220, it is each for being calculated according to the Pod disposed in each node The schedulable resource upper limit of computing resource and schedulable resource lower limit;When the resource of each computing resource in node scheduling condition is asked Ask lower limit be respectively less than or equal to corresponding computing resource schedulable resource lower limit when, one or more target sections are selected from cluster Point.
In one embodiment of the invention, in above-mentioned apparatus, node scheduling condition further includes job priority;Scheduling is single Member 220, for being killed according to job priority when the resource request lower limit in node scheduling condition is greater than schedulable resource lower limit Fall or block the Pod disposed, alternatively, terminating this job scheduling.
In one embodiment of the invention, in above-mentioned apparatus, scheduling unit 220, for being corresponded to when node scheduling condition Be compressible resource when, block Pod;When it is incompressible resource that node scheduling condition is corresponding, Pod is killed.
In one embodiment of the invention, in above-mentioned apparatus, scheduling unit 220, be also used to be by node scheduling condition The Pod distribution computing resource disposed on each node;Wherein, if the compressible resource of each Pod disposed in a node The sum of resource request upper limit is less than the upper limit of the compressible resource of the node, then is divided in portion unassigned compressible resource To each Pod disposed on the node.
In one embodiment of the invention, in above-mentioned apparatus, scheduling unit 220 is also used to calculate each operation process EMS memory occupation point, killed when the EMS memory occupation being calculated point reaches preset value corresponding with the operation process operation into Journey.
In one embodiment of the invention, in above-mentioned apparatus, scheduling unit 220 is based on same Pod data for obtaining The cpu busy percentage of each Pod of deployment is calculated according to the node scheduling condition in the arithmetic mean of instantaneous value of cpu busy percentage and Pod data Pod quantity adjusted.
In one embodiment of the invention, in above-mentioned apparatus, whether scheduling unit 220, being also used to monitor in cluster has It not by the Pod of successful dispatch, is to further determine whether the node of extendible capacity;It is the section for starting at least partly extendible capacity Point will be dispatched on the node newly started by the Pod of successful dispatch.
In one embodiment of the invention, in above-mentioned apparatus, scheduling unit 220, for the node shape according to each node State judges whether node meets capacity reducing condition, is, closes corresponding node, will when having deployed Pod on corresponding node The Pod of deployment is dispatched to other nodes in cluster.
In one embodiment of the invention, in above-mentioned apparatus, capacity reducing condition includes following one or more: node Computing resource utilization rate is less than preset value;The Pod disposed in node can be scheduled to other nodes in cluster;In the middle part of node The Pod of administration is confirmed as to be drifted about according to PodDisruptionBudget controller;Node is not locally stored.
In one embodiment of the invention, in above-mentioned apparatus, scheduling unit 220, for according to following one kind or more Kind strategy carries out dilatation and/or the capacity reducing of cluster: random selection node;Node is selected according to the Pod quantity disposed;According to meter It calculates resource utilization and selects node;Node is selected according to the use price of physical machine;There is preset quantity in the cluster and/or presets When the node of ratio is abnormal, suspend dilatation and/or capacity reducing.
The specific embodiment that the specific embodiment of above-mentioned apparatus embodiment is referred to preceding method embodiment is come real It is existing, it is not repeating herein.
It should be noted that for the various method embodiments described above, for simple description, therefore, it is stated as a series of Combination of actions, but those skilled in the art should understand that, the present invention is not limited by the sequence of acts described because According to the present invention, certain steps can use other sequences or carry out simultaneously.Secondly, those skilled in the art should also know It knows, the embodiments described in the specification are all preferred embodiments, and related actions and modules is not necessarily of the invention It is necessary.
In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, there is no the portion being described in detail in some embodiment Point, it may refer to the associated description of other embodiments.
Fig. 4 shows the block diagram for being suitable for the exemplary computer system/server 12 for being used to realize embodiment of the present invention. The computer system/server 12 that Fig. 4 is shown is only an example, should not function and use scope to the embodiment of the present invention Bring any restrictions.
As shown in figure 4, computer system/server 12 is showed in the form of universal computing device.Computer system/service The component of device 12 can include but is not limited to: one or more processor (processing unit) 16, memory 28, connect not homology The bus 18 of system component (including memory 28 and processor 16).
Bus 18 indicates one of a few class bus structures or a variety of, including memory bus or Memory Controller, Peripheral bus, graphics acceleration port, processor or the local bus using any bus structures in a variety of bus structures.It lifts For example, these architectures include but is not limited to industry standard architecture (ISA) bus, microchannel architecture (MAC) Bus, enhanced isa bus, Video Electronics Standards Association (VESA) local bus and peripheral component interconnection (PCI) bus.
Computer system/server 12 typically comprises a variety of computer system readable media.These media, which can be, appoints What usable medium that can be accessed by computer system/server 12, including volatile and non-volatile media, it is moveable and Immovable medium.
Memory 28 may include the computer system readable media of form of volatile memory, such as random access memory Device (RAM) 30 and/or cache memory 32.Computer system/server 12 may further include it is other it is removable/no Movably, volatile/non-volatile computer system storage medium.Only as an example, storage system 34 can be used for reading and writing Immovable, non-volatile magnetic media (Fig. 4 do not show, commonly referred to as " hard disk drive ").Although not shown in fig 4, may be used To provide the disc driver for reading and writing to removable non-volatile magnetic disk (such as " floppy disk "), and it is non-volatile to moving Property CD (such as CD-ROM, DVD-ROM or other optical mediums) read and write CD drive.In these cases, each drive Dynamic device can be connected by one or more data media interfaces with bus 18.Memory 28 may include at least one program Product, the program product have one group of (for example, at least one) program module, these program modules are configured to perform the present invention The function of each embodiment.
Program/utility 40 with one group of (at least one) program module 42 can store in such as memory 28 In, such program module 42 includes --- but being not limited to --- operating system, one or more application program, other programs It may include the realization of network environment in module and program data, each of these examples or certain combination.Program mould Block 42 usually executes function and/or method in embodiment described in the invention.
Computer system/server 12 can also be (such as keyboard, sensing equipment, aobvious with one or more external equipments 14 Show device 24 etc.) communication, it is logical that the equipment interacted with the computer system/server 12 can be also enabled a user to one or more Letter, and/or with the computer system/server 12 any is set with what one or more of the other calculating equipment was communicated Standby (such as network interface card, modem etc.) communicates.This communication can be carried out by input/output (I/O) interface 22.And And computer system/server 12 can also pass through network adapter 20 and one or more network (such as local area network (LAN), wide area network (WAN) and/or public network, such as internet) communication.As shown in figure 4, network adapter 20 passes through bus 18 communicate with other modules of computer system/server 12.It should be understood that although not shown in the drawings, computer can be combined Systems/servers 12 use other hardware and/or software module, including but not limited to: microcode, device driver, at redundancy Manage unit, external disk drive array, RAID system, tape drive and data backup storage system etc..
The program that processor 16 is stored in memory 28 by operation, at various function application and data Reason, such as realize the method in embodiment illustrated in fig. 1, i.e.,
The present invention discloses a kind of computer readable storage mediums, are stored thereon with computer program, the program quilt Processor will realize the method in embodiment as shown in Figure 1 when executing.
It can be using any combination of one or more computer-readable media.Computer-readable medium can be calculating Machine readable signal medium or computer readable storage medium.Computer readable storage medium for example can be --- but it is unlimited In system, device or the device of --- electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor, or any above combination.It calculates The more specific example (non exhaustive list) of machine readable storage medium storing program for executing includes: electrical connection with one or more conducting wires, just Taking formula computer disk, hard disk, random access memory (RAM), read-only memory (ROM), erasable type may be programmed read-only storage Device (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic memory device, Or above-mentioned any appropriate combination.In this document, computer readable storage medium can be it is any include or storage journey The tangible medium of sequence, the program can be commanded execution system, device or device use or in connection.
Computer-readable signal media may include in a base band or as carrier wave a part propagate data-signal, Wherein carry computer-readable program code.The data-signal of this propagation can take various forms, including --- but It is not limited to --- electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be Any computer-readable medium other than computer readable storage medium, which can send, propagate or Transmission is for by the use of instruction execution system, device or device or program in connection.
The program code for including on computer-readable medium can transmit with any suitable medium, including --- but it is unlimited In --- wireless, electric wire, optical cable, RF etc. or above-mentioned any appropriate combination.
The computer for executing operation of the present invention can be write with one or more programming languages or combinations thereof Program code, described program design language include object oriented program language-such as Java, Smalltalk, C++, Further include conventional procedural programming language-such as " C " language or similar programming language.Program code can be with It fully executes, partly execute on the user computer on the user computer, being executed as an independent software package, portion Divide and partially executes or executed on a remote computer or server completely on the remote computer on the user computer.? Be related in the situation of remote computer, remote computer can pass through the network of any kind --- including local area network (LAN) or Wide area network (WAN)-be connected to subscriber computer, or, it may be connected to outer computer (such as mentioned using Internet service It is connected for quotient by internet).
In several embodiments provided by the present invention, it should be understood that disclosed device and method etc. can pass through Other modes are realized.For example, the apparatus embodiments described above are merely exemplary, for example, the division of the unit, Only a kind of logical function partition, there may be another division manner in actual implementation.
The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme 's.
It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list Member both can take the form of hardware realization, can also realize in the form of hardware adds SFU software functional unit.
The above-mentioned integrated unit being realized in the form of SFU software functional unit can store and computer-readable deposit at one In storage media.Above-mentioned SFU software functional unit is stored in a storage medium, including some instructions are used so that a computer It is each that equipment (can be personal computer, server or the network equipment etc.) or processor (processor) execute the present invention The part steps of embodiment the method.And storage medium above-mentioned include: USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic or disk etc. it is various It can store the medium of program code.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all in essence of the invention Within mind and principle, any modification, equivalent substitution, improvement and etc. done be should be included within the scope of the present invention.

Claims (38)

1. a kind of method of schedule job in the cluster, which is characterized in that this method comprises:
Obtain container pod Pod data corresponding with operation;
According to the node scheduling condition and the node state of each node in the cluster in the Pod data, from the cluster Select one or more destination nodes;
Pod is disposed respectively according to the Pod data on each destination node, and the operation of the operation is run in the Pod of deployment Process.
2. the method according to claim 1, wherein the node scheduling condition include rigid schedulable condition and/ Or soft schedulable condition;
The node state of each node in the node scheduling attribute according in the Pod data and the cluster, from the collection Selecting one or more destination nodes in group includes:
If the node state of a node meets the rigid schedulable condition, belong to destination node;
And/or
Be scheduled scoring according to the node state of each node and the soft schedulable condition, according to appraisal result select one or Multiple destination nodes.
3. according to the method described in claim 2, it is characterized in that, the hardness schedulable condition includes the hardware information of node And/or the area information of node.
4. according to the method described in claim 3, it is characterized in that, the node scheduling condition according in the Pod data With the node state of node each in the cluster, one or more destination nodes are selected from the cluster includes:
The multiple nodes being located on the same available area are selected as destination node;
It is described on each destination node according to the Pod data dispose respectively Pod include: on each destination node according to Pod data dispose a Pod respectively, to form multiple examples of the operation.
5. according to the method described in claim 4, it is characterized in that, the node scheduling condition further include example numerical lower limits and Example the upper limit of the number;The multiple nodes selected on the same available area are as destination node further include:
When the number of nodes selected is greater than or equal to the example numerical lower limits, according to the number of nodes selected and described Example the upper limit of the number determines quantity of the lesser value as destination node in the two;
When the number of nodes selected is less than the example numerical lower limits, this job scheduling is terminated.
6. according to the method described in claim 2, it is characterized in that, the soft schedulable condition includes the affine condition of operation, institute Stating node state includes the corresponding operation of each Pod in this node top administration.
7. the method according to claim 1, wherein the operation corresponds to following one or more applications And/or service:
Deep learning system, Web service, log collector, Distributed Queuing Service, log connector.
8. the method according to the description of claim 7 is characterized in that the operation be deep learning training operation, it is described in portion The operation process that the operation is run in the Pod of administration includes:
A parameter server process and a training aids process are run in the Pod of deployment, by the training aids process from depth The metamessage management node of degree study obtains deep learning task, obtains gradient according to local deep learning training pattern training After be sent to the parameter server process, and obtain updated parameter from the parameter server process;
The parameter server process saves training snapshot in distributed storage at predetermined intervals, in Pod or Pod into Journey restarting is resumed training according to the trained snapshot;
The parameter server process and/or the training aids process store deep learning training pattern to distributed storage.
9. the method according to claim 1, wherein the node scheduling condition includes distinguishing with each computing resource Corresponding resource request lower limit and the resource request upper limit;
The node state of each node in the node scheduling condition according in the Pod data and the cluster, from the collection Selecting one or more destination nodes in group includes:
According to the Pod disposed in each node calculate each computing resource the schedulable resource upper limit and schedulable resource lower limit;
When the resource request lower limit of each computing resource in the node scheduling condition is respectively less than or equal to corresponding computing resource When schedulable resource lower limit, one or more destination nodes are selected from the cluster.
10. according to the method described in claim 9, it is characterized in that, the node scheduling condition further includes job priority;
The node state of each node in the node scheduling condition according in the Pod data and the cluster, from the collection One or more destination nodes are selected in group further include:
When the resource request lower limit in the node scheduling condition is greater than the schedulable resource lower limit, according to job priority It kills or blocks the Pod disposed, alternatively, terminating this job scheduling.
11. according to the method described in claim 10, it is characterized in that, it is described kill or block Pod include:
When it is compressible resource that the node scheduling condition is corresponding, block Pod;
When it is incompressible resource that the node scheduling condition is corresponding, Pod is killed.
12. according to the method described in claim 9, it is characterized in that, this method further include:
It is that the Pod disposed on each node distributes computing resource by node scheduling condition;Wherein, if having disposed in a node The sum of resource request upper limit of compressible resource of each Pod is less than the upper limit of the compressible resource of the node, then will be unassigned Compressible resource is proportionately distributed to each Pod disposed on the node.
13. according to the method for claim 12, which is characterized in that this method further include:
The EMS memory occupation point for calculating each operation process reaches corresponding pre- with the operation process in the EMS memory occupation being calculated point If killing the operation process when value.
14. according to the method described in claim 9, it is characterized in that, this method further include:
Obtain the cpu busy percentage of each Pod disposed based on same Pod data, according to the arithmetic mean of instantaneous value of cpu busy percentage with it is described Node scheduling condition in Pod data calculates Pod quantity adjusted.
15. the method according to claim 1, wherein this method further include:
Whether have not Pod by successful dispatch, be the node for further determining whether extendible capacity if monitoring in the cluster;
It is the node for starting at least partly extendible capacity, will be dispatched on the node newly started by the Pod of successful dispatch.
16. the method according to claim 1, wherein this method further include:
Judge whether node meets capacity reducing condition according to the node state of each node, is corresponding node to be closed, corresponding Other nodes being dispatched to the Pod disposed when having deployed Pod on node in cluster.
17. according to the method for claim 16, which is characterized in that the capacity reducing condition includes following one or more:
The computing resource utilization rate of node is less than preset value;
The Pod disposed in node can be scheduled to other nodes in cluster;
The Pod disposed in node is confirmed as to be drifted about according to PodDisruptionBudget controller;
Node is not locally stored.
18. method described in any one of 5-17 according to claim 1, which is characterized in that according to following one or more plans Slightly carry out dilatation and/or the capacity reducing of cluster:
Randomly choose node;
Node is selected according to the Pod quantity disposed;
Node is selected according to computing resource utilization rate;
Node is selected according to the use price of physical machine;
When thering is the node of preset quantity and/or preset ratio to be abnormal in the cluster, suspend dilatation and/or capacity reducing.
19. a kind of device of schedule job in the cluster, which is characterized in that the device includes:
Pod data capture unit, for obtaining container pod Pod data corresponding with operation;
Scheduling unit, for the node state according to the node scheduling condition in the Pod data and each node in the cluster, One or more destination nodes are selected from the cluster;
Pod deployment unit is transported in the Pod of deployment for disposing Pod respectively according to the Pod data on each destination node The operation process of the row operation.
20. device according to claim 19, which is characterized in that the node scheduling condition includes rigid schedulable condition And/or soft schedulable condition;
The scheduling unit belongs to destination node if the node state for a node meets the rigid schedulable condition; And/or scoring is scheduled according to the node state of each node and the soft schedulable condition, one is selected according to appraisal result Or multiple destination nodes.
21. device according to claim 20, which is characterized in that the hardness schedulable condition includes the hardware information of node And/or the area information of node.
22. device according to claim 21, which is characterized in that
The scheduling unit, for selecting the multiple nodes being located on the same available area as destination node;
The Pod deployment unit, for disposing a Pod respectively according to the Pod data on each destination node, to be formed State multiple examples of operation.
23. device according to claim 22, which is characterized in that the node scheduling condition further includes example numerical lower limits With example the upper limit of the number;
The scheduling unit, for when the number of nodes selected be greater than or equal to the example numerical lower limits when, according to the choosing Number of nodes and the example the upper limit of the number out determines quantity of the lesser value as destination node in the two;When selecting Number of nodes be less than the example numerical lower limits when, terminate this job scheduling.
24. device according to claim 20, which is characterized in that the soft schedulable condition includes the affine condition of operation, The node state includes the corresponding operation of each Pod in this node top administration.
25. device according to claim 19, which is characterized in that the operation corresponds to following one or more applications And/or service:
Deep learning system, Web service, log collector, Distributed Queuing Service, log connector.
26. device according to claim 25, which is characterized in that the operation is deep learning training operation;
The deployment unit, for running a parameter server process and a training aids process in the Pod of deployment, by institute It states training aids process and obtains deep learning task from the metamessage management node of deep learning, according to local deep learning training Model training is sent to the parameter server process after obtaining gradient, and after parameter server process acquisition update Parameter;The parameter server process saves training snapshot in distributed storage at predetermined intervals, in Pod or Pod Process restarting resumed training according to the trained snapshot;The parameter server process and/or the training aids process Deep learning training pattern is stored to distributed storage.
27. device according to claim 19, which is characterized in that the node scheduling condition includes and each computing resource is divided Not corresponding resource request lower limit and the resource request upper limit;
The scheduling unit, for calculated according to the Pod disposed in each node each computing resource the schedulable resource upper limit and Schedulable resource lower limit;When the resource request lower limit of each computing resource in the node scheduling condition is respectively less than or is equal to corresponding When the schedulable resource lower limit of computing resource, one or more destination nodes are selected from the cluster.
28. device according to claim 27, which is characterized in that the node scheduling condition further includes job priority;
The scheduling unit, for being greater than the schedulable resource lower limit when the resource request lower limit in the node scheduling condition When, it is killed according to job priority or blocks the Pod disposed, alternatively, terminating this job scheduling.
29. device according to claim 28, which is characterized in that
The scheduling unit, for blocking Pod when it is compressible resource that the node scheduling condition is corresponding;When the section Point schedulable condition is corresponding when being incompressible resource, kills Pod.
30. device according to claim 27, which is characterized in that
The scheduling unit is also used to be that the Pod disposed on each node distributes computing resource by node scheduling condition;Wherein, If the sum of resource request upper limit of compressible resource of each Pod disposed in a node is less than the compressible resource of the node Unassigned compressible resource is then proportionately distributed to each Pod disposed on the node by the upper limit.
31. device according to claim 30, which is characterized in that
The scheduling unit is also used to calculate the EMS memory occupation point of each operation process, reaches in the EMS memory occupation being calculated point The operation process is killed when preset value corresponding with the operation process.
32. device according to claim 27, which is characterized in that
The scheduling unit, for obtaining the cpu busy percentage of each Pod based on the deployment of same Pod data, according to cpu busy percentage Arithmetic mean of instantaneous value and the Pod data in node scheduling condition calculate Pod quantity adjusted.
33. device according to claim 19, which is characterized in that
Whether the scheduling unit, being also used to monitor in the cluster has the not Pod by successful dispatch, be further judge be The no node for having extendible capacity;It is the node for starting at least partly extendible capacity, new starting will be dispatched to by the Pod of successful dispatch Node on.
34. device according to claim 19, which is characterized in that
The scheduling unit is to close phase for judging whether node meets capacity reducing condition according to the node state of each node The node answered, other nodes being dispatched to the Pod disposed when having deployed Pod on corresponding node in cluster.
35. device according to claim 34, which is characterized in that the capacity reducing condition includes following one or more: The computing resource utilization rate of node is less than preset value;The Pod disposed in node can be scheduled to other nodes in cluster;Section The Pod disposed in point is confirmed as to be drifted about according to PodDisruptionBudget controller;Node is not deposited locally Storage.
36. the device according to any one of claim 33-35, which is characterized in that
The scheduling unit, for carrying out dilatation and/or the capacity reducing of cluster: random selection according to following one or more strategies Node;Node is selected according to the Pod quantity disposed;Node is selected according to computing resource utilization rate;According to the use of physical machine Price selects node;When thering is the node of preset quantity and/or preset ratio to be abnormal in the cluster, suspend dilatation and/or contracting Hold.
37. a kind of computer equipment, including memory, processor and it is stored on the memory and can be on the processor The computer program of operation, which is characterized in that the processor is realized when executing described program as any in claim 1~18 Method described in.
38. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that described program is processed The method as described in any one of claim 1~18 is realized when device executes.
CN201810761530.6A 2018-07-12 2018-07-12 The method, apparatus, equipment and storage medium of schedule job in the cluster Pending CN109117265A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810761530.6A CN109117265A (en) 2018-07-12 2018-07-12 The method, apparatus, equipment and storage medium of schedule job in the cluster

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810761530.6A CN109117265A (en) 2018-07-12 2018-07-12 The method, apparatus, equipment and storage medium of schedule job in the cluster

Publications (1)

Publication Number Publication Date
CN109117265A true CN109117265A (en) 2019-01-01

Family

ID=64862839

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810761530.6A Pending CN109117265A (en) 2018-07-12 2018-07-12 The method, apparatus, equipment and storage medium of schedule job in the cluster

Country Status (1)

Country Link
CN (1) CN109117265A (en)

Cited By (43)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109508238A (en) * 2019-01-05 2019-03-22 咪付(广西)网络技术有限公司 A kind of resource management system and method for deep learning
CN109960585A (en) * 2019-02-02 2019-07-02 浙江工业大学 A kind of resource regulating method based on kubernetes
CN110166429A (en) * 2019-04-12 2019-08-23 深圳壹账通智能科技有限公司 Data package processing method, device, computer readable storage medium and server
CN110297670A (en) * 2019-05-17 2019-10-01 北京瀚海星云科技有限公司 A kind of method and system improving distributed task scheduling training effectiveness on container cloud
CN110347490A (en) * 2019-07-12 2019-10-18 北京天云融创软件技术有限公司 A kind of resource scalable combined schedule method in kubernetes
CN110413391A (en) * 2019-07-24 2019-11-05 上海交通大学 Deep learning task service method for ensuring quality and system based on container cluster
CN110515730A (en) * 2019-08-22 2019-11-29 北京宝兰德软件股份有限公司 Resource secondary dispatching method and device based on kubernetes container arranging system
CN110580194A (en) * 2019-08-29 2019-12-17 上海仪电(集团)有限公司中央研究院 Container scheduling method based on memory hot plug technology and management node scheduler
CN110597626A (en) * 2019-08-23 2019-12-20 第四范式(北京)技术有限公司 Method, device and system for allocating resources and tasks in distributed system
CN110753107A (en) * 2019-10-21 2020-02-04 中国科学院空间应用工程与技术中心 Resource scheduling system, method and storage medium under space-based cloud computing architecture
CN110825520A (en) * 2019-10-18 2020-02-21 山东省计算中心(国家超级计算济南中心) Cluster top-speed elastic expansion method for realizing efficient resource utilization
CN110968424A (en) * 2019-09-12 2020-04-07 广东浪潮大数据研究有限公司 Resource scheduling method, device and storage medium based on K8s
CN111045821A (en) * 2019-12-06 2020-04-21 北京浪潮数据技术有限公司 Container scheduling method and device, container scheduler and readable storage medium
CN111147565A (en) * 2019-12-22 2020-05-12 北京浪潮数据技术有限公司 Cluster node control method, device and equipment and readable storage medium
CN111399855A (en) * 2020-03-09 2020-07-10 山东汇贸电子口岸有限公司 Automatic application instance publishing method based on container technology
CN111435299A (en) * 2019-01-14 2020-07-21 阿里巴巴集团控股有限公司 Application processing method and device
CN111625420A (en) * 2020-05-21 2020-09-04 浪潮电子信息产业股份有限公司 Distributed training task processing method, device, equipment and storage medium
CN111666034A (en) * 2019-03-05 2020-09-15 北京京东尚科信息技术有限公司 Container cluster disk management method and device
CN111880936A (en) * 2020-07-31 2020-11-03 广州华多网络科技有限公司 Resource scheduling method and device, container cluster, computer equipment and storage medium
CN112130991A (en) * 2020-08-28 2020-12-25 北京思特奇信息技术股份有限公司 Application program control method and system based on machine learning
CN112148438A (en) * 2019-06-28 2020-12-29 杭州海康威视数字技术股份有限公司 Abnormal task processing method, abnormal task scheduling method, abnormal task processing device, abnormal task scheduling device and computer storage medium
CN112291288A (en) * 2019-07-24 2021-01-29 北京金山云网络技术有限公司 Container cluster expansion method and device, electronic equipment and readable storage medium
CN112350837A (en) * 2019-08-06 2021-02-09 南京南瑞继保电气有限公司 Cloud platform-based power application cluster management method and device
CN112363820A (en) * 2020-12-01 2021-02-12 成都精灵云科技有限公司 Uniform resource pooling container scheduling engine based on heterogeneous hardware and scheduling method thereof
CN112416368A (en) * 2020-11-25 2021-02-26 中国科学技术大学先进技术研究院 Cache deployment and task scheduling method, terminal and computer readable storage medium
CN112835915A (en) * 2019-11-25 2021-05-25 中国移动通信集团辽宁有限公司 MPP database system, data storage method and data query method
WO2021103790A1 (en) * 2019-11-26 2021-06-03 北京京东尚科信息技术有限公司 Container scheduling method and apparatus, and non-volatile computer-readable storage medium
CN113037800A (en) * 2019-12-09 2021-06-25 华为技术有限公司 Job scheduling method and job scheduling device
CN113190344A (en) * 2021-03-26 2021-07-30 中国科学院软件研究所 Method and device for dynamic reconfiguration and deployment of neural network for software-defined satellite
CN113296870A (en) * 2020-04-07 2021-08-24 阿里巴巴集团控股有限公司 Method and device for predicting Kubernetes cluster configuration
CN113296868A (en) * 2021-07-27 2021-08-24 杭州筋斗腾云科技有限公司 Application platform and application management method
CN113407305A (en) * 2021-05-31 2021-09-17 北京达佳互联信息技术有限公司 Task deployment method and device, electronic equipment and storage medium
CN113452758A (en) * 2021-06-04 2021-09-28 中国联合网络通信集团有限公司 Service access method and device
CN113760448A (en) * 2021-04-30 2021-12-07 中科天玑数据科技股份有限公司 Big data management platform based on kubernets
CN113778610A (en) * 2021-01-12 2021-12-10 北京沃东天骏信息技术有限公司 Method and apparatus for determining resources
CN113872997A (en) * 2020-06-30 2021-12-31 华为技术有限公司 Container group POD reconstruction method based on container cluster service and related equipment
EP3955174A3 (en) * 2021-03-10 2022-05-04 Beijing Baidu Netcom Science And Technology Co. Ltd. Method, apparatus and storage medium for training a deep learning framework
CN114661247A (en) * 2022-05-23 2022-06-24 武汉四通信息服务有限公司 Automatic capacity expansion method and device, electronic equipment and storage medium
CN114691050A (en) * 2022-05-26 2022-07-01 深圳前海环融联易信息科技服务有限公司 Cloud native storage method, device, equipment and medium based on kubernets
CN116155750A (en) * 2023-04-19 2023-05-23 之江实验室 Deep learning job resource placement method, system, equipment and storage medium
CN116339926A (en) * 2023-05-22 2023-06-27 成都交控轨道科技有限公司 Containerized deployment method of ATS software
CN117632444A (en) * 2024-01-26 2024-03-01 之江实验室 NPU fault-tolerant scheduling system of computer cluster
CN117806815A (en) * 2023-11-27 2024-04-02 本原数据(北京)信息技术有限公司 Data processing method, system, electronic device and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102624919A (en) * 2012-03-30 2012-08-01 电子科技大学 Distributed service integrated system for service-oriented architecture and application method thereof
US20150055499A1 (en) * 2013-08-26 2015-02-26 Vmware, Inc. Networking stack of virtualization software configured to support latency sensitive virtual machines
CN106027643A (en) * 2016-05-18 2016-10-12 无锡华云数据技术服务有限公司 Resource scheduling method based on Kubernetes container cluster management system
CN106933650A (en) * 2017-03-03 2017-07-07 北方工业大学 load management method and system of cloud application system
CN107105009A (en) * 2017-03-22 2017-08-29 北京荣之联科技股份有限公司 Job scheduling method and device based on Kubernetes system docking workflow engines
CN107562545A (en) * 2017-09-11 2018-01-09 南京奥之云信息技术有限公司 A kind of container dispatching method based on Docker technologies
CN108228354A (en) * 2017-12-29 2018-06-29 杭州朗和科技有限公司 Dispatching method, system, computer equipment and medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102624919A (en) * 2012-03-30 2012-08-01 电子科技大学 Distributed service integrated system for service-oriented architecture and application method thereof
US20150055499A1 (en) * 2013-08-26 2015-02-26 Vmware, Inc. Networking stack of virtualization software configured to support latency sensitive virtual machines
CN106027643A (en) * 2016-05-18 2016-10-12 无锡华云数据技术服务有限公司 Resource scheduling method based on Kubernetes container cluster management system
CN106933650A (en) * 2017-03-03 2017-07-07 北方工业大学 load management method and system of cloud application system
CN107105009A (en) * 2017-03-22 2017-08-29 北京荣之联科技股份有限公司 Job scheduling method and device based on Kubernetes system docking workflow engines
CN107562545A (en) * 2017-09-11 2018-01-09 南京奥之云信息技术有限公司 A kind of container dispatching method based on Docker technologies
CN108228354A (en) * 2017-12-29 2018-06-29 杭州朗和科技有限公司 Dispatching method, system, computer equipment and medium

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
唐瑞: "基于Kubernetes的容器云平台资源调度策略研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
容器时代 平凡: "在Kubernetes集群中部署Heapster", 《KUBERNETES中文社区》 *
张夏: "DockOne微信分享(一四九):Kubernetes调度详解", 《DOCKONE》 *
张夏: "Kuberntes 服务质量保证(QoS)", 《KUBERNETES中文社区》 *
运维个西瓜: "k8s技术预研7--深⼊掌握Kubernetes Pod", 《CSDN》 *

Cited By (63)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109508238A (en) * 2019-01-05 2019-03-22 咪付(广西)网络技术有限公司 A kind of resource management system and method for deep learning
CN111435299B (en) * 2019-01-14 2023-06-20 阿里巴巴集团控股有限公司 Application processing method and device
CN111435299A (en) * 2019-01-14 2020-07-21 阿里巴巴集团控股有限公司 Application processing method and device
CN109960585A (en) * 2019-02-02 2019-07-02 浙江工业大学 A kind of resource regulating method based on kubernetes
CN111666034A (en) * 2019-03-05 2020-09-15 北京京东尚科信息技术有限公司 Container cluster disk management method and device
CN110166429A (en) * 2019-04-12 2019-08-23 深圳壹账通智能科技有限公司 Data package processing method, device, computer readable storage medium and server
CN110166429B (en) * 2019-04-12 2022-03-22 深圳壹账通智能科技有限公司 Data packet processing method and device, computer readable storage medium and server
CN110297670A (en) * 2019-05-17 2019-10-01 北京瀚海星云科技有限公司 A kind of method and system improving distributed task scheduling training effectiveness on container cloud
CN110297670B (en) * 2019-05-17 2023-06-27 深圳致星科技有限公司 Method and system for improving training efficiency of distributed tasks on container cloud
CN112148438A (en) * 2019-06-28 2020-12-29 杭州海康威视数字技术股份有限公司 Abnormal task processing method, abnormal task scheduling method, abnormal task processing device, abnormal task scheduling device and computer storage medium
CN110347490A (en) * 2019-07-12 2019-10-18 北京天云融创软件技术有限公司 A kind of resource scalable combined schedule method in kubernetes
CN112291288A (en) * 2019-07-24 2021-01-29 北京金山云网络技术有限公司 Container cluster expansion method and device, electronic equipment and readable storage medium
CN112291288B (en) * 2019-07-24 2022-10-04 北京金山云网络技术有限公司 Container cluster expansion method and device, electronic equipment and readable storage medium
CN110413391A (en) * 2019-07-24 2019-11-05 上海交通大学 Deep learning task service method for ensuring quality and system based on container cluster
CN112350837B (en) * 2019-08-06 2022-10-28 南京南瑞继保电气有限公司 Cloud platform-based power application cluster management method and device
CN112350837A (en) * 2019-08-06 2021-02-09 南京南瑞继保电气有限公司 Cloud platform-based power application cluster management method and device
CN110515730A (en) * 2019-08-22 2019-11-29 北京宝兰德软件股份有限公司 Resource secondary dispatching method and device based on kubernetes container arranging system
CN110597626A (en) * 2019-08-23 2019-12-20 第四范式(北京)技术有限公司 Method, device and system for allocating resources and tasks in distributed system
CN110597626B (en) * 2019-08-23 2022-09-06 第四范式(北京)技术有限公司 Method, device and system for allocating resources and tasks in distributed system
CN110580194A (en) * 2019-08-29 2019-12-17 上海仪电(集团)有限公司中央研究院 Container scheduling method based on memory hot plug technology and management node scheduler
CN110968424A (en) * 2019-09-12 2020-04-07 广东浪潮大数据研究有限公司 Resource scheduling method, device and storage medium based on K8s
CN110968424B (en) * 2019-09-12 2023-04-07 广东浪潮大数据研究有限公司 Resource scheduling method, device and storage medium based on K8s
CN110825520B (en) * 2019-10-18 2023-08-29 山东省计算中心(国家超级计算济南中心) Cluster extremely-fast elastic telescoping method for realizing efficient resource utilization
CN110825520A (en) * 2019-10-18 2020-02-21 山东省计算中心(国家超级计算济南中心) Cluster top-speed elastic expansion method for realizing efficient resource utilization
CN110753107A (en) * 2019-10-21 2020-02-04 中国科学院空间应用工程与技术中心 Resource scheduling system, method and storage medium under space-based cloud computing architecture
CN112835915B (en) * 2019-11-25 2023-07-18 中国移动通信集团辽宁有限公司 MPP database system, data storage method and data query method
CN112835915A (en) * 2019-11-25 2021-05-25 中国移动通信集团辽宁有限公司 MPP database system, data storage method and data query method
WO2021103790A1 (en) * 2019-11-26 2021-06-03 北京京东尚科信息技术有限公司 Container scheduling method and apparatus, and non-volatile computer-readable storage medium
EP4068090A4 (en) * 2019-11-26 2024-01-03 Beijing Jingdong Shangke Information Technology Co Ltd Container scheduling method and apparatus, and non-volatile computer-readable storage medium
CN111045821A (en) * 2019-12-06 2020-04-21 北京浪潮数据技术有限公司 Container scheduling method and device, container scheduler and readable storage medium
CN113037800A (en) * 2019-12-09 2021-06-25 华为技术有限公司 Job scheduling method and job scheduling device
CN113037800B (en) * 2019-12-09 2024-03-05 华为云计算技术有限公司 Job scheduling method and job scheduling device
CN111147565A (en) * 2019-12-22 2020-05-12 北京浪潮数据技术有限公司 Cluster node control method, device and equipment and readable storage medium
CN111399855A (en) * 2020-03-09 2020-07-10 山东汇贸电子口岸有限公司 Automatic application instance publishing method based on container technology
CN111399855B (en) * 2020-03-09 2023-10-20 山东省电子口岸有限公司 Automatic application instance publishing method based on container technology
CN113296870A (en) * 2020-04-07 2021-08-24 阿里巴巴集团控股有限公司 Method and device for predicting Kubernetes cluster configuration
CN113296870B (en) * 2020-04-07 2024-03-08 阿里巴巴集团控股有限公司 Method and device for predicting Kubernetes cluster configuration
CN111625420A (en) * 2020-05-21 2020-09-04 浪潮电子信息产业股份有限公司 Distributed training task processing method, device, equipment and storage medium
CN113872997B (en) * 2020-06-30 2022-08-26 华为技术有限公司 Container group POD reconstruction method based on container cluster service and related equipment
CN113872997A (en) * 2020-06-30 2021-12-31 华为技术有限公司 Container group POD reconstruction method based on container cluster service and related equipment
CN111880936A (en) * 2020-07-31 2020-11-03 广州华多网络科技有限公司 Resource scheduling method and device, container cluster, computer equipment and storage medium
CN111880936B (en) * 2020-07-31 2023-08-08 广州华多网络科技有限公司 Resource scheduling method, device, container cluster, computer equipment and storage medium
CN112130991A (en) * 2020-08-28 2020-12-25 北京思特奇信息技术股份有限公司 Application program control method and system based on machine learning
CN112416368B (en) * 2020-11-25 2024-01-16 中国科学技术大学先进技术研究院 Cache deployment and task scheduling method, terminal and computer readable storage medium
CN112416368A (en) * 2020-11-25 2021-02-26 中国科学技术大学先进技术研究院 Cache deployment and task scheduling method, terminal and computer readable storage medium
CN112363820A (en) * 2020-12-01 2021-02-12 成都精灵云科技有限公司 Uniform resource pooling container scheduling engine based on heterogeneous hardware and scheduling method thereof
CN113778610A (en) * 2021-01-12 2021-12-10 北京沃东天骏信息技术有限公司 Method and apparatus for determining resources
CN113778610B (en) * 2021-01-12 2024-04-09 北京沃东天骏信息技术有限公司 Method and device for determining resources
EP3955174A3 (en) * 2021-03-10 2022-05-04 Beijing Baidu Netcom Science And Technology Co. Ltd. Method, apparatus and storage medium for training a deep learning framework
CN113190344A (en) * 2021-03-26 2021-07-30 中国科学院软件研究所 Method and device for dynamic reconfiguration and deployment of neural network for software-defined satellite
CN113190344B (en) * 2021-03-26 2023-12-15 中国科学院软件研究所 Method and device for dynamic reconfiguration deployment of neural network for software defined satellite
CN113760448A (en) * 2021-04-30 2021-12-07 中科天玑数据科技股份有限公司 Big data management platform based on kubernets
CN113407305A (en) * 2021-05-31 2021-09-17 北京达佳互联信息技术有限公司 Task deployment method and device, electronic equipment and storage medium
CN113452758A (en) * 2021-06-04 2021-09-28 中国联合网络通信集团有限公司 Service access method and device
CN113296868A (en) * 2021-07-27 2021-08-24 杭州筋斗腾云科技有限公司 Application platform and application management method
CN114661247B (en) * 2022-05-23 2022-09-20 武汉四通信息服务有限公司 Automatic capacity expansion method and device, electronic equipment and storage medium
CN114661247A (en) * 2022-05-23 2022-06-24 武汉四通信息服务有限公司 Automatic capacity expansion method and device, electronic equipment and storage medium
CN114691050A (en) * 2022-05-26 2022-07-01 深圳前海环融联易信息科技服务有限公司 Cloud native storage method, device, equipment and medium based on kubernets
CN116155750A (en) * 2023-04-19 2023-05-23 之江实验室 Deep learning job resource placement method, system, equipment and storage medium
CN116339926B (en) * 2023-05-22 2023-08-08 成都交控轨道科技有限公司 Containerized deployment method of ATS software
CN116339926A (en) * 2023-05-22 2023-06-27 成都交控轨道科技有限公司 Containerized deployment method of ATS software
CN117806815A (en) * 2023-11-27 2024-04-02 本原数据(北京)信息技术有限公司 Data processing method, system, electronic device and storage medium
CN117632444A (en) * 2024-01-26 2024-03-01 之江实验室 NPU fault-tolerant scheduling system of computer cluster

Similar Documents

Publication Publication Date Title
CN109117265A (en) The method, apparatus, equipment and storage medium of schedule job in the cluster
CN107273185B (en) Load balancing control method based on virtual machine
US9977689B2 (en) Dynamic scaling of management infrastructure in virtual environments
US9684542B2 (en) Smart cloud workload balancer
CN109034396B (en) Method and apparatus for processing deep learning jobs in a distributed cluster
US10715460B2 (en) Opportunistic resource migration to optimize resource placement
CN105049268B (en) Distributed computing resource distribution system and task processing method
US11106508B2 (en) Elastic multi-tenant container architecture
CN106933669B (en) Apparatus and method for data processing
US20170060707A1 (en) High availability dynamic restart priority calculator
EP3688585A1 (en) Autonomous multitenant database cloud service framework
CN109075994A (en) More depot complexes
CN107864211B (en) Cluster resource dispatching method and system
CN107451853B (en) Method, device and system for real-time red packet distribution and storage medium
CN117480494A (en) Coordinated container scheduling for improved resource allocation in virtual computing environments
CN110399272A (en) Log processing equipment, method, electronic equipment and computer readable storage medium
CN113342477A (en) Container group deployment method, device, equipment and storage medium
CN111666158A (en) Kubernetes-based container scheduling method and device, storage medium and electronic equipment
CN116450355A (en) Multi-cluster model training method, device, equipment and medium
CN114090191A (en) Method, device and equipment for scheduling storage resources and storage medium
US10416892B2 (en) Fileset-based data locality enablement in distributed file systems
CN112148458A (en) Task scheduling method and device
CN109005071A (en) A kind of decision and deployment method and controlling equipment
CN108196797A (en) A kind of data processing system based on cloud computing
CN117155804B (en) Cloud server deployment method and related device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190101