CN114020584A - Operation distribution method and device and computing equipment - Google Patents

Operation distribution method and device and computing equipment Download PDF

Info

Publication number
CN114020584A
CN114020584A CN202210002720.6A CN202210002720A CN114020584A CN 114020584 A CN114020584 A CN 114020584A CN 202210002720 A CN202210002720 A CN 202210002720A CN 114020584 A CN114020584 A CN 114020584A
Authority
CN
China
Prior art keywords
resources
resource
job
application software
cluster
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210002720.6A
Other languages
Chinese (zh)
Other versions
CN114020584B (en
Inventor
毛登峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Paratera Technology Co ltd
Original Assignee
Beijing Paratera Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Paratera Technology Co ltd filed Critical Beijing Paratera Technology Co ltd
Priority to CN202210002720.6A priority Critical patent/CN114020584B/en
Publication of CN114020584A publication Critical patent/CN114020584A/en
Application granted granted Critical
Publication of CN114020584B publication Critical patent/CN114020584B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3017Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is implementing multitasking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3051Monitoring arrangements for monitoring the configuration of the computing system or of the computing system component, e.g. monitoring the presence of processing resources, peripherals, I/O links, software programs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5066Algorithms for mapping a plurality of inter-dependent sub-tasks onto a plurality of physical CPUs

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Stored Programmes (AREA)

Abstract

The invention discloses an operation distribution method, an operation distribution device and computing equipment, wherein the method comprises the following steps: detecting the operation submitted by a user to identify one or more application software to be called when the operation is executed; determining dense resources corresponding to the operation according to one or more application software needing to be called during operation of the operation based on the resource pressure rating table; calculating the total score of the pressure formed by all the operations in the cluster on each resource; if the resources with the total scores exceeding the corresponding threshold exist, whether the resources with the total scores exceeding the corresponding threshold are overlapped with the dense resources is determined; and if the resources with the total scores exceeding the corresponding threshold values are overlapped with the dense resources, shunting the operation.

Description

Operation distribution method and device and computing equipment
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a method and an apparatus for job splitting, a computing device, and a storage medium.
Background
Because a computer provides limited computing power, when a job (i.e., a computing task) with a large computing size needs to be processed, a cluster is usually used for computing. A cluster is a supercomputer composed of a plurality of computers as nodes interconnected via an internal high-speed network, in other words, a cluster is a generic term for a plurality of nodes interconnected via an internal high-speed network.
The users can apply for resources as required to complete the job by sharing and using the cluster, and although the cluster can allocate available resources according to the application of the users, the resources of the cluster are limited and cannot meet the resource requirements of all the users at any time, so the cluster needs to use the job scheduler to reasonably schedule and allocate the resources. The user submits the operation to the operation dispatcher, and then the operation dispatcher distributes the operation to the nodes for operation according to the established rule, and finally the operation result is obtained.
Different users on the cluster can select corresponding application software to run the operation according to the research field of the users. For example, users of bioinformatics perform similarity search on DNA (deoxyribose nucleic Acid) sequences using FASTA (an application software for integrated sequence analysis to find homologous sequences in databases).
Different application software has different characteristics, and the application software can be classified according to the characteristics of the occupation (namely load) of application software resources, such as calculation intensive type, input and output intensive type, network intensive type and the like. The design of the cluster usually keeps a balance in terms of computation, input and output and network, and if a large number of computation-intensive users, input and output-intensive users or network-intensive users appear on a cluster, the normal use of all users on the cluster is affected because a single resource is exhausted. Because the dense resources (i.e. load types) occupied by the jobs cannot be predicted in advance, it cannot be determined which jobs should be forwarded to other clusters for operation when job shunting is performed, which results in serious job stacking of the cluster and low operation efficiency.
Therefore, a new job splitting method is needed to optimize the above process.
Disclosure of Invention
To this end, the present invention provides a job splitting scheme in an attempt to solve or at least alleviate the above-identified problems.
According to an aspect of the present invention, there is provided a job splitting method, including the steps of: firstly, detecting a job submitted by a user to identify one or more application software to be called when the job runs; determining dense resources corresponding to the operation according to one or more application software needing to be called during operation of the operation based on the resource pressure rating table; calculating the total score of the pressure formed by all the operations in the cluster on each resource; if the resources with the total scores exceeding the corresponding threshold exist, whether the resources with the total scores exceeding the corresponding threshold are overlapped with the dense resources is determined; and if the resources with the total scores exceeding the corresponding threshold values are overlapped with the dense resources, shunting the operation.
Optionally, in the job splitting method according to the present invention, the step of detecting a job submitted by a user to identify one or more application software to be invoked when the job runs includes: acquiring a script of a job submitted by a user; and matching and detecting the script and the calling identification of each application software on the cluster so as to identify one or more application software to be called when the job runs.
Optionally, in the job splitting method according to the present invention, the resource pressure scoring table includes scoring information of each resource occupied by the plurality of application software, where the resource includes at least one of a computing resource, an input/output resource, a network resource, a memory resource, and an authorized resource.
Optionally, in the job splitting method according to the present invention, the step of determining the intensive resources corresponding to the job according to one or more application software to be called when the job runs based on the resource pressure scoring table includes: determining the grade of the pressure formed by the operation on each resource according to one or more application software needing to be called during operation of the operation based on a resource pressure grade table; and taking the resource with the highest score as the intensive resource corresponding to the operation.
Optionally, in the job splitting method according to the present invention, the step of calculating a total score of pressures formed by all jobs on each resource in the cluster includes: and respectively summing the scores of the pressure formed by all the jobs on each resource in the cluster to calculate the total score of the pressure formed by all the jobs on each resource.
Optionally, in the job splitting method according to the present invention, the splitting the job includes: and submitting the operation to other clusters for operation, and acquiring corresponding operation results from the cluster for operating the operation.
Optionally, in the job splitting method according to the present invention, the method further includes generating a resource pressure score table in advance, and the step of generating the resource pressure score table in advance includes: according to the resource consumption characteristics of each application software on the cluster, acquiring related data information by monitoring each application software; and generating a resource pressure rating table based on the data information of each application software.
Optionally, in the job splitting method according to the present invention, the method further includes: pre-creating calling identifiers of each application software; and generating a corresponding installation path based on the calling identifier of each application software, and installing each application software.
According to still another aspect of the present invention, a job shunting device is provided, which includes a detection module, a determination module, a calculation module, a confirmation module, and a shunting module. The detection module is suitable for detecting the operation submitted by the user so as to identify one or more application software to be called when the operation is executed; the determining module is suitable for determining dense resources corresponding to the operation according to one or more application software needing to be called during operation of the operation based on the resource pressure scoring table; the computing module is suitable for computing the total score of the pressure formed by all the jobs in the cluster on each resource; the confirming module is suitable for confirming whether the resources with the total scores exceeding the corresponding threshold value are overlapped with the dense resources or not when the resources with the total scores exceeding the corresponding threshold value exist; and the shunting module is suitable for shunting the operation when the resources with the total scores exceeding the corresponding threshold values coincide with the intensive resources.
According to yet another aspect of the present invention, there is provided a computing device comprising: at least one processor; and a memory storing program instructions, wherein the program instructions are configured to be executed by the at least one processor, the program instructions comprising instructions for performing the job diversion method as described above.
According to still another aspect of the present invention, there is provided a readable storage medium storing program instructions that, when read and executed by a computing device, cause the computing device to perform the job shunting method as described above.
According to the job shunting scheme, jobs submitted by users are detected, application software needing to be called during job operation is identified, dense resources corresponding to the jobs are determined based on a resource pressure scoring table, total scores of pressure formed by all the jobs on the resources in a cluster are calculated, and if resources with the total scores exceeding corresponding thresholds exist and the resources with the total scores exceeding the corresponding thresholds are overlapped with the dense resources, the jobs are shunted. In the technical scheme, on the premise of acquiring the application software which needs to be called when the operation is performed, the intensive resources corresponding to the operation are identified and judged according to the condition that the application software occupies the resources, so that the operation is dynamically scheduled and distributed to other clusters with idle resources, the occurrence of concentrated hot spots of the same intensive resources is avoided, the operation pressure of the clusters is reduced, and the reliability of the cluster operation is improved.
Drawings
To the accomplishment of the foregoing and related ends, certain illustrative aspects are described herein in connection with the following description and the annexed drawings, which are indicative of various ways in which the principles disclosed herein may be practiced, and all aspects and equivalents thereof are intended to be within the scope of the claimed subject matter. The above and other objects, features and advantages of the present disclosure will become more apparent from the following detailed description read in conjunction with the accompanying drawings. Throughout this disclosure, like reference numerals generally refer to like parts or elements.
FIG. 1 shows a block diagram of a computing device 100, according to an embodiment of the invention;
FIG. 2 illustrates a flow diagram of a job splitting method 200 according to one embodiment of the invention; and
fig. 3 shows a schematic view of a job diversion apparatus 300 according to an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
FIG. 1 shows a block diagram of a computing device 100, according to one embodiment of the invention.
As shown in FIG. 1, in a basic configuration 102, a computing device 100 typically includes a system memory 106 and one or more processors 104. A memory bus 108 may be used for communication between the processor 104 and the system memory 106.
Depending on the desired configuration, the processor 104 may be any type of processing, including but not limited to: a microprocessor (UP), a microcontroller (UC), a digital information processor (DSP), or any combination thereof. The processor 104 may include one or more levels of cache, such as a level one cache 110 and a level two cache 112, a processor core 114, and registers 116. The example processor core 114 may include an Arithmetic Logic Unit (ALU), a Floating Point Unit (FPU), a digital signal processing core (DSP core), or any combination thereof. The example memory controller 118 may be used with the processor 104, or in some implementations the memory controller 118 may be an internal part of the processor 104.
Depending on the desired configuration, system memory 106 may be any type of memory, including but not limited to: volatile memory (such as RAM), non-volatile memory (such as ROM, flash memory, etc.), or any combination thereof. System memory 106 may include an operating system 120, one or more applications 122, and program data 124. In some implementations, the application 122 can be arranged to execute instructions on an operating system with program data 124 by one or more processors 104.
Computing device 100 also includes a storage device 132, storage device 132 including removable storage 136 and non-removable storage 138.
Computing device 100 may also include a storage interface bus 134. The storage interface bus 134 enables communication from the storage devices 132 (e.g., removable storage 136 and non-removable storage 138) to the basic configuration 102 via the bus/interface controller 130. Operating system 120, applications 122, and at least a portion of program data 124 may be stored on removable storage 136 and/or non-removable storage 138, and loaded into system memory 106 via storage interface bus 134 and executed by one or more processors 104 when computing device 100 is powered on or applications 122 are to be executed.
Computing device 100 may also include an interface bus 140 that facilitates communication from various interface devices (e.g., output devices 142, peripheral interfaces 144, and communication devices 146) to the basic configuration 102 via the bus/interface controller 130. The example output device 142 includes an image processing unit 148 and an audio processing unit 150. They may be configured to facilitate communication with various external devices, such as a display or speakers, via one or more a/V ports 152. Example peripheral interfaces 144 may include a serial interface controller 154 and a parallel interface controller 156, which may be configured to facilitate communication with external devices such as input devices (e.g., keyboard, mouse, pen, voice input device, touch input device) or other peripherals (e.g., printer, scanner, etc.) via one or more I/O ports 158. An example communication device 146 may include a network controller 160, which may be arranged to facilitate communications with one or more other computing devices 162 over a network communication link via one or more communication ports 164.
A network communication link may be one example of a communication medium. Communication media may typically be embodied by computer readable instructions, data structures, program modules, and may include any information delivery media, such as carrier waves or other transport mechanisms, in a modulated data signal. A "modulated data signal" may be a signal that has one or more of its data set or its changes made in a manner that encodes information in the signal. By way of non-limiting example, communication media may include wired media such as a wired network or private-wired network, and various wireless media such as acoustic, Radio Frequency (RF), microwave, Infrared (IR), or other wireless media. The term computer readable media as used herein may include both storage media and communication media.
Computing device 100 may be implemented as a personal computer including both desktop and notebook computer configurations. Of course, computing device 100 may also be implemented as part of a small-form factor portable (or mobile) electronic device such as a cellular telephone, a digital camera, a Personal Digital Assistant (PDA), a personal media player device, a wireless web-watch device, a personal headset, an application specific device, or a hybrid device that include any of the above functions. And may even be implemented as a server, such as a file server, a database server, an application server, a WEB server, and so forth. The embodiments of the present invention are not limited thereto.
In an embodiment according to the present invention, the computing device 100 may be implemented as a job scheduler, which is a server and may be configured to schedule a job submitted by a user in a cluster to which the job belongs, and to distribute the job to nodes in the cluster accordingly.
The computing device 100 is configured to execute a job splitting method 200 according to the present invention. Among other things, application 122 disposed on an operating system contains a plurality of program instructions for executing job splitting method 200 of the present invention, which may instruct processor 104 to execute job splitting method 200 of the present invention such that computing device 200 splits a job by executing job splitting method 200 of the present invention.
According to an embodiment of the present invention, the application 122 disposed on the operating system includes a job splitting device 300, and a plurality of program instructions for executing the job splitting method 200 of the present invention are included in the job splitting device 300, so that the job splitting method 200 of the present invention can be executed in the job splitting device 300. When the user job is submitted, the job scheduler provides a plug-in mechanism for checking job parameters, and based on this, the job shunting device 300 can be implemented as a plug-in for detecting and identifying application software called by the job, determining intensive resources of the job, shunting the job according to the over-limit condition of cluster bearing pressure, and the like.
FIG. 2 shows a flow diagram of a job splitting method 200 according to one embodiment of the invention. The job splitting method 200 may be executed in the job splitting apparatus 300 of a computing device (e.g., the aforementioned computing device 100).
As shown in fig. 2, the method 200 begins at step S210. Before step S210 begins, according to one embodiment of the invention, method 200 further comprises pre-generating a resource pressure rating table. In this embodiment, the resource pressure rating table may be generated in advance as follows. And according to the resource consumption characteristics of each application software on the cluster, monitoring each application software to obtain related data information, and generating a resource pressure rating table based on the data information of each application software. The resource pressure scoring table comprises scoring information of each resource occupied by the plurality of application software, and the resource comprises at least one of computing resource, input and output resource, network resource, memory resource and authorized resource.
For example, the occupation of computing resources is mainly focused on the computing aspect of a Central Processing Unit (CPU), and is generally used for large-scale scientific engineering computing, numerical simulation, and the like, the occupation of input and output resources is mostly related to a digital library, a data warehouse, data mining, computing visualization, and the like when files are read and written, and the occupation of network resources is mostly generated in the network communication process between nodes, such as cooperative work, grid computing, remote control, remote diagnosis, and the like.
For example, for a certain application software a, the resource consumption characteristic is that the occupied resource is a computing resource, and the computing resource is usually measured by the CPU occupancy rate, by monitoring the whole process of calling the application software a to run the job, if the CPU occupancy rate during the period is 100%, the pressure score formed by the application software a occupying the computing resource can be 10, and if the CPU occupancy rate during only half of the period is 100%, and the CPU occupancy rate during the other half of the period is 50%, the pressure score formed by the application software a occupying the computing resource can be 7.5. The generation mode of the scoring information of other resources such as input and output resources, network resources, memory resources, authorized resources and the like can refer to the above scoring information generation process of computing resources, and can also be adjusted according to actual conditions, and is not limited.
In consideration of the fact that the resources that are likely to cause bottlenecks are mainly computing resources, input/output resources, and network resources, the following description will take these three resources as examples. Table 1 shows an example of a resource pressure rating table according to an embodiment of the present invention, which is specifically as follows:
Figure 225227DEST_PATH_IMAGE001
TABLE 1
As shown in table 1, the resource pressure scoring table includes scoring information of computing resources, input/output resources, and network resources occupied by application software FASTA, GROMACS (molecular dynamics application software for researching biomolecular systems), WRF (application software for weather simulation and forecast). The pressure scores formed by the FASTA occupying the computing resources, the input and output resources and the network resources are respectively 4 points, 8 points and 2 points, the pressure scores formed by the GROMACS occupying the computing resources, the input and output resources and the network resources are respectively 3 points, 8 points and 2 points, and the pressure scores formed by the WRF occupying the computing resources, the input and output resources and the network resources are respectively 9 points.
Considering that it is usually very difficult to detect and identify the application software used by the user, it is necessary to uniformly manage the application software on the cluster. Otherwise, even for the same application software softX, user U1 may eventually install the application software softX under path/home/userA/softX/bin/softX following a standard installation procedure, invoking softX at the time of submitting the job using the following command:
/home/userA/softX/bin/softx arg1 arg2 arg3
whereas user U2 may prefer a custom installation, installing the application software softX under the path of/home/userB/abc/bin/softX 1, invoking softX when submitting a job would use the following commands:
/home/userB/abc/bin/softx1 arg1 arg2 arg3
therefore, if the application software in the cluster is not managed, even if the application software is the same, the calling modes in the jobs of different users may be different, so that the application software called by the users is difficult to accurately identify from the jobs.
According to an embodiment of the present invention, the method 200 further includes creating a calling identifier of each application software in advance, generating a corresponding installation path based on the calling identifier of each application software, and installing each application software. The calling identifier includes an application software name and version, and can be embodied in a form of "< application software name >/< version >", and each application software is correspondingly installed on each node in the cluster.
For example, the application software FASTA, GROMACS, WRF are installed according to the following rules:
/public1/software/< application software name >/< version >
/public1/software/FASTA/36.3.8
/public1/software/GROMACS/3.9.1
/public1/software/WRF/4.2
Since "< application software name >/< version >" forms the unique identification feature of the application software, the calling identifications of the application software FASTA, GROMACS and WRF are FASTA/36.3.8, GROMACS/3.9.1 and WRF/4.2 respectively.
In step S210, a job submitted by a user is detected to identify one or more application software to be invoked when the job runs. According to one embodiment of the present invention, a job submitted by a user can be detected as follows. The method comprises the steps of firstly obtaining a script of a job submitted by a user, and then carrying out matching detection on the script and calling identifications of all application software on a cluster so as to identify one or more application software to be called when the job runs.
For example, if a job submitted by a user needs to call application software FASTA during running, the script of the job usually includes the following contents:
# setting application software path information first, then running application software
export PATH=/public1/software/FASTA/36.3.8/bin:$PATH
namd input.namd
Or:
# directly running application software in full path
/public1/software/FASTA/36.3.8/bin/namd input.namd
Matching detection is carried out on the script of the job and the call identifier of each application software on the cluster, and finally the script can match the call identifier 'FASTA/36.3.8', so that the application software needing to be called during the operation of the job is determined to be FASTA, and the version of the application software is 36.3.8.
Subsequently, step S220 is performed, and based on the resource pressure scoring table, the intensive resource corresponding to the job is determined according to one or more application software to be invoked when the job runs. According to one embodiment of the invention, the dense resources corresponding to a job may be determined as follows. In this embodiment, based on the resource pressure scoring table, according to one or more application software that needs to be called during the operation of the job, the scoring of the pressure formed by the job on each resource is determined, and then the resource with the highest scoring is used as the intensive resource corresponding to the job.
If the application software to be called during the operation of the job is obtained as FASTA in step S210, referring to table 1, the calculation resources, the input/output resources and the network resources of the job pair are determined from the resource pressure scoring table, the scores of the formed pressures are 4, 8 and 2, respectively, and the input/output resources with the highest score are used as the intensive resources corresponding to the job.
The determination method of the dense resources may be adjusted according to actual conditions, for example, one or more resources with scores exceeding a certain threshold are used as dense resources corresponding to the job, or resources with scores ranked from high to low and ranked as the first two are used as dense resources corresponding to the job, which is not limited in the present invention.
Next, in step S230, a total score of the stress formed by all jobs on each resource in the cluster is calculated. Before calculating the total score, the scores of the pressures formed by all the jobs in the cluster on each resource need to be obtained, and the obtaining process may refer to the processing manners of steps S210 and S220, which are not described herein again. After the scores of the pressures formed by all the jobs on the resources in the cluster are obtained, according to one embodiment of the invention, the scores of the pressures formed by all the jobs on the resources in the cluster are respectively summed to calculate a total score of the pressures formed by all the jobs on the resources. All jobs in the cluster are jobs that are handled by the cluster and are currently being handled by the cluster, and the job that has just been submitted to the job scheduler does not belong to all jobs in the cluster.
Assume that there are a total of 3 jobs in the cluster, job J1, job J2, and job J3, respectively. Job J1 requires application software FASTA to be called when running, job J2 requires application software FASTA and WRF to be called when running, and job J3 requires application software gromac to be called when running. Then, in combination with the resource pressure scoring table in table 1, the total score of the pressures formed by all 3 jobs in the cluster on the computing resources can be calculated to be 4+ (4+9) +3=20, the total score of the pressures formed on the input/output resources is 8+ (8+ 9) +8=33, and the total score of the pressures formed on the network resources is 2+ (2+9) +2= 22.
In step S240, if there are resources whose total scores exceed the corresponding thresholds, it is determined whether there is an overlap between the resources whose total scores exceed the corresponding thresholds and the dense resources.
According to one embodiment of the invention, 1000 jobs exist in the current cluster, and the total scores of the 1000 jobs for the pressure formed by the computing resources, the input and output resources and the network resources are 5000 points, 15000 points and 12000 points respectively. Supposing that the upper pressure limits of computing resources, input and output resources and network resources borne by stable operation of the cluster are obtained through means such as testing, and the corresponding thresholds are 10000 points, 14000 points and 13000 points respectively. And comparing the total score with the corresponding threshold value to know that the resource with the total score exceeding the corresponding threshold value exists, and the resource is an input/output resource.
In step S220, it can be known that the dense resources corresponding to the job submitted by the user are input and output resources, and are consistent with the resources whose total scores exceed the corresponding thresholds, and it can be confirmed that the resources whose total scores exceed the corresponding thresholds coincide with the dense resources.
Finally, step S250 is executed, and if the resources with the total scores exceeding the corresponding thresholds coincide with the intensive resources, the job is shunted. According to an embodiment of the present invention, the job may be submitted to other clusters to be executed, and a corresponding execution result may be obtained from the cluster in which the job is executed, so as to implement job splitting.
Of course, the job shunting processing method may also temporarily store the job in a buffer area, besides forwarding the job to other clusters with idle resources to complete the calculation. And other clusters can acquire the operation from the buffer area and then run, and after the running is finished, the running result is put back to the buffer area again. The above examples of the job splitting are described, and the specific manner can be implemented according to the actual situation, which is not limited by the present invention.
In addition, if the resource whose total score exceeds the corresponding threshold does not coincide with the intensive resource, that is, for example, the intensive resource corresponding to the job is a computing resource and/or a network resource, the current cluster may still run the job. And if the resources with the total scores exceeding the corresponding threshold value do not exist, the current cluster has no operation limitation.
Fig. 3 shows a schematic view of a job diversion apparatus 300 according to an embodiment of the present invention. Job shunting device 300 resides in a computing apparatus (e.g., computing apparatus 100 described above). The job diversion apparatus 300 diverts jobs by executing the job diversion method 200 of the present invention.
As shown in fig. 3, the job shunting device 300 includes a detection module 310, a determination module 320, a calculation module 330, a confirmation module 340, and a shunting module 350. The detecting module 310 is connected to the determining module 320, and the determining module 320, the calculating module 320, and the shunting module 350 are all connected to the confirming module 340.
The detection module 310 may detect a job submitted by a user to identify one or more application software to be invoked when the job runs. Subsequently, the determining module 320 may determine the intensive resources corresponding to the job according to the one or more application software to be invoked when the job runs based on the resource pressure rating table. Calculation module 330 may calculate a total score for the stress that all jobs in the cluster create on each resource. The validation module 340 may validate whether there is coincidence between the resources whose total scores exceed the corresponding thresholds and the dense resources when there are resources whose total scores exceed the corresponding thresholds. The diversion module 350 may divert a job when the resources whose total scores exceed the corresponding thresholds coincide with the intensive resources.
It should be noted that the detecting module 310 is configured to execute the aforementioned step S210, the determining module 320 is configured to execute the aforementioned step S220, the calculating module 330 is configured to execute the aforementioned step S230, the confirming module 340 is configured to execute the aforementioned step S240, and the shunting module 350 is configured to execute the aforementioned step S250. Here, for the execution logic of the detecting module 310, the determining module 320, the calculating module 330, the confirming module 340 and the shunting module 350, reference may be made to the detailed description of the steps S210 to S250 in the method 200, and no further description is given here.
According to the job shunting scheme provided by the embodiment of the invention, the jobs submitted by users are detected, the application software required to be called during the operation of the jobs is identified, the dense resources corresponding to the jobs are determined based on the resource pressure scoring table, the total score of the pressure formed by all the jobs on each resource in the cluster is calculated, and if the resources with the total scores exceeding the corresponding threshold exist and the resources with the total scores exceeding the corresponding threshold are overlapped with the dense resources, the jobs are shunted. In the technical scheme, on the premise of acquiring the application software which needs to be called when the operation is performed, the intensive resources corresponding to the operation are identified and judged according to the condition that the application software occupies the resources, so that the operation is dynamically scheduled and distributed to other clusters with idle resources, the occurrence of concentrated hot spots of the same intensive resources is avoided, the operation pressure of the clusters is reduced, and the reliability of the cluster operation is improved.
The various techniques described herein may be implemented in connection with hardware or software or, alternatively, with a combination of both. Thus, the methods and apparatus of the present invention, or certain aspects or portions thereof, may take the form of program code (i.e., instructions) embodied in tangible media, such as removable hard drives, U.S. disks, floppy disks, CD-ROMs, or any other machine-readable storage medium, wherein, when the program is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the invention.
In the case of program code execution on programmable computers, the computing device will generally include a processor, a storage medium readable by the processor (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device. Wherein the memory is configured to store program code; the processor is configured to execute the job shunting method of the present invention according to instructions in the program code stored in the memory.
By way of example, and not limitation, readable media may comprise readable storage media and communication media. Readable storage media store information such as computer readable instructions, data structures, program modules or other data. Communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. Combinations of any of the above are also included within the scope of readable media.
In the description provided herein, algorithms and displays are not inherently related to any particular computer, virtual system, or other apparatus. Various general purpose systems may also be used with examples of this invention. The required structure for constructing such a system will be apparent from the description above. Moreover, the present invention is not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best mode of the invention.
In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim.
Those skilled in the art will appreciate that the modules or units or components of the devices in the examples disclosed herein may be arranged in a device as described in this embodiment or alternatively may be located in one or more devices different from the devices in this example. The modules in the foregoing examples may be combined into one module or may be further divided into multiple sub-modules.
Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments.
Furthermore, some of the described embodiments are described herein as a method or combination of method elements that can be performed by a processor of a computer system or by other means of performing the described functions. A processor having the necessary instructions for carrying out the method or method elements thus forms a means for carrying out the method or method elements. Further, the elements of the apparatus embodiments described herein are examples of the following apparatus: the apparatus is used to implement the functions performed by the elements for the purpose of carrying out the invention.
As used herein, unless otherwise specified the use of the ordinal adjectives "first", "second", "third", etc., to describe a common object, merely indicate that different instances of like objects are being referred to, and are not intended to imply that the objects so described must be in a given sequence, either temporally, spatially, in ranking, or in any other manner.
While the invention has been described with respect to a limited number of embodiments, those skilled in the art, having benefit of this description, will appreciate that other embodiments can be devised which do not depart from the scope of the invention as described herein. Furthermore, it should be noted that the language used in the specification has been principally selected for readability and instructional purposes, and may not have been selected to delineate or circumscribe the inventive subject matter. Accordingly, many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the appended claims. The present invention has been disclosed in an illustrative rather than a restrictive sense, and the scope of the present invention is defined by the appended claims.

Claims (10)

1. A job splitting method includes:
detecting a job submitted by a user to identify one or more application software to be called when the job runs;
determining intensive resources corresponding to the operation according to one or more application software to be called when the operation runs on the basis of a resource pressure rating table;
calculating the total score of the pressure formed by all the operations in the cluster on each resource;
if the resources with the total scores exceeding the corresponding threshold exist, whether the resources with the total scores exceeding the corresponding threshold are overlapped with the dense resources is determined;
and if the resources with the total scores exceeding the corresponding threshold values are overlapped with the dense resources, carrying out shunting processing on the operation.
2. The method of claim 1, wherein the step of detecting a user-submitted job to identify one or more application software to be invoked when the job is run comprises:
acquiring a script of a job submitted by a user;
and matching and detecting the script and the calling identification of each application software on the cluster so as to identify one or more application software to be called when the job runs.
3. The method of claim 1 or 2, wherein the resource pressure scoring table comprises scoring information of each resource occupied by a plurality of application software, and the resource comprises at least one of a computing resource, an input/output resource, a network resource, a memory resource and an authorized resource.
4. The method of claim 1 or 2, wherein the step of determining the intensive resources corresponding to the job according to the one or more application software to be invoked when the job runs based on the resource pressure scoring table comprises:
determining the grade of the pressure formed by the operation on each resource according to one or more application software needing to be called when the operation runs on the basis of a resource pressure grade table;
and taking the resource with the highest score as the intensive resource corresponding to the job.
5. The method of claim 1 or 2, wherein the step of calculating a total score of the stress developed by all jobs in the cluster on each resource comprises:
and respectively summing the scores of the pressures formed by all the jobs on the resources in the cluster to calculate the total score of the pressures formed by all the jobs on the resources.
6. The method of claim 1 or 2, wherein the step of offloading the job comprises:
and submitting the operation to other clusters for operation, and acquiring corresponding operation results from the clusters operating the operation.
7. The method of claim 1 or 2, further comprising pre-generating a resource pressure rating table, the pre-generating a resource pressure rating table comprising:
according to the resource consumption characteristics of each application software on the cluster, acquiring related data information by monitoring each application software;
and generating a resource pressure scoring table based on the data information of each application software.
8. A work diversion apparatus comprising:
the detection module is suitable for detecting the operation submitted by a user so as to identify one or more application software to be called when the operation runs;
the determining module is suitable for determining the intensive resources corresponding to the operation according to one or more application software needing to be called during operation of the operation based on a resource pressure scoring table;
the computing module is suitable for computing the total score of the pressure formed by all the jobs in the cluster on each resource;
the confirming module is suitable for confirming whether the resources with the total scores exceeding the corresponding threshold value are overlapped with the intensive resources or not when the resources with the total scores exceeding the corresponding threshold value exist;
and the shunting module is suitable for shunting the operation when the resources with the total scores exceeding the corresponding threshold values coincide with the intensive resources.
9. A computing device, comprising:
at least one processor; and
a memory storing program instructions, wherein the program instructions are configured to be adapted to be executed by the at least one processor, the program instructions comprising instructions for performing the method of any of claims 1-7.
10. A readable storage medium storing program instructions that, when read and executed by a computing device, cause the computing device to perform the method of any of claims 1-7.
CN202210002720.6A 2022-01-05 2022-01-05 Operation distribution method and device and computing equipment Active CN114020584B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210002720.6A CN114020584B (en) 2022-01-05 2022-01-05 Operation distribution method and device and computing equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210002720.6A CN114020584B (en) 2022-01-05 2022-01-05 Operation distribution method and device and computing equipment

Publications (2)

Publication Number Publication Date
CN114020584A true CN114020584A (en) 2022-02-08
CN114020584B CN114020584B (en) 2022-05-03

Family

ID=80069258

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210002720.6A Active CN114020584B (en) 2022-01-05 2022-01-05 Operation distribution method and device and computing equipment

Country Status (1)

Country Link
CN (1) CN114020584B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106502792A (en) * 2016-10-20 2017-03-15 华南理工大学 A kind of multi-tenant priority scheduling of resource method towards dissimilar load
US20170109205A1 (en) * 2015-10-20 2017-04-20 Nishi Ahuja Computing Resources Workload Scheduling
CN109167835A (en) * 2018-09-13 2019-01-08 重庆邮电大学 A kind of physics resource scheduling method and system based on kubernetes
CN110806928A (en) * 2019-10-16 2020-02-18 北京并行科技股份有限公司 Job submitting method and system
CN110908795A (en) * 2019-11-04 2020-03-24 深圳先进技术研究院 Cloud computing cluster mixed part job scheduling method and device, server and storage device
CN111813545A (en) * 2020-06-29 2020-10-23 北京字节跳动网络技术有限公司 Resource allocation method, device, medium and equipment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170109205A1 (en) * 2015-10-20 2017-04-20 Nishi Ahuja Computing Resources Workload Scheduling
CN106502792A (en) * 2016-10-20 2017-03-15 华南理工大学 A kind of multi-tenant priority scheduling of resource method towards dissimilar load
CN109167835A (en) * 2018-09-13 2019-01-08 重庆邮电大学 A kind of physics resource scheduling method and system based on kubernetes
CN110806928A (en) * 2019-10-16 2020-02-18 北京并行科技股份有限公司 Job submitting method and system
CN110908795A (en) * 2019-11-04 2020-03-24 深圳先进技术研究院 Cloud computing cluster mixed part job scheduling method and device, server and storage device
CN111813545A (en) * 2020-06-29 2020-10-23 北京字节跳动网络技术有限公司 Resource allocation method, device, medium and equipment

Also Published As

Publication number Publication date
CN114020584B (en) 2022-05-03

Similar Documents

Publication Publication Date Title
US9870270B2 (en) Realizing graph processing based on the mapreduce architecture
CN102855216B (en) Improve the performance of multiprocessor computer system
EP3432157B1 (en) Data table joining mode processing method and apparatus
Zhang et al. Heterogeneity aware dominant resource assistant heuristics for virtual machine consolidation
CN103874982B (en) Determine the N number of or N number of data value in bottom in top
CN112395161A (en) Big data center energy consumption analysis method and computing equipment
CN114003291A (en) Application program running method and device, computing equipment and storage medium
Faheem Accelerating motif finding problem using grid computing with enhanced brute force
Ye et al. Reliability-aware and energy-efficient workflow scheduling in IaaS clouds
CN115391026A (en) Process migration method, computing device and readable storage medium
CN114691226A (en) Multi-operating-system switching operation method, computing device and storage medium
CN114020584B (en) Operation distribution method and device and computing equipment
CN111625367B (en) Method for dynamically adjusting read-write resources of file system
Tian et al. Efficient algorithms for VM placement in cloud data centers
CN114003290A (en) Application program running method and device related to instruction replacement
CN112561412B (en) Method, device, server and storage medium for determining target object identifier
Mao et al. A fine-grained and dynamic MapReduce task scheduling scheme for the heterogeneous cloud environment
CN114721672A (en) Application installation method, computing device and storage medium
CN114201729A (en) Method, device and equipment for selecting matrix operation mode and storage medium
CN113591031A (en) Low-power-consumption matrix operation method and device
Kaur et al. Hybrid application partitioning and process offloading method for the mobile cloud computing
Chhadva et al. Architecture for mobile cloud computing using five level offloading (armflora)
US20110295519A1 (en) Identification of ribosomal dna sequences
Al-Absi et al. Parallel MapReduce: maximizing cloud resource utilization and performance improvement using parallel execution strategies
CN115079771B (en) Waveform generation method, waveform storage device, waveform generation equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant