CN116541155A - Exploration and development cloud resource intelligent scheduling method based on machine learning - Google Patents
Exploration and development cloud resource intelligent scheduling method based on machine learning Download PDFInfo
- Publication number
- CN116541155A CN116541155A CN202210072238.XA CN202210072238A CN116541155A CN 116541155 A CN116541155 A CN 116541155A CN 202210072238 A CN202210072238 A CN 202210072238A CN 116541155 A CN116541155 A CN 116541155A
- Authority
- CN
- China
- Prior art keywords
- module
- exploration
- data
- cpu
- resource
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 188
- 238000011161 development Methods 0.000 title claims abstract description 71
- 238000010801 machine learning Methods 0.000 title claims abstract description 33
- 238000012549 training Methods 0.000 claims abstract description 33
- 238000003062 neural network model Methods 0.000 claims abstract description 23
- 238000012423 maintenance Methods 0.000 claims abstract description 17
- 238000013528 artificial neural network Methods 0.000 claims abstract description 16
- 238000007781 pre-processing Methods 0.000 claims abstract description 6
- 230000008569 process Effects 0.000 claims description 124
- 230000018109 developmental process Effects 0.000 claims description 65
- 238000004422 calculation algorithm Methods 0.000 claims description 29
- 230000006870 function Effects 0.000 claims description 22
- 108090000623 proteins and genes Proteins 0.000 claims description 21
- 238000012360 testing method Methods 0.000 claims description 20
- 230000002068 genetic effect Effects 0.000 claims description 16
- 238000012545 processing Methods 0.000 claims description 16
- 230000001186 cumulative effect Effects 0.000 claims description 14
- 210000000349 chromosome Anatomy 0.000 claims description 13
- 238000011156 evaluation Methods 0.000 claims description 10
- 239000011159 matrix material Substances 0.000 claims description 9
- 230000008859 change Effects 0.000 claims description 8
- 238000013508 migration Methods 0.000 claims description 8
- 230000005012 migration Effects 0.000 claims description 8
- 238000004364 calculation method Methods 0.000 claims description 7
- 230000035772 mutation Effects 0.000 claims description 6
- 238000012795 verification Methods 0.000 claims description 6
- 238000010606 normalization Methods 0.000 claims description 4
- 230000004913 activation Effects 0.000 claims description 3
- 230000000903 blocking effect Effects 0.000 claims description 3
- 238000007635 classification algorithm Methods 0.000 claims description 3
- 230000000295 complement effect Effects 0.000 claims description 3
- 238000000605 extraction Methods 0.000 claims description 3
- 230000011218 segmentation Effects 0.000 claims description 3
- 238000012163 sequencing technique Methods 0.000 claims description 3
- 238000010200 validation analysis Methods 0.000 claims description 3
- 230000002747 voluntary effect Effects 0.000 claims description 3
- 238000005516 engineering process Methods 0.000 abstract description 5
- 238000013468 resource allocation Methods 0.000 abstract description 5
- 238000007726 management method Methods 0.000 abstract description 4
- 238000013210 evaluation model Methods 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000009412 basement excavation Methods 0.000 description 1
- 239000012141 concentrate Substances 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000002921 genetic algorithm search Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 239000003129 oil well Substances 0.000 description 1
- 238000011425 standardization method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
- G06F9/5072—Grid computing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/455—Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
- G06F9/45533—Hypervisors; Virtual machine monitors
- G06F9/45558—Hypervisor-specific management and integration aspects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Program initiating; Program switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
- G06F9/485—Task life-cycle, e.g. stopping, restarting, resuming execution
- G06F9/4856—Task life-cycle, e.g. stopping, restarting, resuming execution resumption being on a different machine, e.g. task migration, virtual machine migration
- G06F9/4862—Task life-cycle, e.g. stopping, restarting, resuming execution resumption being on a different machine, e.g. task migration, virtual machine migration the task being a mobile agent, i.e. specifically designed to migrate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Program initiating; Program switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
- G06F9/4881—Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5011—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
- G06F9/5016—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/12—Computing arrangements based on biological models using genetic models
- G06N3/126—Evolutionary algorithms, e.g. genetic algorithms or genetic programming
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/455—Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
- G06F9/45533—Hypervisors; Virtual machine monitors
- G06F9/45558—Hypervisor-specific management and integration aspects
- G06F2009/4557—Distribution of virtual machine instances; Migration and load balancing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/455—Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
- G06F9/45533—Hypervisors; Virtual machine monitors
- G06F9/45558—Hypervisor-specific management and integration aspects
- G06F2009/45583—Memory management, e.g. access or allocation
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Biophysics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Physiology (AREA)
- Genetics & Genomics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention provides an intelligent scheduling method for exploration and development cloud resources based on machine learning, which comprises the following steps: step 1, acquiring exploration and development cloud operation and maintenance data; step 2, preprocessing and training the exploration development cloud operation data samples to develop a cloud BP neural network model; step 3, predicting the type of the operation module by utilizing the trained exploration and development cloud BP neural network model, giving out the predicted type of the module, and calculating indexes; and step 4, acquiring module resource data published by the job, and scheduling the cloud host. The exploration and development cloud resource intelligent scheduling method based on machine learning adopts a neural network and machine learning technology to predict resource demands, actively optimizes the operation of host resources and network resources, reduces the influence of resource allocation on the working efficiency of users, and can effectively improve the informatization level of system management work.
Description
Technical Field
The invention relates to the technical field of oilfield development, in particular to an intelligent scheduling method for exploration and development cloud resources based on machine learning.
Background
In the large background of the rapid development of cloud computing, processing resources are highly aggregated, so that a large number of application systems are formed. The allocation of hardware resources will directly affect the operating efficiency of the application system. In the software operation and maintenance process, factors such as a CPU, a memory, a network and a temporary disk space are displayed through data issued by an Internet investigation company, so that the operation efficiency of an application system can be greatly influenced, the speed of the application system is often reduced due to the deficiency of the resources, the operation efficiency is reduced, and the user experience is influenced.
In application number: in the Chinese patent application of CN201911175029.2, a reservoir evaluation model construction method and a reservoir identification method are related, which belong to the field of geophysical exploration and artificial intelligence, and the logging data of different types of reservoirs corresponding to the start-stop depth sections are obtained; the method comprises the steps of dividing, recombining and calculating logging data of an existing test oil well section to form a new reservoir evaluation sample, so that expansion of the reservoir evaluation sample is realized, training the reservoir evaluation sample to obtain a reservoir evaluation model, and carrying out reservoir prediction by using the reservoir evaluation model. The method solves the problems that the lack of reservoir identification sample data in a research target area causes the lack of sample support when the reservoir evaluation sample is subjected to machine learning and deep learning, and the low coincidence rate of the model application after the less samples are learned.
In application number: in CN201410450288.2, a method for constructing a cloud desktop for exploration application based on virtualization technology is related. According to the method, on the basis of a virtualization technology, resources such as a host, storage, a user and an application are uniformly managed at a cloud end, a foreground builds a cloud desktop environment of an integrated seismic data processing interpretation software system for a user of an exploration software application, the user logs in the environment through a client under Windows or Linux, various seismic data processing interpretation software positioned at the cloud end can be started by means of a load balancing and scheduling function, various interactive operations are performed and completed on a local cloud desktop, and processing interpretation integration of the software application is achieved without concern of switching the current running environment. When the host machine operated by the cloud desktop fails or has a load problem, the cloud desktop can be migrated to another host machine to continue operation; the user can concentrate on the professional technology, and the user takes a single cloud desktop as a starting point, so that all relevant seismic data processing and interpretation works can be efficiently completed.
In application number: in the chinese patent application CN201911398934.4, a method and an apparatus for constructing a knowledge graph in the field of oil and gas exploration and development are related, where knowledge points can be extracted from unstructured data, semi-structured data and structured data in different aspects according to a body library in the field of oil and gas exploration and development, so as to construct knowledge graphs in multiple aspects, respectively. According to the invention, the obtained knowledge graph fuses the unstructured data, the semi-structured data and the knowledge points in the structured data in different aspects of the oil and gas exploration and development field, so that the knowledge management in the oil and gas exploration and development field is completed. Through the knowledge graph constructed by the invention, a user can conveniently use knowledge in the oil and gas exploration and development field.
The prior art is greatly different from the intelligent scheduling method, the technical problem to be solved by the intelligent scheduling method is not solved, and the intelligent scheduling method for the exploration and development cloud resources based on machine learning is invented.
Disclosure of Invention
The invention aims to provide an exploration and development cloud resource intelligent scheduling method based on machine learning, which is used for researching the resource occupation condition of a server based on a BP neural network and a genetic algorithm and realizing intelligent scheduling of cloud resources.
The aim of the invention can be achieved by the following technical measures: the exploration and development cloud resource intelligent scheduling method based on machine learning comprises the following steps:
step 1, acquiring exploration and development cloud operation and maintenance data;
step 2, preprocessing and training the exploration and development cloud operation data samples to explore and develop a cloud BP neural network model;
step 3, predicting the type of the operation module by using the trained BP neural network model, giving out the predicted type of the module, and calculating indexes;
and step 4, acquiring module resource data published by the job, and scheduling the cloud host.
The aim of the invention can be achieved by the following technical measures:
In step 1, acquiring exploration, development and operation and maintenance data, and acquiring relevant data such as the CPU utilization rate, memory occupation, disk reading and writing, and network bandwidth of a computer.
In step 1, acquired exploration, development and operation and maintenance data comprise:
CPU related field: the system CPU of the corresponding process of the module accumulates the number of seconds to use, the user CPU of the corresponding process of the module accumulates the number of seconds to use, the total number of seconds relative to 1970/01/01 when the corresponding process of the module starts, the number of threads of the current moment state of the corresponding process of the module is running, the number of threads of the current moment state of the corresponding process of the module is waiting, the number of threads of the current moment state of the corresponding process of the module is zonie, the number of threads of the current moment state of the corresponding process of the module is other, the difference value of the acquisition interval of indexes cpu_seconds_system_total of the module and the difference value of the acquisition interval of indexes cpu_seconds_user_total of the module;
memory related field: the method comprises the steps of enabling PSS resident memory byte numbers used by corresponding processes of a module, PSS exchange memory byte numbers used by corresponding processes of the module, RSS memory byte numbers used by corresponding processes of the module, VSS memory byte numbers used by corresponding processes of the module, exchange memory byte numbers used by corresponding processes of the module, major page faults quantity accumulated values of corresponding processes of the module, minor page faults quantity accumulated values of corresponding processes of the module, involuntary frequency accumulated values of context switching of corresponding processes of the module, voluntary frequency accumulated values of context switching of corresponding processes of the module, the number of file descriptors currently opened by corresponding processes of the module, the ratio of file descriptors currently opened by corresponding processes of the module to maximum file descriptors, and index major_page_fault_total acquisition interval difference of the module;
Disk related fields: the method comprises the steps of accumulating byte numbers read by a process corresponding to a module, accumulating written byte numbers by the process corresponding to the module, and accumulating difference values of indexes write_bytes_total of the module at adjacent acquisition intervals and read_bytes_total of the module at adjacent acquisition intervals;
network bandwidth related field: the method comprises the steps of receiving the total number of bytes received by a process corresponding to a module, receiving the total number of bytes sent by a process corresponding to the module, receiving the total number of bytes received by a network card, sending the total number of bytes received by the network card, receiving the number of messages received by a TCP, receiving the number of messages received by the TCP, sending the number of messages sent by the TCP and sending the number of messages sent by the TCP.
In step 1, taking into account different job types, the mode of starting a process of a module is different, and two different modes are used for collecting data of the module; data without knowing the module name and data with known module name are collected separately.
In step 1, without knowing the module name:
checking the currently running operation process on the child node, reading the directory structure of the operation process to obtain a corresponding operation directory in the temporary disc, finding a compressed file with a corresponding module name from the operation directory, scanning the file, and integrating the resource occupation data of the operation process and the child process thereof to serve as a resource occupation sample of the module if the file exists.
In step 1, in case of known module name:
and directly searching a process related to the module name according to the module name, and integrating the resource conditions of the operation process and the subprocesses thereof to be used as a resource occupation sample of the module.
In step 2, the acquired exploration and development cloud operation and maintenance data comprises the accumulated use seconds of a system CPU corresponding to a module process, the accumulated use seconds of a user CPU corresponding to the module process, the PSS resident memory byte number used by the module corresponding process, the PSS exchange memory byte number used by the module corresponding process, the byte number accumulated value read by the module corresponding process, the byte number accumulated and written by the module corresponding process, the total number of bytes received by the module corresponding process, the total number of bytes sent by the module corresponding process and the like, and are divided into a training set, a test set and a verification set according to the proportion of 80%, 10% and 10%; and using the module types of field test and expert validation as sample data, and establishing a sample library as a training sample and a test sample.
In step 2, performing feature extraction processing on the exploration and development cloud operation data, and converting the maximum value, the mean value and the variance of the time sequence data into feature data; and carrying out normalization and missing value processing on the data, carrying out feature selection on the processed data set, and selecting 20 features with the best correlation with the label as the features of model training. And finally, data segmentation is carried out.
In step 2, constructing a cloud BP neural network model for exploration and development, taking a labeled training data set as an input sample of the BP neural network, training the BP neural network to obtain the model, and outputting a module type; and then verifying the trained BP neural network model by using the test data set, adjusting parameters, iterating the training model, and obtaining the trained exploration and development cloud BP neural network model if the model meets the precision requirement.
In step 2, the constructed BP neural network model comprises training batches, iteration times, learning rate, network units and activation functions of the units.
And 3, carrying out classified prediction on the module types, and calculating to obtain the CPU utilization rate, the memory occupancy rate and the IO waiting time of each module, wherein the CPU utilization rate, the memory occupancy rate and the IO waiting time are used as the characteristic labels of the work hosts scheduled by the cloud hosts.
In step 4, the predicted module type is utilized to analyze the resource occupation condition of the job, the resource occupation of the node where the newly issued job is located is calculated based on the job type of each node at present, a genetic algorithm is utilized to calculate the optimal scheme that the virtual machine is deployed to the physical machine according to index data such as the number of cores, CPU load, memory, cache, IO delay and the like of the CPU calculation of the current physical machine and the virtual machine node, a specific virtual machine migration scheduling path is provided, and the virtual machine is migrated, so that the virtual machine with complementary resources after migration is deployed to the same physical machine.
In step 4, the existing data, the newly released job data and the data of the host machine physical resource consumption condition are obtained, and the allocation scheme of the job node virtual machine on the host machine is calculated through a genetic algorithm, and the specific steps are as follows:
the information comprises the information of the CPU quantity of the host, the CPU load trend, the memory change trend, the cache pressure and the IO blocking time delay of the host through the resource occupation condition of each host and the future resource change trend information;
and calculating the current running index of each physical machine according to the module type result obtained by the BP neural network classification algorithm, and establishing a cloud host real-time scheduling model. And generating a physical machine virtual machine resource use matrix according to the index data such as the number of cores calculated by the CPU of the current physical machine and virtual machine node, CPU load, memory, cache and IO delay. In the embodiment of the invention, the main aim is to make the utilization rates of the CPU, the memory, the disk and the network of the physical machine not differ as much as possible. The smaller the variance of the utilization rate of each index of the physical machine, the more uniform the resources, but the specific utilization rate cannot be too high. And calculating an optimal deployment scheme by solving an optimal solution through a genetic algorithm, and giving a specific scheduling path.
In the embodiment of the invention, the problem of cloud host scheduling is solved based on a genetic algorithm, and the method is divided into the following parts.
(1) Initializing a population
And randomly generating N individuals according to the population number and the coding rule. The index data of the virtual machines in the physical machines are summarized and encoded into character strings, and the character strings are spliced together according to the sequence of the numbers of the physical machines, wherein each character string structure is called an individual (chromosome), and a plurality of individuals are combined into a population. Initializing a population creates a population of individuals, which can correspond to the relevant physical machine based on the location of the characters in the individuals. One character in the individual code, namely one gene, corresponds to one index of the virtual machine. The number of individuals in the initialized population is 10 in this example.
In creating individuals, real numbers are used for encoding, and a matrix FieldDR of 3 rows and n columns is used as a decoding matrix, where n is the number of control variables expressed by chromosomes. The structure is that
X in the above 1 …x n For n control variables, the lower and upper bounds represent the bounds of the control variable values, x 1 Lower bound … x n The lower bound is the lower bound of the n control variable values, x 1 Upper bound … x n The upper bound is the upper bound of the n control variable values, varTypes 1 …varTypes n For n types of control variables, a type of 0 indicates that the corresponding control variable is a continuous type variable, and a type of 1 indicates that the corresponding control variable is a discrete type variable. The decoding matrix for permutation encoding requires that all elements of the first row of the FieldDR are equal, all elements of the second row are equal, and all elements of the third row are 1 (permutation encoding variables are discrete). The FieldDR has columns of Linds (i.e., chromosome length Lind).
The requirements are: upper bound-lower bound +1> =lind.
(2) Selection operation
The selection operation is based on the fitness evaluation of the individuals in the population, which is based on the minimization of the objective function, so that the inverse of the objective function is taken as the fitness function. And calculating the fitness of each individual according to the fitness function, and sequencing the individuals according to the sequence from high to low, so that the individual with high adaptability, namely the smallest physical machine resource index variance, can survive with higher probability, thereby improving the overall convergence and the calculation efficiency. The selection of individuals should also satisfy the following rules:
Var(Time)<0.95*Var(Fix)
that is, the real-time index of the physical machine is smaller than the fixed index of the physical machine by 0.95 times. In the above formula, time represents real Time, var (Time) represents real Time value of resource consumption of physical machine CPU, memory, network, temporary disk, fix represents fixed, and Var (Fix) represents total value of physical machine CPU, memory, network, temporary disk, etc. The indexes such as CPU, memory, network, temporary disk and the like are calculated independently, and the real-time value of each index is smaller than 0.95 times of the total value.
(3) In embodiments of the present invention, an improved roulette algorithm is employed. The specific implementation steps are as follows: the fitness of each individual is first calculated from the fitness function. Next, an individual selection probability and an accumulated probability for each individual are calculated. Then, a number is randomly generated between the sections [0 1], and it is determined in which section the number falls, and if the number falls in a certain section, the individual in the section is selected.
(4) Crossover algorithm
Crossover operation is the main process of generating new individuals in genetic algorithms, i.e. the coding of individuals that have not appeared in a previous iterative population, with a certain probability exchanging part of the chromosomes between some two individuals.
The specific implementation steps are as follows: firstly, individuals in the population are randomly paired, secondly, the positions of the crossing points are randomly set, and finally, partial genes between paired chromosomes are exchanged with each other.
(5) Mutation operation
Mutation is an operation method in which a gene value of a gene or genes of an individual is changed with a small probability, and a new individual is generated.
In this example, a random variation site generation method was used. The specific operation process is as follows: firstly, determining the gene variation position of each individual, namely, the position index representing a certain index in the virtual machine, corresponding to the actual parameter, and then taking other values according to the original gene values of the variation points with a certain probability.
(6) Under the termination condition, since the optimal solution of the problem cannot be known, the embodiment adopts the approximate convergence criterion to terminate the algorithm, and when the optimal solution is unchanged after the population evolves for N generations, the solution at the moment is considered to be the optimal solution, and the algorithm is terminated.
The objective function of this embodiment is:
wherein X is the utilization rate of each index of the physical machine,is the average value of the utilization rate of each index of the physical machine, S 2 The method is characterized in that the method is the variance of the utilization rate of each index of the physical machine, n is the number of the indexes, and Min represents that the smaller the variance is, the more balanced the resources are; time represents real Time, var (Time) represents real Time value of resource consumption of physical machine CPU, memory, network, temporary disk, fix represents fixed, var (Fix) represents total value of physical machine CPU, memory, network, temporary disk, etc. CPU, innerAnd (3) independently calculating indexes such as a storage, a network, a temporary disk and the like, wherein the real-time value of each index is smaller than 0.95 times of the total value.
The calculation steps are as follows: 1. the real-time data is summed with each virtual machine index. 2. And calculating the running state of each index of the physical machine. 3. And calculating the utilization rate of each index in the physics. 4. The variance of each index of the physical machine is calculated to check the balance of the resource duty ratio. The smaller the variance the more uniform the resource.
The invention discloses an exploration and development cloud resource intelligent scheduling method based on machine learning, which relates to the computational resource demands of a CPU (central processing unit), a memory, a network, a temporary disk and the like of application software, and realizes intelligent cloud resource deployment based on a BP (back propagation) neural network and a genetic algorithm as well as research on resource occupation types. The exploration and development cloud resource intelligent scheduling method based on machine learning classifies the operation modules based on a machine learning algorithm model. By host hardware resources: main resource indexes such as CPU, memory, disk IO, network IO and the like are marked out a virtual machine migration scheme by genetic algorithm rules, so that cloud host scheduling is realized. According to the method, index data such as the number of CPU computing cores, CPU load, memory, cache, IO delay and the like are comprehensively considered, and the manual scheduling mode is changed into a new intelligent scheduling mode based on multiple parameters. The tracking module runs the change of the occupation of resources, and obtains better classification results by combining the service characteristics. The classification of the modules is realized, the classification result is utilized, the resource allocation of the cloud host nodes is calculated, the rationalization migration of the virtual machine is guided, the influence of the resource allocation problem on the working efficiency of the user is reduced, and the informatization level of the system management work is effectively improved.
Drawings
FIG. 1 is a flow chart of one embodiment of a machine learning based exploration and development cloud resource intelligent scheduling method of the present invention;
FIG. 2 is a flow chart of acquiring exploration, development and operation and maintenance data in accordance with an embodiment of the present invention;
FIG. 3 is a flow chart of a classification and scheduling method based on BP neural network model according to an embodiment of the invention;
FIG. 4 is a schematic diagram of a BP neural network according to an embodiment of the invention;
fig. 5 is a schematic diagram of a scheduling scheme according to an embodiment of the present invention.
Detailed Description
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the invention. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the present invention. As used herein, the singular forms also are intended to include the plural forms unless the context clearly indicates otherwise, and furthermore, it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, and/or combinations thereof.
The cloud platform runs a large number of application software modules, and more or less has certain unreasonable resource allocation. The method comprises the steps of developing data acquisition and excavation for a certain time by applying the resource application condition of an application software module, calibrating the known resource requirements, deeply exploring the regularity factor of resource consumption by means of machine learning and artificial intelligence, establishing a monitoring index and application software operation resource demand prediction model, predicting the resource demands by adopting a neural network and machine learning technology, actively optimizing the operation of host resources and network resources, reducing the influence of resource allocation on the working efficiency of users, and effectively improving the informatization level of system management work.
As shown in fig. 1, fig. 1 is a flowchart of an intelligent scheduling method for exploration and development cloud resources based on machine learning. The exploration and development cloud resource intelligent scheduling method based on machine learning comprises the following steps:
1. acquiring exploration and development cloud operation and maintenance data;
2. preprocessing an exploration development cloud operation data sample and training an exploration development cloud BP neural network model;
3. classifying, predicting and calculating indexes of professional software module resource occupation;
4. And acquiring module resource data published by the job, and scheduling the cloud host.
The following are several specific examples of the application of the present invention.
Example 1
In a specific embodiment 1 to which the present invention is applied, the exploration and development cloud resource intelligent scheduling method based on machine learning of the present invention specifically includes the following steps:
step 1, acquiring exploration and development cloud operation and maintenance data, and acquiring relevant data such as the CPU utilization rate, memory occupation, disk reading and writing, network bandwidth and the like of a computer. The acquisition data includes, but is not limited to, the following fields:
CPU related field: the system CPU of the corresponding process of the module accumulates the number of seconds to use, the user CPU of the corresponding process of the module accumulates the number of seconds to use, the total number of seconds relative to 1970/01/01 when the corresponding process of the module starts, the number of threads of the current moment state of the corresponding process of the module is running, the number of threads of the current moment state of the corresponding process of the module is waiting, the number of threads of the current moment state of the corresponding process of the module is zonie, the number of threads of the current moment state of the corresponding process of the module is other, the difference value of the acquisition interval of indexes cpu_seconds_system_total of the module and the difference value of the acquisition interval of indexes cpu_seconds_user_total of the module;
Memory related field: the method comprises the steps of enabling PSS resident memory byte numbers used by corresponding processes of a module, PSS exchange memory byte numbers used by corresponding processes of the module, RSS memory byte numbers used by corresponding processes of the module, VSS memory byte numbers used by corresponding processes of the module, exchange memory byte numbers used by corresponding processes of the module, major page faults quantity accumulated values of corresponding processes of the module, minor page faults quantity accumulated values of corresponding processes of the module, involuntary frequency accumulated values of context switching of corresponding processes of the module, voluntary frequency accumulated values of context switching of corresponding processes of the module, the number of file descriptors currently opened by corresponding processes of the module, the ratio of file descriptors currently opened by corresponding processes of the module to maximum file descriptors, and index major_page_fault_total acquisition interval difference of the module;
disk related fields: the method comprises the steps of accumulating byte numbers read by a process corresponding to a module, accumulating written byte numbers by the process corresponding to the module, and accumulating difference values of indexes write_bytes_total of the module at adjacent acquisition intervals and read_bytes_total of the module at adjacent acquisition intervals;
Network bandwidth related field: the method comprises the steps of receiving the total number of bytes received by a process corresponding to a module, receiving the total number of bytes sent by a process corresponding to the module, receiving the total number of bytes received by a network card, sending the total number of bytes received by the network card, receiving the number of messages received by a TCP, receiving the number of messages received by the TCP, sending the number of messages sent by the TCP and sending the number of messages sent by the TCP.
Considering different job types, the mode of starting the process of the module is different, and two different modes are used for collecting the data of the module. Data without knowing the module name and data with known module name are collected separately.
And 2, preprocessing the data samples and training a BP neural network model. The acquired exploration and development cloud operation and maintenance data comprise cumulative use seconds of a system CPU of a corresponding process of a module, cumulative use seconds of a user CPU of the corresponding process of the module, the number of PSS resident memory bytes used by the corresponding process of the module, the number of PSS exchange memory bytes used by the corresponding process of the module, cumulative byte number read by the corresponding process of the module, cumulative written byte number of the corresponding process of the module, total number of bytes received by the corresponding process of the section module, total number of bytes sent by the corresponding process of the module and the like, and the acquired exploration and development cloud operation and maintenance data are divided into a training set, a test set and a verification set according to the proportion of 80%, 10% and 10%. And using the module types of field test and expert validation as sample data, and establishing a sample library as a training sample and a test sample.
And performing feature extraction processing on the exploration, development and operation and maintenance data, and converting the time series data into feature data by extracting maximum values, average values, variances and the like. And carrying out normalization and missing value processing on the data, carrying out feature selection on the processed data set, selecting 20 features with the best correlation with the label as the features of model training, and finally carrying out data segmentation.
And constructing a cloud BP neural network model, taking the labeled training data set as an input sample of the BP neural network, training the BP neural network to obtain the model, and outputting the model type. And then verifying the trained BP neural network model by using the test data set, adjusting parameters, iterating the training model, and obtaining the trained exploration and development cloud BP neural network model if the model meets the precision requirement.
The cloud BP neural network model is built and developed, and comprises training batches, iteration times, learning rate, network units, activation functions of the units and the like.
And 3, predicting the type of the operation module by utilizing the trained exploration and development cloud BP neural network model, and giving out the predicted type of the module.
And carrying out classified prediction on the module types, and calculating to obtain the CPU utilization rate, the memory occupancy rate, the IO waiting time and the like of each module, wherein the CPU utilization rate, the memory occupancy rate, the IO waiting time and the like are used as characteristic labels of the operation hosts scheduled by the cloud hosts.
And step 4, acquiring module resource data published by the job, and scheduling the cloud host. And analyzing the resource occupation condition of the job by utilizing the predicted module type, calculating the resource occupation of the node where the newly issued job is based on the job type of each node at present, calculating the optimal scheme for deploying the virtual machine to the physical machine by utilizing a genetic algorithm according to the index data such as the number of cores calculated by the CPU of the current physical machine and the virtual machine node, the CPU load, the memory, the cache, the IO time delay and the like, providing a specific virtual machine migration scheduling path, and migrating the virtual machine so that virtual machines with complementary resources are deployed to the same physical machine after migration.
Example 2
In a specific embodiment 2 of the present invention, as shown in fig. 2, taking into account that different specialized software uses cluster, stand-alone, parallel, and serial job types, the manner in which the specialized processing module is started is different, two different manners are used to collect the operational data samples of the module.
1) Without knowing the module name:
checking the currently running operation process on the child node, reading the directory structure of the operation process, obtaining the corresponding operation directory in the temporary disc, finding the compressed file with the corresponding module name from the operation directory, scanning the file, and integrating the resource occupation data of the operation process and the child process thereof to serve as a resource occupation sample of the module if the file exists.
2) In case of known module name:
and directly searching a process related to the module name according to the module name, and integrating the resource conditions of the operation process and the subprocesses thereof to be used as a resource occupation sample of the module.
Example 3
In a specific embodiment 3 to which the present invention is applied, as shown in fig. 3, fig. 3 is a flowchart of a classification and scheduling method based on a BP neural network model according to an embodiment of the present invention. As shown in fig. 3, mainly 9 steps are included.
Step 01: the CPU collected in FIG. 2 is utilized to calculate index data such as the number of cores, the CPU load, the memory, the cache, the IO delay and the like, and the whole data is divided into a training set, a testing set and a verification set. The module type calibration is carried out on the training set and the testing set data by utilizing the field test and expert approval mode to serve as a training sample and a testing sample.
Step 02: the preprocessing mode of the data is as follows:
the time series data is extracted to obtain maximum value, average value, variance and the like, and the maximum value, average value, variance and the like are converted into characteristic data.
Normalization: the dimension influence among the indexes is eliminated, and data standardization processing is needed to solve the comparability among the data indexes. After the original data is subjected to data standardization processing, all indexes are in the same order of magnitude, and the method is suitable for comprehensive comparison and evaluation. The data processed by the Z-score standardization method accords with standard normal distribution, namely the mean value is 0, and the standard deviation is 1.
Missing value processing: and eliminating the missing value and eliminating the influence on the sample characteristics.
Feature selection: and removing similar features, and selecting features with high correlation with the labels.
Step 03: sample construction, which divides sample data into a training set, a testing set and a verification set.
Step 04: and determining the structure of the exploration and development cloud BP neural network according to the sample characteristics.
Fig. 4 is a model structure of a cloud BP neural network for exploration and development, where the network structure includes a plurality of full connection layers, and extracts features of different layers of data according to fields such as a cumulative number of seconds used by a system CPU of a process corresponding to a module, a cumulative number of PSS resident memory bytes used by a process corresponding to a module, a cumulative number of PSS exchange memory bytes used by a process corresponding to a module, a cumulative value of bytes read by a process corresponding to a module, a cumulative number of bytes written by a process corresponding to a module, a total number of bytes received by a process corresponding to a module, and a total number of bytes sent by a process corresponding to a module. The classification categories are as follows: the first bit indicates whether the CPU is intensive, the second bit indicates whether the memory is intensive, the third bit indicates whether the disk is intensive, and the fourth bit indicates whether the network is intensive. Examples: 0100 is memory intensive; 1100 is CPU memory intensive.
Step 05: model training, namely taking 80% of labeled data randomly as a training set, and carrying the data into the model for training.
Step 06: and the model evaluation is verified by a test set, so that the model accuracy is good.
Step 07: and model prediction, namely predicting the test data without labeling by using a model with good effect after model evaluation verification.
Step 08: and acquiring the service condition of the virtual machine and the type data of the newly-uploaded operation module.
Step 09: whether the virtual machine needs to be migrated is calculated, and how to migrate is calculated by using a genetic algorithm as shown in fig. 5.
The method comprises the steps of obtaining existing data, newly issued job data and data of host machine physical resource consumption conditions, and judging an allocation scheme of a job node virtual machine on a host machine through reasoning, wherein the specific steps are as follows:
the information comprises the information of the quantity of the CPUs of the hosts, the CPU load trend, the memory change trend, the cache pressure, IO blocking time delay and the like through the resource occupation condition of each host and the future resource change trend information;
and calculating the current running index of each physical machine according to the module type result obtained by the exploration and development cloud BP neural network classification algorithm, and establishing a cloud host real-time scheduling model. And generating a physical machine virtual machine resource use matrix according to index data such as the number of cores calculated by the CPU of the current physical machine and virtual machine node, CPU load, memory, cache, I O delay and the like. In the embodiment of the invention, the main aim is to make the utilization rates of the CPU, the memory, the disk and the network of the physical machine not differ as much as possible. The smaller the variance of the utilization rate of each index of the physical machine, the more uniform the resources, but the specific utilization rate cannot be too high. And calculating an optimal deployment scheme by solving an optimal solution through a genetic algorithm, and giving a specific scheduling path.
In the embodiment of the invention, the problem of cloud host scheduling is solved based on a genetic algorithm, and the method is divided into the following parts.
(1) Initializing a population
And randomly generating N individuals according to the population number and the coding rule. The index data of the virtual machines in the physical machines are summarized and encoded into character strings, and the character strings are spliced together according to the sequence of the numbers of the physical machines, wherein each character string structure is called an individual (chromosome), and a plurality of individuals are combined into a population. Initializing a population creates a population of individuals, which can correspond to the relevant physical machine based on the location of the characters in the individuals. One character in the individual code, namely one gene, corresponds to one index of the virtual machine. The number of individuals in the initialized population is 10 in this example.
In creating individuals, real numbers are used for encoding, and a matrix FieldDR of 3 rows and n columns is used as a decoding matrix, where n is the number of control variables expressed by chromosomes. The structure is that
X in the above 1 …x n For n control variables, the lower and upper bounds represent the bounds of the control variable values, x 1 Lower bound … x n The lower bound is the lower bound of the n control variable values, x 1 Upper bound … x n The upper bound is the upper bound of the n control variable values, varTypes 1 …varTypes n For n types of control variables, a type of 0 indicates that the corresponding control variable is a continuous type variable, and a type of 1 indicates that the corresponding control variable is a discrete type variable. The decoding matrix for permutation encoding requires that all elements of the first row of the FieldDR are equal, all elements of the second row are equal, and all elements of the third row are 1 (permutation encoding variables are discrete). The FieldDR has columns of Linds (i.e., chromosome length Lind).
The requirements are: upper bound-lower bound +1> =lind.
(2) Selection operation
The selection operation is based on the fitness evaluation of the individuals in the population, which is based on the minimization of the objective function, so that the inverse of the objective function is taken as the fitness function. And calculating the fitness of each individual according to the fitness function, and sequencing the individuals according to the sequence from high to low, so that the individual with high adaptability, namely the smallest physical machine resource index variance, can survive with higher probability, thereby improving the overall convergence and the calculation efficiency. The selection of individuals should also satisfy the following rules:
Var(Time)<0.95*Var(Fix)
that is, the real-time index of the physical machine is smaller than the fixed index of the physical machine by 0.95 times. In the above formula, time represents real Time, var (Time) represents real Time value of resource consumption of physical machine CPU, memory, network, temporary disk, fix represents fixed, and Var (Fix) represents total value of physical machine CPU, memory, network, temporary disk, etc. The indexes such as CPU, memory, network, temporary disk and the like are calculated independently, and the real-time value of each index is smaller than 0.95 times of the total value.
(3) In embodiments of the present invention, an improved roulette algorithm is employed. The specific implementation steps are as follows: the fitness of each individual is first calculated from the fitness function. Next, an individual selection probability and an accumulated probability for each individual are calculated. Then, a number is randomly generated between the sections [0 1], and it is determined in which section the number falls, and if the number falls in a certain section, the individual in the section is selected.
(4) Crossover algorithm
Crossover operation is the main process of generating new individuals in genetic algorithms, i.e. the coding of individuals that have not appeared in a previous iterative population, with a certain probability exchanging part of the chromosomes between some two individuals.
The specific implementation steps are as follows: firstly, individuals in the population are randomly paired, secondly, the positions of the crossing points are randomly set, and finally, partial genes between paired chromosomes are exchanged with each other.
(5) Mutation operation
Mutation is an operation method in which a gene value of a gene or genes of an individual is changed with a small probability, and a new individual is generated.
In this example, a random variation site generation method was used. The specific operation process is as follows: firstly, determining the gene variation position of each individual, namely, the position index representing a certain index in the virtual machine, corresponding to the actual parameter, and then taking other values according to the original gene values of the variation points with a certain probability.
(6) Under the termination condition, since the optimal solution of the problem cannot be known, the embodiment adopts the approximate convergence criterion to terminate the algorithm, and when the optimal solution is unchanged after the population evolves for N generations, the solution at the moment is considered to be the optimal solution, and the algorithm is terminated.
The objective function of this embodiment is:
wherein X is the utilization rate of each index of the physical machine,is the average value of the utilization rate of each index of the physical machine, S 2 The method is characterized in that the method is the variance of the utilization rate of each index of the physical machine, n is the number of the indexes, and Min represents that the smaller the variance is, the more balanced the resources are. The calculation steps are as follows: 1. the real-time data is summed with each virtual machine index. 2. And calculating the running state of each index of the physical machine. 3. And calculating the utilization rate of each index in the physics. 4. The variance of each index of the physical machine is calculated to check the balance of the resource duty ratio. The smaller the variance the more uniform the resource. In the above formula, time represents real Time, var (Time) represents real Time value of resource consumption of physical machine CPU, memory, network, temporary disk, fix represents fixed, and Var (Fix) represents total value of physical machine CPU, memory, network, temporary disk, etc. The indexes such as CPU, memory, network, temporary disk and the like are calculated independently, and the real-time value of each index is smaller than 0.95 times of the total value.
Finally, it should be noted that: the foregoing description is only a preferred embodiment of the present invention, and is not intended to limit the present invention, but although the present invention has been described in detail with reference to the foregoing embodiment, it will be apparent to those skilled in the art that modifications may be made to the technical solution described in the foregoing embodiment, or equivalents may be substituted for some of the technical features thereof. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Other than the technical features described in the specification, all are known to those skilled in the art.
Claims (14)
1. The exploration and development cloud resource intelligent scheduling method based on machine learning is characterized by comprising the following steps of:
step 1, acquiring exploration and development cloud operation and maintenance data;
step 2, preprocessing and training the exploration and development cloud operation data samples to explore and develop a cloud BP neural network model;
step 3, predicting the type of the operation module by utilizing the trained exploration and development cloud BP neural network model, giving out the predicted type of the module, and calculating indexes;
and step 4, acquiring module resource data published by the job, and scheduling the cloud host.
2. The intelligent scheduling method for exploration and development cloud resources based on machine learning according to claim 1, wherein in step 1, exploration and development cloud operation and maintenance data are acquired, and relevant exploration and development cloud operation and maintenance data such as computer CPU utilization rate, memory occupation, disk reading and writing and network bandwidth are acquired.
3. The machine learning based exploration and development cloud resource intelligent scheduling method of claim 2, wherein in step 1, the collected exploration and development operation and maintenance data comprises the following fields but is not limited to the following fields:
CPU related field: the system CPU of the corresponding process of the module accumulates the number of seconds to use, the user CPU of the corresponding process of the module accumulates the number of seconds to use, the total number of seconds relative to 1970/01/01 when the corresponding process of the module starts, the number of threads of the current moment state of the corresponding process of the module is running, the number of threads of the current moment state of the corresponding process of the module is waiting, the number of threads of the current moment state of the corresponding process of the module is zonie, the number of threads of the current moment state of the corresponding process of the module is other, the difference value of the acquisition interval of indexes cpu_seconds_system_total of the module and the difference value of the acquisition interval of indexes cpu_seconds_user_total of the module;
memory related field: the method comprises the steps of enabling PSS resident memory byte numbers used by corresponding processes of a module, PSS exchange memory byte numbers used by corresponding processes of the module, RSS memory byte numbers used by corresponding processes of the module, VSS memory byte numbers used by corresponding processes of the module, exchange memory byte numbers used by corresponding processes of the module, major page faults quantity accumulated values of corresponding processes of the module, minor page faults quantity accumulated values of corresponding processes of the module, involuntary frequency accumulated values of context switching of corresponding processes of the module, voluntary frequency accumulated values of context switching of corresponding processes of the module, the number of file descriptors currently opened by corresponding processes of the module, the ratio of file descriptors currently opened by corresponding processes of the module to maximum file descriptors, and index major_page_fault_total acquisition interval difference of the module;
Disk related fields: the method comprises the steps of accumulating byte numbers read by a process corresponding to a module, accumulating written byte numbers by the process corresponding to the module, and accumulating difference values of indexes write_bytes_total of the module at adjacent acquisition intervals and read_bytes_total of the module at adjacent acquisition intervals;
network bandwidth related field: the method comprises the steps of receiving the total number of bytes received by a process corresponding to a module, receiving the total number of bytes sent by a process corresponding to the module, receiving the total number of bytes received by a network card, sending the total number of bytes received by the network card, receiving the number of messages received by a TCP, receiving the number of messages received by the TCP, sending the number of messages sent by the TCP and sending the number of messages sent by the TCP.
4. The intelligent scheduling method for exploration and development cloud resources based on machine learning according to claim 3, wherein in step 1, considering different job types, the mode of starting a process of a module is different, and data of the module is collected by two different modes, namely, data without knowing the name of the module and data with known names of the module are collected respectively.
5. The intelligent scheduling method for exploration and development cloud resources based on machine learning of claim 4, wherein in step 1, without knowing the module name:
Checking the currently running operation process on the child node, reading the directory structure of the operation process to obtain a corresponding operation directory in the temporary disc, finding a compressed file with a corresponding module name from the operation directory, scanning the file, and integrating the resource occupation data of the operation process and the child process thereof to serve as a resource occupation sample of the module if the file exists.
6. The intelligent scheduling method for exploration and development cloud resources based on machine learning as claimed in claim 4, wherein in step 1, in case of known module name:
and directly searching a process related to the module name according to the module name, and integrating the resource conditions of the operation process and the subprocesses thereof to be used as a resource occupation sample of the module.
7. The intelligent scheduling method for exploration and development cloud resources based on machine learning according to claim 1, wherein in step 2, the acquired exploration and development operation and maintenance data comprises the cumulative number of seconds used by a system CPU of a corresponding process of a module, the cumulative number of seconds used by a user CPU of a corresponding process of a module, the number of PSS resident memory bytes used by a corresponding process of a module, the cumulative number of bytes read by a corresponding process of a module, the cumulative number of bytes written by a corresponding process of a module, the total number of bytes received by a corresponding process of a module, and the total number of bytes sent by a corresponding process of a module are divided into a training set, a test set and a verification set according to the proportions of 80%, 10% and 10%; and using the module types of field test and expert validation as sample data, and establishing a sample library as a training sample and a test sample.
8. The intelligent scheduling method for exploration and development cloud resources based on machine learning according to claim 7, wherein in step 2, feature extraction processing is performed on exploration and development cloud operation data, and time series data are converted into feature data by extracting maximum value, mean value and variance; and carrying out normalization and missing value processing on the data, carrying out feature selection on the processed data set, selecting 20 features with the best correlation with the label as the features of model training, and finally carrying out data segmentation.
9. The intelligent scheduling method for exploration and development cloud resources based on machine learning according to claim 8, wherein in step 2, an exploration and development cloud BP neural network model is constructed, a labeled training data set is used as an input sample of the BP neural network, the BP neural network is trained, a model is obtained, and a module type is output; and then verifying the trained BP neural network model by using the test data set, adjusting parameters, iterating the training model, and obtaining the trained exploration and development cloud BP neural network model if the model meets the precision requirement.
10. The intelligent scheduling method for exploration and development cloud resources based on machine learning according to claim 9, wherein in step 2, the built exploration and development cloud BP neural network model comprises training batches, iteration times, learning rates, network elements and activation functions of the elements.
11. The intelligent scheduling method for exploration and development cloud resources based on machine learning according to claim 1, wherein in step 3, classification prediction is performed on module types, and the CPU utilization rate, the memory occupancy rate and the IO waiting time of each module are calculated and obtained as feature labels of job hosts scheduled by a subsequent cloud host.
12. The intelligent scheduling method for exploration and development cloud resources based on machine learning according to claim 1, wherein in step 4, the predicted module type is utilized to analyze the resource occupation condition of the job, the resource occupation of the node where the newly issued job is located is calculated based on the job type of each node at present, a genetic algorithm is utilized to calculate the optimal scheme for deploying the virtual machine to the physical machine according to the index data of the number of cores calculated by the CPU, the load of the CPU, the memory, the cache and the IO delay of the current physical machine and the virtual machine node, a specific virtual machine migration scheduling path is provided, and the virtual machine is migrated, so that the virtual machine with complementary resources after migration is deployed to the same physical machine.
13. The intelligent scheduling method for exploration and development cloud resources based on machine learning according to claim 12, wherein in step 4, existing data, newly issued job data and data of host physical resource consumption conditions are obtained, and an allocation scheme of the job node virtual machine on the host is calculated through a genetic algorithm, and the specific steps are as follows:
The information comprises the information of the CPU quantity of the host, the CPU load trend, the memory change trend, the cache pressure and the IO blocking time delay of the host through the resource occupation condition of each host and the future resource change trend information;
calculating current operation indexes of each physical machine according to module type results obtained by a cloud BP neural network classification algorithm for exploration and development, and establishing a cloud host real-time scheduling model; and generating a physical machine virtual machine resource use matrix according to the index data such as the number of cores calculated by the CPU of the current physical machine and virtual machine node, CPU load, memory, cache and IO delay.
14. The intelligent scheduling method for exploration and development cloud resources based on machine learning according to claim 1, wherein in step 4, cloud host scheduling is performed based on a genetic algorithm, comprising the steps of:
(1) Initializing a population
Randomly generating N individuals according to the population number and the coding rule; summarizing and encoding index data of virtual machines in a physical machine into character strings, and splicing the character strings according to the sequence of the physical machine numbers, wherein each character string structure is called an individual, namely a chromosome, and a plurality of individuals are combined into a population; initializing a population, namely creating a group of individuals, and corresponding to related physical machines according to the positions of characters in the individuals; one character in the individual code is one gene, which corresponds to one index of the virtual machine; when an individual is created, real numbers are adopted for encoding;
(2) Selection operation
The selection operation is used for avoiding the loss of the effective genes, the selection operation is based on the fitness evaluation of individuals in the population, and the reciprocal of the objective function is taken as the fitness function because the scene is a minimized objective function; calculating the fitness of each individual according to the fitness function, and sequencing the individuals according to the sequence from high to low, so that the individual with high adaptability, namely the smallest physical machine resource index variance, can survive with higher probability, thereby improving the overall convergence and the calculation efficiency; the selection of individuals should also satisfy the following rules:
Var(Time)<0.95*Var(Fix)
namely, the real-time index of the physical machine is smaller than the fixed index of the physical machine by 0.95 times; in the above formula, time represents real-Time, var (Time) represents real-Time values of resource consumption of a physical machine CPU, a memory, a network, a temporary disk and the like, fix represents fixed, and Var (Fix) represents total values of the resources of the physical machine CPU, the memory, the network, the temporary disk and the like; the indexes such as a CPU, a memory, a network, a temporary disk and the like are calculated independently, and the real-time value of each index is smaller than 0.95 times of the total value;
(3) Adopting an improved roulette algorithm; the specific implementation steps are as follows: firstly, calculating the fitness of each individual according to a fitness function; secondly, calculating individual selection probability and cumulative probability of each individual; then randomly generating a number among the intervals [0 1], judging which interval the number falls in, and if the number falls in a certain interval, selecting an individual in the interval;
(4) Crossover algorithm
Crossover operation is the main operation process in genetic algorithm to generate new individuals, i.e. the codes of individuals that have not appeared in the previous iterative population, which exchanges with each other with a certain probability the partial chromosomes between some two individuals;
the specific implementation steps are as follows: firstly, randomly pairing individuals in a population, secondly, randomly setting cross point positions, and finally, mutually exchanging partial genes between paired chromosomes;
(5) Mutation operation
The mutation operation is to change the gene value of one or some genes of an individual according to a certain small probability, and is also an operation method for generating a new individual;
a mode of randomly generating variation sites is adopted; the specific operation process is as follows: firstly, determining the gene variation position of each individual, namely, the position index representing a certain index in a virtual machine, corresponding to an actual parameter, and then taking other values from the original gene values of variation points according to a certain probability;
(6) The termination condition is that the optimal solution of the problem cannot be known, so that an approximate convergence criterion is adopted to terminate the algorithm, when the optimal solution is unchanged after the population evolves for N generations, the solution at the moment is considered to be the optimal solution, and the algorithm is terminated;
The objective function is:
wherein X is the utilization rate of each index of the physical machine,is the average value of the utilization rate of each index of the physical machine, S 2 The method is characterized in that the method is the variance of the utilization rate of each index of the physical machine, n is the number of the indexes, and Min represents that the smaller the variance is, the more balanced the resources are; time represents real-Time, var (Time) represents real-Time values of resource consumption of a physical machine CPU, a memory, a network, a temporary disk, fix represents fixed, and Var (Fix) represents total values of resources of the physical machine CPU, the memory, the network, the temporary disk, and the like; the indexes such as a CPU, a memory, a network, a temporary disk and the like are calculated independently, and the real-time value of each index is smaller than 0.95 times of the total value;
the calculation steps are as follows: 1. summing the indexes of each virtual machine of the real-time data; 2. calculating the running state of each index of the physical computer; 3. calculating the utilization rate of each index in physics; 4. calculating the variance of each index of the physical computer to check the balance of the resource duty ratio; the smaller the variance the more uniform the resource.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210072238.XA CN116541155A (en) | 2022-01-21 | 2022-01-21 | Exploration and development cloud resource intelligent scheduling method based on machine learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210072238.XA CN116541155A (en) | 2022-01-21 | 2022-01-21 | Exploration and development cloud resource intelligent scheduling method based on machine learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116541155A true CN116541155A (en) | 2023-08-04 |
Family
ID=87444035
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210072238.XA Pending CN116541155A (en) | 2022-01-21 | 2022-01-21 | Exploration and development cloud resource intelligent scheduling method based on machine learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116541155A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118014098A (en) * | 2024-02-04 | 2024-05-10 | 贝格迈思(深圳)技术有限公司 | Machine learning training data scheduling method and equipment |
-
2022
- 2022-01-21 CN CN202210072238.XA patent/CN116541155A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118014098A (en) * | 2024-02-04 | 2024-05-10 | 贝格迈思(深圳)技术有限公司 | Machine learning training data scheduling method and equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110389820B (en) | Private cloud task scheduling method for resource prediction based on v-TGRU model | |
CN110674636B (en) | Power consumption behavior analysis method | |
CN112910690A (en) | Network traffic prediction method, device and equipment based on neural network model | |
CN110188919A (en) | A kind of load forecasting method based on shot and long term memory network | |
CN111950622B (en) | Behavior prediction method, device, terminal and storage medium based on artificial intelligence | |
CN114584406B (en) | Industrial big data privacy protection system and method for federated learning | |
CN113190670A (en) | Information display method and system based on big data platform | |
CN111191825A (en) | User default prediction method and device and electronic equipment | |
CN111210332A (en) | Method and device for generating post-loan management strategy and electronic equipment | |
CN112836750A (en) | System resource allocation method, device and equipment | |
CN111582645B (en) | APP risk assessment method and device based on factoring machine and electronic equipment | |
CN116541155A (en) | Exploration and development cloud resource intelligent scheduling method based on machine learning | |
CN116578436A (en) | Real-time online detection method based on asynchronous multielement time sequence data | |
CN114897085A (en) | Clustering method based on closed subgraph link prediction and computer equipment | |
CN116976318A (en) | Intelligent auditing system for switching operation ticket of power grid based on deep learning and model reasoning | |
CN117155771B (en) | Equipment cluster fault tracing method and device based on industrial Internet of things | |
CN113743453A (en) | Population quantity prediction method based on random forest | |
CN112231299A (en) | Method and device for dynamically adjusting feature library | |
CN111737319B (en) | User cluster prediction method, device, computer equipment and storage medium | |
US11823066B2 (en) | Enterprise market volatility predictions through synthetic DNA and mutant nucleotides | |
CN115913992A (en) | Anonymous network traffic classification method based on small sample machine learning | |
CN116155835B (en) | Cloud resource service quality assessment method and system based on queuing theory | |
CN113486354B (en) | Firmware security assessment method, system, medium and electronic equipment | |
CN117369954B (en) | JVM optimization method and device of risk processing framework for big data construction | |
US11823064B2 (en) | Enterprise market volatility prediction through synthetic DNA and mutant nucleotides |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |