CN111309489A

CN111309489A - Cloud computing resource scheduling method and system for geographic big data

Info

Publication number: CN111309489A
Application number: CN202010222154.0A
Authority: CN
Inventors: 刘静静; 陈曦; 刘敏; 蒋捷; 李志强; 刘小平; 李庆利; 方涛; 霍宏
Original assignee: East China Normal University
Current assignee: East China Normal University
Priority date: 2020-03-26
Filing date: 2020-03-26
Publication date: 2020-06-19
Also published as: CN112035264A; CN112035264B

Abstract

The invention provides a cloud computing resource scheduling method and system facing geographic big data.

Description

Cloud computing resource scheduling method and system for geographic big data

Technical Field

The invention relates to the field of geographic big data application, in particular to a cloud computing resource scheduling method and system for geographic big data.

Background

With the rapid development of technology and the increasing application scale, the current geographic big data application needs to process massive data. In general, big data has a "5V" feature: volume (bulk), Velocity (update speed), Variety (Variety), Value (Value), Veracity (reality). The geographic big data also has the feature of "5V", but the difference between the geographic big data and other big data is whether the geographic big data has a spatiotemporal attribute.

According to the type of the sensor and the recorded object, the geographic big data can be divided into two kinds of big data, namely earth observation and human behavior.

The earth observation data records the characteristics of earth surface elements, and the sensor types of the earth observation data comprise aerospace sensors and earth surface monitoring sensors, and active acquisition is mainly used. The corresponding data comprises satellite, unmanned aerial vehicle remote sensing images and monitoring station data.

Human behavior data records various behaviors of human social interaction, movement, consumption and the like, and sensors thereof comprise: smart card, cell-phone terminal, social media, navigation etc. to passive acquisition mode is given first place to, and the data that produce include: mobile phone signaling, taxi track, internet of things, social media and the like.

The objects of interest for both types of data are "ground" and "person", respectively. The large geographic data comprehensively combines earth observation and large human behavior data, and provides new power, new resources and new visual angles for the human-earth relationship in geography. The structure, granularity and expression mode of the two types of data are different, and a new proposition is provided for the analysis and the processing of the geographic big data. In addition, the great majority of geographic data, and particularly human behavioral data, is largely inattentive, often resulting in biases and uncertainties. Therefore, the geographic big data can be summarized into the characteristics of 5 degrees such as space-time breadth, space-time granularity, space-time skewness, space-time density, space-time precision and the like. How to schedule different data vertically or horizontally from the dimensions of space-time and attributes becomes the key of parallel computation of big geographic data.

In order to process the massive geographic big data and provide high-quality service, a cloud computing technology oriented to the geographic big data needs to be provided. Convenient and fast geographic big data storage and computing services are provided through technologies such as parallel computing, distributed computing, virtualization and load balancing. Compared with traditional distributed computing and parallel computing, the resource pool of cloud computing is usually composed of special servers in advance, and the cloud computing is oriented to numerous types of users, so that some traditional management technologies and resource scheduling are not suitable for the cloud computing environment. And the resource scheduling of cloud computing has new characteristics of large resource flow, allocation according to needs, consumption reduction and energy conservation, support for heterogeneous environment and the like, and the new characteristics of the resource scheduling of cloud computing bring more new challenges to the resource scheduling problem in the cloud. How to process resource scheduling in a cloud computing environment by adopting a more efficient resource scheduling method is a key point of current research.

According to the mode of resource scheduling, the traditional resource scheduling of distributed computation and parallel computation is divided into three types: distributed, hierarchical, and centralized. Hierarchical and decentralized resource scheduling modes are generally suitable for distributed systems and parallel systems. Currently, in a cloud computing environment, cloud computing resources are mainly managed through a virtual resource pool, and task processing and resource scheduling are completed through a data center. Therefore, centralized resource management and scheduling modes are more suitable for cloud computing. At this stage, research on a cloud computing resource scheduling method has achieved certain achievements.

The resource scheduling method based on the economic model, which is proposed by Rajkumura Buyya et al in Australia, is one of the main resource scheduling methods at present, and the resource buyer and seller are allowed to trade through negotiation by analogy of a resource supply and demand relationship and a market economic model in a cloud computing environment, and competition and allocation among resources are dynamically adjusted through price, so that the purpose of resource optimal allocation is achieved. Many resource scheduling methods can only select the most appropriate scheduling policy according to the resource demand state of the current task when the task arrives, and in practice, the state of the task changes during the task execution process, so dynamic scheduling of resources is also a very important problem. Scholars such as Jean-Marc Menaud and Hiennguyen Van in France convert the problem of selecting proper virtual machines for tasks and selecting proper physical machines for virtual machines into a constraint satisfaction problem, and realize dynamic rescheduling of resources. Fabien Hermenier et al studied how to allocate and migrate virtual machines to physical machines and proposed a dynamic resource management and scheduling method that comprehensively considers reconfiguration computation time and virtual machine migration time. And the Wei-Gui-Yi and the like provide a resource scheduling algorithm based on the game theory by comprehensively considering two aspects of optimization and fairness. In the implementation process of the algorithm, the dynamic planning algorithm is firstly utilized to process the independent optimization problem of a single participant, and then the evolutionary algorithm is utilized to process the comprehensive optimization problem of a plurality of participants.

Although researchers have carried out many research works on the resource scheduling problem of cloud computing, the resource scheduling methods are all oriented to underlying physical resources, the balance of system load and the improvement of resource utilization rate are mainly realized by optimizing the physical resource configuration of virtual resources or by a scheme of dynamic migration of virtual machines, and the realization of the methods needs to stop cloud application to complete the dynamic scheduling of resources, and has certain limitations. In addition, some methods consider task types to be scheduled of all users to be consistent, and there are many task types in actual cloud computing, especially in geographic big data computing. If the resources of each user are scheduled under the same framework, the optimal scheduling result cannot be generated generally. How to achieve dynamic scheduling of cloud computing resources of different task types without stopping cloud application becomes a technical problem to be solved urgently.

Disclosure of Invention

The invention aims to provide a cloud computing resource scheduling method and system for geographic big data, so as to achieve dynamic scheduling of cloud computing resources of different task types under the condition of not stopping cloud application.

In order to achieve the purpose, the invention provides the following scheme:

a cloud computing resource scheduling method facing geographic big data comprises the following steps:

the method comprises the steps of obtaining cloud computing scheduling trigger events of geographic big data and one or more resource scheduling templates corresponding to each trigger event in an off-line mode, and establishing a cloud computing resource scheduling rule base;

monitoring the cloud computing process of the geographic big data, and determining a current trigger event;

determining one or more resource scheduling templates of the current trigger event according to the cloud computing resource scheduling rule base;

generating one or more resource scheduling actions according to one or more resource scheduling templates of the current trigger event and the current load flow data information;

acquiring load flow data of J-1 historical moments when a current trigger event occurs and before the occurrence to form a load flow matrix sequence, wherein the load flow matrix sequence comprises the load flow data of the J-1 historical moments when the current trigger event occurs and before the occurrence;

inputting the load flow matrix sequence into a Convolitional LSTM (ConvLSTM) load flow prediction model, predicting each resource scheduling action, and obtaining a load flow prediction value of each resource scheduling action;

and selecting the resource scheduling action with the minimum load flow predicted value as an optimal scheduling scheme to perform cloud resource scheduling.

Optionally, the obtaining, offline, a cloud computing scheduling trigger event of the geographic big data and one or more resource scheduling templates corresponding to each trigger event, and establishing a cloud computing resource scheduling rule base specifically include:

determining cloud computing key performance index data, and establishing a cloud computing key performance index database;

acquiring a cloud computing dynamic resource scheduling target of geographic big data from a cloud environment;

extracting all trigger events of cloud computing off line according to a cloud computing key performance index database and a cloud computing dynamic resource scheduling target;

acquiring cloud computing related subsection information of geographic big data and all resource scheduling data instructions from a cloud environment; the cloud computing related subsection information comprises resource information of a physical machine, running state information of a virtual machine and running state information of a component deployed by the virtual machine;

and obtaining a cloud computing resource scheduling template of each resource scheduling data instruction according to the relevant distribution information of the cloud computing, and establishing a cloud computing resource scheduling template set.

And determining one or more cloud computing resource scheduling templates corresponding to each trigger event according to the resource scheduling template set and the resource scheduling data instruction corresponding to each trigger event to form a cloud computing resource scheduling rule.

Optionally, the inputting the load traffic matrix sequence into a ConvLSTM load traffic prediction model, predicting each resource scheduling action, and obtaining a load traffic prediction value of each resource scheduling action, includes:

superposing a plurality of ConvLSTM layers to form a coding prediction structure which is used as an end-to-end training model of load flow;

acquiring historical cloud computing load-carrying capacity data and historical cloud computing resource scheduling instructions to form a training sample set;

and optimizing a training model by adopting a weighted mean square error loss function based on the training sample set to obtain a ConvLSTM load flow prediction model.

Optionally, the ConvLSTM load traffic prediction model includes a coding network and a prediction network.

A cloud computing resource scheduling system for geographic big data, the scheduling system comprising:

the cloud computing resource scheduling rule base establishing module is used for acquiring cloud computing scheduling trigger events of the geographic big data and one or more resource scheduling templates corresponding to each trigger event in an off-line manner and establishing a cloud computing resource scheduling rule base;

the trigger event monitoring module is used for monitoring the cloud computing process of the geographic big data and determining the current trigger event;

the resource scheduling template determining module is used for determining one or more resource scheduling templates of the current trigger event according to the cloud computing resource scheduling rule base;

the resource scheduling action generating module is used for generating one or more resource scheduling actions according to one or more resource scheduling templates of the current trigger event and the current load flow data information;

the load flow matrix sequence acquisition module is used for acquiring load flow data of J-1 historical moments when and before the current trigger event occurs to form a load flow matrix sequence; the load flow matrix sequence comprises load flow data of J-1 historical moments when and before a current trigger event occurs;

the load flow prediction module is used for inputting the load flow matrix sequence into a Convolitional LSTM (ConvLSTM) load flow prediction model, predicting each resource scheduling action and obtaining a load flow prediction value of each resource scheduling action;

and the optimal scheduling scheme selecting module is used for selecting the resource scheduling action with the minimum load flow predicted value as the optimal scheduling scheme to perform cloud resource scheduling.

Optionally, the cloud computing resource scheduling rule base establishing module specifically includes:

the cloud computing key performance index database establishing submodule is used for determining cloud computing key performance index data and establishing a cloud computing key performance index database;

the cloud computing dynamic resource scheduling target obtaining submodule is used for obtaining a cloud computing dynamic resource scheduling target of geographic big data from a cloud environment;

the trigger event extraction sub-module is used for extracting all trigger events of the cloud computing in an off-line mode according to the cloud computing key performance index database and the cloud computing dynamic resource scheduling target;

the cloud computing related subsection information and resource scheduling data instruction acquisition submodule is used for acquiring cloud computing related subsection information of geographic big data and all resource scheduling data instructions from a cloud environment; the cloud computing related subsection information comprises resource information of a physical machine, running state information of a virtual machine and running state information of a component deployed by the virtual machine;

the cloud computing resource scheduling template set establishing submodule is used for obtaining a cloud computing resource scheduling template of each resource scheduling data instruction according to the relevant distribution information of cloud computing and establishing a cloud computing resource scheduling template set;

and the cloud computing resource scheduling rule establishing submodule is used for determining one or more cloud computing resource scheduling templates corresponding to each trigger event according to the resource scheduling template set and the resource scheduling data instruction corresponding to each trigger event to form a cloud computing resource scheduling rule.

Optionally, the scheduling system further includes:

the training model establishing module is used for superposing a plurality of ConvLSTM layers to form a coding prediction structure as an end-to-end training model of load flow;

the training sample set establishing module is used for acquiring historical cloud computing load-carrying capacity data and historical cloud computing resource scheduling instructions to form a training sample set;

and the model optimization module is used for optimizing the training model by adopting a weighted mean square error loss function based on the training sample set to obtain a ConvLSTM load flow prediction model.

According to the specific embodiment provided by the invention, the invention discloses the following technical effects:

the invention provides a cloud computing resource scheduling method and system facing geographic big data, wherein the scheduling method comprises the steps of firstly obtaining cloud computing scheduling trigger events of the geographic big data and one or more resource scheduling templates corresponding to each trigger event in an off-line manner, and establishing a cloud computing resource scheduling rule base; then monitoring the cloud computing process of the geographic big data, and determining a current trigger event; determining one or more resource scheduling templates of the current trigger event according to the cloud computing resource scheduling rule base; generating one or more resource scheduling actions according to one or more resource scheduling templates of the current trigger event and the current load flow data information; finally, load flow data of J-1 historical moments when and before the current trigger event occurs are obtained to form a load flow matrix sequence; inputting the load flow matrix sequence into a Convolitional LSTM (ConvLSTM) load flow prediction model, predicting each resource scheduling action, and obtaining a load flow prediction value of each resource scheduling action; and selecting the resource scheduling action with the minimum load flow predicted value as an optimal scheduling scheme to perform cloud resource scheduling. According to the method, the resource scheduling action of the trigger event is determined through the cloud computing resource scheduling rule base established in an off-line mode and the trigger event monitored in an on-line mode, the optimal scheduling scheme is determined through the ConvLSTM load flow prediction model, and dynamic scheduling of the cloud computing resources of different task types (the monitored trigger event) is achieved under the condition that cloud application is not stopped.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.

Fig. 1 is a flowchart of a cloud computing resource scheduling method for geographic big data according to the present invention;

FIG. 2 is a schematic structural diagram of a ConvLSTM load flow prediction model provided by the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.

In order to realize efficient scheduling of a plurality of types of tasks, different scheduling modes are selected according to the types of the tasks to be scheduled. In addition, the resource consumption type of the task may change with the execution of the task, if a static resource allocation and scheduling method is adopted, resources may be insufficient or wasted, and manual dynamic resource adjustment has serious hysteresis, so that the execution state of the task needs to be monitored in real time. In addition, task resources need to be dynamically rescheduled at a proper time to achieve balanced use of each task resource (CPU, memory, disk, I/O, network), so as to avoid the occurrence of a single resource bottleneck, increase the density of virtual machines, eliminate hot spots, and improve the service processing capability, which is not involved in the existing task resource scheduling technology.

In order to achieve the purpose, the invention provides the following scheme:

as shown in fig. 1, the present invention provides a cloud computing resource scheduling method for geographic big data, where the scheduling method includes the following steps:

and acquiring cloud computing scheduling trigger events of the geographic big data and one or more resource scheduling templates corresponding to each trigger event in an off-line manner, and establishing a cloud computing resource scheduling rule base.

As shown in fig. 1, the obtaining, offline, a cloud computing scheduling trigger event of geographic big data and one or more resource scheduling templates corresponding to each trigger event and establishing a cloud computing resource scheduling rule base specifically includes:

determining cloud computing key performance index data, and establishing a cloud computing key performance index database; acquiring a cloud computing dynamic resource scheduling target of geographic big data from a cloud environment; extracting all trigger events of cloud computing off line according to a cloud computing key performance index database and a cloud computing dynamic resource scheduling target; specifically, the key performance indexes mainly include CPU utilization, memory utilization, storage space utilization, network resource utilization, and other relevant load-carrying capacity data. The cloud computing resource scheduling target refers to a flow resource utilization rate type self-adaptive dynamic adjustment target. The corresponding distribution information comprises physical machine distribution information, virtual machine distribution information and component service distribution information.

The cloud computing resource scheduling target determines an adjustment range by using Above, Between and Lower keywords, and is described by using key performance indicator data (KPI), wherein the specific format is as follows:

Target target_name:KPI_nameAbove thr_l；

Target target_name:KPI_name Below thr_h；

Target target_name:KPI_name Betweenthr_l thr_h；

where Above indicates that the adjustment should be made Above a given threshold thr _ l, Below indicates that the adjustment should be made Below a given threshold thr _ h, and Between indicates that the adjustment should be made Between the given thresholds thr _ l and thr _ h.

The trigger events are used for triggering cloud computing dynamic resource scheduling, each trigger event corresponds to a dynamic resource scheduling target, and target trends achieved by resource scheduling after the event triggering are illustrated. And establishing a judgment condition of the trigger event according to keywords (Above, Between, Lower) of a resource scheduling target.

Acquiring cloud computing related subsection information of geographic big data and all resource scheduling data instructions from a cloud environment; the cloud computing related subsection information comprises resource information of a physical machine, running state information of a virtual machine and running state information of a component deployed by the virtual machine; acquiring a cloud computing resource scheduling template of each resource scheduling data instruction according to the relevant distribution information of the cloud computing, and establishing a cloud computing resource scheduling template set; specifically, the resource scheduling data instruction is a specific execution instruction for performing dynamic resource scheduling in cloud computing, each resource corresponds to a specific resource scheduling action, and the specific instruction format is as follows:

Add_CPU(VM,Num)；

Reduce_CPU(VM,Num)；

the above two instructions are used to increase or decrease the number of target virtual machine CPUs, where VM is the target virtual machine name and Num is the increase or decrease of a specific number.

The resource scheduling template comprises a resource scheduling function target, a scheduling mode and scheduling related constraint conditions. The form of the template is as follows:

Resource Schedule Template:Add_VM_i_CPU；

ObjectName (scheduling object): VM_i；

Operation (scheduling instruction) Add _ CPU (VM, Num);

Constraints:VM_i.status＝RUNNI NG

VM_i.CPU.count+Num≤VM_i.CPU.Thr.count

the above example describes when a VM_iWhile in the run state, the VM is required due to resource requirements_iScheduling CPU, and utilizing scheduling instruction Add CPU (VM, Num) to schedule resource to act on object VM_iReasonably scheduling, and executing the constraint condition of VM_iState of (VM)_iStatus) and virtual machine VM_iUpper limit of CPU count (VM)_i.CPU.Thr.count)。

And determining one or more cloud computing resource scheduling templates corresponding to each trigger event according to the resource scheduling template set and the resource scheduling data instruction corresponding to each trigger event to form a cloud computing resource scheduling rule. Specifically, each cloud computing resource scheduling rule includes a corresponding trigger event and a resource scheduling template set. In the online stage, after the cloud computing related KPI exceeds a threshold value, a trigger event is triggered, the corresponding trigger event filters the resource scheduling template set, the resource scheduling templates which are in accordance with the trigger event are reserved, and the resource scheduling rules and the trigger event form a cloud computing resource scheduling rule.

The resource scheduling rules are used to describe which resource scheduling actions can be executed to achieve the resource scheduling goal when the load traffic needs to be reallocated, and each cloud computing resource scheduling rule can be described as:

Condition:Trigger Event

Resource Schedules:{Template₁,Template₁…Template_i}

{Template₁,Template₁…Template_iand the Trigger Event is a corresponding Trigger Event.

determining one or more resource scheduling templates of the current trigger event according to the cloud computing resource scheduling rule base; specifically, each rule includes a plurality of resource scheduling templates, wherein each template can be automatically generated by a scheduling action target, a scheduling mode and a resource scheduling data instruction. And after the template is automatically generated, screening the template set according to the type of the trigger event, reserving the resource scheduling template meeting the requirement, and forming a resource scheduling rule with the trigger event.

Generating one or more resource scheduling actions according to one or more resource scheduling templates of the current trigger event and the current load flow data information; specifically, a group of resource scheduling actions is generated through the determined resource scheduling template and the current load flow data information, different resource adjustment amounts correspond to different resource scheduling actions, and the process of generating each resource scheduling action is the determined resource scheduling data instruction parameter (resource adjustment amount).

Acquiring load flow data of J-1 historical moments when a current trigger event occurs and before the occurrence to form a load flow matrix sequence, wherein the load flow matrix sequence comprises the load flow data of the J-1 historical moments when the current trigger event occurs and before the occurrence; the dynamic system observed for a spatial region represented by an M x M grid of M rows and M columns has a corresponding measurement P in each cell in the grid. The tensors X epsilon R can be used for measuring values (load capacity data) containing spatial information at any time^P×M×MTo indicate that R represents the observed characteristic region. The measurement results are recorded periodically, a set of inclusions can be obtained

Of (2) a

Inputting the load flow matrix sequence into a ConvLSTM load flow prediction model, predicting each resource scheduling action, and obtaining a load flow prediction value of each resource scheduling action; specifically, different resource scheduling actions have different execution effects, so that in order to perform more reasonable scheduling on resources and use computer resources in a balanced manner, a prediction model is required to predict inter-zone load flow in advance, and the time-space characteristics of geographic big data can be captured better. Taking the load flow matrix sequence as the input of a prediction model, wherein the input is X belongs to R^P×M×MIs generated by a predictive model to O e R^M×MI.e. the predicted value of the load traffic matrix at the next moment. As shown in fig. 2, the ConvLSTM load traffic prediction model includes a coding network and a prediction network. The initial state of the prediction network and the unit output are obtained by copying the last state of the coding network. Both networks are formed by stacking multiple ConvLSTM layers, with all states of the predicted network concatenated into a 1x1 convolutional layer to generate the prediction, since the prediction target has the same dimensions as the input.

As a preferred embodiment, before inputting the load traffic matrix sequence into a ConvLSTM load traffic prediction model to predict each resource scheduling action and obtain a load traffic predicted value of each resource scheduling action, the method further includes:

superposing a plurality of ConvLSTM layers to form a coding prediction structure which is used as an end-to-end training model of load flow; acquiring historical cloud computing load-carrying capacity data and historical cloud computing resource scheduling instructions to form a training sample set; and optimizing a training model by adopting a weighted mean square error loss function based on the training sample set to obtain a ConvLSTM load flow prediction model. In order to improve the performance of the training model and obtain a better prediction result, a weighted mean square error loss function is used for optimizing the prediction model:

respectively representing the standard value and the predicted value, and the weight value w_iThe weights emphasize more errors that occur on significant pixels of the ground truth S. Verified, w_iThe value of 2-3 is optimal.

Specifically, ConvLSTM is an LSTM unit that combines convolution operations, and matrix multiplication vector operations in the conventional LSTM are replaced with convolution operations, so that matrix reception or higher-dimensional data reception at a single time step can be realized. Because the convolution operation is suitable for a matrix with any dimensionality, the ConvLSTM can be used for performing feature learning on input data in two dimensionalities of space and time, and the method is very suitable for predicting the interval flow of a future time period.

A coding prediction structure is formed by superposing a plurality of ConvLSTM layers, an end-to-end training model of load flow is established, and then the model is optimized by a weighted mean square error loss function to form a load flow prediction model.

Within a single time period, the calculated correlation information is contained in a flow vector, the flow pattern will change with time, and thus the correlation information that changes with time is contained in a continuous flow matrix sequence.

Assuming that the dynamic system observed in a spatial region is represented by a sequence of M × M matrices of M rows and M columns, each cell of the matrix contains time-dependent measured values P, so that the measured values at any time can be represented by a tensor X ∈ R^P ^×M×MR represents the observed characteristic region. The observations at different times may be represented as a set

The space-time computation load sequence of the future time period and the interval is expressed as:

where K represents the number of predicted time segments and J represents the historical time segment (including the current time).

In ConvLSTM, all inputs are x₁，...，x_tThe corresponding cell state is C₁，...，C_tHidden state is H₁，...，H_tAnd a gating signal i of ConvLSTM_t，f_t，o_tBoth 3D tensors, where the latter two dimensions are spatial dimensions (rows and columns). By using convolution operators in the state-to-state and input-to-state transitions, ConvLSTM can enable the determination of the future state of a cell in a trellis from the input of local neighborhoods and past states. The key equation for ConvLSTM is as follows, where "-" denotes the convolution operator, "omicron" denotes the Hadamard product:

i_t＝σ(W_xi*X_t+W_hi*H_t-1+W_ciοC_t-1+b_i)

f_t＝σ(W_xf*X_t+W_hf*H_t-1+W_cfοC_t-1+b_f)

C_t＝f_tοC_t-1+i_tοtanh(W_xc*X_t+W_hc*H_t-1+b_c)

o_t＝σ(W_xo*X_t+W_ho*H_t-1+W_coοC_t+b_o)

H_t＝o_tοtanh(C_t)

the invention also provides a cloud computing resource scheduling system facing the geographic big data, which comprises:

the cloud computing resource scheduling rule base establishing module is used for acquiring cloud computing scheduling trigger events of the geographic big data and one or more resource scheduling templates corresponding to each trigger event in an off-line mode and establishing a cloud computing resource scheduling rule base.

The cloud computing resource scheduling rule base establishing module specifically comprises: the cloud computing key performance index database establishing submodule is used for determining cloud computing key performance index data and establishing a cloud computing key performance index database; the cloud computing dynamic resource scheduling target obtaining submodule is used for obtaining a cloud computing dynamic resource scheduling target of geographic big data from a cloud environment; the trigger event extraction sub-module is used for extracting all trigger events of the cloud computing in an off-line mode according to the cloud computing key performance index database and the cloud computing dynamic resource scheduling target; the cloud computing related subsection information and resource scheduling data instruction acquisition submodule is used for acquiring cloud computing related subsection information of geographic big data and all resource scheduling data instructions from a cloud environment; the cloud computing related subsection information comprises resource information of a physical machine, running state information of a virtual machine and running state information of a component deployed by the virtual machine; the cloud computing resource scheduling template set establishing submodule is used for obtaining a cloud computing resource scheduling template of each resource scheduling data instruction according to the relevant distribution information of cloud computing and establishing a cloud computing resource scheduling template set; and the cloud computing resource scheduling rule establishing submodule is used for determining one or more cloud computing resource scheduling templates corresponding to each trigger event according to the resource scheduling template set and the resource scheduling data instruction corresponding to each trigger event to form a cloud computing resource scheduling rule.

And the trigger event monitoring module is used for monitoring the cloud computing process of the geographic big data and determining the current trigger event.

and the resource scheduling action generating module is used for generating one or more resource scheduling actions according to the one or more resource scheduling templates of the current trigger event and the current load flow data information.

The load flow matrix sequence obtaining module is used for obtaining load flow data at J-1 historical moments when the current trigger event occurs and before the current trigger event occurs to form a load flow matrix sequence, wherein the load flow matrix sequence comprises the load flow data at the J-1 historical moments when the current trigger event occurs and before the current trigger event occurs.

And the load flow prediction module is used for inputting the load flow matrix sequence into a Convolitional LSTM (ConvLSTM) load flow prediction model, predicting each resource scheduling action and obtaining a load flow prediction value of each resource scheduling action. The ConvLSTM load flow prediction model comprises a coding network and a prediction network.

As a preferred embodiment, the scheduling system further includes: the training model establishing module is used for superposing a plurality of ConvLSTM layers to form a coding prediction structure as an end-to-end training model of load flow; the training sample set establishing module is used for acquiring historical cloud computing load-carrying capacity data and historical cloud computing resource scheduling instructions to form a training sample set; and the model optimization module is used for optimizing the training model by adopting a weighted mean square error loss function based on the training sample set to obtain a ConvLSTM load flow prediction model.

The existing cloud computing resource allocation and scheduling technologies for the geographic big data are few, and the geographic big data processing often needs to face data of different geographic patterns, distribution and processes, and needs to perform vertical scheduling or horizontal scheduling on different data from space-time and attribute dimensions.

The invention aims at solving the technical problem of providing a cloud computing dynamic resource scheduling method based on a deep learning prediction algorithm, which can be used for carrying out space-time two-dimensional cloud computing on large geographic data, comprehensively calculating required computing resources for the large geographic data of different types in advance according to the time-space characteristics of the large geographic data, reasonably scheduling and balancing the use of computer resources, reducing the consumption of data transmission with lower expenditure, realizing better load balancing and improving the computing efficiency.

The big geographic data comprise data with different properties such as satellites, unmanned aerial vehicle remote sensing images and monitoring stations. Human behavior data is closely related to time, and computing resources need to be allocated in a time-sharing mode. The earth observation is closely related to the geographic position, and the computing resources are required to be allocated according to places. Time series prediction is common nowadays, but the geographic big data needs space-time series joint prediction.

The invention provides a load flow prediction model for comprehensively calculating required computing resources in advance according to the time-space characteristics of different types of geographic big data, and a cloud computing dynamic resource scheduling method based on the model.

The invention deconstructs the geographical big data and the load flow data into a time-space two-dimensional sequence, and comprehensively considers the data space-time attributes of the geographical big data and the load flow.

The model calculates load flow data and geographical big data as an input matrix sequence by means of given two-dimensional space-time intervals, and calculates the load flow matrix sequence of a plurality of time slice intervals in the future. And obtaining an optimal load balancing optimization strategy through a prediction result of the model, and realizing reasonable scheduling of resources.

The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.

The principles and embodiments of the present invention have been described herein using specific examples, which are provided only to help understand the method and the core concept of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the above, the present disclosure should not be construed as limiting the invention.

Claims

1. A cloud computing resource scheduling method for geographic big data is characterized by comprising the following steps:

acquiring load flow data of J-1 historical moments when and before a current trigger event occurs to form a load flow matrix sequence; the load flow matrix sequence comprises load flow data of J-1 historical moments when and before a current trigger event occurs;

inputting the load flow matrix sequence into a ConvLSTM load flow prediction model, predicting each resource scheduling action, and obtaining a load flow prediction value of each resource scheduling action;

2. The method for scheduling cloud computing resources for big geographic data according to claim 1, wherein the step of obtaining the cloud computing scheduling trigger events for big geographic data and the one or more resource scheduling templates corresponding to each trigger event offline and establishing the cloud computing resource scheduling rule base specifically comprises:

acquiring a cloud computing resource scheduling template of each resource scheduling data instruction according to the relevant distribution information of the cloud computing, and establishing a cloud computing resource scheduling template set;

3. The method for scheduling resource of cloud computing facing to geographic big data according to claim 1, wherein the load traffic matrix sequence is input into a ConvLSTM load traffic prediction model, each resource scheduling action is predicted, and a load traffic predicted value of each resource scheduling action is obtained, and the method further comprises:

4. The geographic big data-oriented cloud computing resource scheduling method according to claim 3, wherein the ConvLSTM load traffic prediction model comprises a coding network and a prediction network.

5. A cloud computing resource scheduling system for geographic big data is characterized in that the scheduling system comprises:

the load flow prediction module is used for inputting the load flow matrix sequence into a ConvLSTM load flow prediction model, predicting each resource scheduling action and obtaining a load flow prediction value of each resource scheduling action;

6. The cloud computing resource scheduling system for geographic big data according to claim 5, wherein the cloud computing resource scheduling rule base establishing module specifically includes:

7. The cloud computing resource scheduling system facing geographic big data according to claim 5, wherein the scheduling system further comprises:

8. The cloud computing resource scheduling system facing geographic big data according to claim 7, wherein the ConvLSTM load traffic prediction model comprises a coding network and a prediction network.