CN117492934B

CN117492934B - Data processing method and system based on cloud service intelligent deployment

Info

Publication number: CN117492934B
Application number: CN202410001773.5A
Authority: CN
Inventors: 丁新云; 杨作铭; 孙军远
Original assignee: Eden Information Service Ltd
Current assignee: Eden Information Service Ltd
Priority date: 2024-01-02
Filing date: 2024-01-02
Publication date: 2024-04-16
Anticipated expiration: 2044-01-02
Also published as: CN117492934A

Abstract

The invention belongs to the technical field of cloud service deployment and discloses a data processing method and a system based on cloud service intelligent deployment.

Description

Data processing method and system based on cloud service intelligent deployment

Technical Field

The invention belongs to the technical field of cloud service deployment, and particularly relates to a data processing method and system based on intelligent cloud service deployment.

Background

With the integration of big data operation and cloud service platforms by more and more cloud service providers, the problem of performance optimization of big data application of the cloud platform becomes an important problem gradually, and on the premise that the physical node configuration and the whole scale of the cloud service platform are already determined, the allocation strategy of optimizing virtual resources, the deployment strategy of application and the calling of service requirements become effective means for further improving the application performance; however, although the novel service mode of the cloud service platform saves economic and technical expenses for common users, with the increase of the deployment amount of big data application on the platform, the job request processing mode in a multi-user scene brings new challenges for the design of a job scheduling mechanism; because different applications are uniformly deployed on the cloud service platform, cloud service providers often need to process multiple application requests from different users at the same time, and different users have different requirements on big data application service quality (Quality of Service, qoS), new influencing factors are added for job scheduling on the cloud platform, and therefore the difficulty of job scheduling is increased.

Disclosure of Invention

In view of the above, the present invention provides a data processing method and system based on intelligent deployment of cloud services, which can improve the overall performance of the operation, optimize the big data operation on the cloud service platform and improve the service quality, so as to solve the above technical problems.

In a first aspect, the present invention provides a data processing method based on intelligent deployment of cloud services, including the following steps:

Acquiring a call request of submitting a big data job by a SaaS user to a cloud service platform, counting the calculated amount of the big data job according to the scheduling request, and connecting virtual machines of which resource layers are distributed on all physical computing nodes through a virtual network to form a virtual computing cluster for executing the big data job, wherein a physical server of the cloud service platform comprises computing nodes and storage nodes, and the cloud service platform comprises the SaaS user, a SaaS provider and a laaS provider;

Constructing a service optimization model of a SaaS user and a resource allocation model of a laaS provider according to a virtual computing cluster, and determining a virtual cluster optimization model of big data operation based on the service optimization model and the resource allocation model, wherein the virtual cluster optimization model comprises the deployment number of communication agents, the deployment position of the communication agents and the mapping relation between the communication agents and the virtual machines;

selecting a corresponding mirror image copy on a storage node according to the virtual cluster optimization model, and loading the mirror image copy into a virtual computing cluster constructed by a resource layer, so that the virtual computing cluster becomes a computing platform for executing big data operation;

and constructing a virtual big data operation platform on the physical cluster to execute big data operation, outputting a calculation result, and inputting the calculation result into a trained antagonistic neural network to predict QoS so as to complete data processing of intelligent deployment of cloud service.

As a further preferred aspect of the above technical solution, constructing a service optimization model of the SaaS user and a resource allocation model of the laaS provider according to the virtual computing cluster includes:

the SaaS user optimization is performed at the application layer of the SaaS provider, and the optimized utility expression obtained by the SaaS user is as follows Wherein the set of users I is I and the set of service providers S is S,/> Representing the total service capacity of the service provider; /(I) Representing the service amount of user i,/> Representing the utility of user i to the service volume;

presetting Lagrangian function as Wherein/> Is a Lagrangian factor,/> Solving the Lagrangian function for the price charged by the SaaS provider for providing per unit of service results in an optimal solution for optimal utility, the Lagrangian function being rewritten as/> The objective function of the dual problem of the SaaS layer quality of service optimization model is/> ;

Wherein, the liquid crystal display device comprises a liquid crystal display device, The expression of the dual problem D1 is Wherein/> representing the price charged by the SaaS provider for providing services per unit,/> representing the amount of service offered by the SaaS provider to the user,/> Representing the user i at the service volume/> the service charge to be obtained is set up, representing the current price/>, of user i Maximum benefit under/> Representing all the obtained service prices for users i, each user i may be based on/> Obtain a/> is a maximum value of (a).

As a further preferable aspect of the above technical solution, the set of SaaS providers S is S, the set of laaS infrastructure cloud resource providers is P, and the expression of the SaaS providers for obtaining the optimized utility of the cloud resource infrastructure is Wherein/> representing CPU computing resources provided by laaS provider,/> Representing memory resources provided by the laaS provider, Representing storage resources offered by laaS provider,/> Representing the maximum CPU computing resource offered by laaS provider,/> Representing the maximum memory resource offered by laaS provider,/> Representing the maximum storage resource offered by laaS provider,/> Representing the utility function of the SaaS provider in the service requirements.

As a further preferred aspect of the above technical solution, determining a virtual cluster optimization model of a big data job based on a service optimization model and a resource allocation model includes:

Presetting that task execution time consumption of the jth virtual machine in three sub-stages in a Map stage is respectively as follows on a specific ith computing node 、/> And/> then there is/> And respectively carrying out quantitative modeling on task execution time consumption of three sub-phases:

The time consumption of the data import stage is determined by the size S of the input file and the average transmission rate B of the network, the data transmission time delay between any two computing nodes is approximately equal, and the lower bound of the time consumption of the data import stage is approximately quantized into ;

If a communication agent is deployed on a certain computing node, all virtual machine communication on the computing node is responsible for the communication agent; if the target virtual machine is a host machine of the Mapper node, deploying a communication proxy, wherein the time consumption of data transmission in the physical node is negligible; if the host of the Mapper node does not have a communication proxy, the Mapper node may additionally undergo a transmission across physical nodes, and the time consumption in the data forwarding stage is determined by the traffic across physical nodes, specifically ;

The time consumption of the data processing stage is determined by the size of the data block processed by the Mapper node and the computing performance of the virtual node, and the data processing efficiency of each Mapper node is determined by the working efficiency of the VM on the idle host and VM concurrency quantity on host/> Determining, fitting the variation trend of the average performance of the VM to obtain a performance function of the jth virtual machine on a specific ith computing node, wherein the performance function is/> ;

when the computing resource is idle, corresponding Map tasks are sequentially executed, the data volume processed by the Mapper node is in direct proportion to the average performance of the node, and the time consumption of the Mapper node in the data processing sub-stage is obtained by combining the performance function of the specific node According to/> 、 And/> The performance expression of the (1) is obtained on a specific ith computing node, and the overall task execution time/>, in the Map stage, of the jth Mapper node ; k communication agents are deployed in preset computing nodes and shared/> The VM and the communication agent are deployed on the same computing node, and the total task time of all Mapper nodes and Map stages of the whole virtual computing cluster is calculated

；

The Map stage task execution comprises three sub-stages: a data importing stage, in which an input file is split into a plurality of data blocks and transmitted to each physical node deploying a communication agent; in the data forwarding stage, the data block is further forwarded to a target virtual machine, namely a Mapper node, through a communication proxy; in the data processing stage, the Mapper node processes the forwarded data block and generates an intermediate processing result;

Wherein m represents the number of physical computing nodes in the cloud service platform, N represents the total number of VMs deployed on the physical computing nodes, k represents the number of deployments of the communication agents in the virtual computing cluster, representing the number of VMs deployed on the ith compute node, S representing the file size that the job needs to process, B representing the average rate of data transfer between compute nodes; /(I) The method comprises the steps of representing and judging whether a communication agent is deployed on an ith computing node, wherein the deployment value is 1, and the undeployment value is 0; /(I) representing the average data processing efficiency of a virtual machine over unloaded computing nodes,/> representing decay rate of VM computational performance while concurrently working with a communication agent,/> representing the average data processing efficiency of the jth virtual machine on the ith compute node,/> representing the maximum number of deployable VMs on the ith compute node,/> Representing the deployment location of the ith communication agent,/> Representing a proxy deployment location responsible for jth virtual machine communication on an ith compute node,/> Representing the total number of VMs deployed at the same compute node as the communication agent.

As a further preferred option of the above technical solution, intermediate results of data processing output by each Mapper node in the Map stage are summarized to the Reducer nodes in the virtual cluster, and aggregation of the intermediate results is performed on each Reducer node, and then the Reducer nodes process the aggregated results to generate a final result, where data transmission in the Reducer stage includes: the Mapper node transmits the intermediate result data of the Map stage to the corresponding communication proxy; communication dialing of the Mapper node further transmits the data to a communication proxy of the Reducer node; the communication agent of the Reducer node transmits data to the Reducer node to complete aggregation of intermediate results;

in the first stage of Reduce, the intermediate result of Map task output is transmitted from the Mapper node to each communication proxy, and the overall communication performance of the cluster in this stage is determined Is the same as the total number of VMs of the communication proxy host the data transmission of the first stage of the VM, which is the same as the communication proxy host, is node internal communication, the time consumption of cross-node communication is not required to be experienced, and the overall data transmission time consumption of the first stage cluster is/> Wherein/> Representing the average size of the intermediate file on each virtual node at the end of the Map phase,/> Representing the number of Reducer nodes in the Reducer stage virtual cluster;

In the second Reduce stage, data are uniformly transmitted to the communication agent of the Reducer node by the communication agent of each Mapper node, and random selection is performed in a preset cluster The Reducer nodes are used for executing the Reduce task and are responsible for/> The agent for communication of each VM becomes a destination node for transmission in the second stage, the traffic is different from other non-destination agent nodes, and the probability of containing a Reducer in the VM of each agent responsible for communication in k communication agents is/> The number of destination agents/>, among the k communication agents The value of (1) is expected to be/> And/> identical/>, each communication proxy host The VMs will experience one less data transfer across the physical nodes in the second phase,/> The desired computing expression for the value of (a) is/> Binding/> The value of (1) expects to obtain the total data transmission time of the virtual cluster formed by N VMs in the second stage of Reduce as/> ;

in the third stage of Reduce, data is transmitted by the communication agent of the Reducer to the node of the Reducer for data summarization and processing, and if the Reducer is the same as the host of the communication agent, the Reducer will not need to undergo cross-physical node communication, and the number of Reducer with the agent deployed on the host Is/> The rest of the individual Reduce will undergo data communication across the physical nodes in the third phase, and the overall data transfer in the third phase of Reduce is time consuming/> the overall time overhead of the virtual cluster reduce task execution stage is obtained as/>, by combining the first stage, the second stage and the third stage .

As a further preferred aspect of the above technical solution, the optimal construction of the virtual cluster is achieved by determining an optimal number of communication agents, an optimal deployment location of each communication agent, and an optimal mapping relationship between the communication agents and the virtual machine.

As a further preferred aspect of the foregoing technical solution, selecting a corresponding mirror image copy on a storage node according to a virtual cluster optimization model, and loading the mirror image copy into a virtual computing cluster constructed by a resource layer, so that the virtual computing cluster becomes a computing platform for executing big data jobs, including:

The method comprises the steps that the optimization construction of a virtual computing cluster is completed in a physical computing cluster, a resource basis is provided for the execution of big data jobs, and a management node of a cloud service platform searches suitable application copies in storage clusters storing various different application images according to application types required by users;

The application mirror image is loaded into the virtual cluster, so that the integration of the computing resource and the upper layer application is realized, a platform for executing the specific application is formed, and the virtual platform can execute the application requested by the user and output an execution result.

As a further preferred aspect of the above solution, the training process for generating the countermeasure network includes:

In a cloud service invocation scenario, there is a user sequence And cloud service sequence At this time, the user invokes the service to generate a plurality of QoS matrixes, and if QoS is the response time, the matrixes/> Middle/> Representing response time generated when user i invokes service j;

Generator use The time sequence training generator in the moment converts the call characteristic information of the next moment into noise data every time the call characteristic information of the next moment is input into the weight of each hidden neuron, and generates a QoS predicted value for user service of the next moment, wherein the generator inputs the time sequence, outputs the QoS predicted value of the future moment, and the expression from the input noise to the output QoS is/> Wherein/> Indicating that user u is at/> Predicted value of moment about QoS,/> Representing t real call sequences generated by u users on s server calls,/> Represents the/> characteristic information called in each calling sequence,/> Representation of practical/> invoking a sequence training generator function in time, wherein the sequence training generator function comprises weights for all hidden neurons;

The loss function of the generator is the error of the QoS predicted value and the QoS true value, and the loss function calculation expression of the generator G is Wherein/> Representing the true QoS value at time t,/> a predicted QoS value representing time t;

Mapping the dimension of the real data into the same dimension as the input layer of the GRU network by using the full connection layer, learning the distribution characteristics of each characteristic in the real data and the fitting process of QoS, constructing a function from a historical variable to the current value of a certain dimension variable, wherein the expression of the fitting process is that Wherein/> Representing the predicted value of QoS at time t, Indicating time t/> Weight vector of dimension,/> Representing the current moment/> Feature vector of user and service call record of dimension,/> indicating the error at the current time.

As a further preferred aspect of the above solution, the countermeasure network further includes a discriminator model, the discriminator D discriminating true or false from the true data and the predicted data set generated by the generator, giving a probability value between 0 and 1 to the record in the input D, the loss function expression of D being wherein the input time sequence of the arbiter is mapped into/>, through the fully connected neural network The larger the probability value, the more true the currently input time series is, whereas the higher the probability of being input as a predicted value is.

In a second aspect, the present invention further provides a data processing system based on intelligent deployment of cloud services, including:

The system comprises a request acquisition unit, a cloud service platform and a resource layer management unit, wherein the request acquisition unit is used for acquiring a call request of submitting a big data job to the cloud service platform by a SaaS user, counting the calculated amount of the big data job according to a scheduling request, and connecting virtual machines of which resource layers are distributed on all physical calculation nodes through a virtual network to form a virtual calculation cluster for executing the big data job, wherein a physical server of the cloud service platform comprises calculation nodes and storage nodes, and the cloud service platform comprises the SaaS user, a SaaS provider and a laaS provider;

The system comprises a model construction unit, a virtual cluster optimization model, a virtual machine and a virtual machine, wherein the model construction unit is used for constructing a service optimization model of a SaaS user and a resource allocation model of a laaS provider according to a virtual computing cluster, and determining a virtual cluster optimization model of big data operation based on the service optimization model and the resource allocation model, wherein the virtual cluster optimization model comprises the deployment number of communication agents, the deployment position of the communication agents and the mapping relation between the communication agents and the virtual machine;

The platform determining unit is used for selecting a corresponding mirror image copy on the storage node according to the virtual cluster optimization model and loading the mirror image copy into the virtual computing cluster constructed by the resource layer, so that the virtual computing cluster becomes a computing platform for executing big data operation;

The data processing unit is used for constructing a virtual big data operation platform on the physical cluster to execute big data operation, outputting a calculation result, and inputting the calculation result into the trained antagonistic neural network to predict QoS so as to complete data processing of intelligent deployment of cloud service.

The invention provides a data processing method and a system based on cloud service intelligent deployment, which are characterized in that a call request of a large data job is submitted to a cloud service platform by a SaaS user, the calculation amount of the large data job is counted according to a scheduling request, virtual machines of resource layers on all physical calculation nodes are connected through a virtual network to form a virtual calculation cluster for executing the large data job, a service optimization model of the SaaS user and a resource allocation model of a laaS provider are built according to the virtual calculation cluster, a virtual cluster optimization model of the large data job is determined based on the service optimization model and the resource allocation model, a corresponding mirror image copy is selected on a storage node according to the virtual cluster optimization model and is loaded into the virtual calculation cluster constructed by the resource layers, so that the virtual calculation cluster is a calculation platform for executing the large data job, the virtual large data job platform is constructed on a physical cluster to execute the large data job and output a calculation result, the calculation result is input into a trained antagonistic neural network to conduct QoS prediction so as to complete data processing of cloud service deployment, the overall performance of the large data job is saved for the user, the cloud service quality is better, and the cloud service quality is optimized for the user.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a data processing method based on intelligent deployment of cloud services;

FIG. 2 is a block diagram of a data processing system based on intelligent deployment of cloud services.

Description of the embodiments

Reference will now be made in detail to the embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the like or similar elements throughout or elements having like or similar functionality; the embodiments described below by referring to the drawings are illustrative only and are not to be construed as limiting the invention.

Referring to fig. 1, the invention provides a data processing method based on intelligent deployment of cloud services, which comprises the following steps:

s1: acquiring a call request of submitting a big data job by a SaaS user to a cloud service platform, counting the calculated amount of the big data job according to the scheduling request, and connecting virtual machines of which resource layers are distributed on all physical computing nodes through a virtual network to form a virtual computing cluster for executing the big data job, wherein a physical server of the cloud service platform comprises computing nodes and storage nodes, and the cloud service platform comprises the SaaS user, a SaaS provider and a laaS provider;

S2: constructing a service optimization model of a SaaS user and a resource allocation model of a laaS provider according to a virtual computing cluster, and determining a virtual cluster optimization model of big data operation based on the service optimization model and the resource allocation model, wherein the virtual cluster optimization model comprises the deployment number of communication agents, the deployment position of the communication agents and the mapping relation between the communication agents and the virtual machines;

s3: selecting a corresponding mirror image copy on a storage node according to the virtual cluster optimization model, and loading the mirror image copy into a virtual computing cluster constructed by a resource layer, so that the virtual computing cluster becomes a computing platform for executing big data operation;

S4: and constructing a virtual big data operation platform on the physical cluster to execute big data operation, outputting a calculation result, and inputting the calculation result into a trained antagonistic neural network to predict QoS so as to complete data processing of intelligent deployment of cloud service.

In this embodiment, constructing a service optimization model of a SaaS user and a resource allocation model of a laaS provider according to a virtual computing cluster includes: the SaaS user optimization is performed at the application layer of the SaaS provider, and the optimized utility expression obtained by the SaaS user is as follows Wherein the set of users I is I and the set of service providers S is S,/> Representing the total service capacity of the service provider; presetting Lagrangian function as Wherein/> Is a Lagrangian factor,/> Solving the Lagrangian function for the price charged by the SaaS provider for providing per unit of service results in an optimal solution for optimal utility, the Lagrangian function being rewritten as/> The objective function of the dual problem of the SaaS layer quality of service optimization model is/> ; wherein, the liquid crystal display device comprises a liquid crystal display device, The expression of the dual problem D1 is/> Wherein/> representing the price charged by the SaaS provider for providing services per unit,/> representing the amount of service offered by the SaaS provider to the user,/> Representing the user i at the service volume/> Lower acquired service charge,/> representing the current price/>, of user i Maximum benefit under/> Representing all the obtained service prices for users i, each user i may be based on/> Obtain a/> is a maximum value of (a).

It should be noted that, the set of SaaS providers S is S, the set of laaS infrastructure cloud resource providers P is P, and the expression of the SaaS providers for obtaining the optimized utility of the cloud resource infrastructure is Wherein/> representing CPU computing resources provided by laaS provider,/> Representing memory resources provided by laaS provider,/> Representing storage resources offered by laaS provider,/> Representing the maximum CPU computing resource offered by laaS provider,/> Representing the maximum memory resource offered by laaS provider,/> Representing the maximum storage resource offered by the laaS provider. Optimal construction of virtual clusters is achieved by determining an optimal number of communication agents, optimal deployment location for each communication agent and optimal mapping relationship between communication agents and virtual machines,/> Representing the utility function of the SaaS provider in the service requirements.

It should be understood that before deploying big data applications, a cloud service provider preferably needs to build a cloud service platform with a certain scale, and typically, a physical server of the cloud service platform is divided into a computing node and a storage node, which are respectively used for an upper layer application to allocate virtual computing resources and provide application copy storage services; each physical computing node of the cloud service platform returns to allocate different numbers of virtual machines for platform users in advance, so as to provide computing capability for big data operation, and meanwhile, according to a block storage cloud service provider, mirror image copies of big data applications which can be called are deployed in advance in storage nodes of the cloud service platform; when different users submit call requests of big data jobs to a cloud service platform, a job scheduling module of the platform stores the user requests in a unified scheduling queue, then determines the execution sequence of the job requests in the scheduling queue according to a specific job scheduling strategy, and when the scheduling module determines an application to be executed at a specific moment, a big data job platform constructed on a bottom physical cluster, namely a computing platform, executes specific big data processing jobs and outputs a settlement result; after the execution of the current job is completed, the job layer scheduling module can continuously schedule job requests of other users, and the resource layer and the platform layer repeat the virtual cluster construction and application copy loading processes so as to complete the execution process of other big data jobs; the method comprises the steps of acquiring a call request of a SaaS user for submitting big data jobs to a cloud service platform, counting the calculated amount of the big data jobs according to the scheduling request, distributing a resource layer on virtual machines of all physical calculation nodes to form a virtual calculation cluster for executing the big data jobs through virtual network connection, constructing a service optimization model of the SaaS user and a resource allocation model of a laaS provider according to the virtual calculation cluster, determining a virtual cluster optimization model of the big data jobs based on the service optimization model and the resource allocation model, selecting corresponding mirror image copies on storage nodes according to the virtual cluster optimization model, loading the mirror image copies into the virtual calculation cluster constructed by the resource layer, so that the virtual calculation cluster becomes a calculation platform for executing the big data jobs, constructing a virtual big data job platform on a physical cluster, executing the big data jobs, outputting calculation results, inputting the calculation results into a trained antagonistic neural network for QoS prediction to complete data processing of intelligent deployment of cloud service, improving the overall performance of the big data jobs, saving the cost of the platform construction and maintenance for the user, and enabling the big data jobs on the cloud service platform to be better optimized and also improving service quality.

Optionally, determining a virtual cluster optimization model of the big data job based on the service optimization model and the resource allocation model includes:

the time consumption of the data processing stage is determined by the size of the data block processed by the Mapper node and the calculation performance of the virtual node, and the data processing efficiency of each Mapper node is determined by the working efficiency of the VM on the idle host and VM concurrency quantity on host/> Determining, fitting the variation trend of the average performance of the VM to obtain a performance function of the jth virtual machine on a specific ith computing node, wherein the performance function is/> ;

when the computing resource is idle, corresponding Map tasks are sequentially executed, the data volume processed by the Mapper node is in direct proportion to the average performance of the node, and the time consumption of the Mapper node in the data processing sub-stage is obtained by combining the performance function of the specific node According to/> 、/>And The performance expression of the (1) is obtained on a specific ith computing node, and the overall task execution time/>, in the Map stage, of the jth Mapper node ; k communication agents are deployed in preset computing nodes and shared/> the VM and the communication agent are deployed on the same computing node, and the total task time of all Mapper nodes and Map stages of the whole virtual computing cluster is calculated ;

In the embodiment, the performances of data communication and data processing in Map and Reducer stages are respectively quantitatively modeled, then a performance optimization model of MapReduce operation on the cluster is designed according to the result of the performance modeling, a model foundation is provided for the subsequent virtual cluster topology construction, and in order to facilitate model yo, a virtual computing cluster consisting of VM (virtual machine) is formed; the public cloud platform has the advantages that the data size required to be processed by a user in a scene of providing large-scale data application for small-scale users is smaller, the performance of a data import stage is called as bottleneck performance of application execution, and the overall execution efficiency of the large-scale data application is jointly determined by the jessamine performance and the communication performance of a virtual cluster, so that the robustness of model optimization is ensured.

Optionally, the intermediate results of the data processing output by each Mapper node in the Map stage are summarized to the Reducer nodes in the virtual cluster, aggregation of the intermediate results is performed on each Reducer node, and then the Reducer nodes process the aggregated results to generate a final result, where the data transmission in the Reduce stage includes: the Mapper node transmits the intermediate result data of the Map stage to the corresponding communication proxy; communication dialing of the Mapper node further transmits the data to a communication proxy of the Reducer node; the communication agent of the Reducer node transmits data to the Reducer node to complete aggregation of intermediate results; in the first stage of Reduce, the intermediate result of Map task output is transmitted from the Mapper node to each communication proxy, and the overall communication performance of the cluster in this stage is determined i.e. the total number of VMs identical to the communication proxy host/> the data transmission of the first stage of the VM, which is the same as the communication proxy host, is intra-node communication, and the time consumption of the inter-node communication is not required to be experienced, and the overall data transmission time consumption of the first stage cluster is that Wherein/> Representing the average size of the intermediate file on each virtual node at the end of the Map phase,/> Representing the number of Reducer nodes in the Reducer stage virtual cluster;

In the second Reduce stage, data are uniformly transmitted to the communication agent of the Reducer node by the communication agent of each Mapper node, and random selection is performed in a preset cluster The Reducer nodes are used for executing the Reduce task and are responsible for/> The agent for communication of each VM becomes a destination node for transmission in the second stage, the traffic is different from other non-destination agent nodes, and the probability of containing a Reducer in the VM of each agent responsible for communication in k communication agents is/> The number of destination agents/>, among the k communication agents The value of (1) is expected to be/> And/or the/> identical/>, each communication proxy host The VMs will experience one less data transfer across the physical nodes in the second phase,/> The desired computing expression for the value of (a) is/> Binding/> The value of (1) expects to obtain the total data transmission time of the virtual cluster formed by N VMs in the second stage of Reduce as/> ;

It should be noted that, selecting a corresponding mirror image copy on a storage node according to a virtual cluster optimization model, and loading the mirror image copy into a virtual computing cluster constructed by a resource layer, so that the virtual computing cluster becomes a computing platform for executing big data jobs, including: the method comprises the steps that the optimization construction of a virtual computing cluster is completed in a physical computing cluster, a resource basis is provided for the execution of big data jobs, and a management node of a cloud service platform searches suitable application copies in storage clusters storing various different application images according to application types required by users; loading the application mirror image into a virtual cluster, integrating computing resources and upper-layer applications, and forming a platform for executing specific applications, wherein the virtual platform can execute the applications requested by a user and produce execution results; the MapReduce data processing flow can be understood approximately as: and firstly carrying out induction arrangement on the disordered input data according to a certain characteristic, and then further processing the intermediate data after induction arrangement to obtain a final result.

Optionally, the training process of generating the countermeasure network includes:

In this embodiment, the countermeasure network further includes a discriminator model for discriminating true and false from the true data and the predicted data set generated by the generator, giving a probability value between 0 and 1 to the record in the input D, and the loss function expression of D is Wherein the input time sequence of the discriminator is mapped into the input time sequence through the fully connected neural network The probability value between the two is larger, the probability value is larger to indicate that the currently input time sequence is true, otherwise, the probability of being input as a predicted value is high; the QoS value can be predicted accurately when the data density is low by adopting the antagonistic neural network, and the service model recommendation by using the lung function attribute of the service can be understood as screening the service with higher service quality from the service with multiple functions so as to improve the experience when the user calls the service, the QoS generated by calling the service by the user is an important reference for service recommendation, and the higher the QoS is called by the service with the same function, the higher the possibility is.

Referring to fig. 2, the present invention further provides a data processing system based on intelligent deployment of cloud services, including:

In this embodiment, the cloud service platform mainly includes three layers, i.e., a SaaS user, a SaaS provider and a laaS provider, the lowest layer is a cloud computing resource layer operated by a physical machine, the top layer is a SaaS user layer, the SaaS provider provides an interface for a request of the SaaS user on the layer, the middle layer is that the SaaS provider obtains a configuration of cloud resources to provide corresponding services for the SaaS user, and the laaS provider is responsible for using physical resources in a virtualized management node at a resource allocation layer and scheduling and allocating the cloud resources in the physical machine by using a virtual machine on the node; the method comprises the steps of acquiring a call request of a SaaS user for submitting big data jobs to a cloud service platform, counting the calculated amount of the big data jobs according to the scheduling request, distributing a resource layer on virtual machines of all physical calculation nodes to form a virtual calculation cluster for executing the big data jobs through virtual network connection, constructing a service optimization model of the SaaS user and a resource allocation model of a laaS provider according to the virtual calculation cluster, determining a virtual cluster optimization model of the big data jobs based on the service optimization model and the resource allocation model, selecting corresponding mirror image copies on storage nodes according to the virtual cluster optimization model, loading the mirror image copies into the virtual calculation cluster constructed by the resource layer, so that the virtual calculation cluster becomes a calculation platform for executing the big data jobs, constructing a virtual big data job platform on a physical cluster, executing the big data jobs, outputting calculation results, inputting the calculation results into a trained antagonistic neural network for QoS prediction to complete data processing of intelligent deployment of cloud service, improving the overall performance of the big data jobs, saving the cost of the platform construction and maintenance for the user, and enabling the big data jobs on the cloud service platform to be better optimized and also improving service quality.

Any particular values in all examples shown and described herein are to be construed as merely illustrative and not a limitation, and thus other examples of exemplary embodiments may have different values.

It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.

The above examples merely represent a few embodiments of the present invention, which are described in more detail and are not to be construed as limiting the scope of the present invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention.

Claims

1. The data processing method based on intelligent deployment of cloud service is characterized by comprising the following steps:

Constructing a virtual big data operation platform on the physical cluster to execute big data operation, outputting a calculation result, and inputting the calculation result into a trained antagonistic neural network to predict QoS so as to complete data processing of intelligent deployment of cloud service;

Constructing a service optimization model of the SaaS user and a resource allocation model of the laaS provider according to the virtual computing cluster, wherein the method comprises the following steps:

SaaS user optimization is performed at the application layer of the SaaS provider, and the optimization utility expression obtained by the SaaS user is max sigma _i∈IV_i(x_i)s.t.∑_i∈I(S)x_i≤T_s over x_i More than or equal to 0, I e I, wherein the set of user I is I, the set of service provider S is S, T _s Representing the total service capacity of a service provider x _i Representing the service volume of user i, V _i(x_i ) Representing the utility of user I to the amount of service, I (S) representing the set of services that the service provider can provide to the user;

presetting Lagrangian function as L(x_i,λ)＝∑_i∈IV_i(x_i)-∑_s∈Sλ_s(∑_i∈I(S)x_i-T_s), Solving the Lagrangian function to obtain an optimal solution for optimizing the utility, and rewriting the Lagrangian function to L (x) _i,λ)＝∑_i∈I(V_i(x_i)-x_i∑_s∈Sλ_sT_s ) The objective function of the dual problem of the SaaS layer quality of service optimization model is D (λ) = maxL (x) _i,λ)＝∑_i∈IA(λ)+∑_s∈Sλ_sT_s;

Wherein a (λ) =max Σ _i∈I(V_i(x_i)-x_i∑_s∈Sλ_s ) The expression of the dual problem D1 is D1, minD (lambda) over lambda is more than or equal to 0, wherein lambda _s representing the price charged by the SaaS provider for providing services per unit, x _i representing the amount of service provided by the SaaS provider to the user, x _i∑_s∈Sλ_s Indicating that user i is at service volume x _i The lower acquired service charges, a (λ) represents the maximum benefit of user i at price λ, λ represents all acquired service prices of user i, and each user i can obtain a maximum value of a (λ) according to λ;

the set of SaaS providers S is S, the laaS infrastructure cloud resource provider is P, and the expression of the SaaS provider for obtaining the optimized utility of the cloud resource infrastructure is P Wherein/> representing CPU computing resources provided by laaS provider,/> Representing memory resources provided by laaS provider,/> Representing storage resources offered by laaS provider,/> Representing the maximum CPU computing resource offered by laaS provider,/> Representing the maximum memory resource offered by laaS provider,/> Representing the maximum storage resource provided by laaS provider, U _s () Representing a utility function of the SaaS provider in service requirements;

determining a virtual cluster optimization model of the big data job based on the service optimization model and the resource allocation model, comprising:

Presetting that task execution time consumption of the jth virtual machine in three sub-stages in a Map stage is respectively as follows on a specific ith computing node And/> Then there is/> Quantitative modeling is performed on task execution time consumption of three sub-phases respectively:

the time consumption of the data import stage is determined by the size S 'of the input file and the average transmission rate B' of the network, the data transmission delay between any two computing nodes is approximately equal, and the lower bound of the time consumption of the data import stage is approximately quantized into

If a communication agent is deployed on a certain computing node, all virtual machine communication on the computing node is responsible for the communication agent; if the target virtual machine is a host machine of the Mapper node, deploying a communication proxy, wherein the time consumption of data transmission in the physical node is negligible; if the host of the Mapper node does not have a communication proxy, the Mapper node may additionally undergo a transmission across physical nodes, and the time consumption in the data forwarding stage is determined by the traffic across physical nodes, specifically Wherein V is _ij The number of target virtual machines representing the jth physical node on the ith host machine, i and j respectively represent the ith host machine and the jth physical node, S '' _ij representing the size of an input file communicated by a jth virtual machine on an ith compute node,/> representing a number of physical nodes at a proxy deployment location for the jth virtual machine communication at the ith compute node;

The time consumption of the data processing stage is determined by the size of the data block processed by the Mapper node and the calculation performance of the virtual node, and the data processing efficiency of each Mapper node is determined by the working efficiency w of the VM on the idle host ₀ And the number n of VM concurrency on host _i determining, fitting the variation trend of the average performance of the VM to obtain a performance function of the jth virtual machine on a specific ith computing node, wherein the performance function is as follows

When the computing resource is idle, corresponding Map tasks are sequentially executed, the data volume processed by the Mapper node is in direct proportion to the average performance of the Mapper node, and the time consumption of the Mapper node in the data processing sub-stage is obtained by combining the performance function of the specific node According to And/> The performance expression of the (1) is obtained on a specific ith computing node, and the overall task execution time/>, in the Map stage, of the jth Mapper node K communication agents are deployed in preset computing nodes and N is shared simultaneously _a The VM and the communication agent are deployed on the same computing node, and the total task time/>, of all Mapper nodes and Map stages of the whole virtual computing cluster, is calculated

The Map stage task execution comprises three sub-stages: a data import stage, in which an input file is split into a plurality of data blocks and transmitted to each physical node deploying a communication agent; in the data forwarding stage, the data block is further forwarded to a target virtual machine, namely a Mapper node, through a communication proxy; in the data processing stage, the Mapper node processes the forwarded data block and generates an intermediate processing result;

wherein m represents the number of physical computing nodes in the cloud service platform, N represents the total number of VMs deployed on the physical computing nodes, k represents the number of deployments of communication agents in the virtual computing cluster, and N _i Representing a number of VMs deployed on an ith compute node; x's' _i The method comprises the steps of representing and judging whether a communication agent is deployed on an ith computing node, wherein the deployment value is 1, and the undeployment value is 0; w ₀ Representing the average data processing efficiency of a virtual machine on an unloaded computing node, gamma representing the decay rate of VM computing performance when working concurrently with a communication agent, w _ij representing the average data processing efficiency of the jth virtual machine on the ith compute node, Representing the maximum number of VMs that can be deployed on the ith compute node, l (i) representing the deployment location of the ith communication agent, A _ij Representing a proxy deployment location responsible for jth virtual machine communication on an ith compute node, N _a representing the total number of VMs with communication agents deployed at the same compute node, S' _ij The size of the input file communicated by the jth virtual machine on the ith computing node;

The intermediate results of the data processing output by each Mapper node in the Map stage are summarized to the Reducer nodes in the virtual cluster, the intermediate results are aggregated on each Reducer node, and then the Reducer nodes process the aggregated results to generate a final result, wherein the data transmission of the Reducer stage comprises: the Mapper node transmits the intermediate result data of the Map stage to the corresponding communication proxy; the communication agent of the Mapper node further transmits the data to the communication agent of the Reducer node; the communication agent of the Reducer node transmits data to the Reducer node to complete aggregation of intermediate results;

In the first stage of Reduce, the intermediate result of Map task output is transmitted to each communication agent by the Mapper node, and the overall communication performance of the cluster in the first stage of Reduce is determined by N _a The value of (a) is the total number N of VMs identical to the communication proxy host _a＝n_l(1)+n_l(2)+...+n_l(k) the data transmission of the first stage of the VM, which is the same as the communication proxy host, is intra-node communication, and the time consumption of the inter-node communication is not required to be experienced, and the overall data transmission time consumption of the first stage cluster is that wherein S is _r Representing the average size of the intermediate file on each virtual node at the end of the Map phase, n _r Representing the number of Reducer nodes in the Reducer stage virtual cluster;

in the second Reduce stage, data are uniformly transmitted to the communication agent of the Reducer node by the communication agent of each Mapper node, and n is randomly selected from a preset cluster _r The Reducer nodes are used for executing the Reduce task and are responsible for n _r The agent for communication of each VM becomes a destination node for transmission in the second stage, the traffic is different from other non-destination agent nodes, and the probability of containing Reducer in the VM of each agent responsible for communication is number k of destination agents among k communication agents _r The value of (1) is expected to be/> And k is equal to _r N identical to each communication proxy host _r The VMs will experience less than one data transfer across the physical nodes in the second phase, N _r The expected computing expression of the value of (a) is as follows Bond N _r The value of (1) expects to obtain the total data transmission time of the virtual cluster formed by N VMs in the second stage of Reduce as/>

In the third stage of Reduce, data is transmitted by the communication agent of the Reducer to the node of the Reducer for data summarization and processing, and if the Reducer is the same as the host of the communication agent, the Reducer will not need to undergo cross-physical node communication, and the number n of Reducer with the agent deployed on the host _a Is expected to be E (n) _a)＝N_an_r N, the rest of N _r-n_a The third stage of the reduction is subjected to data communication across physical nodes, and the overall data transmission time of the third stage of the reduction is that The first stage, the second stage and the third stage are combined to obtain the overall time overhead of the virtual cluster reduce task execution stage as/> Wherein/> Representing a time consuming execution of a Reduce task at a proxy deployment location for a jth virtual machine communication on an ith compute node;

The optimal construction of the virtual cluster is realized by determining the optimal number of the communication agents, the optimal deployment position of each communication agent and the optimal mapping relation between the communication agents and the virtual machine;

Selecting a corresponding mirror image copy on a storage node according to a virtual cluster optimization model, and loading the mirror image copy into a virtual computing cluster constructed by a resource layer, so that the virtual computing cluster becomes a computing platform for executing big data operation, wherein the method comprises the following steps:

loading the application mirror image into a virtual cluster, integrating computing resources and upper-layer applications, and forming a platform for executing specific applications, wherein the virtual platform can execute the applications requested by a user and produce execution results;

the training process to generate the countermeasure network includes:

In a cloud service invocation scenario, there is a user sequence u= { U ₁,u₂...u_m Sequence of cloud services S ²＝{s₁,s₂...s_n When the user calls the service to generate a plurality of QoS matrixes, if the QoS is the response time, the matrix R _m×nMiddle r _i,j representing response time generated when the user i calls the service j, wherein m represents the maximum number of users, and n represents the maximum cloud service number;

the generator trains the weights of the generator to each hidden neuron by using the time sequence within the time of 0-t, converts the call characteristic information of the next time into noise data every time, and generates a QoS predicted value for user service of the next time, wherein the generator inputs the time sequence, outputs the QoS predicted value of the future time, and the expression from the input noise to the output QoS of the generator is that Wherein/> Representing the predicted value of user u with respect to QoS at time t+1, t _u,t+1 Representing t real call sequences generated by u users on s server calls, x _u,t+1 Representing feature information called in the t+1th calling sequence, and G () represents a calling sequence training generator function within practical 0-t time, wherein the function comprises weights for all hidden neurons;

The loss function of the generator is the error of the QoS predicted value and the QoS true value, and the loss function calculation expression of the generator G is Wherein y is _t Representing the true QoS value at time t,/> a predicted QoS value representing time t;

Mapping the dimension of the real data into the same dimension as the input layer of the GRU network by using the full connection layer, learning the distribution characteristics of each characteristic in the real data and the fitting process of QoS, constructing a function from a historical variable to the current value of a certain dimension variable, wherein the expression of the fitting process is that Wherein/> Represents the predicted value of QoS at t time, θ _n Weight vector representing 1*n dimensions at time t,/> Feature vector, e, representing user and service invocation record of current time n1 dimension _t Representing the error at the current time;

The countermeasure network further includes a discriminator model for discriminating true and false from the true data and the predicted data set generated by the generator, giving a probability value between 0 and 1 to the record in the input D, the loss function expression of D being Wherein, the input time sequence of the discriminator is mapped into a probability value between 0 and 1 through the fully connected neural network, the larger the probability value is, the higher the probability value is, otherwise, the probability that the input is a predicted value is high, and D () represents the operation of the discriminator for giving a probability value between 0 and 1 to the record in the input D.

2. A cloud service intelligent deployment-based data processing system according to the cloud service intelligent deployment-based data processing method of claim 1, comprising: