CN116050235A - Workflow data layout method under cloud side environment and storage medium - Google Patents

Workflow data layout method under cloud side environment and storage medium Download PDF

Info

Publication number
CN116050235A
CN116050235A CN202310176231.7A CN202310176231A CN116050235A CN 116050235 A CN116050235 A CN 116050235A CN 202310176231 A CN202310176231 A CN 202310176231A CN 116050235 A CN116050235 A CN 116050235A
Authority
CN
China
Prior art keywords
data
copy
particles
data center
task
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310176231.7A
Other languages
Chinese (zh)
Inventor
张舜民
林兵
郑裕恒
吴克涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujian Zhenshi Information Technology Co ltd
Fujian Normal University
Original Assignee
Fujian Zhenshi Information Technology Co ltd
Fujian Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujian Zhenshi Information Technology Co ltd, Fujian Normal University filed Critical Fujian Zhenshi Information Technology Co ltd
Priority to CN202310176231.7A priority Critical patent/CN116050235A/en
Publication of CN116050235A publication Critical patent/CN116050235A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/25Design optimisation, verification or simulation using particle-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/12Computing arrangements based on biological models using genetic models
    • G06N3/126Evolutionary algorithms, e.g. genetic algorithms or genetic programming
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Biology (AREA)
  • Geometry (AREA)
  • Computer Hardware Design (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Mathematical Physics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physiology (AREA)
  • Genetics & Genomics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a method and a storage medium for workflow data layout in a cloud edge environment, which are used for carrying out mathematical representation on the cloud edge environment, generating cost and data transmission cost based on copies, and modeling a data layout problem as a 0-1 integer programming problem with the aim of minimizing total time delay to obtain a mathematical problem model; adopting a nonlinear inertial weight discrete particle swarm optimization algorithm based on a genetic algorithm operator, introducing a crossover operator and a mutation operator of the genetic algorithm into the particle swarm algorithm, and adaptively adjusting the inertial weight according to the difference between particles and global particles so as to solve the mathematical problem model; carrying out workflow data layout according to the solving result; time delay can be effectively reduced; and the crossover and mutation operators of the genetic algorithm are introduced into the particle swarm algorithm, so that the searching capability of the particle swarm algorithm is enhanced, premature convergence is avoided, and the inertia weight is adaptively adjusted according to the difference between the current particle and the global particle, so that the optimizing process is more efficient.

Description

Workflow data layout method under cloud side environment and storage medium
Technical Field
The invention relates to the technical field of workflow data layout, in particular to a method and a storage medium for workflow data layout in a cloud edge environment.
Background
Workflow models are an effective method for describing business processes, and consist of a plurality of interrelated tasks, and workflow is commonly used in astronomy, physics, bioinformatics and other scientific fields. As a data intensive application, deployment of scientific workflows places stringent demands on the computing power and storage capacity of the environment.
Cloud computing has strong storage and computing capabilities, provides personalized services for users, and ensures resource supply of scientific workflow. However, the operation of the scientific workflow is accompanied by large-scale data transmission, and the cloud computing deployed at the far end can cause serious data transmission delay. The edge calculation moves the calculation to the edge of the network edge close to the position of the user, so that the transmission delay of data can be reduced, and the privacy data of the user can be stored. But the edge computing resources are limited and cannot store all the data needed and generated when the scientific workflow is executed. Cloud computing and edge computing are combined, and a safe and efficient mode can be provided for deployment of scientific workflow.
Due to the existence of private data, a large amount of data transmission can be performed during the execution of the scientific workflow, which causes serious time delay. With the reduction of the storage cost, the data copy is frequently used in cloud computing and edge computing, and the data transmission times can be reduced by accessing the copy nearby. However, the layout of the data copy in the cloud environment has many challenges, and in particular, the generation, transmission and storage of the copy are accompanied by overhead, so that a proper amount of copy needs to be generated by selecting proper data, and the position of the copy layout is difficult to select.
Therefore, how to layout the data copies to reduce latency is important.
Disclosure of Invention
The technical problems to be solved by the invention are as follows: the method and the storage medium for workflow data layout in cloud environment can effectively reduce time delay.
In order to solve the technical problems, the invention adopts the following technical scheme:
a method for workflow data layout in cloud-edge environment comprises the following steps:
s1, carrying out mathematical representation on a cloud edge environment, and modeling a data layout problem as a 0-1 integer programming problem based on copy generation cost and data transmission cost with the aim of minimizing total time delay to obtain a mathematical problem model;
s2, adopting a nonlinear inertial weight discrete particle swarm optimization algorithm based on a genetic algorithm operator, introducing a crossover operator and a mutation operator of the genetic algorithm into the particle swarm algorithm, and adaptively adjusting the inertial weight according to the difference between particles and global particles so as to solve the mathematical problem model;
and S3, carrying out workflow data layout according to the solving result.
In order to solve the technical problems, the invention adopts another technical scheme that:
a storage medium having stored thereon a computer program which when executed performs the steps of a method of workflow data layout in a cloud-edge environment as described above.
The invention has the beneficial effects that: according to the method and the storage medium for workflow data layout in the cloud edge environment, the data copy layout is modeled into 0-1 integer programming problem with the aim of minimizing total time delay, and a nonlinear inertial weight discrete particle swarm optimization algorithm based on a genetic algorithm operator is adopted to solve the data layout problem and effectively reduce the time delay; and the crossover and mutation operators of the genetic algorithm are introduced into the particle swarm algorithm, so that the searching capability of the particle swarm algorithm is enhanced, premature convergence is avoided, and the inertia weight is adaptively adjusted according to the difference between the current particle and the global particle, so that the optimizing process is more efficient.
Drawings
Fig. 1 is a schematic diagram of a scientific workflow example of a method for workflow data layout in a cloud-edge environment according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of an example of a workflow data layout in a cloud-edge environment;
FIG. 3 is a schematic diagram of an example one-dimensional encoding of a data layout of a method for workflow data layout in a cloud-edge environment according to an embodiment of the present invention;
fig. 4 is a schematic diagram of two-dimensional encoding example of a data layout of a method for workflow data layout in a cloud-edge environment according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a mutation operator example of a method for workflow data layout in a cloud-edge environment according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of an example of a cross operator of a method for workflow data layout in a cloud-edge environment according to an embodiment of the present invention;
fig. 7 is a flowchart of a method for workflow data layout in a cloud-edge environment according to an embodiment of the present invention.
Detailed Description
In order to describe the technical contents, the achieved objects and effects of the present invention in detail, the following description will be made with reference to the embodiments in conjunction with the accompanying drawings.
Referring to fig. 1, 2 and 4 to 7, a method for workflow data layout in a cloud environment includes the steps of:
s1, carrying out mathematical representation on a cloud edge environment, and modeling a data layout problem as a 0-1 integer programming problem based on copy generation cost and data transmission cost with the aim of minimizing total time delay to obtain a mathematical problem model;
s2, adopting a nonlinear inertial weight discrete particle swarm optimization algorithm based on a genetic algorithm operator, introducing a crossover operator and a mutation operator of the genetic algorithm into the particle swarm algorithm, and adaptively adjusting the inertial weight according to the difference between particles and global particles so as to solve the mathematical problem model;
And S3, carrying out workflow data layout according to the solving result.
From the above description, the beneficial effects of the invention are as follows: according to the method and the storage medium for workflow data layout in the cloud edge environment, the data copy layout is modeled into 0-1 integer programming problem with the aim of minimizing total time delay, and a nonlinear inertial weight discrete particle swarm optimization algorithm based on a genetic algorithm operator is adopted to solve the data layout problem and effectively reduce the time delay; and the crossover and mutation operators of the genetic algorithm are introduced into the particle swarm algorithm, so that the searching capability of the particle swarm algorithm is enhanced, premature convergence is avoided, and the inertia weight is adaptively adjusted according to the difference between the current particle and the global particle, so that the optimizing process is more efficient.
Further, in step S1, the mathematical representation of the cloud-edge environment is specifically:
the cloud-edge environment is expressed as:
S={S cld ,S edg };
wherein, cloud computing S cld Comprising j data centers, denoted as:
S cld ={s 1 ,s 2 ,…,s j };
edge computation S edg Comprising k data centers, denoted as:
S edg ={s j+1 ,s j+2 ,…,s j+k };
each data center s i Expressed as:
s i =<c i ,γ i ,a i >;
wherein ,ci Representing its storage capacity, gamma i Representing data center type, gamma i ∈{0,1},γ i =0 represents that the data center is a cloud data center, and only public data, gamma, can be stored i The data center is denoted by 1 and can store public data and private data with fixed storage positions, a i Representing the speed at which the data center replicates data;
the network bandwidth between data centers is expressed as:
Figure BDA0004101636750000041
wherein ,bij Representing a data center s i And data center s j Is a bandwidth of (a);
the scientific workflow is expressed as:
G=(V,E,D);
wherein V represents a set of tasks in a scientific workflow:
V={v 1 ,v 2 ,…,v w };
e represents a set of task dependencies in a scientific workflow:
Figure BDA0004101636750000042
d represents a set of data replicas:
D={d 1 ,d 2 ,…,d m };
each task v i The relevant dataset is represented as<D i ,D o >,D i Representing its input dataset, D o Representing an output dataset thereof, the input dataset and the output dataset each consisting of one or more data, the inter-task dependencies e ij E, represent task v j Is task v i Is required at task v i Can be executed after completion, otherwise task v j For task v i Without dependence, each data copy set d i Comprising several copies of the ith data, d ij Representing it as the j-th copy of the i-th data, d i1 And (3) rendering the ith original data, each data copy containing attributes<z i1 ,n i1 ,f i1 ,l i1 >,z i1 Representing data size, n i1 Representing the number of copies of the data, is an integer greater than 0If n i1 =1, then indicates that the data has no other copies, f i1 Representing the generated data d i1 If the data is the initial data, f i1 Is marked as 0,l i1 Record data d ij If data d ij Is privacy data, then l i1 Recording the data center to which the data center belongs, and if the data center is public data, l i1 Is 0.
From the above description, the cloud edge environment is mathematically represented through the above steps.
Further, in step S3, modeling of the digital problem model is specifically:
data d i1 In data center s k Copy overhead t of (2) copy The method comprises the following steps:
Figure BDA0004101636750000051
/>
wherein ,zi1 Is data d ij Size of a), a k Is a data center s k The speed at which the data is copied;
data d ij From data centre s k Transmitted to data center s l Is the transmission overhead t of (2) tran The method comprises the following steps:
Figure BDA0004101636750000052
wherein bkl Is a data center s k And data center s l If the copy is copied and laid out to the current data center, no transmission overhead exists;
the data layout is expressed as { S, D, Y, T } total S is a data center set, D is a data set, Y is a layout position set of data, and all data D ij E D, all correspond to unique data centers:
Figure BDA0004101636750000053
T total for the total time delay corresponding to the data layout scheme, the data copying time T copy And data transmission time T tran And (2) sum:
T total =T copy +T tran
data replication time T copy Expressed as:
Figure BDA0004101636750000054
wherein ,
Figure BDA0004101636750000055
representing data d i1 Layout position n of (2) i1 For data d i The number of copies;
data transmission time T tran Expressed as:
Figure BDA0004101636750000056
where h (i, j, k, l) ∈ {0,1}, h (i, j, k, l) =1 represents the l-th copy d of data k kl The slave data center s exists i To data centre s j Otherwise h (i, j, k, l) =0;
the targets of the data layout strategy are expressed as:
Figure BDA0004101636750000061
where β (i, j, k) ∈ {0,1}, β (i, j, k) =1 represents that the kth copy of data j is stored on data center i.
From the above description, it can be seen that, through the above steps, a mathematical problem model of a data layout strategy is obtained that aims to minimize the total delay.
Further, the step S3 includes the steps of:
encoding the data layout strategy by adopting a two-dimensional array to construct candidate particles:
data layout scheme of particle i at t-th iteration
Figure BDA0004101636750000062
The following are provided:
Figure BDA0004101636750000063
each bit
Figure BDA0004101636750000064
Representing the copy set storage location of data j for the ith particle in the t-th iteration: />
Figure BDA0004101636750000065
wherein ,qk ∈{0,1},q k =1 indicates that a copy of data j is laid out on data center k, otherwise indicates that a copy of data j is not laid out on data center k, x tij Middle q k The number of=1 represents the number of copies of data j.
From the above description, two problems should be considered with the use of data replicas: (1) How to represent different copies of the data, (2) how to represent the storage locations of the copies of the data; the above steps solve both problems, giving attention to completeness and non-redundancy.
Further, solving the mathematical problem model according to the nonlinear inertial weight discrete particle swarm optimization algorithm based on the genetic algorithm operator comprises the following steps:
analyzing the scientific workflow, and performing topological ordering on the tasks to obtain a task queue capable of being sequentially executed;
initializing the maximum capacity of a data center, generating an initialization population according to a privacy data set, wherein privacy data in the initialization population can be laid out on the corresponding data center, and public data is randomly laid out without generating other copies;
simulating a data layout process, judging whether particles are feasible solutions, if so, calculating total time delay, and if not, recording an unreliable data set;
setting all individuals in the initial population as the optimal individual history, setting the optimal population history as the particles with the best fitness in the initial population, and calculating the fitness of the particles;
iterating the population, mutating the population according to the inertia weight factor w, and accelerating the population according to the acceleration factor alpha 1 Crossing the population with the optimal population of the individual history according to the acceleration factor alpha 2 Crossing the population with the population history optimum, calculating the adaptability of the new population, and updating the global information;
and outputting the total time delay with optimal population history when the iteration is finished.
From the above description, it can be seen that the data copy layout strategy based on NPSO-GA is realized according to the above steps.
Further, wherein the calculating of the fitness includes:
based on the comparison of fitness values F for both types of particles, a fitness function is established:
both particles compared are feasible solutions, and the particle fitness with lower total delay is better, and the fitness function is defined as follows:
F=T total
both particles compared are not feasible solutions, then the data set D is not resolvable inf The smaller length particles have better fitness, which means that more data is laid out in feasible locations, and become feasible solution particles in subsequent iterations more easily, and the fitness function is as follows:
F=|D inf |;
if the feasible solution particles and the infeasible solution particles are compared, selecting a feasible solution, wherein the fitness function is as follows:
Figure BDA0004101636750000071
from the above description, it can be seen that, since the encoding of the data layout strategy of the present invention is not robust, infeasible solution particles are generated, and thus different adaptations need to be defined according to different situations.
Further, the data layout process includes the steps of:
initializing a task position list for recording the execution positions of all tasks and an overrun mark for recording whether a data center exceeds the capacity limit of the data center in the task execution process;
Calculating the capacity condition of the data center after the initial data set is subjected to data layout, traversing the task queue, calculating the execution position of the task and recording the execution position of the task into a task position list;
when a task generates an output data set, temporarily storing the input data set and the output data set of the task on a data center, judging whether the data center exceeds capacity limit at the moment, then distributing the output data of the task on the data center designated by the task, and updating the capacity of the data center;
if the data center exceeds the capacity limit in the process of executing the task, recording the data distributed on the data center exceeding the capacity limit in the insoluble data set D inf And if not, calculating and recording the total time delay.
As is apparent from the above description, the data layout process is realized through the above steps.
Further, in the nonlinear inertial weight discrete particle swarm optimization algorithm based on the genetic algorithm operator in the step S3, introducing the crossover and mutation operator of the genetic algorithm into the particle swarm algorithm comprises the steps of:
iterating the velocity and position of the particles:
Figure BDA0004101636750000081
Figure BDA0004101636750000082
the ith update policy for the ith particle is:
Figure BDA0004101636750000083
wherein ,Cg and Cp Is a crossover operator, M u Is a mutation operator, which is used for the mutation of the original data,
Figure BDA0004101636750000084
is the individual history of particle i at the t-th iteration is optimal, g t Is the optimal population history at the t iteration, alpha 1 、α 2 And w is between 0 and 1, representing an acceleration factor and an inertial weight factor;
replacing an inertia part in the particle swarm algorithm by adopting a mutation operator of the genetic algorithm:
Figure BDA0004101636750000085
generating a random number r between 0 and 1 w If it is smaller than the inertial weight factor w, the particles undergo mutation:
acquisition of an insoluble data set D of particles X i inf From the insoluble dataset D inf And a privacy dataset D fix Obtaining a variation position:
if D inf If there is no data, choose not to be at D fix Bit of a data correspondence of D inf If there is data in the list, select D inf The common data of (a) is divided into bits;
counting the copy number of the data corresponding to the position to be mutated of the statistical particles X i, if
X i [muIndex][j]=1;
Then it indicates that the data corresponding to the position to be mutated of particle X i has a copy on data center j;
updating the copy number copy count, increasing or decreasing the copy number copy count according to probability based on the original number, and ensuring that at least one copy exists and the copy number copy count does not exceed the number of the data center;
is particle X i Generating a data copy layout scheme with copy number of copy count at the position to be mutated;
the individual cognition and social cognition parts in the particle swarm algorithm are replaced by adopting a crossover operator of the genetic algorithm:
Figure BDA0004101636750000091
Figure BDA0004101636750000092
wherein ,
Figure BDA0004101636750000093
representing optimal crossing of particles and individual history, +.>
Figure BDA0004101636750000094
Representing the optimal intersection of particles and population history.
From the above description, by introducing the crossover and mutation operators of the genetic algorithm into the particle swarm algorithm, the searching capability of the particle swarm algorithm is enhanced, and premature convergence is avoided.
Further, in the nonlinear inertial weight discrete particle swarm optimization algorithm based on the genetic algorithm operator in the step S3, the step of adaptively adjusting the inertial weight according to the difference between the particles and the global particles includes the steps of:
the strategy of nonlinear adjustment of the inertia weight is adopted, and the inertia weight is adjusted based on the difference degree of the current particle and the global particle:
Figure BDA0004101636750000095
Figure BDA0004101636750000096
Figure BDA0004101636750000097
representing the difference between the particles and the population optimal particles;
adjusting acceleration factor alpha using a linear variation strategy 1 and α2
Figure BDA0004101636750000101
Figure BDA0004101636750000102
From the above description, it can be seen that through the above steps, the inertia weight is adaptively adjusted according to the difference between the current particle and the global particle, so that the optimizing process is more efficient.
A storage medium having stored thereon a computer program which when executed performs the steps of a method of workflow data layout in a cloud-edge environment as described above.
The method and the storage medium for arranging the workflow data in the cloud environment are suitable for arranging the workflow data in the cloud environment.
Referring to fig. 1 to 7, a first embodiment of the present invention is as follows:
a method for workflow data layout in cloud-edge environment comprises the following steps:
s1, carrying out mathematical representation on a cloud edge environment, and modeling a data layout problem as a 0-1 integer programming problem based on copy generation cost and data transmission cost with the aim of minimizing total time delay to obtain a mathematical problem model;
in step S1, the mathematical representation of the cloud edge environment is specifically:
the cloud-edge environment is expressed as:
S={S cld ,S edg };
wherein, cloud computing S cld Comprising j data centers, denoted as:
S cld ={s 1 ,s 2 ,…,s j };
edge computation S edg Comprising k data centers, denoted as:
S edg ={s j+1 ,s j+2 ,…,s j+k };
each data center s i Expressed as:
s i =<c i ,γ i ,a i >;
wherein ,ci Representing its storage capacity, gamma i Representing data center type, gamma i ∈{0,1},γ i =0 represents that the data center is a cloud data center, and only public data, gamma, can be stored i The data center is denoted by 1 and can store public data and private data with fixed storage positions, a i Representing the speed at which the data center replicates data;
the network bandwidth between data centers is expressed as:
Figure BDA0004101636750000111
wherein ,bij Representing a data center s i And data center s j Is a bandwidth of (a);
the scientific workflow is expressed as:
G=(V,E,D);
wherein V represents a set of tasks in a scientific workflow:
V={v 1 ,v 2 ,…,v w };
e represents a set of task dependencies in a scientific workflow:
Figure BDA0004101636750000112
d represents a set of data replicas:
D={d 1 ,d 2 ,…,d m };
tasksIs a unit that can perform calculations in a data center, using data sets as inputs to perform tasks and generate new data sets in a certain order of execution. Each task v i The relevant dataset is represented as<D i ,D o >,D i Representing its input dataset, D o Representing an output dataset thereof, the input dataset and the output dataset each consisting of one or more data, the inter-task dependencies e ij E, represent task v j Is task v i Is required at task v i Can be executed after completion, otherwise task v j For task v i Without dependence, each data copy set d i Comprising several copies of the ith data, d ij Representing it as the j-th copy of the i-th data, d i1 And (3) rendering the ith original data, each data copy containing attributes<z i1 ,n i1 ,f i1 ,l i1 >,z i1 Representing data size, n i1 Representing the number of copies of the data, which is an integer greater than 0, if n i1 =1, then indicates that the data has no other copies, f i1 Representing the generated data d i1 If the data is the initial data, f i1 Is marked as 0,l i1 Record data d ij If data d ij Is privacy data, then l i1 Recording the data center to which the data center belongs, and if the data center is public data, l i1 Is 0.
Different copies can be laid out on different data centers so as to shorten the data transmission delay, and if the data is private data, the copies can not be generated. The use of data copies creates additional overhead, including data copy overhead t copy And data copy transmission overhead t tran And at the same time, the storage resources of the data center are occupied. Data d i1 In data center s k Copy overhead t of (2) copy The method comprises the following steps:
Figure BDA0004101636750000121
wherein ,zi1 Is a number ofAccording to d ij Size of a), a k Is a data center s k The speed at which the data is copied;
data d ij From data centre s k Transmitted to data center s l Is the transmission overhead t of (2) tran The method comprises the following steps:
Figure BDA0004101636750000122
wherein bkl Is a data center s k And data center s l And if the copy is copied and laid out to the current data center, no transmission overhead exists.
In this embodiment, copying all common data generates a large amount of overhead, so the number of copies of each data in the present invention is dynamic, and the number of copies of the data is affected by the number of times the data is input as a task. FIG. 1 illustrates a data replication model of the present invention that selectively replicates data in exchange for overhead of generating a replica for transmission overhead, thereby reducing overall latency.
The data layout is expressed as { S, D, Y, T } total S is a data center set, D is a data set, Y is a layout position set of data, and all data D ij E D, all correspond to unique data centers:
Figure BDA0004101636750000123
before a task in a scientific workflow is executed, all input data required for the task should be transmitted to a data center for executing the task. Because the data volume in the scientific workflow is huge, the task scheduling time is far less than the data transmission time, so the task scheduling time is ignored. T (T) total For the total time delay corresponding to the data layout scheme, the data copying time T copy And data transmission time T tran And (2) sum:
T total =T copy +T tran
data replication time T copy Representation ofThe method comprises the following steps:
Figure BDA0004101636750000124
wherein ,
Figure BDA0004101636750000125
representing data d i1 Layout position n of (2) i1 For data d i The number of copies;
data transmission time T tran Expressed as:
Figure BDA0004101636750000131
where h (i, j, k, l) ∈ {0,1}, h (i, j, k, l) =1 represents the l-th copy d of data k kl The slave data center s exists i To data centre s j Otherwise h (i, j, k, l) =0;
the targets of the data layout strategy are expressed as:
Figure BDA0004101636750000132
where β (i, j, k) ∈ {0,1}, β (i, j, k) =1 represents that the kth copy of data j is stored on data center i.
In this embodiment, the data layout of the scientific workflow shown in fig. 2 is the scientific workflow from fig. 1. The scientific workflow contains a task set v= { V 1 ,v 2 ,v 3 ,v 4 ,v 5 ,v 6 ,v 7 Sum dataset d= { D 1 ,d 2 ,d 3 ,d 4 ,d 5 ,d 6 ,d 7 The data size is {6GB,10GB,4GB,3GB, 5GB,11GB }, wherein the data set is divided into a common data set D flex ={d 2 ,d 6 ,d 7 Sum privacy dataset D fix ={d 1 ,d 3 ,d 4 ,d 5 }. The data center comprises two data units with a capacity of 25GBAn edge data center and a cloud data center with unlimited storage space. Set the bandwidth { b between data centers 12 ,b 13 ,b 23 The data replication speed of the data center was set to 800M/s for {10M/s,20M/s,100M/s } respectively. Privacy data d 1 ,d 3 Laid out on the edge data center 2, the private data d 4 ,d 5 Laid out on the edge data center 3. Since tasks all involve private data as input or output, task v 1 ,v 2 ,v 3 ,v 6 Executing on edge data center 2, task v 4 ,v 5 ,v 7 Is performed on the edge data center 3.
Wherein FIG. 2a and FIG. 2b are two layout schemes, respectively, that do not use a copy of the data, the difference being that the a scheme uses data d 2 Is laid out in the data center 2, while scheme b will be data d 2 Is laid out in the data center 3. Both schemes generate data d 2 Is transmitted across data centers twice and data d 7 Is transmitted across the data center, causing a delay of about 6144 s. FIG. 2c is a diagram illustrating a data layout scheme using dynamic copy number according to the present invention, shown in v 2 Generating data d 2 When the data is copied once, one copy is transmitted to the data center 3, and only the data d is needed 2 Performing one copy and one transmission across data centers and data d 7 Making one transmission across the data center can cause a delay of about 5427 s. In addition, if all the common data are duplicated, unnecessary time overhead is caused, and even the limit of the capacity of the edge data center is exceeded. The invention replaces transmission cost with cost of generating copy on the premise of capacity permission, and reduces total time delay by reasonably using data copy.
S2, adopting a nonlinear inertial weight discrete particle swarm optimization algorithm based on a genetic algorithm operator, introducing a crossover operator and a mutation operator of the genetic algorithm into the particle swarm algorithm, and adaptively adjusting the inertial weight according to the difference between particles and global particles so as to solve the mathematical problem model.
The overall goal of the data placement strategy is to achieve a mapping of the data set D to the data center S such that the overall latency is minimized, as allowed by the data center capacity. In this embodiment, a data copy layout strategy of a nonlinear inertial weight discrete particle swarm optimization algorithm (Nonlinear inertial weight discrete Particle Swarm Optimization algorithm based on Genetic Algorithm's operators, NPSO-GA) based on genetic algorithm operators is provided, which considers the cost of generating data copies, selectively copies data according to the task requirement, and determines the layout position of the data.
Problem coding:
two problems should be considered with using a copy of the data: (1) How to represent different copies of data, (2) how to represent the storage locations of the copies of data, and problem encoding requires that completeness, non-redundancy and robustness be considered as much as possible.
FIG. 3 is a diagram of a conventional static copy number encoding method, in which the same number of copies (the number of copies is 2 in the diagram) is generated for public data, and a one-dimensional array is used to represent a data layout scheme of a scientific workflow in a cloud environment, wherein each bit represents a layout position of one data copy. The coding scheme has completeness, each candidate solution of the problem space can be coded as a particle, but has no non-redundancy and robustness, such as particle X 1 = (2,2,3,2,3,3,2,3,1,1) and particle X 2 The same solution to the corresponding problem space, = (2,3,2,2,3,3,2,3,1,1), is d for data 2 One copy is made and the two copies are laid out on the data center 2 and the data center 3, respectively. In addition, the number of the copies needs to be determined in advance in the coding mode, and the number of the data copies cannot be adjusted according to the frequency of data use.
In this embodiment, a new coding scheme is proposed, and a two-dimensional array is used to construct candidate solution particles.
The step S3 includes the steps of:
encoding the data layout strategy by adopting a two-dimensional array to construct candidate particles:
data layout scheme of particle i at t-th iteration
Figure BDA0004101636750000141
The following are provided: />
Figure BDA0004101636750000142
Each bit
Figure BDA0004101636750000151
Representing the copy set storage location of data j for the ith particle in the t-th iteration:
Figure BDA0004101636750000152
wherein ,qk ∈{0,1},q k =1 indicates that a copy of data j is laid out on data center k, otherwise indicates that a copy of data j is not laid out on data center k, x tij Middle q k The number of=1 represents the number of copies of data j.
Such a coding scheme has completeness and non-redundancy and may vary the number and location of copies with the iteration of the particle, the data layout of fig. 2c corresponds to a coding scheme as in fig. 4 (assuming a number of data centers of 3).
Fitness function:
the invention aims to reduce the total time delay of the data layout of the scientific workflow, and the lower the total time delay is, the higher the particle quality is. However, the codes of the present invention are not robust and produce infeasible solution particles. There are two reasons for the impossibility of solving the problem, namely privacy disclosure and lack of satisfaction of capacity constraint, which are different fitness needs to be defined according to different situations. Wherein privacy disclosure indicates that at least one private data is copied or distributed to a non-corresponding data center, and that the capacity constraint is not satisfied indicates that at least one edge data center stores data exceeding the capacity constraint, and the illegal data set D is used inf To describe the data set that caused the particle to become an infeasible solution. The comparison of the fitness value F of two types of particles for a viable solution and an infeasible solution is divided into three cases.
Based on the comparison of fitness values F for both types of particles, a fitness function is established:
both particles compared are feasible solutions, and the particle fitness with lower total delay is better, and the fitness function is defined as follows:
F=T total
both particles compared are not feasible solutions, then the data set D is not resolvable inf The smaller length particles have better fitness, which means that more data is laid out in feasible locations, and become feasible solution particles in subsequent iterations more easily, and the fitness function is as follows:
F=|D inf |;
if the feasible solution particles and the infeasible solution particles are compared, selecting a feasible solution, wherein the fitness function is as follows:
Figure BDA0004101636750000161
particle update policy:
PSO (particle swarm optimization) uses particles to represent each solution in the search space, the velocity of the particles determining the direction and distance they fly, and the optimal solution is obtained by iterating the velocity and position of the particles continuously:
iterating the velocity and position of the particles:
Figure BDA0004101636750000162
Figure BDA0004101636750000163
in this example, the NPSO-GA used is an improvement to the PSO algorithm. The t-th update strategy for the ith particle in NPSO-GA is as follows:
The ith update policy for the ith particle is:
Figure BDA0004101636750000164
wherein ,Cg and Cp Is a crossover operator, M u Is a mutation operator, which is used for the mutation of the original data,
Figure BDA0004101636750000165
is the individual history of particle i at the t-th iteration is optimal, g t Is the optimal population history at the t iteration, alpha 1 、α 2 And w is between 0 and 1, representing an acceleration factor and an inertial weight factor;
replacing an inertia part in the particle swarm algorithm by adopting a mutation operator of the genetic algorithm:
Figure BDA0004101636750000166
generating a random number r between 0 and 1 w If it is smaller than the inertia weight factor w, the particles undergo a mutation process M u As shown in algorithm 1:
Figure BDA0004101636750000167
Figure BDA0004101636750000171
in algorithm 1, first an insoluble dataset D of particles X i is acquired inf From the insoluble dataset D inf And a privacy dataset D fix Obtaining a variation position:
if D inf If there is no data, choose not to be at D fix Bit of a data correspondence of D inf If there is data in the list, select D inf The common data of (a) is divided into bits;
counting the copy number of the data corresponding to the position to be mutated of the statistical particles X i, if
X i [muIndex][j]=1;
Then it indicates that the data corresponding to the position to be mutated of particle X i has a copy on data center j;
updating the copy number copy count, increasing or decreasing the copy number copy count according to probability based on the original number, and ensuring that at least one copy exists and the copy number copy count does not exceed the number of the data center;
Is particle X i A data copy layout scheme with copy number of copy count is generated at the position to be mutated.
The overall mutation process not only results in a change in the layout position of the data, but also changes the number of copies, and fig. 5 is an example of a mutation process.
The individual cognition and social cognition parts in the particle swarm algorithm are replaced by adopting a crossover operator of the genetic algorithm:
Figure BDA0004101636750000172
Figure BDA0004101636750000181
wherein ,
Figure BDA0004101636750000182
representing optimal crossing of particles and individual history, +.>
Figure BDA0004101636750000183
Representing the optimal intersection of particles and population history.
The process of cross operation of particles and individual history optimization (population history optimization): after the mutation operation, a random number r between 0 and 1 is generated 1 (r 2 ) If it is less than or equal to the acceleration factor alpha 12 ) Randomly selecting two bits of the particle, wherein a segment between the two bits is used as a crossing interval, and the segment in the crossing interval is replaced by a corresponding segment of p (or g), as shown in FIG. 6, which is a crossingExamples of fork procedures.
Parameter updating:
the larger inertia weight factor is beneficial to global searching, and local extremum is jumped out; and a smaller w is favorable for local search, so that the algorithm can be quickly converged to the optimal solution. In order to achieve the balance between the search speed and the search precision, the invention uses a strategy of nonlinear adjustment of the inertia weight w:
The strategy of nonlinear adjustment of the inertia weight is adopted, and the inertia weight is adjusted based on the difference degree of the current particle and the global particle:
Figure BDA0004101636750000184
Figure BDA0004101636750000185
Figure BDA0004101636750000186
representing the difference between the particles and the population optimal particles. When the value is larger, the difference between the current particle and the population optimum is larger, the inertia weight should be increased to perform global search, otherwise, the inertia weight should be reduced to perform local search, so that the algorithm can be quickly converged to the optimum solution.
Adjusting acceleration factor alpha using a linear variation strategy 1 and α2
Figure BDA0004101636750000187
/>
Figure BDA0004101636750000188
As the number of iterations increases, α 1 Continuously decrease and alpha 2 The acceleration factor alpha is increased continuously, so that a larger acceleration factor alpha is obtained at the initial stage of iteration 1 And a smaller acceleration factor alpha 2 Searching a local optimal value in a smaller range, so that the particle searching is finer; obtaining smaller acceleration factor alpha in the later period of iteration 1 And a larger acceleration factor alpha 2 The global cooperation capability among particles is improved, and the particles can jump out of local optimum conveniently.
Data copy layout strategy overview:
algorithm 2 introduces the overall flow of the data replica layout strategy, which is based on the flow of the traditional PSO algorithm:
Figure BDA0004101636750000191
/>
in the algorithm 2, firstly, system initialization (1 st to 5 th lines), analysis of scientific workflow, and topology sequencing of tasks are carried out to obtain a task queue (1 st line) which can be executed sequentially;
Initializing the maximum capacity of a data center (line 2), generating an initialization population according to a privacy data set, wherein privacy data in the initialization population can be laid out on the corresponding data center, and public data is randomly laid out without generating other copies (line 3);
in this embodiment, the data layout process is simulated through the DataPlacement () function, whether the particles are feasible solutions is determined, if so, the total delay is calculated, and if not, an insoluble data set is recorded (line 4);
setting all individuals in the initial population as the optimal individual history, setting the optimal population history as the particles (line 5) with the best fitness in the initial population, and calculating the fitness of the particles;
iterative population (lines 6-12), variation of population according to inertial weight factor w (line 8), acceleration factor alpha 1 Crossing the population with the optimal population of the individual history according to the acceleration factor alpha 2 Crossing the population with the population history optimum (line 9), calculating the fitness of the new population, and updating global information (lines 10-11);
and outputting the total time delay with optimal population history at the end of iteration (line 12).
The data layout process comprises the following steps:
algorithm 3 gives the data layout process of the encoded particles and records the fitness of the particles.
Figure BDA0004101636750000201
/>
Figure BDA0004101636750000211
In this embodiment, the data layout process function dataPlaclement () returns fitness information of the population, records its total delay for feasible solution particles, and records its insoluble data set D for insoluble particles inf
The data layout process comprises the steps of:
initializing a task position list (taskLocList) for recording the execution positions of all tasks and an overrun identification (flagOverflow) for recording whether a data center exceeds the capacity limit of the data center in the task execution process (lines 1-4);
calculating capacity conditions of the data center after the initial data set is subjected to data layout (lines 5-7), calculating capacity conditions of the data center in the process of task execution, traversing task queues (lines 8-17), calculating execution positions of tasks and recording task position lists (lines 9-13);
when a task generates an output data set, temporarily storing the input data set and the output data set of the task on a data center, judging whether the data center exceeds capacity limit at the moment (lines 14-15), then distributing the output data of the task on the data center appointed by the task, and updating the capacity of the data center (line 16);
if the data center exceeds the capacity limit in the process of executing the task, recording the data distributed on the data center exceeding the capacity limit in the insoluble data set D inf And (lines 18-19), otherwise, calculating and recording the total delay (lines 20-22), including the data replication delay and the data transmission delay.
And S3, carrying out workflow data layout according to the solving result.
The second embodiment of the invention is as follows:
a storage medium having stored thereon a computer program for workflow data layout in a cloud-edge environment, characterized in that the computer program when executed performs the steps of a method for workflow data layout in a cloud-edge environment according to any of the preceding claims 1-9.
In summary, according to the workflow data layout method and the storage medium in the cloud environment provided by the invention, under the premise of considering the factors such as transmission bandwidth, data copy generation cost, data center capacity, privacy data and the like, the data copy is adaptively generated to optimize the transmission delay in the operation of the scientific workflow. And modeling the data copy layout into a 0-1 integer programming problem with the aim of minimizing the total time delay, generating copies for data used by high frequency according to the topological structure of the scientific workflow, and exchanging the cost of generating the copies for the transmission cost, thereby reducing the total time delay. The nonlinear inertial weight discrete particle swarm optimization algorithm based on the genetic algorithm operator is provided for solving the problem of data layout. The crossover operator and the mutation operator of the genetic algorithm are introduced into the particle swarm algorithm, so that the searching capability of the particle swarm algorithm is enhanced, premature convergence is avoided, and the inertia weight is adaptively adjusted according to the difference between the current particle and the global particle, so that the optimizing process is more efficient.
The core purpose of the invention is to minimize the time delay while meeting the storage capacity limit of the data privacy and the data center in the process of executing the scientific workflow.
The foregoing description is only illustrative of the present invention and is not intended to limit the scope of the invention, and all equivalent changes made by the specification and drawings of the present invention, or direct or indirect application in the relevant art, are included in the scope of the present invention.

Claims (10)

1. The method for arranging the workflow data in the cloud side environment is characterized by comprising the following steps:
s1, carrying out mathematical representation on a cloud edge environment, and modeling a data layout problem as a 0-1 integer programming problem based on copy generation cost and data transmission cost with the aim of minimizing total time delay to obtain a mathematical problem model;
s2, adopting a nonlinear inertial weight discrete particle swarm optimization algorithm based on a genetic algorithm operator, introducing a crossover operator and a mutation operator of the genetic algorithm into the particle swarm algorithm, and adaptively adjusting the inertial weight according to the difference between particles and global particles so as to solve the mathematical problem model;
and S3, carrying out workflow data layout according to the solving result.
2. The method for workflow data layout in a cloud-edge environment according to claim 1, wherein the mathematical representation of the cloud-edge environment in step S1 is specifically:
The cloud-edge environment is expressed as:
S={S cld ,S edg };
wherein, cloud computing S cld Comprising j data centers, denoted as:
S cld ={s 1 ,s 2 ,…,s j };
edge computation S edg Comprising k data centers, denoted as:
S edg ={s j+1 ,s j+2 ,…,s j+k };
each data center s i Expressed as:
s i =<c i ,γ i ,a i >;
wherein ,ci Representing its storage capacity, gamma i Representing data center type, gamma i ∈{0,1},γ i =0 represents that the data center is a cloud data center, and only public data, gamma, can be stored i The data center is denoted by 1 and can store public data and private data with fixed storage positions, a i Representing the speed at which the data center replicates data;
the network bandwidth between data centers is expressed as:
Figure FDA0004101636740000011
wherein ,bij Representing a data center s i And data center s j Is a bandwidth of (a);
the scientific workflow is expressed as:
G=(V,E,D);
wherein V represents a set of tasks in a scientific workflow:
V={v 1 ,v 2 ,…,v w };
e represents a set of task dependencies in a scientific workflow:
Figure FDA0004101636740000021
d represents a set of data replicas:
D={d 1 ,d 2 ,…,d m };
each task v i The relevant dataset is represented as<D i ,D o >,D i Representing its input dataset, D o Representing an output dataset thereof, the input dataset and the output dataset each consisting of one or more data, the inter-task dependencies e ij E, represent task v j Is task v i Is required at task v i Can be executed after completion, otherwise task v j For task v i Without dependence, each data copy set d i Comprising several copies of the ith data, d ij Representing it as the j-th copy of the i-th data, d i1 And (3) rendering the ith original data, each data copy containing attributes<z i1 ,n i1 ,f i1 ,l i1 >,z i1 Representing data size, n i1 Representing the number of copies of the data, which is an integer greater than 0, if n i1 =1, then indicates that the data has no other copies, f i1 Representing the generated data d i1 If dataF is the initial data i1 Is marked as 0,l i1 Record data d ij If data d ij Is privacy data, then l i1 Recording the data center to which the data center belongs, and if the data center is public data, l i1 Is 0.
3. The method for workflow data layout in a cloud-edge environment according to claim 2, wherein the modeling of the mathematical problem model in step S3 is specifically:
data d i1 In data center s k Copy overhead t of (2) copy The method comprises the following steps:
Figure FDA0004101636740000022
wherein ,zi1 Is data d ij Size of a), a k Is a data center s k The speed at which the data is copied;
data d ij From data centre s k Transmitted to data center s l Is the transmission overhead t of (2) tran The method comprises the following steps:
Figure FDA0004101636740000023
wherein bkl Is a data center s k And data center s l If the copy is copied and laid out to the current data center, no transmission overhead exists;
The data layout is expressed as { S, D, Y, T } total S is a data center set, D is a data set, Y is a layout position set of data, and all data D ij E D, all correspond to unique data centers:
Figure FDA0004101636740000031
T total for the total delay corresponding to the data placement scheme,for data replication time T copy And data transmission time T tran And (2) sum:
T total =T copy +T tran
data replication time T copy Expressed as:
Figure FDA0004101636740000032
wherein ,
Figure FDA0004101636740000033
representing data d i1 Layout position n of (2) i1 For data d i The number of copies;
data transmission time T tran Expressed as:
Figure FDA0004101636740000034
where h (i, j, k, l) ∈ {0,1}, h (i, j, k, l) =1 represents the l-th copy d of data k kl The slave data center s exists i To data centre s j Otherwise h (i, j, k, l) =0;
the targets of the data layout strategy are expressed as:
Figure FDA0004101636740000035
where β (i, j, k) ∈ {0,1}, β (i, j, k) =1 represents that the kth copy of data j is stored on data center i.
4. The method for workflow data layout in cloud-edge environment as recited in claim 1, wherein said step S3 comprises the steps of:
encoding the data layout strategy by adopting a two-dimensional array to construct candidate particles:
particle i at the t-th iterationData layout scheme of (a)
Figure FDA0004101636740000036
The following are provided:
Figure FDA0004101636740000037
each bit
Figure FDA0004101636740000038
Representing the copy set storage location of data j for the ith particle in the t-th iteration:
Figure FDA0004101636740000039
wherein ,qk ∈{0,1},q k =1 indicates that a copy of data j is laid out on data center k, otherwise indicates that a copy of data j is not laid out on data center k, x tij Middle q k The number of=1 represents the number of copies of data j.
5. The method of workflow data placement in a cloud-edge environment of claim 4, wherein solving the mathematical problem model according to the genetic algorithm operator based nonlinear inertial weight discrete particle swarm optimization algorithm comprises the steps of:
analyzing the scientific workflow, and performing topological ordering on the tasks to obtain a task queue capable of being sequentially executed;
initializing the maximum capacity of a data center, generating an initialization population according to a privacy data set, wherein privacy data in the initialization population can be laid out on the corresponding data center, and public data is randomly laid out without generating other copies;
simulating a data layout process, judging whether particles are feasible solutions, if so, calculating total time delay, and if not, recording an unreliable data set;
setting all individuals in the initial population as the optimal individual history, setting the optimal population history as the particles with the best fitness in the initial population, and calculating the fitness of the particles;
iterating the population, mutating the population according to the inertia weight factor w, and accelerating the population according to the acceleration factor alpha 1 Crossing the population with the optimal population of the individual history according to the acceleration factor alpha 2 Crossing the population with the population history optimum, calculating the adaptability of the new population, and updating the global information;
and outputting the total time delay with optimal population history when the iteration is finished.
6. The method of workflow data placement in a cloud-edge environment of claim 5, wherein the fitness calculation comprises:
based on the comparison of fitness values F for both types of particles, a fitness function is established:
both particles compared are feasible solutions, and the particle fitness with lower total delay is better, and the fitness function is defined as follows:
F=T total
both particles compared are not feasible solutions, then the data set D is not resolvable inf The smaller length particles have better fitness, which means that more data is laid out in feasible locations, and become feasible solution particles in subsequent iterations more easily, and the fitness function is as follows:
F=|D inf |;
if the feasible solution particles and the infeasible solution particles are compared, selecting a feasible solution, wherein the fitness function is as follows:
Figure FDA0004101636740000051
7. the method for workflow data placement in a cloud-edge environment of claim 5, wherein the data placement process comprises the steps of:
Initializing a task position list for recording the execution positions of all tasks and an overrun mark for recording whether a data center exceeds the capacity limit of the data center in the task execution process;
calculating the capacity condition of the data center after the initial data set is subjected to data layout, traversing the task queue, calculating the execution position of the task and recording the execution position of the task into a task position list;
when a task generates an output data set, temporarily storing the input data set and the output data set of the task on a data center, judging whether the data center exceeds capacity limit at the moment, then distributing the output data of the task on the data center designated by the task, and updating the capacity of the data center;
if the data center exceeds the capacity limit in the process of executing the task, recording the data distributed on the data center exceeding the capacity limit in the insoluble data set D inf And if not, calculating and recording the total time delay.
8. The method for workflow data layout in a cloud environment as claimed in claim 7, wherein in the nonlinear inertial weight discrete particle swarm optimization algorithm based on genetic algorithm operator in step S3, introducing crossover and mutation operators of genetic algorithm in the particle swarm algorithm comprises the steps of:
Iterating the velocity and position of the particles:
Figure FDA0004101636740000052
Figure FDA0004101636740000053
the ith update policy for the ith particle is:
Figure FDA0004101636740000054
wherein ,Cg and Cp Is a crossover operator, M u Is a mutation operator, which is used for the mutation of the original data,
Figure FDA0004101636740000055
is the individual history of particle i at the t-th iteration is optimal, g t Is the optimal population history at the t iteration, alpha 1 、α 2 And w is between 0 and 1, representing an acceleration factor and an inertial weight factor;
replacing an inertia part in the particle swarm algorithm by adopting a mutation operator of the genetic algorithm:
Figure FDA0004101636740000061
generating a random number r between 0 and 1 w If it is smaller than the inertial weight factor w, the particles undergo mutation:
acquisition of an insoluble data set D of particles X i inf From the insoluble dataset D inf And a privacy dataset D fix Obtaining a variation position:
if D inf If there is no data, choose not to be at D fix Bit of a data correspondence of D inf If there is data in the list, select D inf The common data of (a) is divided into bits;
counting the copy number of the data corresponding to the position to be mutated of the particle Xi, if
X i [muIndex][j]=1;
The corresponding data of the position to be mutated of the particle Xi is shown to have a copy on the data center j;
updating the copy number copy count, increasing or decreasing the copy number copy count according to probability based on the original number, and ensuring that at least one copy exists and the copy number copy count does not exceed the number of the data center;
Is particle X i Generating a data copy layout scheme with copy number of copy count at the position to be mutated;
the individual cognition and social cognition parts in the particle swarm algorithm are replaced by adopting a crossover operator of the genetic algorithm:
Figure FDA0004101636740000062
Figure FDA0004101636740000063
wherein ,
Figure FDA0004101636740000064
representing optimal crossing of particles and individual history, +.>
Figure FDA0004101636740000065
Representing the optimal intersection of particles and population history.
9. The method for workflow data layout in a cloud environment according to claim 5, wherein in the nonlinear inertial weight discrete particle swarm optimization algorithm based on genetic algorithm operator in step S3, the self-adaptive adjustment of the inertial weight according to the difference between the particles and the global particles comprises the steps of:
a strategy of non-linearly adjusting the inertial weight is adopted, adjusting inertial weights based on the degree of difference of the current particle and the global particle:
Figure FDA0004101636740000071
Figure FDA0004101636740000072
Figure FDA0004101636740000073
representing the difference between the particles and the population optimal particles;
adjusting acceleration factor alpha using a linear variation strategy 1 and α2
Figure FDA0004101636740000074
Figure FDA0004101636740000075
10. A storage medium having stored thereon a computer program for workflow data layout in a cloud-edge environment, characterized in that the computer program when executed performs the steps of a method for workflow data layout in a cloud-edge environment according to any of the preceding claims 1-9.
CN202310176231.7A 2023-02-28 2023-02-28 Workflow data layout method under cloud side environment and storage medium Pending CN116050235A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310176231.7A CN116050235A (en) 2023-02-28 2023-02-28 Workflow data layout method under cloud side environment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310176231.7A CN116050235A (en) 2023-02-28 2023-02-28 Workflow data layout method under cloud side environment and storage medium

Publications (1)

Publication Number Publication Date
CN116050235A true CN116050235A (en) 2023-05-02

Family

ID=86127427

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310176231.7A Pending CN116050235A (en) 2023-02-28 2023-02-28 Workflow data layout method under cloud side environment and storage medium

Country Status (1)

Country Link
CN (1) CN116050235A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116955354A (en) * 2023-06-30 2023-10-27 国家电网有限公司大数据中心 Identification analysis method and device for energy digital networking

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116955354A (en) * 2023-06-30 2023-10-27 国家电网有限公司大数据中心 Identification analysis method and device for energy digital networking

Similar Documents

Publication Publication Date Title
Guo et al. Cloud resource scheduling with deep reinforcement learning and imitation learning
Das et al. Recent advances in differential evolution–an updated survey
CN109948029B (en) Neural network self-adaptive depth Hash image searching method
CN113098714B (en) Low-delay network slicing method based on reinforcement learning
US7882052B2 (en) Evolutionary neural network and method of generating an evolutionary neural network
CN108875955A (en) Gradient based on parameter server promotes the implementation method and relevant device of decision tree
Chakraborty et al. Differential evolution and its applications in image processing problems: a comprehensive review
CN111541570A (en) Cloud service QoS prediction method based on multi-source feature learning
Tawhid et al. A hybrid social spider optimization and genetic algorithm for minimizing molecular potential energy function
CN115168281B (en) Neural network on-chip mapping method and device based on tabu search algorithm
CN116050235A (en) Workflow data layout method under cloud side environment and storage medium
Deb et al. Classifying metamodeling methods for evolutionary multi-objective optimization: first results
CN115293623A (en) Training method and device for production scheduling model, electronic equipment and medium
CN111414961A (en) Task parallel-based fine-grained distributed deep forest training method
AlSuwaidan et al. Swarm Intelligence Algorithms for Optimal Scheduling for Cloud‐Based Fuzzy Systems
Ming et al. Intelligent approaches to tolerance allocation and manufacturing operations selection in process planning
Wang et al. Particle swarm optimizer with adaptive tabu and mutation: A unified framework for efficient mutation operators
CN118012602A (en) Distributed cluster data equalization method based on balanced multi-way tree
TWI758223B (en) Computing method with dynamic minibatch sizes and computing system and computer-readable storage media for performing the same
CN110175172B (en) Extremely-large binary cluster parallel enumeration method based on sparse bipartite graph
CN108289115A (en) A kind of information processing method and system
Younis et al. Genetic algorithm for independent job scheduling in grid computing
Ho et al. Adaptive communication for distributed deep learning on commodity GPU cluster
JP6953376B2 (en) Neural networks, information addition devices, learning methods, information addition methods, and programs
Dou et al. A genetic algorithm with path-relinking for operation sequencing in CAPP

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination