CN112926262A

CN112926262A - Data separate storage method, system, medium and terminal under cloud edge collaborative environment

Info

Publication number: CN112926262A
Application number: CN202110190341.XA
Authority: CN
Inventors: 蒋昌俊; 闫春钢; 王鹏伟; 丁志军; 张亚英; 曹尔洋
Original assignee: Tongji University
Current assignee: Tongji University
Priority date: 2021-02-18
Filing date: 2021-02-18
Publication date: 2021-06-08

Abstract

The invention provides a data separate storage method, a system, a medium and a terminal under a cloud edge collaborative environment; the method comprises the following steps: establishing a mathematical model related to a user operation data object; assigning a basic availability evaluation strategy to a plurality of feasible inventory schemes in the mathematical model to determine the basic availability of the inventory strategy; defining strong and weak usability and strong and weak time ductility of a plurality of feasible storage schemes in the mathematical model so as to measure the storage strategy conditions under different user requirements; the cloud-side service provider and data object information are used as input, multi-objective optimization is carried out by utilizing an improved NSGA-II algorithm, and a separate storage strategy is made; the invention has the advantages of increased usability and reduced risk of privacy disclosure; optimizing availability, response delay and cost; reasonable storage strategies are provided for users in different areas to access data, and the effectiveness of the method is proved by a large amount of experiments.

Description

Data separate storage method, system, medium and terminal under cloud edge collaborative environment

Technical Field

The invention relates to the technical field of multi-cloud storage and edge computing, in particular to a data optimized separate storage method combining multi-cloud and edges, and particularly relates to a data separate storage method, a data separate storage system, a data separate storage medium and a data separate storage terminal in a cloud-edge collaborative environment.

Background

Many enterprises and organizations put their data into the cloud because of its advantages of convenient use, low price and data reliability, and as cloud storage is more widely used, users pay more and more attention to the integrity and security of data, and in particular, users need to know the accessibility, vendor lock, privacy problems, etc. caused in a single cloud environment.

From the perspective of dealing with the risk of single cloud storage, research on data distributed storage is numerous, and when data is stored on resources provided by a third party, a user needs to consider data security factors including confidentiality, availability and integrity; the study of multi-cloud storage (MCS) security can be divided into two main parts: security mechanisms are developed for Content-Security-Policy (CSP) to gain user trust and to formulate policies that reduce the user's requirements for CSP, for which the two main redundant distributed storage methods are replication and erasure coding.

Research on MCS issues aimed at generating an optimal solution for storing data objects, considering different objectives such as cost, availability, latency, security, etc., and subject to some limitations, e.g., budget, availability requirements, related research includes an agent mechanism called RACS proposed by Abu-Libdeh et al, which distributes user data to multiple cloud service providers to reduce the cost of service provider replacement, however, RACS does not propose a mathematical optimization model to design a method for optimizing a specific objective; in addition, a secure, cost-effective MCS scheme called scmscs is provided, and data storage with high availability and minimal cost, other schemes include a heuristic storage method and storage mode conversion strategy called CHARM to increase the availability and overhead of data, and take into account data redundancy, cost and latency factors, and use a heuristic method to find the best storage solution CLRDS.

In recent years, researchers have conducted some research many times on copy placement in edge-side or cloud-edge environments, but there is no general method, similar to cloud computing, where storage is centralized and implemented as a complex multi-layer system consisting of commodity servers and disk drive groups, and where some edge nodes are responsible for meeting storage requirements and balancing storage requirements of different edge nodes; related researches include an edge-side collaborative storage framework (ECS) to improve the cache hit rate of preference data, and other data blocks are comprehensively considered, and the data storage capacity, the replacement rate and the replacement cost of an edge server for storing the data blocks are designed to be a data block placement method; existing edge storage work rarely combines cloudy storage with edge storage, which may lead to the above-mentioned problem of a single cloud, and in addition, they use a cloud platform as a supplement to resources, do not fully utilize cloudy storage, and cannot fully utilize the potential advantages of high availability and price of cloudy storage.

It may be noted that existing research is limited to cloudy or edge cloud computing environments that lack a combination of cloudy and edge service providers to meet the data storage needs of users.

Disclosure of Invention

In view of the above disadvantages of the prior art, an object of the present invention is to provide a data split-storage method, system, medium and terminal in a cloud-edge collaborative environment, so as to solve the problem that the prior art cannot efficiently implement the existing multi-cloud storage and single-edge storage of data.

In order to achieve the above and other related objects, the present invention provides a data split-storage method in a cloud-edge collaborative environment, including the following steps: establishing a mathematical model related to a user operation data object; the mathematical model defines a data access area concept, defines service indexes under a multi-user access area and is used for describing service quality indexes when a user stores data objects and accesses the data objects in different areas so as to carry out operation processing; assigning a basic availability evaluation strategy to a plurality of feasible inventory schemes in the mathematical model to determine the basic availability of the inventory strategy; defining strong and weak usability and strong and weak time ductility of a plurality of feasible inventory sharing schemes in the mathematical model so as to measure the condition of inventory sharing strategies under different user requirements; and taking the cloud-side service provider and the data object information as input, performing multi-objective optimization by using an improved NSGA-II algorithm, and making a separate storage strategy.

In an embodiment of the present invention, the step of establishing the mathematical model associated with the user-operated data object comprises the steps of: determining a user access area according to the historical condition or the requirement of the user access data; determining the weight of each user access area according to the actual access condition; and on the premise of erasure-coding, defining the service index according to coding parameters and the weight, and establishing the mathematical model.

In an embodiment of the present invention, the step of assigning the basic availability evaluation policy to a plurality of feasible allocation schemes in the mathematical model comprises: and determining whether the separate access strategy meets the basic data acquisition request of the user for each user access area according to the user access area and the erasure-coding condition in the mathematical model.

In an embodiment of the present invention, the method further includes the following steps: defining a mathematical expectation of availability of each user access area as a total availability to measure the availability of all of the inventory schemes based on the total availability; defining the weighted sum of the response delay in each user access area as the total response delay of the inventory strategy; wherein the weighted weight is a probability that the user accesses data in each of the user access areas.

In an embodiment of the present invention, the strong and weak usability includes: strong availability and weak availability; the strong and weak time delay performance comprises the following steps: strong and weak ductility; wherein the strong availability and the strong latency require that the availability and the latency of each user access area meet a threshold requirement; the weak availability and the weak latency require that the overall availability and latency of the inventory policy meet the threshold requirement.

In an embodiment of the present invention, the multi-objective optimization and the partition strategy formulation using the improved NSGA-II algorithm include the following steps: on the basis of the traditional NSGA-II algorithm, a plurality of populations are introduced to improve the local search capability; initializing the multi-population, calculating the value of a fitness function, executing cross mutation, non-dominated sorting and calculating a crowdedness function, and performing feasibility check on the generated inventory strategy to determine whether the inventory strategy meets constraint conditions; the constraint conditions include: usability, feasibility and timely ductility.

In an embodiment of the present invention, the performing multi-objective optimization and the making of the inventory strategy by using the improved NSGA-II algorithm further includes the following steps: normalizing each service index; acquiring a pareto set; the pareto set comprises at least one inventory strategy; and calculating the Euclidean distance from each element in the pareto set to an ideal point, and determining the inventory strategy with the minimum distance as a final inventory strategy.

The invention provides a data storage system under a cloud edge collaborative environment, which comprises: the system comprises a modeling module, a storage scheme basic availability detection module, a storage scheme solving module and a storage scheme determining module; the modeling module is used for establishing a mathematical model related to a user operation data object; the mathematical model defines a data access area concept, defines service indexes under a multi-user access area and is used for describing service quality indexes when a user stores data objects and accesses the data objects in different areas so as to carry out operation processing; the storage scheme basic availability detection module is used for appointing a basic availability evaluation strategy for a plurality of feasible inventory schemes in the mathematical model to determine the basic availability of the inventory strategy; the storage scheme solving module is used for appointing a basic availability evaluation strategy for a plurality of feasible storage schemes in the mathematical model so as to determine the basic availability of the storage strategy; the storage scheme determining module is used for inputting cloud-side service provider and data object information, performing multi-objective optimization by using an improved NSGA-II algorithm and making a storage strategy.

The invention provides a storage medium, on which a computer program is stored, which, when executed by a processor, implements the data split-storage method in the cloud-edge collaborative environment described above.

The present invention provides a terminal, including: a processor and a memory; the memory is used for storing a computer program; the processor is configured to execute the computer program stored in the memory, so that the terminal executes the data split-storage method in the cloud-edge collaborative environment.

As described above, the data separate storage method, system, medium and terminal in the cloud-edge collaborative environment according to the present invention have the following beneficial effects:

(1) compared with the prior art, the invention provides a data separate storage method under a multi-cloud and edge scene and a reasonability analysis method thereof, and compared with single cloud storage, the multi-cloud storage has the advantages of increasing the usability and reducing the privacy disclosure risk; in addition, considering the area where the user frequently accesses data, a two-stage storage model based on combination of cloud and edge of the area accessed by the user is provided, wherein the first stage optimizes availability, response delay and cost to generate leading edge pareto; in the second stage, reasonable storage strategies can be provided for users in different areas to access data according to actual conditions, and the effectiveness of the method is proved by a large amount of experiments.

(2) The invention ensures quick response time service and high availability, considers the access area of the user, defines a multi-target mathematical problem on multi-cloud and edge storage, and provides a storage strategy based on improved NSGA-II, thereby improving the accuracy of the search solution based on the actual condition of the user.

Drawings

Fig. 1 is an application scenario architecture diagram of the data storage method in the cloud-edge collaborative environment according to an embodiment of the present invention.

Fig. 2 is a flowchart illustrating a data storage method in a cloud-edge collaborative environment according to an embodiment of the present invention.

FIG. 3 is a flow chart illustrating the process of creating a mathematical model associated with a user-manipulated data object according to one embodiment of the present invention.

Fig. 4 is a diagram illustrating the availability and cost comparison of the data split-storage method under the cloud-edge collaborative environment and the CHARM algorithm in an embodiment under different data sizes.

Fig. 5 is a diagram showing a comparison between availability and overhead of the data split-storage method under the cloud edge collaborative environment and the CLRDS algorithm in an embodiment under different data sizes.

Fig. 6 is a graph showing a comparison between availability and overhead of the data split-storage method under the cloud-edge collaborative environment and the CHARM algorithm in an embodiment under different data access frequencies.

Fig. 7 is a graph showing a comparison between availability and overhead of the data split-storage method under the cloud edge collaborative environment and the CLRDS algorithm in an embodiment under different data access frequencies.

Fig. 8 is a schematic structural diagram of a data storage system in a cloud-edge collaborative environment according to an embodiment of the present invention.

Fig. 9 is a schematic structural diagram of a terminal according to an embodiment of the invention.

Description of the reference symbols

81 modeling module

82 storage scheme basic availability detection module

83 storage scheme solving module

84 storage scheme determination module

91 processor

92 memory

S1-S4

S11-S13

Detailed Description

The following description of the embodiments of the present invention is provided by way of specific examples, and other advantages and effects of the present invention will be readily apparent to those skilled in the art from the disclosure herein. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict.

It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention, and the drawings only show the components related to the present invention rather than the number, shape and size of the components in actual implementation, and the type, quantity and proportion of the components in actual implementation may be changed freely, and the layout of the components may be more complicated.

Compared with the prior art, the data separate storage method under the multi-cloud and edge scene and the reasonableness analysis method thereof are provided, and compared with single cloud storage, the multi-cloud storage has the advantages of increasing usability and reducing privacy disclosure risks; in addition, considering the area where the user frequently accesses data, a two-stage storage model based on combination of cloud and edge of the area accessed by the user is provided, wherein the first stage optimizes availability, response delay and cost to generate leading edge pareto; in the second stage, reasonable storage strategies can be provided for user access data in different areas according to actual conditions, and the effectiveness of the method is proved through a large amount of experiments; the invention ensures quick response time service and high availability, considers the access area of the user, defines a multi-target mathematical problem on multi-cloud and edge storage, and provides a storage strategy based on improved NSGA-II, thereby improving the accuracy of the search solution based on the actual condition of the user.

Fig. 1 shows an application scenario architecture diagram of the data storage method under the cloud-edge collaborative environment in an embodiment of the present invention.

As shown in fig. 1, once a user hosts data to a cloud or edge storage device, when the user needs to access the data, all locations where the user accesses the data constitute the user's access area.

Assuming that there are regions where NR users frequently access the data object, that is, the number of accesses by the users in these regions is greater than or equal to a given threshold, the set of these regions is called user access region: r_U＝{R₁，R₂，...，R_NR}。

Depending on the particular access area, a list of edge service providers providing services in the respective area may be obtained, these service providers providing data access services only for users in their vicinity.

Firstly, determining an area where a user frequently accesses data; specifically, the main method for acquiring the region frequently operated by the user is: pre-specifying an area for acquiring future frequently accessed data; alternatively, the future potential user access area is derived based on historical record retrieval rules or predictions.

It should be noted that the service provider selection is based on the perspective of accessing data in each area (all identified user access areas), and thus the service provider is the union of the cloud service provider and the edge service provider in each access area.

Assuming an NC cloud service provider, access region R for a user_r，r∈[1，NR]The number of edge servers is NE_rThen, SP ═ CSP ═ ESP ═ { CSP, ESP₁，ESP₂，...，ESP_NR}; wherein, CSP ═ { CSP ═ CSP₁，CSP₂，...，CSP_NC}；

It is assumed that every user access area has access to all cloud service providers, which means that in any user access area the CSP inventory can be accessed and manipulatedStored data; in the user access area r, for

E can only be accessed when the location the user visits is within region r, in other words, assuming that the edge service provider only provides data access services in the nearby region.

As shown in fig. 2, in an embodiment, the data split-storage method in the cloud-edge collaborative environment of the present invention includes the following steps:

and step S1, establishing a mathematical model related to the data object operated by the user.

It should be noted that the mathematical model defines a concept of a data access area, and defines service indexes in a multi-user access area, so as to describe service quality indexes when a user performs data object storage and accesses data objects in different areas, so as to perform operation processing.

As shown in FIG. 3, in one embodiment, building a mathematical model associated with a user-manipulated data object includes the steps of:

and step S11, determining the user access area according to the historical situation or the requirement of the user access data.

And step S12, determining the weight of each user access area according to the actual access situation.

And step S13, under the premise of erasure-coding, defining the service index according to the coding parameters and the weight, and establishing the mathematical model.

Step S2, a basic availability evaluation policy is specified for a plurality of feasible inventory schemes in the mathematical model to determine the basic availability of the inventory policy.

In one embodiment, assigning a basic availability evaluation policy to a plurality of possible inventory schemes present in the mathematical model comprises the steps of:

and determining whether the separate access strategy meets the basic data acquisition request of the user for each user access area according to the user access area and the erasure-coding condition in the mathematical model.

In one embodiment, the method further comprises the following steps:

the mathematical expectation defining the availability of each user access area is the overall availability to measure the availability of all the allocation schemes based on the overall availability.

Defining a weighted sum of response delays in each of the user access areas as a total response delay of the inventory strategy.

Step S3, defining strong and weak usability and strong and weak ductility of a plurality of feasible allocation schemes in the mathematical model, so as to measure the allocation strategy conditions under different user requirements.

In one embodiment, the strong and weak availability includes a strong availability and a weak availability; the strong and weak ductility includes a strong ductility and a weak ductility.

Specifically, the strong availability and the strong latency require that the availability and the latency of each user access area meet threshold requirements; the weak availability and the weak latency require that the overall availability and latency of the inventory policy meet the threshold requirement.

It should be noted that, since the availability of each storage solution is different in each area, the total availability is defined to measure the availability of the storage solutions of all areas, and the mathematical expectation of the availability of each user to access an area is defined as the total availability.

First, the availability of each area is calculated using equation (1),

is shown in

In case j service set; the equivalent transformation is performed on equation (1) to calculate the area availability, thereby reducing the number of combinations.

It is inferred that the combination of all available cases for all n service providers in the r-th region equals X_rSince the number of available services in the r-th area is m to | X_rI, the availability of each region is therefore calculated using equation (2), where,

shown in region r

The j-th cloud and edge service set in this case.

From equation (2), it can be seen that the availability is only given by X in each region r_rAnd (4) determining.

It should be noted that the user access frequency is regarded as the probability of user data access in the area, and therefore, the linear combination of the user access frequency and the availability of each area is the overall availability, and for each user access area, the availability is calculated using the formulas (1) and (3).

Further, prior to computation, it is also necessary to ensure that each region can substantially use the user data object.

|X_r|≥m，r∈[1，NR] (4)

Min A_r≥A_req，r∈[1，NR] (5)

f₁(X)≥A_req (6)

Availability definition: in the case of erasure coding (parameter (m, n)), the storage policy X is defined as (X)₁，x₂，...，x_n) X for data if at least m data blocks are available per user access area rThe object is basically usable as shown in equation (4).

According to the availability definition above, two types of data objects are defined, which are strong availability and weak availability from the point of view of all users accessing the area; wherein equation (5) represents strong availability in the storage scheme; the data availability in all user access regions satisfies the minimum data availability required by the user, which is calculated using formula (3); equation (6) represents weak availability of the storage scheme, which means that the overall availability of the storage scheme can meet the minimum data availability required by the user, and obviously, strong availability is more strict than weak availability in terms of data availability.

Min D_r≤D_req，r∈[1，NR] (9)

f₂(X)≤D_req (10)

Defining that the weighted sum of the response delay in each user access area is the total response delay of the storage policies of the r user access areas; wherein the weighted weight is a probability that a user accesses data in each user access area; similar to usability, two types of retardation properties are defined, strong and weak ductility respectively; the response delay meets the requirement of strong delay as shown in a formula (9), wherein the formula (9) represents that the storage solution meets the requirement of strong delay, and the data response delay of the user in all user access areas meets the minimum response delay required by the user; weak latency requirement as shown in equation (10), equation (10) indicates that the latency satisfaction of the stored solution is weak, which means that the overall response latency of the stored solution satisfies the minimum response latency required by the user.

It should be noted that the above-mentioned weighting is the probability of the user accessing the data in each user access area, and can be described by equations (11) - (12); wherein D is_rIs the response delay for accessing the data object in region r, which is obtained by equation (7); since there are r user access areas, the weighted sum of the user costs in each user access area is the total cost of the storage solution.

For each storage strategy, its corresponding objective function can be described as equation (13), f₁(X)、f₂(X)、f₃(X) corresponds to an objective function of availability, response delay, and cost, respectively, while the constraints of the basic availability, and data response delay thresholds should be met.

And step S4, inputting the cloud service provider and the data object information, performing multi-objective optimization by using an improved NSGA-II algorithm, and making a storage strategy.

It should be noted that, since the data storage problem across the cloud and the edge is an NP-complete problem, the above problem belongs to the category of combinatorial optimization, and in order to solve the above problem, in the present invention, a series of storage strategies are obtained according to the user requirements by using the improved NSGA-II algorithm, and then a scheme is selected according to the actual situation.

In this embodiment, the basic steps of the improved NSGA-II algorithm include initializing a population, calculating a value of a fitness function, performing cross mutation, non-dominated sorting, calculating a congestion function, and performing feasibility check on the generated solution to determine whether a constraint condition is satisfied, where after the above process is repeatedly performed for a certain number of times, a solution set (Pareto solution set) at the 0 th layer is an obtained solution set.

It should be noted that NSGA-II is based on a genetic algorithm, and a conventional genetic algorithm can perform global search in a wide search range, but the local search capability is weak, and is easily trapped in a local optimal solution, the conventional genetic algorithm usually has only one population, and a plurality of sets of genetic algorithms can improve the search accuracy without affecting the search time.

In one embodiment, the multi-objective optimization and the partition strategy making by using the improved NSGA-II algorithm comprises the following steps:

(41) on the basis of the traditional NSGA-II algorithm, a plurality of populations are introduced to improve the local searching capability.

In the embodiment, considering the context of cloud and edge collaborative storage, when the user is located in different user access areas, the user can access the data object through different service providers, considering the context of the problem, a multi-population strategy is used to enhance the global search capability of the algorithm, during initialization, each population is generated according to the random distribution in the CSP and ESP of each area, the parameters of the individual include SP and X, and the parameter m of the individual code is ensured. According to the aforementioned usability definition, in the case of ensuring the encoding, the storage policy corresponding to each individual should satisfy the basic usability for each area user, the storage scheme basic usability detection algorithm aims to check the basic usability of an individual, the algorithm output is the boolean judgment result of one policy, the algorithm first initializes the SP number of each area, then calculates each SP number of a single area, and thereafter, the algorithm initializes a return result flag, the algorithm calculates based on the above formula (4) according to the individual erasure-coding parameter m judgment result.

(42) Initializing the multi-population, calculating the value of a fitness function, executing cross mutation, non-dominated sorting and calculating a crowdedness function, and performing feasibility check on the generated inventory strategy to determine whether the inventory strategy meets constraint conditions; the constraint conditions include: usability, feasibility and timely ductility.

In this embodiment, because the local search capability of the conventional genetic algorithm is weak, the local search algorithm in the storage scheme solution aims to solve the problem, when generating offspring, pareto neighborhood search is randomly performed on individuals at the front end of pareto, and the search process is inspired by the migration and adjustment operators of the monarch algorithm; the specific process of the migration operation is as follows: for each position of the chromosome, a random number is first generated; if the random number is less than the mobility, the position is a value of the same position of the random individuals in the population; if not, setting the random individuals in another population at the same position; the specific process of the adjusting operation is as follows: for each digit of the chromosome, firstly generating a random number; if the random number is less than the adjustment rate, the position is the value of the same position of the best individual of the population; otherwise, modifying the value of the bit into the value of the same position of the random individual in the population; if the newly generated individual does not dominate the original individual, the newly generated individual still has a certain likelihood of survival.

In this embodiment, after the iteration is completed, a pareto solution set is obtained, and there are one or several individuals in the solution set, namely storage policies, each individual corresponds to a data storage scheme, and the individual objective function values of the individuals correspond to the availability, cost and delay of the storage policies.

In one embodiment, the multi-objective optimization and the partition strategy making by using the improved NSGA-II algorithm further comprises the following steps:

(43) a pareto set is obtained.

It should be noted that the pareto set includes at least one of the stock policies.

(44) And normalizing each service index.

It should be noted that the storage scheme determination algorithm obtains a solution at the pareto front based on the ideal point; first, it initializes parameters and finds the maximum availability, minimum delay and minimum cost; the components of the pareto front are then normalized to yield an ideal optimal solution with availability bestA, latency bestD, cost bestC, noted (bestA, bestD, bestC).

(45) And calculating the Euclidean distance from each element in the pareto set to an ideal point, and determining the inventory strategy with the minimum distance as a final inventory strategy.

In general, the ideal optimal solution is meaningless, so in order to decide on a solution from the complete pareto set, the euclidean distance (bestA, bestD, bestC) to each element in the pareto set needs to be calculated; specifically, all dimensions in the range [0, 1] are first normalized; finally, the least distant solution is the final storage strategy corresponding to the objective function result.

In the present invention, information of 24 cloud service providers and 5 edge service providers is used for each user area, and their attributes include a storage price, an outgoing bandwidth price, an operation price, a response delay, and the like; the latest price information of the cloud service providers is collected from official websites of main cloud service providers, 24 CSP storage packages are collected in total, and most types of storage solutions are covered; the price of the edge storage is calculated according to the Nash equilibrium price of the corresponding cloud service provider; calculating an average response delay of each area and each service provider for a period of time by repeating the measurement, and further, simulating an availability value of the cloud service to 0.95 to 0.99, and obtaining an edge service by measuring an available time; finally, 29 service providers were reserved for each user access area for the experiments.

It should be noted that the experiment was performed on a Windows 10 operating system with 1.8GHz Intel Core i5-8300H and 4GB RAM, using the environments of Anaconda 2 and Python 2.7 to implement the algorithm.

It should be noted that the protection scope of the data storage method in the cloud edge collaborative environment according to the present invention is not limited to the execution sequence of the steps listed in this embodiment, and all the schemes implemented by the steps addition, subtraction, and step replacement in the prior art according to the principle of the present invention are included in the protection scope of the present invention.

As shown in fig. 4-7, in one embodiment, other similar multi-cloud storage jobs are adapted to be suitable solutions for solving cloud-edge combined storage in multi-user access areas, including CLRDS and CHARM methods, which are compared to the proposed method of the present invention, and which are modified according to relative goals to compare to existing jobs.

First, a method of changing a different data size from 200GB to 4000GB (step size of 100GB) is discussed; in one embodiment, the population of the proposed method is set to 3, and the local search frequency is set to 5, τ to 0.5, and γ to [0.90,0.05,0.05 ].

FIGS. 4 and 5 show the results of CHARM and CLRDS generation; as can be seen from the experimental results considering different data object sizes, the algorithm proposed by the invention is better than CHARM in both availability and cost; compared with CLRDS, the response delay and the cost are greatly reduced; finally, these methods were compared by changing τ from 0.1 to 1.2 with a step size of 0.01; the population number of the proposed method is set to 3, the local search frequency is set to 5, S is 200, γ is [0.90,0.05,0.05], and the results are shown in fig. 6 and 7.

From experimental results considering different access frequencies, this method is significantly better than CHARM in terms of availability and cost over most frequencies; with the increase of frequency, the price advantage is more obvious; similar to CLRDS, this approach achieves better results in terms of latency and cost.

As shown in fig. 8, in an embodiment, the data storage system under the cloud-edge collaborative environment of the present invention includes a modeling module 81, a storage scheme basic availability detection module 82, a storage scheme solving module 83, and a storage scheme determination module 84.

The modeling module 81 is used to build a mathematical model related to the user manipulation of the data object.

The storage scheme basic availability detection module 82 is configured to assign a basic availability evaluation policy to a plurality of feasible inventory schemes present in the mathematical model to determine the basic availability of the inventory policy.

The storage solution solving module 83 is configured to assign a basic availability evaluation policy to a plurality of feasible storage solutions existing in the mathematical model to determine the basic availability of the storage policy.

The storage scheme determining module 84 is configured to take the cloud-side service provider and the data object information as input, perform multi-objective optimization by using an improved NSGA-II algorithm, and make a storage policy.

It should be noted that the structures and principles of the modeling module 81, the storage scheme basic availability detection module 82, the storage scheme solving module 83, and the storage scheme determining module 84 correspond to the steps in the data storage method under the cloud-edge collaborative environment one by one, and therefore are not described herein again.

It should be noted that the division of the modules of the above system is only a logical division, and the actual implementation may be wholly or partially integrated into one physical entity, or may be physically separated. And these modules can be realized in the form of software called by processing element; or may be implemented entirely in hardware; and part of the modules can be realized in the form of calling software by the processing element, and part of the modules can be realized in the form of hardware. For example, the x module may be a processing element that is set up separately, or may be implemented by being integrated in a chip of the system, or may be stored in a memory of the system in the form of program code, and the function of the x module may be called and executed by a processing element of the system. Other modules are implemented similarly. In addition, all or part of the modules can be integrated together or can be independently realized. The processing element described herein may be an integrated circuit having signal processing capabilities. In implementation, each step of the above method or each module above may be implemented by an integrated logic circuit of hardware in a processor element or an instruction in the form of software.

For example, the above modules may be one or more integrated circuits configured to implement the above methods, such as: one or more Application Specific Integrated Circuits (ASICs), or one or more Digital Signal Processors (DSPs), or one or more Field Programmable Gate Arrays (FPGAs), etc. For another example, when one of the above modules is implemented in the form of a Processing element scheduler code, the Processing element may be a general-purpose processor, such as a Central Processing Unit (CPU) or other processor capable of calling program code. For another example, these modules may be integrated together and implemented in the form of a System-On-a-Chip (SOC).

The storage medium of the present invention stores thereon a computer program that, when executed by a processor, implements the data split-storage method in the cloud-edge collaborative environment described above. The storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic disk, U-disk, memory card, or optical disk.

As shown in fig. 9, the terminal of the present invention includes a processor 91 and a memory 92.

The memory 92 is used for storing computer programs; preferably, the memory 92 comprises: various media that can store program codes, such as ROM, RAM, magnetic disk, U-disk, memory card, or optical disk.

The processor 91 is connected to the memory 92, and is configured to execute the computer program stored in the memory 92, so that the terminal executes the data split-storage method in the cloud-edge collaborative environment.

Preferably, the Processor 91 may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; the Integrated Circuit may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, or discrete hardware components.

It should be noted that the data separate-storage system under the cloud-edge collaborative environment of the present invention can implement the data separate-storage method under the cloud-edge collaborative environment of the present invention, but the implementation apparatus of the data separate-storage method under the cloud-edge collaborative environment of the present invention includes but is not limited to the structure of the data separate-storage system under the cloud-edge collaborative environment, which is exemplified in this embodiment, and all the structural modifications and substitutions in the prior art made according to the principle of the present invention are included in the protection scope of the present invention.

In summary, compared with the prior art, the data split-storage method under the cloud-edge collaborative environment and the reasonability analysis method thereof provided by the invention have the advantages that compared with single cloud storage, the multi-cloud storage has the advantages of increasing the usability and reducing the risk of privacy disclosure; in addition, considering the area where the user frequently accesses data, a two-stage storage model based on combination of cloud and edge of the area accessed by the user is provided, wherein the first stage optimizes availability, response delay and cost to generate leading edge pareto; in the second stage, reasonable storage strategies can be provided for user access data in different areas according to actual conditions, and the effectiveness of the method is proved through a large amount of experiments; the invention ensures quick response time service and high availability, considers the access area of the user, defines a multi-target mathematical problem on multi-cloud and edge storage, and simultaneously provides a storage strategy based on improved NSGA-II, which can improve the accuracy of a search solution based on the actual condition of the user; therefore, the invention effectively overcomes various defects in the prior art and has high industrial utilization value.

The foregoing embodiments are merely illustrative of the principles and utilities of the present invention and are not intended to limit the invention. Any person skilled in the art can modify or change the above-mentioned embodiments without departing from the spirit and scope of the present invention. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical spirit of the present invention be covered by the claims of the present invention.

Claims

1. A data separate storage method under a cloud edge collaborative environment is characterized by comprising the following steps:

establishing a mathematical model related to a user operation data object; the mathematical model defines a data access area concept, defines service indexes under a multi-user access area and is used for describing service quality indexes when a user stores data objects and accesses the data objects in different areas so as to carry out operation processing;

assigning a basic availability evaluation strategy to a plurality of feasible inventory schemes in the mathematical model to determine the basic availability of the inventory strategy;

defining strong and weak usability and strong and weak time ductility of a plurality of feasible inventory sharing schemes in the mathematical model so as to measure the condition of inventory sharing strategies under different user requirements;

and taking the cloud-side service provider and the data object information as input, performing multi-objective optimization by using an improved NSGA-II algorithm, and making a separate storage strategy.

2. The data storage method in the cloud-edge collaborative environment according to claim 1, wherein the step of establishing a mathematical model related to the user operation data object comprises the steps of:

determining a user access area according to the historical condition or the requirement of the user access data;

determining the weight of each user access area according to the actual access condition;

and on the premise of erasure-coding, defining the service index according to coding parameters and the weight, and establishing the mathematical model.

3. The data storage method in the cloud-edge collaborative environment according to claim 1, wherein the step of assigning a basic availability evaluation policy to a plurality of feasible storage schemes in the mathematical model comprises:

4. The data storage method in the cloud-edge collaborative environment according to claim 1, further comprising the steps of:

defining a mathematical expectation of availability of each user access area as a total availability to measure the availability of all of the inventory schemes based on the total availability;

defining the weighted sum of the response delay in each user access area as the total response delay of the inventory strategy; wherein the weighted weight is a probability that the user accesses data in each of the user access areas.

5. The data storage method in the cloud-edge collaborative environment according to claim 1, wherein the strong and weak availability includes: strong availability and weak availability; the strong and weak time delay performance comprises the following steps: strong and weak ductility; wherein the content of the first and second substances,

the strong availability and the strong latency require that the availability and the latency of each user access area meet threshold requirements;

the weak availability and the weak latency require that the overall availability and latency of the inventory policy meet the threshold requirement.

6. The data inventory method under the cloud-edge collaborative environment according to claim 1, wherein the multi-objective optimization and inventory strategy formulation by using the improved NSGA-II algorithm comprises the following steps:

on the basis of the traditional NSGA-II algorithm, a plurality of populations are introduced to improve the local search capability;

initializing the multi-population, calculating the value of a fitness function, executing cross mutation, non-dominated sorting and calculating a crowdedness function, and performing feasibility check on the generated inventory strategy to determine whether the inventory strategy meets constraint conditions; the constraint conditions include: usability, feasibility and timely ductility.

7. The data storage method under the cloud-edge collaborative environment according to claim 6, wherein the multi-objective optimization and the storage strategy formulation by using the improved NSGA-II algorithm further comprise the following steps:

normalizing each service index;

acquiring a pareto set; the pareto set comprises at least one inventory strategy;

and calculating the Euclidean distance from each element in the pareto set to an ideal point, and determining the inventory strategy with the minimum distance as a final inventory strategy.

8. A data storage system under a cloud edge collaborative environment is characterized by comprising: the system comprises a modeling module, a storage scheme basic availability detection module, a storage scheme solving module and a storage scheme determining module;

the modeling module is used for establishing a mathematical model related to a user operation data object; the mathematical model defines a data access area concept, defines service indexes under a multi-user access area and is used for describing service quality indexes when a user stores data objects and accesses the data objects in different areas so as to carry out operation processing;

the storage scheme basic availability detection module is used for appointing a basic availability evaluation strategy for a plurality of feasible inventory schemes in the mathematical model to determine the basic availability of the inventory strategy;

the storage scheme solving module is used for appointing a basic availability evaluation strategy for a plurality of feasible storage schemes in the mathematical model so as to determine the basic availability of the storage strategy;

the storage scheme determining module is used for inputting cloud-side service provider and data object information, performing multi-objective optimization by using an improved NSGA-II algorithm and making a storage strategy.

9. A storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the data storage method in the cloud-edge collaborative environment according to any one of claims 1 to 7.

10. A terminal, comprising: a processor and a memory;

the memory is used for storing a computer program;

the processor is configured to execute the computer program stored in the memory to enable the terminal to execute the data split-storage method in the cloud-edge collaborative environment according to any one of claims 1 to 7.