CN114048186A

CN114048186A - Data migration method and system based on mass data

Info

Publication number: CN114048186A
Application number: CN202111209912.6A
Authority: CN
Inventors: 彭亮
Original assignee: Individual
Current assignee: Guizhou Anhe Shengda Enterprise Management Co ltd
Priority date: 2021-10-18
Filing date: 2021-10-18
Publication date: 2022-02-15

Abstract

The invention discloses a data migration method based on mass data, which comprises the following steps: the central cloud predicts the hot event and outputs a corresponding hot data group; the central cloud imports the hotspot data set into an HDFS storage engine; the HDFS storage engine migrates the hotspot data set into the distributed data warehouse; the distributed data warehouse distributes the hotspot data set to the edge cloud cluster, so that the edge cloud cluster provides the hotspot data set for user equipment.

Description

Data migration method and system based on mass data

Technical Field

The invention belongs to the technical field of information, and particularly relates to a data migration method and system based on mass data.

Background

The big data is a data set which is large in scale and greatly exceeds the capability range of traditional database software tools (such as MySQL, Oracle, PostgreSQL and the like) in the aspects of acquisition, storage and analysis, and has the four characteristics of massive data scale, rapid data circulation, various data types and low value density; the method is a massive, high-growth rate and diversified information asset which can have stronger decision making power, insight discovery power and process optimization capability only by a new processing mode.

In recent years, researches on data storage, data migration, data reading and the like based on big data have entered a high-speed development period, but many technical problems need to be solved, and the inventor finds that, in the research process, instantaneous flow increase caused by sudden hot events can cause short-term congestion caused by big data reading and data migration, and at present, an effective early warning mechanism is lacking to reject such phenomena.

Disclosure of Invention

The invention provides a data migration method and system based on mass data, which solve the problem of data reading and transmission congestion caused by a hot event under a mass data scene in the prior art, effectively reduce the congestion problem and improve the user experience.

In order to achieve the above object, the present invention provides a mass data based data migration method, which is applied to a data migration system, wherein the system includes a central cloud, an HDFS storage engine, a distributed data warehouse, and an edge cloud cluster, and the method includes:

in the first period of time,

step 1: the central cloud predicts the hot event and outputs a corresponding hot data group;

step 2: the central cloud imports the hotspot data set into an HDFS storage engine;

and step 3: the HDFS storage engine migrates the hotspot data set into the distributed data warehouse;

and 4, step 4: the distributed data warehouse distributes the hotspot data set to the edge cloud cluster so that the edge cloud cluster provides the hotspot data set for user equipment;

in the second period of time, the period of time,

repeating the steps 1-3 to enable the distributed data warehouse to distribute the hotspot data set in a second period to the edge cloud cluster, wherein the hotspot data set in the second period is different from the hotspot data set in the first period.

Optionally, if the distributed data warehouse stores the hotspot data set in advance, the migrating the hotspot data set to the distributed data warehouse by the HDFS storage engine includes:

the HDFS storage engine acquires the ID and the priority parameter of the hotspot data group;

and the HDFS storage engine adjusts the priority parameter of the hot spot data group, and guides the adjusted priority parameter and the ID of the hot spot data group into the distributed data warehouse through an instruction, so that the distributed data warehouse searches and updates the priority parameter of the hot spot data group based on the ID and the priority parameter of the hot spot data group.

Optionally, if the distributed data warehouse is divided into a range partition and a hash partition, the importing the adjusted priority parameter and the ID of the hotspot data group into the distributed data warehouse through an instruction includes:

writing the ID of the hotspot data group into a range partition of the distributed data warehouse, and writing the adjusted priority parameter into a hash partition of the distributed data warehouse.

Optionally, if the distributed data warehouse does not store the hotspot data set in advance, migrating the hotspot data set to the distributed data warehouse by the HDFS storage engine, where the migrating includes:

the distributed data warehouse is provided with a monitoring program, and the monitoring program is used for monitoring a file record table of the HDFS storage engine;

after monitoring that the hotspot data group is recorded in the file record table, the distributed data warehouse sends a data acquisition request to the HDFS storage engine;

and after receiving the data acquisition request, the HDFS storage engine packages the hot spot data group into a message queue and migrates the message queue to the distributed data warehouse.

Optionally, the migrating the message queue to the distributed data warehouse includes:

dividing the message queue into a plurality of sectors through a kafka connect tool, writing the plurality of sectors into the distributed data warehouse in parallel, and merging in the distributed data warehouse.

Optionally, the edge cloud cluster includes a plurality of edge clouds and an edge cloud manager, and after the edge cloud cluster provides the hotspot data set for the user equipment, the method further includes:

the edge cloud manager predicts KPI performance indexes of the edge clouds, and sets a load balancing strategy after predicting that the KPI performance index of a first edge cloud exceeds a first preset threshold;

the edge cloud manager sends the storage content of the first edge cloud to a temporary partition of the distributed data warehouse based on the load balancing strategy, wherein the storage content of the first edge cloud does not include the hotspot data group;

the distributed data warehouse migrates the storage content in the temporary partition to one or more edge clouds nearest to the first edge cloud, wherein a KPI performance index of the one or more edge clouds is lower than a second preset threshold, and the second preset threshold is smaller than the first preset threshold.

Optionally, the predicting, by the edge cloud manager, KPI performance indicators of the plurality of edge clouds includes:

predicting KPI performance indicators for the plurality of edge clouds using a Hidden Markov Model (HMM).

Optionally, the predicting the hotspot event by the central cloud, and outputting a corresponding hotspot data set, includes:

the central cloud predicts the hotspot events using a time series model ARIMA.

Optionally, the edge cloud cluster includes a plurality of edge clouds and a plurality of edge nodes, where one edge cloud corresponds to the plurality of edge nodes, and the providing, by the edge cloud cluster, the hotspot data set for the user equipment includes:

the plurality of edge clouds respectively distribute the hotspot data sets to the corresponding plurality of edge nodes;

and the plurality of edge nodes send the hotspot data groups to the user equipment.

The embodiment of the present invention further provides a system, which includes a memory and a processor, where the memory stores computer-executable instructions, and the processor implements the method when running the computer-executable instructions on the memory.

The method and the system of the embodiment of the invention have the following advantages:

in the embodiment of the invention, the hot content data group corresponding to the hot event in the period is obtained by predicting the hot event, the hot content data is subjected to data migration in advance, and the hot content data group is written into the edge cloud in advance, so that when the user equipment requests the hot event, the edge cloud can quickly respond to the user equipment, the network transmission efficiency can be improved, and the problem of data congestion caused by an emergent hot event can be greatly reduced. In addition, due to the positioning reason of the large data storage database, the HDFS storage engine is suitable for a scene of once writing and multiple times of reading, and does not support random modification of stored data, so that hot spot events in different periods are different in corresponding hot spot data groups, the random modification requirement on the data is greatly increased, and the original HDFS storage engine is not suitable for the scene.

Drawings

FIG. 1 is a block diagram of a data migration system based on mass data in one embodiment;

FIG. 2 is a flow diagram of a method for mass data based data migration in one embodiment;

FIG. 3 is a logical view of inventory data migration in one embodiment;

FIG. 4 is a logical representation of incremental data migration in one embodiment;

FIG. 5 is a logic diagram of edge cloud data distribution in one embodiment;

FIG. 6 is a logic diagram of edge cloud load balancing in one embodiment;

FIG. 7 is a diagram illustrating the hardware components of the system in one embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

Fig. 1 is a network architecture diagram according to an embodiment of the present invention, and as shown in fig. 1, the embodiment of the present invention includes a central cloud 11, an HDFS storage engine 12, a distributed data warehouse 13, an edge cloud cluster 14, and a user device 15, where the central cloud 11 is located at a core layer of a network, and is configured to acquire mass data, process and analyze the mass data, and control storage, migration, and distribution of various types of data.

The HDFS storage engine 12 is the mainstream storage tool/engine for big data Hadoop, which is collectively referred to as a distributed file system. The off-line analysis tool is suitable for high throughput, is suitable for a scene of one-time writing and multiple times of reading and writing, and does not support random modification of files.

The distributed data warehouse 13 is distinguished from the HDFS, which supports random modification of files and is therefore well suited for scenarios where data needs to be updated and modified randomly. For data warehouse (data washhouse), it is a strategic set that provides all types of data support for decision making at all levels of the enterprise. Its main function is to systematically analyze and organize a large amount of data accumulated over the years by the online transaction processing (OLTP) of the information system through the data storage structure specific to the data warehouse theory, so as to facilitate the proceeding of various analysis methods such as online analysis processing (OLAP) and data mining, and further support the creation of Decision Support Systems (DSS) and administrative information systems (EIS), thereby helping the decision maker to quickly and effectively analyze valuable information from a large amount of data and help to construct BI.

The edge cloud cluster 14 is a cloud server cluster located at an edge layer, is closer to a user, and is a lightweight and miniaturized cloud service cluster, the edge cloud cluster 14 may further include an edge cloud manager 141 (not shown in the figure) and an edge node 142 (not shown in the figure), the edge cloud manager is a functional device, and its own functional characteristics are that an edge cloud server is used as both an edge cloud server and a device for controlling normal operation and maintenance work such as load balancing and data migration of an edge cloud, and the edge node is a lower level of the edge cloud and is closer to the user, it can be understood that the edge cloud server is generally distributed at a city level and a district level, and the edge node is distributed at a county level and a county level.

The user equipment 15 is all devices which can be networked and have intelligent analysis and processing capabilities, such as a mobile phone, a tablet computer, a PC, a VR, an AR, an industrial personal computer, an intelligent automobile, an internet of things device and the like. The method can perform data interaction with the edge cloud cluster through the existing wireless communication protocol to acquire various different data.

To achieve the above object, as shown in fig. 2, the present invention provides a method for data migration based on mass data, which is applied to the network architecture shown in fig. 1, and the method includes the following steps:

during the first period T1, the following steps 1-4 are performed:

the hot events are a group of events with short time period and high concurrence, such as major news, major topics and the like, and are characterized in that the search volume is high in a short time period, and keywords are concentrated. The hot spot events are classified into two types, one type is a fixed hot spot event, such as a high-concurrency strong-association hot spot event which occurs in fixed time of the type of "mid-autumn" or "moon cake", and the other type is non-fixed, such as sudden news, therefore, for the embodiment of the invention, an event sequence prediction model can be adopted for the fixed hot spot event to carry out hot spot data prediction.

Time series analysis explains the fact that data points obtained over time may have an internal structure that should be considered (e.g., autocorrelation, trends, or seasonal variations). And time series prediction uses a regression model to predict future values based on previously observed values. In the embodiment of the present invention, a time series model ARIMA (auto regression sum moving average) or VAR (vector auto regression) may be used to predict a hotspot event, and a hotspot data set corresponding to the hotspot event is obtained (the hotspot data set is a data file of the content of the hotspot event, specifically, the hotspot data set may be a binary data file, a video file, an audio file, and the like, for example, if the hotspot event is "mid-autumn," the hotspot data set may be a video, a moon cake picture, and the like at a late meeting in mid-autumn, that is, the hotspot event and the hotspot data set have a strong association relationship). The ARIMA model and the VAR model are prior art and embodiments of the present invention will not be described here.

HDFS is one of the most prominent distributed storage systems used in Hadoop applications. An HDFS cluster is mainly composed of one NameNode and many dataodes: the Namenode manages metadata of the file system, and the Datanode stores actual data. Hadoop (including HDFS) is well suited for distributed storage and computation on commodity hardware (comfort hardware) because it is not only fault tolerant and scalable, but is also very scalable. The Map-Reduce framework is known for its simplicity and availability in large distributed system applications, and has been integrated into Hadoop.

After acquiring the hotspot data set, the central cloud imports the hotspot data set into the HDFS storage engine. However, due to the positioning of large data storage, the HDFS is suitable for write-once and read-many scenarios, does not support random modification of files, and is not allowed if random update and deletion of data are performed during data migration.

for the above reasons, the embodiment of the present invention provides a distributed data warehouse, which can support batch storage and random read-write in a short time. Data storage, migration and distribution are performed using a distributed data warehouse to "replace" the HDFS.

For example, the original hotspot data group is only a data group with a lower priority in the data warehouse, and at this time, the priority of the hotspot data group needs to be increased, so that the time of migration of the hotspot data group is preferentially ensured in the transmission or migration process.

Specifically, if the distributed data warehouse stores the hotspot data group in advance, step 3 is actually inventory migration, and the method includes:

and the HDFS storage engine adjusts the priority parameter of the hot spot data group, and imports the adjusted priority parameter and the ID of the hot spot data group into the distributed data warehouse through an instruction, so that the distributed data warehouse searches and updates the priority parameter of the hot spot data group based on the ID and the priority parameter of the hot spot data group.

Specifically, the distributed data warehouse may be divided into a range partition (range partition) and a hash partition (hash partition), where the range partition is a partition mode that is most widely applied, and uses a range of column values as a partition condition, and has advantages of easy horizontal expansion of partitions and high throughput of sequential reading. The principle is that records are stored in a range partition where column values are located, so that the base columns and the range values of the partitions need to be specified during creation, if some records cannot predict the range temporarily, a maxvalue partition can be created, all records which are not in the specified range are stored in the partition where the maxvalue is located, and multiple columns are supported as dependent columns. The data stored in each partition is less than the value of the partition, and the other partitions except the first partition have the minimum value and are equal to the value of the value less than the value of the previous partition.

And the hash partition carries out the hash algorithm on the key, the data can be uniformly distributed, and therefore the writing speed is high. The Hash partitioning implementation idea is that a token is distributed to each node in the system, the range is 0-232-1, and the tokens form a Hash ring. When the data read-write executes the node searching operation, the hash value is calculated according to the key, and then the first encountered token node is found clockwise. The advantage of this approach over node redundancy is that adding and deleting nodes only affects neighboring nodes in the hash ring, and has no effect on other nodes.

Therefore, as shown in fig. 3, based on the advantages of different partitions, the HDFS may write the ID of the hot data group into the range partition of the distributed data warehouse (the range partition has a fast reading speed), write the adjusted priority parameter into the hash partition of the distributed data warehouse (the hash partition has a fast writing speed), and finally merge the two.

If the distributed data warehouse does not store the hotspot data set in advance, step 3 is actually incremental migration, and the method includes:

a monitoring program is set in the distributed data warehouse and is used for monitoring a file record table of the HDFS storage engine; for example, a BinLog file of MySQL may be monitored, and when the BinLog file is changed, the monitoring program may know whether the data is modified or updated.

When the fact that the hotspot data set is recorded in the file record table is monitored, the distributed data warehouse sends a data acquisition request to the HDFS storage engine; that is, when data modification or update occurs, the distributed data warehouse synchronizes data to the storage engine, and the data modification or update can be maintained in the data warehouse in real time.

In the embodiment of the invention, a message queue tool is adopted to serve as a buffer pool of data so as to reduce the peak value brought by data synchronization.

The step of migrating the message queue to the distributed data warehouse may specifically be:

dividing the message queue into a plurality of sectors through a kafka connect tool, writing the plurality of sectors into the distributed data warehouse in parallel, and merging in the distributed data warehouse. Among them, Kafka Connect is a tool for reliable data transmission between Kafka and other systems.

As shown in fig. 4, when the incremental data needs to be synchronized, the distributed data warehouse listens through a listener, and imports the incremental data into its own database in parallel through a synchronization mechanism, and finally merges the incremental data. In the process, the advantage of high parallel writing speed of the distributed data warehouse is fully exerted, different sector data are transmitted to the database in parallel, and finally the different sector data are combined/merged and stored in a container space or a class.

after receiving the hotspot data set, the distributed data warehouse distributes the hotspot data set to each edge cloud server, so that when a user requests the hotspot data set in the first period, the edge cloud cluster can quickly respond to the user requirement and send the hotspot data set corresponding to a hotspot event (namely, content corresponding to the hotspot event) to each User Equipment (UE).

In this embodiment of the present invention, an edge cloud cluster includes a plurality of edge clouds and a plurality of edge nodes, where one edge cloud corresponds to a plurality of edge nodes, and the edge cloud cluster provides a hot data group for a user equipment, which may specifically be:

the plurality of edge clouds respectively distribute the hotspot data sets to a plurality of corresponding edge nodes;

and the plurality of edge nodes send the hotspot data set to the user equipment. As shown in fig. 5, a distributed data warehouse-an edge cloud-an edge node may form a tree structure, and the distributed data warehouse serves as a root node, so that a hotspot data set is distributed to the edge cloud, the edge cloud is distributed to the edge node, and finally the hotspot data set is sent to the UE, thereby forming an effective data distribution method.

In the second period of time, the period of time,

repeating the steps 1-3 to enable the distributed data warehouse to distribute the hotspot data set in a second period to the edge cloud cluster, wherein the hotspot data set in the second period is different from the hotspot data set in the first period. In the second period, a new hot event is predicted through the central cloud, the hot event is obtained and led into the HDFS, the HDFS is poured into the distributed data warehouse, the distributed data warehouse is led into the edge cloud cluster, a cycle is completed, and the like, the third period and the fourth period are repeated.

In addition, in order to improve the data transmission efficiency, the load performance of the edge cloud cannot be considered without load balancing. As shown in fig. 6, in the embodiment of the present invention, an edge cloud manager predicts KPI (Key Performance Indicator) Performance indicators of multiple edge clouds, and sets a load balancing policy after predicting that the KPI Performance Indicator of a first edge cloud exceeds a first preset threshold;

KPI performance indicators are a general term for a series of indicator parameters, including but not limited to resource occupancy, transmission efficiency, storage space, etc. In the field of cloud computing operation and maintenance, KPI performance indicators are generally used to indicate whether the performance of a server is good or bad, and if any one indicator exceeds a standard, for example, the storage space exceeds an upper limit of 85%, the cloud server has a possibility of deteriorating storage response at this time, and a warning needs to be given in time to reduce the storage space.

In addition, KPI performance indicators can be predicted. For example, KPI performance indicators of a plurality of edge clouds may be predicted using a hidden markov model HMM. HMMs are powerful and complex algorithms for time series. Which itself trains HMM parameters using an EM algorithm that maximizes the likelihood of historical training data. For example, unsupervised learning can be employed for time series analysis, using temporal signatures to determine and predict KPI anomalies. This technology belongs to the prior art, and embodiments of the present invention will not be described in detail.

The method comprises the steps that an edge cloud manager sends storage content of a first edge cloud to a temporary partition of a distributed data warehouse based on a load balancing strategy, wherein the storage content of the first edge cloud does not comprise a hotspot data group;

the idea of the load balancing strategy is to equally divide resources, maximize efficiency improvement, and if a KPI anomaly occurs in a first edge cloud (any one of the edge clouds), it is necessary to migrate non-important data (i.e., other data of a non-hotspot data group) in the current first edge cloud to reduce resource occupancy rate thereof, thereby achieving load balancing. Therefore, the edge cloud manager needs to migrate data in the first edge cloud, and in the migration process, a temporary partition can be created for temporarily transferring the data by taking the distributed data warehouse as a transfer station, and the temporary partition can be destroyed after the data transfer is finished.

And the distributed data warehouse migrates the storage content in the temporary partition to one or more edge clouds closest to the first edge cloud, wherein the KPI performance index of the one or more edge clouds is lower than a second preset threshold, and the second preset threshold is smaller than the first preset threshold.

The data is migrated to the target object, so that the KPI performance index of the target object cannot exceed the standard and is within a lower threshold, therefore, a second preset threshold (for example, the upper limit of the storage space does not exceed 60%) can be manually set, and the edge cloud satisfying the condition has a lower load degree and is suitable for migration of data redundancy backup.

The method of the embodiment of the invention has the following advantages:

Embodiments of the present invention also provide a computer-readable storage medium having stored thereon computer-executable instructions for performing the method in the foregoing embodiments.

FIG. 7 is a diagram illustrating the hardware components of the system in one embodiment. It will be appreciated that fig. 7 only shows a simplified design of the system. In practical applications, the systems may also respectively include other necessary elements, including but not limited to any number of input/output systems, processors, controllers, memories, etc., and all systems that can implement the big data management method of the embodiments of the present application are within the protection scope of the present application.

The memory includes, but is not limited to, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM), or a portable read-only memory (CD-ROM), which is used for storing instructions and data.

The input system is for inputting data and/or signals and the output system is for outputting data and/or signals. The output system and the input system may be separate devices or may be an integral device.

The processor may include one or more processors, for example, one or more Central Processing Units (CPUs), and in the case of one CPU, the CPU may be a single-core CPU or a multi-core CPU. The processor may also include one or more special purpose processors, which may include GPUs, FPGAs, etc., for accelerated processing.

The memory is used to store program codes and data of the network device.

The processor is used for calling the program codes and data in the memory and executing the steps in the method embodiment. Specifically, reference may be made to the description of the method embodiment, which is not repeated herein.

In the several embodiments provided in the present application, it should be understood that the disclosed system and method may be implemented in other ways. For example, the division of the unit is only one logical function division, and other division may be implemented in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. The shown or discussed mutual coupling, direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, systems or units, and may be in an electrical, mechanical or other form.

Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The procedures or functions according to the embodiments of the present application are wholly or partially generated when the computer program instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable system. The computer instructions may be stored on or transmitted over a computer-readable storage medium. The computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)), or wirelessly (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that includes one or more of the available media. The usable medium may be a read-only memory (ROM), or a Random Access Memory (RAM), or a magnetic medium, such as a floppy disk, a hard disk, a magnetic tape, a magnetic disk, or an optical medium, such as a Digital Versatile Disk (DVD), or a semiconductor medium, such as a Solid State Disk (SSD).

The above is only a specific embodiment of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think of various equivalent modifications or substitutions within the technical scope of the present application, and these modifications or substitutions should be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A data migration method based on mass data is characterized by being applied to a data migration system, wherein the system comprises a central cloud, an HDFS storage engine, a distributed data warehouse and an edge cloud cluster, and the method comprises the following steps:

in the first period of time,

in the second period of time, the period of time,

2. The method of claim 1, wherein if the distributed data warehouse pre-stores the hotspot data set, the HDFS storage engine migrating the hotspot data set to the distributed data warehouse, comprising:

3. The method of claim 2, wherein the distributed data warehouse is divided into a range partition and a hash partition, and the importing the adjusted priority parameter and the ID of the hotspot data group into the distributed data warehouse via an instruction comprises:

4. The method of claim 1, wherein if the distributed data warehouse does not store the hotspot data set in advance, the migrating the hotspot data set to the distributed data warehouse by the HDFS storage engine comprises:

5. The method of claim 4, wherein said migrating said message queue into said distributed data warehouse comprises:

6. The method of claim 1, wherein the edge cloud cluster comprises a plurality of edge clouds and an edge cloud manager, and wherein after the edge cloud cluster provides the hotspot data set to a user device, the method further comprises:

7. The method of claim 6, wherein the edge cloud manager predicts KPI performance indicators for the plurality of edge clouds, comprising:

8. The method of claim 1, wherein the central cloud predicts hotspot events and outputs corresponding hotspot data sets, comprising:

the central cloud predicts the hotspot events using a time series model ARIMA.

9. The method of claim 1, wherein the edge cloud cluster comprises a plurality of edge clouds and a plurality of edge nodes, and wherein if an edge cloud corresponds to a plurality of edge nodes, the edge cloud cluster provides the hotspot data set for a user device, comprising:

10. A mass data based data migration system comprising a memory having stored thereon computer-executable instructions and a processor that, when executing the computer-executable instructions on the memory, implements the method of any of claims 1 to 9.