WO2019037093A1

WO2019037093A1 - Spark distributed computing data processing method and system

Info

Publication number: WO2019037093A1
Application number: PCT/CN2017/099083
Authority: WO
Inventors: 毛睿; 陆敏华; 陆克中; 朱金彬; 隋秀峰
Original assignee: 深圳大学
Priority date: 2017-08-25
Filing date: 2017-08-25
Publication date: 2019-02-28

Abstract

The present invention relates to the field of computers, and provides a Spark distributed computing data processing method. The method comprises: scheduling a sub-task by means of a task scheduler, executing an RDD partition data storage task, and applying for a space of a storage area; calculating the size of an expellable space in the storage area, and setting a migration address of a hybrid storage system according to partition data access popularity (S102); and reading cached data in a specified storage area, releasing a corresponding memory space, migrating partition data to a specified address, modifying a persistence level of the migrated data, and feeding back an expelling success signal and expelled space information (S103). Also provided is a Spark distributed computing system. By introducing the hybrid storage system and designing an expelling logic unit and a cached data migration unit, the data is migrated to an SSD or an HDD according to the partition data popularity and is not directly migrated to a magnetic disk or the cached data is deleted, so that the pressure of memory space shortage can be effectively reduced and the Spark performance is improved.

Description

Spark distributed computing data processing method and system

Technical field

The present invention relates to the field of computers, and in particular, to a Spark distributed computing data processing method and system.

Background technique

With the improvement of the level of social science and technology, people and the requirements for large-scale data processing are getting higher and higher. Among them, big data applications have a strong dependence on memory. Ample memory is the premise and guarantee for fast calculation of big data.

As a general-purpose, fast, and large-scale data processing engine, Spark has become a popular computing framework for big data applications, especially in the field of iterative computing such as graph computing and machine learning. As the scale of data sets continues to expand, The lack of space causes some partitioned data to be cached to memory, or the data that has been cached to memory needs to be migrated to disk, causing the performance of Spark to drop. For this problem, Spark proposes and designs a unified memory management model, when the partition data is cached. When the task cannot apply for enough storage space, it actively migrates the cached data in the storage area to disk or directly rejects it; the unified memory management model has the flexibility to effectively alleviate the Spark cache by migrating or culling the cached data. The demand for data and the pressure of insufficient storage space.

However, since the cached intermediate data is culled or migrated to the disk, the data must be re-executed to obtain data or read the disk to obtain cached data when the data is called again. Therefore, the Spark unified memory management model triggers some tasks of Spark. The problem of double counting or disk reading has a bad impact on Spark performance.

technical problem

The main purpose of the present invention is to provide a Spark distributed computing data processing method and system, which aims to solve the technical problem of repeated Spark task calculation or disk reading in the Spark unified memory management model in the prior art.

Technical solution

In order to achieve the above object, a first aspect of the present invention provides a Spark A distributed computing system data processing method, the method comprising:

When performing a storage task on the elastic distributed dataset RDD partition data that the user has identified the cache, if you are going to Spark If the memory storage area fails to apply, the eviction logic unit sends a command to evict the cached data by expelling the memory storage area;

Calculating a size of the eviction space in the memory storage area, and if the size of the space after the eviction meets the requirement of the memory storage area of the storage task, the data access heat setting according to the eviction cache of the memory storage area is based on Migration address of the hybrid storage system of SSD and HDD;

Reading and releasing the eviction cache data in the memory storage area, migrating the memory storage area to evict the cache data to the migration address, modifying the eviction cache data persistence level in the memory storage area, and feedback eviction success Signal and expulsion information.

In order to achieve the above object, the second aspect of the present invention further provides a Spark A distributed computing data processing system, the system comprising:

Applying a storage module for performing a storage task on the elastic distributed data set RDD partition data that the user has identified the cache If the Spark memory storage area fails to apply, the eviction logic unit sends a command to evict the cache memory of the memory storage area;

Calculating a location module, configured to calculate a size of the eviction space in the memory storage area, and if the space size after the eviction meets the requirement of the storage task space by the storage task, the cache data may be eviction according to the memory storage area Access popularity settings are based on Migration address of the hybrid storage system of SSD and HDD;

a data migration module, configured to read and release the eviction cache data in the memory storage area, and migrate the memory storage area to evict the cached data to the migration address, and modify the eviction cache data in the memory storage area to be persistent Level, feedback eviction success signal and eviction information.

Beneficial effect

By introducing SSD and HDD to build a hybrid storage system, and designing the eviction logic unit and the cache data migration unit, the partition data can be flexibly migrated to the SSD or HDD according to the heat, instead of directly migrating the buffered intermediate data to the disk or kicking out The cached data can effectively alleviate the pressure of Spark partition data cache on the huge storage space and insufficient memory space. At the same time, when the partition data is called, the high-speed read and write performance of the hybrid storage system and the heat according to the partition data are separated. The storage feature can quickly read the partition data of different access heats stored in the hybrid storage system to improve the performance of Spark.

DRAWINGS

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the description of the prior art will be briefly described below. Obviously, the drawings in the following description are only It is a certain embodiment of the present invention, and those skilled in the art can obtain other drawings according to these drawings without any creative work.

1 is a schematic flowchart of a Spark distributed computing data processing method according to an embodiment of the present invention;

2 is a schematic flowchart of a refinement step of step 101 of a Spark distributed computing data processing method according to an embodiment of the present invention;

FIG. 3 is a schematic flowchart of a refinement step of step 102 of a Spark distributed computing data processing method according to an embodiment of the present invention;

4 is a schematic flowchart of a refinement step in step 304 of a Spark distributed computing data processing method according to an embodiment of the present invention;

FIG. 5 is a schematic flowchart of a step of refining data in step 103 of a Spark distributed computing data processing method according to an embodiment of the present invention;

FIG. 6 is a schematic flowchart of a step of refining a data persistence level step in step 103 of a Spark distributed computing data processing method according to an embodiment of the present invention;

FIG. 7 is a schematic diagram of functional modules of a Spark distributed computing data processing system according to an embodiment of the present invention;

8 is a schematic diagram of a refinement function module of an application storage module 601 of a Spark distributed computing data processing system according to an embodiment of the present invention;

FIG. 9 is a schematic diagram of a refinement function module of the application storage module 602 of the Spark distributed computing data processing system according to an embodiment of the present invention;

FIG. 10 is a schematic diagram of a refinement function module of the application storage module 603 of the Spark distributed computing data processing system according to an embodiment of the present invention.

Embodiments of the invention

The technical solutions in the embodiments of the present invention will be clearly and completely described in conjunction with the drawings in the embodiments of the present invention. The embodiments are merely a part of the embodiments of the invention, and not all of the embodiments. All other embodiments obtained by a person skilled in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.

Referring to FIG. 1, FIG. 1 is a schematic flowchart of a Spark distributed computing data processing method according to an embodiment of the present invention, where the processing method includes:

S101. When performing a storage task on the RDD partition data of the encrypted distributed data set that the user has identified, if the space request for the Spark memory storage area fails, the command to evict the memory storage area cache data is sent to the eviction logic unit.

S102. Calculate the size of the eviction space in the memory storage area. If the space size after the eviction meets the storage task space requirement, the migration of the SSD and HDD based hybrid storage system may be set according to the memory storage area eviction cache data access heat. address.

S103. Read and release the memory storage area to evict the cached data, migrate the memory storage area to evict the cached data to the migration address, modify the memory storage area to evict the cache data persistence level, and feed back the eviction success signal and the eviction information.

In the embodiment of the present invention, a hybrid storage system is constructed by introducing an SSD and an HDD, and the eviction logic unit and the cache data migration unit are designed, and the partition data is flexibly migrated to the SSD or the HDD according to the heat, instead of directly buffering the intermediate data. Migrating to disk or kicking out cached data can effectively alleviate the pressure of Spark partition data cache on the huge storage space and insufficient memory space. At the same time, when calling partition data, due to the high-speed read and write performance of the hybrid storage system. And separate according to the heat of the partition data The storage feature can quickly read the partition data of different access heats stored in the hybrid storage system to improve the performance of Spark.

Referring to FIG. 2, FIG. 2 is a schematic flowchart of a refinement step of a Spark distributed computing data processing method S101 according to an embodiment of the present invention, where the refinement step includes:

S201. Calculate the size of the memory storage area occupied by the storage task for the RDD partition data, apply for space to the memory storage area of the Spark, and compare the size of the memory storage space occupied by the storage task with the unoccupied space of the memory storage area;

Specifically, the Spark execution engine performs the scheduling of the subtask through the task scheduler, and performs a storage task on the RDD partition data that the user has identified and cached in the subtask runtime space, and then attempts to apply for the space space to the Spark memory storage area. If the application is successful, the RDD partition data is directly stored.

S202. If the size of the memory storage area occupied by the storage task is larger than the unoccupied space of the memory storage area, requesting space from the Spark memory storage area fails, and sending the eviction memory storage area to evict the cached data command and sending the data to the eviction logic unit. The storage task needs to occupy the size of the memory storage space.

Referring to FIG. 3, FIG. 3 is a schematic flowchart of a refinement step of a Spark distributed computing data processing method S102 according to an embodiment of the present invention, where the refinement step includes:

S301. The eviction logic unit receives the eviction command, and the eviction logic unit sends an application for expelling the memory storage space to the memory storage area by requiring insufficient storage space for performing the storage task due to the RDD partition data.

Further, after receiving the application sent by the eviction logic unit, the memory storage area determines whether the memory storage area has an expellable space and feeds back to the eviction logic unit.

S302. If the application is successful, calculate the size of the expellable space in the memory storage area according to the least-time algorithm LRU strategy;

Among them, the least-used algorithm LRU strategy, that is, the algorithm performs the phase-out data according to the historical access heat record of the memory storage area data, and the core idea is that if the data is recently accessed, the probability of being accessed in the future is also higher, according to The probability of access determines the size of the eviction space in the memory storage area.

S303. If the size of the eviction space in the memory storage area is greater than or equal to the RDD partition data, the storage space needs to be occupied.

S304. Set a migration address of the hybrid storage system based on the SSD and the HDD according to the access heat of the cache storage data in the memory storage area, and send the memory storage area eviction cache data migration information and the memory storage area eviction cache data migration command to the cache. Data migration unit.

S305. If the size of the eviction space in the memory storage area is smaller than the RDD partition data, the storage space needs to occupy a space.

S306. Terminating the memory storage area may evict the cache data migration task, and feedback the eviction memory storage area to evict the cache data failure signal.

Referring to FIG. 4, FIG. 4 is a schematic flowchart of a refinement step in a Spark distributed computing data processing method S304 according to an embodiment of the present invention, where the refinement step includes:

S3041: Determine that the memory storage area can evict the cache data access heat.

S3042: If the memory storage area eviction cache data access heat is within the first preset heat value range, the SSD address is read and the read SSD address is set as the migration address;

The first preset heat value range is that the memory storage area can be eviction cache data access heat is high, and the specific access heat range can be freely set by the user;

In particular, the first preset heat value is greater than the second preset heat value.

S3043. If the memory storage area eviction cache data access heat is within the second preset heat value range, the HDD address is read and the read HDD address is set as the migration address.

The second preset heat value range is that the memory storage area can be eviction cache data access heat is low, and the specific access heat range can be freely set by the user.

Referring to FIG. 5, FIG. 5 is a schematic flowchart of a step of refining data in a Spark distributed computing data processing method S103 according to an embodiment of the present invention. The refinement step includes:

S401. The cache data migration unit receives the memory storage area to evict the cache data migration information and the memory storage area may evict the cache data migration command, and store the eviction data of the memory storage area according to the migration information to the SSD or the HDD;

Further, after the cache data migration unit receives the memory storage area to evict the cache data migration information and the memory storage area can evict the cache data migration command, the cached data in the specified memory storage area is first read and the corresponding memory space is released, and then Cache the cached data in the memory storage area to the SSD or HDD according to the migration address;

The memory storage area can evict data migration information, including: the memory storage area can evict the cache data address, the memory storage area can evict the cache data space size, and the migration address.

S402. Sending a memory storage area to the eviction logic unit may evict the cache data migration completion signal.

Referring to FIG. 6, FIG. 6 is a schematic flowchart of a step of refining a data persistence level step in a Spark distributed computing data processing method S103 according to an embodiment of the present invention. The refinement step includes:

S501. Determine a category in which the memory storage area can evict the cached data migration address.

S502. If the migration address of the cache storage data in the memory storage area is SSD, the persistent storage level of the cache memory data in the modified memory storage area is SSD_ONLY.

S503. If the migration address of the cache storage data in the memory storage area is HDD, modifying the memory storage area to evict the cache data by a persistent level of HDD_ONLY.

S504, the modification is completed, the feedback memory storage area can evict the cache data eviction success signal, and the memory storage area can evict the data migration information, so that the RDD partition data enters the memory storage area to complete the storage task.

Referring to FIG. 7, FIG. 7 is a schematic diagram of functional modules of a Spark distributed computing data processing system according to an embodiment of the present invention. The functional module includes:

The application storage module 601 is configured to send the eviction memory storage area cache data to the eviction logic unit if the storage space of the Spark memory storage area fails when the storage task is performed on the flexible distributed data set RDD partition data that the user has identified. The command;

The calculation address module 602 is configured to calculate the size of the eviction space in the memory storage area. If the space size after the eviction meets the requirements of the storage task space for the memory storage area, the data storage area may be evicted according to the memory storage area, and the SSD and HDD are set based on the SSD and the HDD. Migration address of the hybrid storage system;

The data migration module 603 is configured to read and release the eviction cache data in the memory storage area, migrate the cache storage data to the migration address in the memory storage area, modify the memory storage area to evict the cache data persistence level, and feedback the eviction success signal. And eviction information.

Referring to FIG. 8, FIG. 8 is a schematic diagram of a refinement function module of a storage module 601 of a Spark distributed computing data processing system according to an embodiment of the present disclosure, where the refinement function module includes:

The first application module 6011 is configured to calculate a size of a memory storage space occupied by performing a storage task on the RDD partition data, apply for a space to the Spark memory storage area, and compare with an unoccupied space of the memory storage area;

The first feedback module 6012 is configured to: if the size of the memory storage area occupied by the storage task is larger than the unoccupied space of the memory storage area, requesting space from the Spark memory storage area fails, and sending the eviction memory storage area to the eviction logic unit to evict the cache The command of the data and the size of the memory storage space are required to send the storage task.

Referring to FIG. 9, FIG. 9 is a schematic diagram of a refinement function module of a storage module 602 of a Spark distributed computing data processing system according to an embodiment of the present disclosure, where the refinement function module includes:

The second application module 6021 is configured to: the eviction logic unit receives the eviction command, and the eviction logic unit sends an application to the memory storage area that requires insufficient storage space for performing the storage task due to the RDD partition data, and if the application is successful, press Recently, the LRU strategy is used to calculate the size of the expellable space in the memory storage area;

The migration address module 6022 is configured to set the size of the unoccupied space of the memory storage area after the eviction is greater than or equal to the size of the RDD partition data to perform the storage task, and set the hybrid storage system based on the SSD and the HDD according to the eviction cache data access heat of the memory storage area. The migration address, and the memory storage area eviction cache data migration information and the memory storage area eviction cache data migration command are sent to the cache data migration unit;

The second feedback module 6023 is configured to: if the unoccupied space of the memory storage area after the eviction is smaller than the size of the RDD partition data to perform the storage task, terminate the memory storage area to evict the cache data migration task, and feedback the eviction memory storage area to evict Cache data failure signal;

The SSD migration address module 6024 is configured to: if the memory storage area eviction cache data access heat is within a first preset heat value range, read the SSD address and set the read SSD address as a migration address;

The HDD migration address module 6025 is configured to read the HDD address and set the read HDD address as a migration address if the memory storage area eviction cache data access heat is within the second preset heat value range.

Referring to FIG. 10, FIG. 10 is a schematic diagram of a refinement function module of a storage module 603 of a Spark distributed computing data processing system according to an embodiment of the present invention. The refinement function module includes:

The third feedback module 6031 is configured to send, to the eviction logic unit, a memory storage area eviction cache data migration completion signal;

The SSD persistence level module 6032 is configured to: if the memory storage area can evict the cached data, the migration address is SSD, and modify the memory storage area to evict the cached data to have a persistence level of SSD_ONLY;

The HDD persistence level module 6033 is configured to: if the memory storage area can evict the cached data, the migration address is HDD, and the modified memory storage area can evict the cached data by a persistent level of HDD_ONLY;

The fourth feedback module 6034 is configured to feedback the memory storage area to evict the cache data eviction success signal and the memory storage area to evict the data migration information, so that the RDD partition data enters the memory storage area to complete the storage task.

In the several embodiments provided by the present application, it should be understood that the disclosed methods and systems may be implemented in other manners. For example, the system embodiments described above are merely illustrative. For example, the division of modules is only a logical function division. In actual implementation, there may be another division manner. For example, multiple modules or components may be combined or integrated. Go to another system, or some features can be ignored or not executed. In addition, the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or module, and may be electrical, mechanical or otherwise.

The modules described as separate components may or may not be physically separate. The components displayed as modules may or may not be physical modules, that is, may be located in one place, or may be distributed to multiple network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, each functional module in each embodiment of the present invention may be integrated into one processing module, or each module may exist physically separately, or two or more modules may be integrated into one module. The above integrated modules can be implemented in the form of hardware or in the form of software functional modules.

An integrated module, if implemented as a software functional module and sold or used as a standalone product, can be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention, which is essential or contributes to the prior art, or all or part of the technical solution, may be embodied in the form of a software product stored in a storage medium. A number of instructions are included to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the various embodiments of the present invention. The foregoing storage medium includes: a U disk, a mobile hard disk, a read only memory (ROM, Read-Only) Memory, random access memory (RAM), disk or optical disk, and other media that can store program code.

It should be noted that, for the foregoing method embodiments, for the sake of brevity, they are all described as a series of action combinations, but those skilled in the art should understand that the present invention is not limited by the described action sequence. Because certain steps may be performed in other sequences or concurrently in accordance with the present invention. In the following, those skilled in the art should also understand that the embodiments described in the specification are all preferred embodiments, and the actions and modules involved are not necessarily required by the present invention.

In the above embodiments, the descriptions of the various embodiments are all focused, and the parts that are not detailed in a certain embodiment can be referred to the related descriptions of other embodiments.

The foregoing is a description of a Spark distributed computing data processing method and system provided by the present invention. For those skilled in the art, according to the idea of the embodiment of the present invention, there are changes in specific implementation manners and application scopes. In conclusion, the contents of the specification are not to be construed as limiting the invention.

Claims

A Spark distributed computing data processing method, the method comprising:

Resilient Distributed Data Set (RDD, Resilient Distributed) on the user's identified cache Datasets) When the partition data performs a storage task, if the space request to the Spark memory storage area fails, the eviction logic unit sends a command to evict the cache data from the memory storage area;

Calculating a size of the eviction space in the memory storage area, and if the space size after the eviction meets the storage task space requirement of the storage task, setting the cache data access heat according to the memory storage area to be based on the solid state hard disk ( SSD, Solid Migration address of the hybrid storage system of State Drives and HDD (Hard Disk Drive);

Reading and releasing the eviction cache data in the memory storage area, migrating the memory storage area to evict the cache data to the migration address, modifying the eviction cache data persistence level in the memory storage area, and feedback eviction success Signal and expulsion information.
The method of claim 1, wherein the request to evict the buffered data by expelling the memory storage area from the eviction logic unit if the application space fails to the Spark memory storage area comprises:

Calculating a size of the memory storage area occupied by the storage task for the RDD partition data, applying for a space to the memory storage area of the Spark, and occupying the size of the memory storage area occupied by the storage task If the size of the memory storage area occupied by the storage task is larger than the unoccupied space of the memory storage area, the space for applying to the Spark memory storage area fails, and The eviction logic unit sends a command to evict the cache memory data by eviction of the memory storage area and a size of the memory storage area space required to send the storage task.
The method according to claim 1, wherein the calculating the size of the eviction space in the memory storage area, and if the size of the eviction space satisfies the requirement of the storage task for the memory storage area space, The memory storage area can be used to evict the cache data access heat setting. The migration address of the hybrid storage system based on SSD and HDD specifically includes:

The eviction logic unit receives the eviction command, and the eviction logic unit sends an application to the memory storage area that the storage space required for performing the storage task is insufficient due to the RDD partition data, and if the application is successful, Calculating the size of the expellable space in the memory storage area according to the least recently used algorithm LRU policy;

If the size of the eviction space in the memory storage area is greater than or equal to the size of the RDD partition data to perform the storage task, the migration of the hybrid storage system based on the SSD and the HDD is set according to the access heat of the memory storage area to evict the cached data. An address, and the memory storage area eviction cache data migration information and the memory storage area eviction cache data migration command are sent to the cache data migration unit;

If the size of the eviction space in the memory storage area is smaller than the size of the RDD partition data to perform the storage task, terminating the memory storage area may evict the cache data migration task, and feedback eviction of the memory storage area to evict the cache Data failure signal.
The method according to claim 3, wherein the setting the migration address of the hybrid storage system based on the SSD and the HDD according to the access heat of the cacheable data of the memory storage area comprises:

If the memory storage area eviction cache data access heat is within a first preset heat value range, reading the SSD address and setting the read SSD address to the migration address;

If the memory storage area eviction cache data access heat is within a second preset heat value range, reading the HDD address and setting the read HDD address to the migration address;

The first preset heat value is greater than the second preset heat value.
The method according to claim 1, wherein the reading and releasing the eviction cache data in the memory storage area, and migrating the eviction cache data to the migration address in the memory storage area comprises:

After the cache data migration unit receives the memory storage area eviction cache data migration information and the memory storage area eviction cache data migration command, the memory storage area eviction data is stored into the SSD or HDD according to the migration information. And sending the memory storage area to the eviction logic unit to evict the cache data migration completion signal;

The eviction data migration information includes: the memory storage area can evict the cache data address, the memory storage area can evict the cache data space size, and the migration address.
The method according to claim 1, wherein the modifying the memory storage area to evict the cache data persistence level, and the feedback eviction success signal and the eviction information specifically include:

If the migration address of the cache storage data of the memory storage area is an SSD, modifying the memory storage area to evict the cache data by a persistence level of SSD_ONLY;

If the migration address of the cache storage data of the memory storage area is HDD, modifying the memory storage area to evict the cache data by a persistent level of HDD_ONLY;

After the modification is completed, the memory storage area may be evoked to expel the cache data eviction success signal, and the memory storage area may evict the data migration information, so that the RDD partition data enters the memory storage area to complete the storage task.
A Spark distributed computing data processing system, characterized in that the system comprises:

Applying a storage module, when performing a storage task on the elastic distributed data set RDD partition data that has been identified by the user, if the space request for the Spark memory storage area fails, sending the eviction memory storage area to the eviction logic unit a command to evict cached data;

Calculating a location module, configured to calculate a size of the eviction space in the memory storage area, and if the space size after the eviction meets the requirement of the storage task space by the storage task, the cache data may be eviction according to the memory storage area Access hotness sets the migration address of the hybrid storage system based on SSD and HDD;

a data migration module, configured to read and release the eviction cache data in the memory storage area, and migrate the memory storage area to evict the cached data to the migration address, and modify the eviction cache data in the memory storage area to be persistent Level, feedback eviction success signal and eviction information.
The system of claim 7, wherein the application storage module comprises:

The first application module is configured to calculate a size of the memory storage area occupied by the storage task for the RDD partition data, apply for a space to the Spark memory storage area, and compare with the unoccupied space of the memory storage area. ;

The first feedback module is configured to: if the size of the memory storage area occupied by the storage task is larger than the unoccupied space of the memory storage area, requesting space from the Spark memory storage area fails, and sending the space to the eviction logic unit The eviction of the memory storage area may evict a command to cache data and a size of the memory storage area space required to transmit the storage task.
The system of claim 7 wherein said computing address module comprises;

a second application module, configured to receive the eviction command by the eviction logic unit, and send the eviction logic unit to the memory storage area due to the RDD The partitioned data requires insufficient storage space to perform the storage task. If the application is successful, the least-time algorithm is used. The policy calculates the size of the expellable space in the memory storage area;

The migration address module is configured to: if the unoccupied space of the memory storage area is greater than or equal to the RDD partition data, the storage task needs to be occupied after the expulsion The size of the space, according to the memory storage area, the eviction cache data access heat setting the migration address of the SSD- and HDD-based hybrid storage system, and migrating the memory storage area Information and the memory storage area eviction cache data migration command are sent to the cache data migration unit;

a second feedback module, configured to: if the unoccupied space of the memory storage area is smaller than the RDD partition data, the storage task needs to be occupied after the expulsion The size of the space terminates the memory storage area to evict the cache data migration task, and feedback eviction of the memory storage area to evict the cache data failure signal;

The SSD migration address module is configured to read the SSD if the memory storage area can evict the cache data access heat within the first preset heat value range Address and set the read SSD address to the migration address;

The HDD migration address module is configured to read the HDD if the memory storage area eviction cache data access heat is within a second preset heat value range Address and set the read HDD address to the migration address.
The system of claim 7, wherein the data migration module comprises:

a data migration module, after the cache data migration unit receives the memory storage area eviction cache data migration information and the memory storage area eviction cache data migration command, the memory storage area eviction data is migrated according to the Information is stored to the SSD or HDD;

a third feedback module, configured to send, to the eviction logic unit, the memory storage area eviction cache data migration completion signal;

The SSD persistence level module is configured to: if the migration address of the memory storage area to evict the cache data is an SSD, modify the memory storage area to evict the cache data to have a persistence level of SSD_ONLY;

The HDD persistence level module is configured to: if the memory storage area can evict the cached data, the migration address is HDD, and modify the memory storage area to evict the cache data to have a persistence level of HDD_ONLY;

a fourth feedback module, configured to feed back the memory storage area to evict the cache data eviction success signal, and the memory storage area may evict data migration information, so that the RDD partition data enters the memory storage area to complete the storage task.