CN107193494B - RDD (remote data description) persistence method based on SSD (solid State disk) and HDD (hard disk drive) hybrid storage system - Google Patents

RDD (remote data description) persistence method based on SSD (solid State disk) and HDD (hard disk drive) hybrid storage system Download PDF

Info

Publication number
CN107193494B
CN107193494B CN201710358093.9A CN201710358093A CN107193494B CN 107193494 B CN107193494 B CN 107193494B CN 201710358093 A CN201710358093 A CN 201710358093A CN 107193494 B CN107193494 B CN 107193494B
Authority
CN
China
Prior art keywords
rdd
data
module
preset
block manager
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710358093.9A
Other languages
Chinese (zh)
Other versions
CN107193494A (en
Inventor
陆克中
黄泽成
毛睿
廖好
朱金彬
隋秀峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Baode Network Security System Shenzhen Co ltd
Original Assignee
Shenzhen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen University filed Critical Shenzhen University
Priority to CN201710358093.9A priority Critical patent/CN107193494B/en
Publication of CN107193494A publication Critical patent/CN107193494A/en
Application granted granted Critical
Publication of CN107193494B publication Critical patent/CN107193494B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3034Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a storage system, e.g. DASD based or network based
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/32Monitoring with visual or acoustical indication of the functioning of the machine
    • G06F11/324Display of status information
    • G06F11/325Display of status information by lamps or LED's
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/0643Management of files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/068Hybrid storage device

Abstract

The invention provides an RDD (remote data description) persistence method based on a SSD (solid State disk) and HDD (hard disk drive) hybrid storage system, which comprises the following steps: the RDD module transmits the block identifier in the RDD module and the preset persistence level of the data in the RDD module to the block manager; the disk block manager transmits the preset persistence level to a device adapter; the equipment adapter receives a preset persistence level of data and reads two directory management variables in a configuration file, matches the preset persistence level with a temporary file directory in a corresponding directory management variable according to the preset persistence level of the data, and returns the temporary file directory obtained by matching to the disk block manager; the disk block manager obtains a file name according to the block identifier, obtains a data storage address according to the temporary file directory and the file name obtained by matching, and returns the data storage address to the block manager; and the block manager stores the data in the RDD module in the SSD or the HDD according to the data storage address.

Description

RDD (remote data description) persistence method based on SSD (solid State disk) and HDD (hard disk drive) hybrid storage system
Technical Field
The invention relates to the technical field of data processing, in particular to an RDD (remote data description) persistence method based on a SSD (solid state drive) and HDD (hard disk drive) hybrid storage system.
Background
In the existing big data era, in the face of massive data, how to manage, analyze and extract valuable information in an effective time becomes a problem which people need to solve urgently. However, big data, whether it be of scale, variety, or structure, presents a significant challenge to people's ability to host data.
Spark is a big data computing framework which is currently efficient and widely used in the industry, and is a universal and fast large-scale data processing engine. Firstly, Spark provides a uniform solution, and can be used for complex tasks such as interactive query, real-time stream processing, machine learning and the like; secondly, the Spark divides phases and tasks through an elastic distributed data set (RDD), optimizes the execution sequence of subtasks through a high-efficiency Directed Acyclic Graph (DAG) execution engine, and greatly improves the data processing efficiency through memory-based calculation; thirdly, Spark data management depends on multiple data sources such as HDFS and Hive, Spark in a cluster mode realizes horizontal expansion, and large-scale data processing is supported. RDD is the most important concept of Spark to distinguish from other big data computing frameworks, which is a read-only distributed data set with a highly fault-tolerant mechanism. In the Spark application, each RDD is divided into a plurality of partitions, and Spark performs various operations on the RDD in units of partitions. And the data of the persistent (Persist) RDD partition is cached in a memory or a hard disk, so that the intermediate result of the calculation task can be directly read by the subsequent iteration task, the repeated calculation is avoided, and the data processing efficiency is greatly improved. In addition, the data is durably transmitted to the hard disk, the limitation of insufficient memory capacity on the size of the data set is broken, and spare processing of large data by Spark is enabled.
However, at present, the initial RDD dataset is divided according to a random proportion, and the persistence framework provided by Spark persists data to different storage media according to the proportion, so that persistence on demand cannot be realized.
Disclosure of Invention
The invention aims to solve the technical problem that on-demand persistence cannot be realized in the prior art, and provides an RDD persistence method based on an SSD and HDD hybrid storage system, which can not realize on-demand persistence.
The embodiment of the invention provides an RDD (remote data description) persistence method based on a SSD (solid State disk) and HDD (hard disk drive) hybrid storage system, which comprises the following steps:
the RDD module transmits the block identifier in the RDD module and the preset persistence level of the data in the RDD module to the block manager;
the block manager transmits the block identifier and a preset persistence level to a disk block manager;
the disk block manager transmits the preset persistence level to a device adapter;
the equipment adapter receives a preset persistence level of data and reads two directory management variables in a configuration file, matches the preset persistence level with a temporary file directory in a corresponding directory management variable according to the preset persistence level of the data, and returns the temporary file directory obtained by matching to the disk block manager;
the disk block manager obtains a file name according to the block identifier, obtains a data storage address according to the temporary file directory and the file name obtained by matching, and returns the data storage address to the block manager;
and the block manager stores the data in the RDD module in the SSD or the HDD according to the data storage address.
The invention also provides a computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method.
Compared with the prior art, the technical scheme of the invention has the beneficial effects that: and storing the data in the RDD module in the SSD or the HDD according to the preset persistence level by using the data storage address so as to realize the on-demand persistence of the Spark application program.
Drawings
FIG. 1 is a block diagram of one embodiment of a distributed computing system according to the present invention.
FIG. 2 is a flow chart of one embodiment of a data processing method of the distributed computing system of the present invention.
FIG. 3 is a flow chart of one embodiment of the RDD persistence method based on the SSD and HDD hybrid storage system of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.
Specifically, the emergence of a Solid-State Drive (SSD) brings a new opportunity for improving the performance of a storage system, and the SSD has the advantages of low power consumption, low latency, small size, and the like. Unlike the conventional Hard disk drive (Hard disk drive for short) which addresses by moving a robot arm, the SSD is completely built on a semiconductor chip, and thus has a random access performance. However, due to the disadvantages of high cost and limited life span of the SSD, the complete replacement of the HDD with the SSD will significantly increase the cost of the industry. In order to make reasonable use of the advantages of high performance of SSDs and low price of HDDs, heterogeneous data centers based on hybrid storage of SSDs and HDDs are widely researched and applied.
As shown in fig. 1, the distributed computing system according to an embodiment of the present invention includes a spare platform module 1 and a hybrid storage module 2, where the hybrid storage module 2 includes an SSD unit 21 and an HDD unit 22, and the spare platform module 1 is connected to the SSD unit 21 and the HDD unit 22 respectively;
the Spark platform module 1 uses a big data processing frame Spark as a calculation engine, and sends the processed data to the SSD unit 21 or the HDD unit 22 for storage, and the Spark platform module 1 is further configured to receive a query instruction, and fetch and output data corresponding to the query instruction from the SSD unit 21 or the HDD unit 22.
The Spark platform module is respectively connected with the SSD unit and the HDD unit, so that the processed data is sent to the SSD unit or the HDD unit for storage, and accurate mapping and storage of the data can be realized.
In a specific implementation, the Spark platform module 1 includes a first API (application programming interface) corresponding to the SSD unit 21 and a second API corresponding to the HDD unit, the Spark platform module 1 is connected to the SSD unit 21 through the first API, and the Spark platform module 1 is connected to the HDD unit 22 through the second API, so as to perform data transmission. The Spark platform module 1 may expose the structural features of the hybrid storage system to the user through the first API and the second API. The selection of the storage medium is realized by calling the first API or the second API interface, that is, the selection of the storage in the SSD unit 21 or the HDD unit 22 is realized by calling the first API or the second API interface.
In a specific implementation, the SSD unit 21 and the HDD unit 22 are persistent storage units in the same layer. The processed data specifically includes RDD partition data. The Spark platform module is further used for persisting the RDD partition data into the SSD unit or the HDD unit according to a preset partition proportion value.
In a specific implementation, the spare platform module 1 is further configured to persist the RDD partition data into the SSD unit or the HDD unit according to a hot degree of the RDD partition data. The I/O bandwidth and reduced access latency due to SSD can be effectively increased. HDDs still provide substantial storage efficiency for data that requires less storage performance. After an additional large amount of data is collected and captured by the data center, it is not often accessed, called cold data, which accounts for about 90% of global data. While the remaining 10% of the data is collected and captured and is accessed frequently, referred to as hot data. Clearly, it is not reasonable to store all of the data on a high performance, low latency storage device, and the cost is prohibitively expensive. Therefore, according to the heat of the RDD partition data, the SSD unit 21 and the HDD unit 22 are combined in a reasonable manner, performance can be greatly improved by constructing a hybrid storage system, and cost controllability is ensured.
In specific implementation, the distributed computing system further includes a capacity monitoring module connected to the hybrid storage module, where the capacity monitoring module is configured to monitor the remaining capacity of the hybrid storage module and output an alarm signal when the remaining capacity is smaller than a preset threshold. That is to say, the distributed computing system may further include a capacity monitoring module connected to the hybrid storage module 2, where the capacity monitoring module is configured to monitor the remaining capacity of the hybrid storage module 2, and output alarm information when the remaining capacity is smaller than a preset threshold. The specific value of the preset threshold can be determined according to the capacity of the hybrid storage module 2, and the output alarm information can be the control of the loudspeaker to sound or the control of the alarm lamp to flash and the like. When the residual capacity of the hybrid storage module 2 is too low, an alarm is given to remind a worker to transfer the stored data or replace a storage hard disk and the like in time so as to improve the reliability of data storage.
The present invention also provides a data processing method of a distributed computing system according to an embodiment, as shown in fig. 2, the data processing method includes the following steps:
step S21, the Spark platform module sends the processed data to the SSD unit or the HDD unit for storage by using a big data processing frame Spark as a calculation engine;
step S22, the Spark platform module receives the query instruction, and acquires data corresponding to the query instruction from the SSD unit or the HDD unit and outputs the data.
The Spark platform module is respectively connected with the SSD unit and the HDD unit, so that the processed data is sent to the SSD unit or the HDD unit for storage, and accurate mapping and storage of the data can be realized.
In specific implementation, the data processing method further includes the following steps of monitoring the remaining capacity of the hybrid storage module through a capacity monitoring module, and outputting alarm information when the remaining capacity is smaller than a preset threshold value. The specific value of the preset threshold can be determined according to the capacity of the hybrid storage module 2, and the output alarm information can be the control of the loudspeaker to sound or the control of the alarm lamp to flash and the like. When the residual capacity of the hybrid storage module 2 is too low, an alarm is given to remind a worker to transfer the stored data or replace a storage hard disk and the like in time so as to improve the reliability of data storage.
In a specific implementation, the Spark platform module 1 includes a first API (application programming interface) corresponding to the SSD unit 21 and a second API corresponding to the HDD unit, the Spark platform module 1 is connected to the SSD unit 21 through the first API, and the Spark platform module 1 is connected to the HDD unit 22 through the second API, so as to perform data transmission. The Spark platform module 1 may expose the structural features of the hybrid storage system to the user through the first API and the second API. The selection of the storage medium is realized by calling the first API or the second API interface, that is, the selection of the storage in the SSD unit 21 or the HDD unit 22 is realized by calling the first API or the second API interface.
In a specific implementation, the SSD unit 21 and the HDD unit 22 are persistent storage units in the same layer. The processed data specifically includes RDD partition data. The Spark platform module is further used for persisting the RDD partition data into the SSD unit or the HDD unit according to a preset partition proportion value.
In a specific implementation, the spare platform module 1 is further configured to persist the RDD partition data into the SSD unit or the HDD unit according to a hot degree of the RDD partition data. The I/O bandwidth and reduced access latency due to SSD can be effectively increased. HDDs still provide substantial storage efficiency for data that requires less storage performance. After an additional large amount of data is collected and captured by the data center, it is not often accessed, called cold data, which accounts for about 90% of global data. While the remaining 10% of the data is collected and captured and is accessed frequently, referred to as hot data. Clearly, it is not reasonable to store all of the data on a high performance, low latency storage device, and the cost is prohibitively expensive. Therefore, according to the heat of the RDD partition data, the SSD unit 21 and the HDD unit 22 are combined in a reasonable manner, performance can be greatly improved by constructing a hybrid storage system, and cost controllability is ensured.
In specific implementation, the RDD partition data is persisted by calling RDD. The operation of persisting the RDD is started by an RDD initiator method, and the content shown in fig. 3 is a persistence flow of RDD data. In addition, to persist RDD partition data, two conditions need to be met: the method comprises the steps of partitioning data and addresses, wherein the partitioning data are stored in an RDD module, the addresses need to be obtained through calculation, the addresses are paths/file names, the paths are stored in a configuration file, the paths need to be obtained according to a preset persistence level mapping configuration file of the partitioning data, and the file names need to be generated according to block identifiers.
The invention provides an embodiment of an RDD persistence method based on a SSD and HDD hybrid storage system, which is based on an optimized Spark framework to realize the persistence of RDD partition data, and comprises the following steps:
the RDD module transmits the block identifier in the RDD module and the preset persistence level of the data in the RDD module to the block manager;
the block manager transmits the block identifier and a preset persistence level to a disk block manager;
the disk block manager transmits the preset persistence level to a device adapter;
the equipment adapter receives a preset persistence level of data and reads two directory management variables in a configuration file, matches the preset persistence level with a temporary file directory in a corresponding directory management variable according to the preset persistence level of the data, and returns the temporary file directory obtained by matching to the disk block manager;
the disk block manager obtains a file name according to the block identifier, obtains a data storage address according to the temporary file directory and the file name obtained by matching, and returns the data storage address to the block manager;
and the block manager stores the data in the RDD module in the SSD or the HDD according to the data storage address.
The data storage address stores the data in the RDD module in the SSD or the HDD according to the preset persistence level, so that the on-demand persistence of the Spark application program is realized. That is, when the preset persistence level is SSD _ ONLY, the data in the RDD module is stored in the SSD, and when the preset persistence level is HDD _ ONLY, the data in the RDD module is stored in the HDD.
Specifically, as shown in fig. 3, the steps of the persistence method are as follows:
step 1, the RDD module calls a doputiterer method of a block manager Blockmanager through an iterer method to transmit a block identifier blockId in the RDD module and a preset persistence level of data in the RDD module to the block manager Blockmanager;
step 2, the doPutIterator method of the block manager BlockManager calls the getFile method of the disk block manager, and transmits the block identification blockId in the RDD module and the preset persistence level of the data in the RDD module to the DiskBlockManager;
step 3, the getFile method of the disk block manager DiskBlockManager calls a getACCURateDir method of the device adapter to transfer the preset persistence level to the device adapter;
step 4, the device adapter DeviceAdapter reads two directory management variables in the configuration file, specifically, the two directory management variables include an SSD directory management variable and an HDD directory management variable;
step 5, the device adapter DeviceAdapter matches the preset persistence level with the temporary file directory in the corresponding directory management variable according to the preset persistence level of the data, that is, the device adapter DeviceAdapter can obtain the preset persistence level from the upper layer, can obtain the configuration file such as the SSD directory management variable and the HDD directory management variable from the lower layer, and can complete the preset persistence level and the temporary file directory, that is, the getAccurateDir method reads the configuration file, wherein the configuration file includes two variables of the SSD directory management variable and the HDD directory management variable, and then matches the two variables according to the received preset persistence level. If the preset persistence level is SSD _ ONLY, matching an SSD directory management variable; if the preset persistence level is HDD _ ONLY, matching HDD directory management variables, obtaining a specific storage address of RDD data persistence at the moment, and then returning the address to the disk block manager DiskBlockManager;
step 6, returning the temporary file directory obtained by matching to the disk block manager DiskBlockManager, that is, the temporary file directory obtained by matching contains a specific storage address, and then returning the address to the disk block manager DiskBlockManager;
step 7, the disk block manager DiskBlockManager obtains a fileName according to the block identification blockId, and obtains a data storage address according to the temporary file directory obtained by matching and the fileName, that is, the specific address + fileName is a complete address, that is, a data storage address, where RDD _ and Index are digital indexes, and are sequentially incremented, and the data storage address is a directory/fileName, and the temporary file directory is a storage path;
step 8, the disk block manager DiskBlockManager returns the data storage address to the block manager BlockManager;
and 9, after the block manager BlockManager obtains the data storage address of the RDD, calling a writeFunc method of the DiskStore block storage module to finish the data storage task.
In a specific implementation, the RDD persistence method further comprises the steps of;
judging whether the heat degree of the data in the RDD module is greater than a first preset value or not;
if so, the preset persistence level of the data in the RDD module is SSD _ ONLY;
and if not, the preset persistence level of the data in the RDD module is HDD _ ONLY.
That is, according to the heat of the data in the RDD partition, the preset persistence level of the data is set to realize the combination of the SSD unit 21 and the HDD unit 22 in a reasonable manner, and the performance can be greatly improved by constructing the hybrid storage system, while ensuring the controllability of the cost.
That is, by means of the optimized Spark persistence framework, on-demand persistence of Spark data is achieved. Furthermore, the user can call an SSD persistence-oriented API provided by the optimized Spark framework to persist the partition data of the high-heat RDD into the SSD, so that the Spark performance is effectively improved.
The present invention also provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method of fig. 3 described above.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims (10)

1. A RDD persistence method based on a SSD and HDD hybrid storage system is characterized in that: the method comprises the following steps:
the RDD module transmits a block identifier in the RDD module and a preset persistence level of data in the RDD module to the block manager, wherein the preset persistence level is SSD _ ONLY or HDD _ ONLY;
the block manager transmits the block identifier and a preset persistence level to a disk block manager;
the disk block manager transmits the preset persistence level to a device adapter;
the device adapter receives a preset persistence level of data and reads two directory management variables in a configuration file, wherein the two directory management variables comprise an SSD directory management variable and an HDD directory management variable, matches the preset persistence level with a temporary file directory in a corresponding directory management variable according to the preset persistence level of the data, and returns the temporary file directory obtained by matching to the disk block manager;
the disk block manager obtains a file name according to the block identifier, obtains a data storage address according to the temporary file directory and the file name obtained by matching, and returns the data storage address to the block manager;
and the block manager stores the data in the RDD module in the SSD or the HDD according to the data storage address.
2. The RDD persistence method of claim 1, wherein: the method comprises the following steps that the RDD module transmits a block identifier in the RDD module and a preset persistence level of data in the RDD module to a block manager, and specifically comprises the following steps:
and the RDD module calls a doputiterer method of the block manager through the iterer method to transmit the block identifier in the RDD module and the preset persistence level of the data in the RDD module to the block manager.
3. The RDD persistence method of claim 1, wherein: the step of the block manager transmitting the block identifier and the preset persistence level to the disk block manager specifically comprises:
and the block manager calls a getFile method of the disk block manager, and transmits the block identifier in the RDD module and the preset persistence level of the data in the RDD module to the disk block manager.
4. The RDD persistence method of claim 1, wherein: the step that the disk block manager obtains a file name according to the block identifier and transmits the preset persistence level to the equipment adapter specifically comprises the following steps:
the disk block manager obtains a file name according to the block identifier by a getFile method;
the disk block manager calls a getaccuredir method of the device adapter to pass the preset persistence level to the device adapter.
5. The RDD persistence method of claim 1, wherein: the method comprises the steps that the equipment adapter receives a preset persistence level of data and reads two directory management variables in a configuration file, the preset persistence level is matched with a temporary file directory in a corresponding directory management variable according to the preset persistence level of the data, and the temporary file directory obtained through matching is returned to the disk block manager, and specifically comprises the following steps:
the equipment adapter matches a preset persistence level with a temporary file directory in a corresponding directory management variable through a getACCURATeDir method according to the preset persistence level of the data;
and the device adapter returns the temporary file directory obtained by matching to the disk block manager by a getACCURATeDir method.
6. The RDD persistence method of claim 5, wherein: the two directory management variables include an SSD directory management variable and an HDD directory management variable.
7. The RDD persistence method of claim 6, wherein: the method comprises the following steps that the equipment adapter matches a preset persistence level with a temporary file directory in a corresponding directory management variable according to the preset persistence level of data by a getACCURATeDir method, and specifically comprises the following steps:
when the preset persistence level of the data is SSD _ ONLY, mapping and matching the preset persistence level of the data with a temporary file directory in an SSD directory management variable;
and when the preset persistence level of the data is HDD _ ONLY, matching the preset persistence level of the execution data with the mapping of the temporary file directory in the HDD directory management variable.
8. The RDD persistence method of claim 1 wherein the step of the block manager storing the data in the RDD module in the SSD or HDD according to the data storage address comprises:
and after the block manager obtains the data storage address of the RDD, calling a writeFunc method of the block storage module to store the data in the RDD module in the SSD or the HDD.
9. The RDD persistence method of claim 1, wherein: the RDD persistence method further comprises the steps of;
judging whether the heat degree of the data in the RDD module is greater than a first preset value or not;
if so, the preset persistence level of the data in the RDD module is SSD _ ONLY;
and if not, the preset persistence level of the data in the RDD module is HDD _ ONLY.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 9.
CN201710358093.9A 2017-05-19 2017-05-19 RDD (remote data description) persistence method based on SSD (solid State disk) and HDD (hard disk drive) hybrid storage system Active CN107193494B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710358093.9A CN107193494B (en) 2017-05-19 2017-05-19 RDD (remote data description) persistence method based on SSD (solid State disk) and HDD (hard disk drive) hybrid storage system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710358093.9A CN107193494B (en) 2017-05-19 2017-05-19 RDD (remote data description) persistence method based on SSD (solid State disk) and HDD (hard disk drive) hybrid storage system

Publications (2)

Publication Number Publication Date
CN107193494A CN107193494A (en) 2017-09-22
CN107193494B true CN107193494B (en) 2020-05-12

Family

ID=59875380

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710358093.9A Active CN107193494B (en) 2017-05-19 2017-05-19 RDD (remote data description) persistence method based on SSD (solid State disk) and HDD (hard disk drive) hybrid storage system

Country Status (1)

Country Link
CN (1) CN107193494B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018209693A1 (en) * 2017-05-19 2018-11-22 深圳大学 Rdd persistence method based on ssd and hdd hybrid storage system
CN107590003B (en) * 2017-09-28 2020-10-23 深圳大学 Spark task allocation method and system
CN109375868B (en) * 2018-09-14 2022-07-08 深圳爱捷云科技有限公司 Data storage method, scheduling device, system, equipment and storage medium
CN112799597A (en) * 2021-02-08 2021-05-14 东北大学 Hierarchical storage fault-tolerant method for stream data processing
CN113590536B (en) * 2021-05-20 2023-12-29 济南浪潮数据技术有限公司 Data storage method, system, electronic equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104216988A (en) * 2014-09-04 2014-12-17 天津大学 SSD (Solid State Disk) and HDD(Hard Driver Disk)hybrid storage method for distributed big data
CN105893541A (en) * 2016-03-31 2016-08-24 中国科学院软件研究所 Streaming data self-adaption persistence method and system based on mixed storage

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104216988A (en) * 2014-09-04 2014-12-17 天津大学 SSD (Solid State Disk) and HDD(Hard Driver Disk)hybrid storage method for distributed big data
CN105893541A (en) * 2016-03-31 2016-08-24 中国科学院软件研究所 Streaming data self-adaption persistence method and system based on mixed storage

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
hStorag e-DB: Heterog eneity-aware Data Manag ement to;Luo Tian;《August 27th - 31st 2012, Istanbul, Turkey.》;20120831;第10卷(第5期);全文 *
Hybrid HBase: Leveraging Flash SSDs to Improve Cost per;Awasthi A;《The 18th International Conference on Management of Data (COMAD)》;20121216;全文 *

Also Published As

Publication number Publication date
CN107193494A (en) 2017-09-22

Similar Documents

Publication Publication Date Title
CN107193494B (en) RDD (remote data description) persistence method based on SSD (solid State disk) and HDD (hard disk drive) hybrid storage system
US10114749B2 (en) Cache memory system and method for accessing cache line
CN103106158B (en) Accumulator system including key-value storage
US9021189B2 (en) System and method for performing efficient processing of data stored in a storage node
US9092321B2 (en) System and method for performing efficient searches and queries in a storage node
US8332367B2 (en) Parallel data redundancy removal
CN112632069B (en) Hash table data storage management method, device, medium and electronic equipment
US9569381B2 (en) Scheduler for memory
CN104462225A (en) Data reading method, device and system
CN107179883B (en) Spark architecture optimization method of hybrid storage system based on SSD and HDD
US10146783B2 (en) Using file element accesses to select file elements in a file system to defragment
US20240004852A1 (en) Confidence-based database management systems and methods for use therewith
CN110781159B (en) Ceph directory file information reading method and device, server and storage medium
CN112346647A (en) Data storage method, device, equipment and medium
US11061676B2 (en) Scatter gather using key-value store
CN113031857B (en) Data writing method, device, server and storage medium
WO2014003707A2 (en) Hardware-based accelerator for managing copy-on-write
CN110162395B (en) Memory allocation method and device
CN112650577A (en) Memory management method and device
CN112804003A (en) Optical module communication-based storage method, system and terminal
US10061725B2 (en) Scanning memory for de-duplication using RDMA
CN112711564A (en) Merging processing method and related equipment
CN109002255B (en) Memory system and method of operating the same
US20190042443A1 (en) Data acquisition with zero copy persistent buffering
US11016666B2 (en) Memory system and operating method thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20220523

Address after: 518000 east of the fourth floor of plant 1 (Building 1) of Baode technology R & D and production base, gaoxinyuan, Guanlan street, Longhua new area, Shenzhen, Guangdong

Patentee after: Baode network security system (Shenzhen) Co.,Ltd.

Address before: 518000 No. 3688 Nanhai Road, Shenzhen, Guangdong, Nanshan District

Patentee before: SHENZHEN University