CN107179883A - Spark architecture optimization method of hybrid storage system based on SSD and HDD - Google Patents
Spark architecture optimization method of hybrid storage system based on SSD and HDD Download PDFInfo
- Publication number
- CN107179883A CN107179883A CN201710358537.9A CN201710358537A CN107179883A CN 107179883 A CN107179883 A CN 107179883A CN 201710358537 A CN201710358537 A CN 201710358537A CN 107179883 A CN107179883 A CN 107179883A
- Authority
- CN
- China
- Prior art keywords
- persistence
- ssd
- hdd
- data
- directory
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/0671—In-line storage system
- G06F3/0683—Plurality of storage devices
- G06F3/0685—Hybrid storage combining heterogeneous device types, e.g. hierarchical storage, hybrid arrays
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a Spark architecture optimization method of a hybrid storage system based on an SSD and an HDD, which comprises the following steps: setting an SSD directory management variable and an HDD directory management variable; setting a device adapter to achieve matching between a data persistence level and a corresponding temporary file directory; setting two persistence levels SSD _ ONLY and HDD _ ONLY to generate two persistence interfaces; extending the scope of the scopes of the two persistence levels to the device adapter.
Description
Technical field
The present invention relates to technical field of data processing, more particularly to a kind of mixing storage system based on SSD and HDD
Spark framework optimization methods.
Background technology
In the existing big data epoch, in face of mass data, how to manage, analyze and extract within the effective time and be valuable
The information of value, the problem of as people's urgent need to resolve.However, either scale, species or structure, big data are controlled people
The ability of data proposes huge challenge.
It, at present efficiently and in the big data computing architecture that industrial circle is widely used, is general, quickly big advise that Spark, which is,
Mould data processing engine.First, Spark provide unified solution, can be used for interactive inquiry, real-time stream process,
The complex tasks such as machine learning;Secondly, Spark passes through elasticity distribution formula data set (Resilient Distributed
Dataset, abbreviation RDD) stage and task are divided, pass through efficient directed acyclic graph (Directed Acyclic Graph, letter
Claim DAG) enforcement engine optimization subtask execution sequence, and data-handling efficiency is substantially improved by the calculating based on internal memory;The
Three, Spark data management realize horizontal expansion dependent on the Spark under the multiple data sources, and cluster mode such as HDFS, Hive
Exhibition, supports the processing of large-scale data.RDD is that Spark is different from other most important concepts of big data computing architecture, and it is one
Plant with Error Tolerance mechanism, read-only distributed data collection.In Spark application programs, each RDD can be divided into multiple
Subregion, and Spark carries out various operations in units of subregion to RDD.Persistence (Persist) RDD partition datas to internal memory or
Hard disk realizes the caching to calculating task intermediate result, so that successive iterations task directly reads intermediate result, it is to avoid weight
It is multiple to calculate, greatly improve data-handling efficiency.In addition, perdurable data is to hard disk, memory size deficiency is broken to data
The limitation of collection scale so that Spark processing big data is masterly.
But current Spark frameworks can not perceive the combining structure of bottom storage device in mixing storage system, in addition
To SSD presence unaware ability.
The content of the invention
Present invention seek to address that Spark frameworks can not perceive bottom storage device in mixing storage system in the prior art
There is provided a kind of Spark framework optimization methods of the mixing storage system based on SSD and HDD for the technical problem of combining structure.
Embodiments of the invention provide a kind of Spark framework optimization methods of the mixing storage system based on SSD and HDD,
Methods described includes:
SSD directory managements variable and HDD directory management variables are set;
Device adapter is set to realize the matching between data persistence rank and correspondence temporary file directory;
Two persistence rank SSD_ONLY are set and with HDD_ONLY to generate two persistence interfaces;
Expand the scope of action scope of two persistence ranks to the device adapter.
The present invention also provides a kind of computer-readable recording medium, is stored thereon with computer program, it is characterised in that should
The step of above method being realized when program is executed by processor.
Compared with prior art, beneficial effect is technical scheme:By setting two persistence ranks
SSD_ONLY and with HDD_ONLY to generate two persistence interfaces so that provided a user SSD_ONLY's and HDD_ONLY
Two persistence API so that the combining structure structure of bottom storage device is demonstrated out, so as to perceive the group of bottom storage device
Close structure.
Brief description of the drawings
Fig. 1 is a kind of structural representation of embodiment of distributed computing system of the present invention.
Fig. 2 is a kind of flow chart of embodiment of data processing method of distributed computing system of the present invention.
Fig. 3 is a kind of structural representation of embodiment of Spark persistences framework of the present invention.
Fig. 4 is a kind of structural representation of embodiment of Spark persistence frameworks after present invention optimization.
Fig. 5 is that a kind of Spark frameworks optimization method one kind of mixing storage system based on SSD and HDD of the present invention is implemented
The flow chart of example.
Fig. 6 is a kind of flow chart of embodiment of RDD persistence methods that the present invention mixes storage system based on SSD and HDD.
Embodiment
Embodiments of the invention are described below in detail, the example of the embodiment is shown in the drawings, wherein from beginning to end
Same or similar label represents same or similar element or the element with same or like function.Below with reference to attached
The embodiment of figure description is exemplary, it is intended to for explaining the present invention, and be not considered as limiting the invention.
Specifically, the appearance of solid state hard disc (Solid-State Drive, abbreviation SSD) is lifting performance of storage system band
Opportunity is newly carried out, SSD has the advantages that low-power consumption, low latency, small volume.With traditional forms of enterprises level hard disk (Hard Disk
Drive, abbreviation HDD) it is different come addressing system by mobile mechanical arm, SSD is implemented on semiconductor chip completely, therefore is had
Random access performance.However, due to deficiencies such as SSD Capacity Costs are too high, restricted lifetimes, replacing HDD using SSD completely can be significantly
Lifting industry cost.In order to rationally utilize the advantage such as SSD high-performance and HDD low price, deposited based on SSD and HDD mixing
The isomeric data center of storage obtains people and generally studies and apply.
The distributed computing system of one embodiment of the invention, as shown in figure 1, being deposited including Spark console modules 1 and mixing
Store up module 2, it is described mixing memory module 2 include SSD units 21 and with HDD units 22, the Spark console modules 1 respectively with
The SSD units 21 and HDD units 22 are connected;
The Spark console modules 1 are by the use of big data processing framework Spark as computing engines, the number that processing is obtained
According to the SSD units 21 are delivered to or the HDD units 22 are stored, the Spark console modules 1 are additionally operable to receive inquiry
Instruction, and take output after data corresponding with query statement from the SSD units 21 or the HDD units 22.
It is connected respectively with the SSD units and HDD units by the Spark console modules, so that the number that processing is obtained
According to delivering to the SSD units or the HDD units are stored, it is possible to achieve the Precision Mapping of data and preservation.
In specific implementation, the Spark console modules 1 include first API corresponding with the SSD units 21
(ApplicationProgrammingInterface, application programming interface) and with the HDD units corresponding second
API, the Spark console modules 1 are connected by the first API with the SSD units 21, and the Spark console modules 1 pass through
Two API are connected with the HDD units 22, to carry out data transmission.The Spark console modules 1 pass through the first API and second
API, can show user by the architectural feature for mixing storage system.And the selection of storage medium is by calling the first API
Or second api interface realize, that is, select to carry out in the SSD units 21 or the HDD units 22 storage by calling
First API or the second api interface are realized.
In specific implementation, the SSD units 21 are made and the HDD units 22 are with layer persistent storage unit.It is described
Handle obtained data and specifically include RDD partition datas.The Spark console modules are additionally operable to according to default subregion ratio value
RDD partition datas are persisted in the SSD units or the HDD units.
In specific implementation, the Spark console modules 1 are additionally operable to the RDD numbers of partitions according to the temperatures of RDD partition datas
According to being persisted in the SSD units or the HDD units.Because SSD I/O bandwidth and reduction access delay can be effective
Ground is lifted.And HDD still can require that relatively low data provide substantial amounts of storage efficiency for those to storage performance.It is substantial amounts of in addition
After data are collected and captured by data center, and infrequently it is accessed, referred to as cold data, accounts for the 90% of global metadata.And
After remaining 10% data are collected and captured, meeting is regular to be accessed, referred to as dsc data.Obviously, by whole numbers
It is irrational according to high-performance, the storage device of low latency is stored in, cost is prohibitively expensive.Therefore, according to RDD subregions
The temperature of data, realization is combined to SSD units 21 and HDD units 22 with reasonable manner, by building mixing storage system
System can bring being substantially improved for performance, while ensureing that cost is controllable.
In specific implementation, the distributed computing system also includes the capacity monitor mould for connecting the mixing memory module
Block, the capacity monitor module is used to be monitored the residual capacity of the mixing memory module, and is less than in residual capacity
Output alarm signal during predetermined threshold value.That is, distributed computing system may also include the capacity of connection mixing memory module 2
Monitoring module, capacity monitor module is used to be monitored the residual capacity for mixing memory module 2, and is less than in advance in residual capacity
If exporting warning message during threshold value.The specific value of predetermined threshold value can be determined according to the amount of capacity of mixing memory module 2, be exported
Warning message can be controlling loudspeaker sounding or control alarm lamp flicker etc..It is too low in the residual capacity of mixing memory module 2
Shi Jinhang alarms, and reminds staff that storage hard disk etc. is shifted or changed to data storage in time, to improve data storage
Reliability.
The present invention also provides a kind of data processing method of the distributed computing system of embodiment, as shown in Fig. 2 the number
Comprise the following steps according to processing method:
Step S21, the Spark console modules are used as computing engines by big data processing framework Spark, will handled
To data deliver to the SSD units or the HDD units are stored;
Step S22, the Spark console modules receive query statement, and from the SSD units or the HDD units
Obtain and exported after data corresponding with query statement.
It is connected respectively with the SSD units and HDD units by the Spark console modules, so that the number that processing is obtained
According to delivering to the SSD units or the HDD units are stored, it is possible to achieve the Precision Mapping of data and preservation.
In specific implementation, the data processing method it is further comprising the steps of by capacity monitor module to the mixing
The residual capacity of memory module is monitored, and exports warning message when residual capacity is less than predetermined threshold value.Predetermined threshold value
Specific value can determine according to the amount of capacity of mixing memory module 2, output warning message can be controlling loudspeaker sounding or
Control alarm lamp flicker etc..Alarmed when the residual capacity for mixing memory module 2 is too low, remind staff in time to depositing
Storage data are shifted or changed storage hard disk etc., to improve data storing reliability.
In specific implementation, the Spark console modules 1 include first API corresponding with the SSD units 21
(ApplicationProgrammingInterface, application programming interface) and with the HDD units corresponding second
API, the Spark console modules 1 are connected by the first API with the SSD units 21, and the Spark console modules 1 pass through
Two API are connected with the HDD units 22, to carry out data transmission.The Spark console modules 1 pass through the first API and second
API, can show user by the architectural feature for mixing storage system.And the selection of storage medium is by calling the first API
Or second api interface realize, that is, select to carry out in the SSD units 21 or the HDD units 22 storage by calling
First API or the second api interface are realized.
In specific implementation, the SSD units 21 are made and the HDD units 22 are with layer persistent storage unit.It is described
Handle obtained data and specifically include RDD partition datas.The Spark console modules are additionally operable to according to default subregion ratio value
RDD partition datas are persisted in the SSD units or the HDD units.
In specific implementation, the Spark console modules 1 are additionally operable to the RDD numbers of partitions according to the temperatures of RDD partition datas
According to being persisted in the SSD units or the HDD units.Because SSD I/O bandwidth and reduction access delay can be effective
Ground is lifted.And HDD still can require that relatively low data provide substantial amounts of storage efficiency for those to storage performance.It is substantial amounts of in addition
After data are collected and captured by data center, and infrequently it is accessed, referred to as cold data, accounts for the 90% of global metadata.And
After remaining 10% data are collected and captured, meeting is regular to be accessed, referred to as dsc data.Obviously, by whole numbers
It is irrational according to high-performance, the storage device of low latency is stored in, cost is prohibitively expensive.Therefore, according to RDD subregions
The temperature of data, realization is combined to SSD units 21 and HDD units 22 with reasonable manner, by building mixing storage system
System can bring being substantially improved for performance, while ensureing that cost is controllable.
As shown in figure 3, Spark data persistences framework can be summarized as to the basic reason of SSD presence unaware ability:
(1) Spark configuration files preserve multiple temporary file directories using single parameter, will point to SSD and HDD catalogue
Carry out mixed management;
(2) storage medium data are visited where nonNegativeHash methods not operatively distinguish different temporary file directories
Ask the difference of performance, equiprobable selection catalogue;
(3) it is unified to provide persistence interface using DISK_ONLY for upper layer application to different storage mediums, and this connects
Mouth feeds back to user by StorageLevel.
The present invention provides a kind of Spark framework optimization methods of the mixing storage system based on SSD and HDD of embodiment,
To obtain the Spark data persistence frameworks after optimization as shown in Figure 4, as shown in figure 5, the optimization method includes:
Step S51, sets SSD directory managements variable and HDD directory management variables;
Step S52, sets device adapter to realize between data persistence rank and correspondence temporary file directory
Match somebody with somebody;
Step S53, sets two persistence rank SSD_ONLY and with HDD_ONLY to generate two persistence interfaces;
Expand the scope of action scope of two persistence ranks to the device adapter.
In specific implementation, the step S51 includes:
Increase SSD directory managements variable and HDD directory management variables;
SSD directory managements variable is pointed into SSD temporary file directories, and HDD directory managements variable sensing HDD is interim
File directory.
In specific implementation, the step S52 includes:
Increase device adapter;
The default persistence rank of data is received by device adapter, and is read according to the default persistence rank of data
Temporary file directory in directory management variable corresponding to the default persistence rank of data;
Matching between data persistence rank and correspondence temporary file directory is realized by device adapter.
In specific implementation, two persistence interfaces include SSD interface and HDD interface.
In specific implementation, the step S54 includes:
The scope of the action scope of two persistence ranks is extended into the device adapter;
Or including:The scope of the action scope of two persistence ranks passes through park from the block manager in Spark frameworks
Disk block manager in framework is to the device adapter.
Specifically, the specific prioritization scheme of Spark persistence frameworks is as follows:
(1) increase SSD temporary file directories management variable and HDD temporary file directories management variable, while will interim text
The mixed management mode of part catalogue is changed to manage variable and HDD temporary file directories management variable one by SSD temporary file directories
One correspondence points to SSD and HDD temporary file directory;
(2) increase device adapter DeviceAdaptor, receive the data persistence rank that user is set, read simultaneously
The temporary file directory of user configuring, realizes persistence level parameters to SSD or HDD Precision Mapping;
(3) increase by two persistence ranks of SSD_ONLY and HDD_ONLY, mixing storage system feature is exposed to user.
Meanwhile, StorageLevel action scope is extended, as shown in figure 4, StorageLevel acts only on block manager
BlockManager, is that user and block manager BlockManager provide data persistence rank.In the present invention, will
StorageLevel action scopes further extend to device adapter DeviceAdapter, distinguish SSD units with this and HDD is mono-
Member.
By setting two persistence rank SSD_ONLY and with HDD_ONLY to generate two persistence interfaces, realize pair
Spark persistence framework is optimized, and mixes two that storage system has provided a user SSD_ONLY and HDD_ONLY
Persistence API so that the combining structure of bottom storage device is exposed to user, so that break DISK_ONLY shielding action,
And more accurate persistence API is provided a user, realize the persistence on demand of Spark application programs.
The present invention also provides a kind of computer-readable recording medium, is stored thereon with computer program, and the program is processed
The step of device realizes method in above-mentioned Fig. 5 when performing.
By setting two persistence rank SSD_ONLY and with HDD_ONLY to generate two persistence interfaces, realize pair
Spark persistence framework is optimized, and mixes two that storage system has provided a user SSD_ONLY and HDD_ONLY
Persistence API so that the combining structure of bottom storage device is exposed to user, so that break DISK_ONLY shielding action,
And more accurate persistence API is provided a user, realize the persistence on demand of Spark application programs.
In specific implementation, by calling RDD.persist (StorageLevel.SSD_ONLY) to realize, persistence should
RDD partition datas, while the default persistence rank for setting partition data is SSD_ONLY.Persistence RDD operation by
RDD.iterator methods are opened, and content shown in Fig. 3 is the persistence flow of RDD data.In addition, wanting the persistence RDD numbers of partitions
According to, it is necessary to possess two conditions:Partition data+address, partition data is had been saved in RDD modules, and address needs to pass through
Calculate and obtain, address=path/filename, path has been saved in configuration file, it is necessary to be held according to the default of partition data
Longization level map configuration file is obtained, and filename needs to be generated according to block identification.
The present invention provides a kind of RDD persistence methods that storage system is mixed based on SSD and HDD of embodiment, described to hold
Longization method be based on optimization after Spark frameworks to realize the persistence to RDD partition datas, the persistence method includes
Following steps:
The default persistence rank of data in block identification in RDD modules and RDD modules is passed to block management by RDD modules
Device;
The block identification and default persistence rank are passed to disk block manager by described piece of manager;
The default persistence rank is passed to device adapter by the disk block manager;
The device adapter, which receives the default persistence rank of data and reads two directory managements in configuration file, to be become
Amount, temporary file directory in default persistence rank and correspondence directory management variable is carried out according to the default persistence rank of data
Matching, and the temporary file directory that matching is obtained returns to the disk block manager;
The disk block manager obtains filename, and the temporary file directory obtained according to matching according to the block identification
Address data memory is obtained with the filename, and the address data memory is back to described piece of manager;
Described piece of manager is deposited according to the address data memory to the data in RDD modules in SSD or HDD
Storage.
Specifically, as shown in fig. 6, as follows the step of the persistence method:
Step 1, the RDD modules pass through Iterator method call block managers BlockManager's
DoPutIterator methods by block identification blockId and the RDD module in RDD modules data default persistence level supplementary biography
Pass block manager BlockManager;
Step 2, described piece of manager BlockManager doPutIterator method call disk block managers
GetFile methods, magnetic is passed to by the default persistence rank of data in block identification blockId and the RDD module in RDD modules
Disk block manager DiskBlockManager;
Step 3, the getFile method call device adapters of the disk block manager DiskBlockManager
The default persistence rank is passed to device adapter DeviceAdapter by getAccurateDir methods;
Step 4, the device adapter DeviceAdapter reads two directory management variables in configuration file, specifically
, described two directory management variables include SSD directory managements variable and HDD directory management variables;
Step 5, the device adapter DeviceAdapter carries out default lasting according to the default persistence rank of data
Change temporary file directory in rank and correspondence directory management variable to match, that is to say, that the device adapter
DeviceAdapter can obtain default persistence rank from upper strata, can obtain configuration file such as SSD catalogues pipe from lower floor
Variable and HDD directory management variables are managed, default persistence rank and temporary file directory can be completed, that is to say, that
GetAccurateDir methods read configuration file, and it is SSD directory managements variable and HDD that wherein configuration file, which includes two variables,
Directory management variable, then according to the default persistence rank matching above-mentioned two variable received.If default persistence level
It is not SSD_ONLY, then matches SSD directory management variables;If default persistence rank is HDD_ONLY, HDD catalogues are matched
Variable is managed, the specific storage address of RDD data persistences has now been obtained, the address is then returned into the disk block pipe
Manage device DiskBlockManager;
Step 6, temporary file directory matching obtained returns to the disk block manager DiskBlockManager,
That is, including specific storage address in the temporary file directory that matching is obtained, the address is then returned into the disk
Block manager DiskBlockManager;
Step 7, the disk block manager DiskBlockManager obtains filename according to the block identification blockId
Filename, and the temporary file directory and the filename that are obtained according to matching obtain address data memory, that is to say, that tool
Body address+fileName is exactly the full address i.e. address data memory of RDD data Cun Chudao disks, wherein fileName=
" rdd_ "+Index, Index is a Numerical Index, is incremented by sequence, and address data memory=directory/file name, separately
I.e. outer temporary file directory storing path;
Step 8, the address data memory is back to described piece by the disk block manager DiskBlockManager
Manager BlockManager;
Step 9, described piece of manager BlockManager is obtained after RDD address data memory, calls block memory module
DiskStore writeFunc methods, complete the store tasks of data.
In specific implementation, the RDD persistence methods are further comprising the steps of;
Judge whether the temperature of data in RDD modules is more than the first preset value;
If it is, the default persistence rank of data is SSD_ONLY in the RDD modules;
If not, the default persistence rank of data is HDD_ONLY in the RDD modules.
I.e. according to the temperature of data in RDD subregions, the setting for carrying out the default persistence rank of data is mono- to SSD to realize
21 and HDD of member units 22 are combined with reasonable manner, and significantly carrying for performance can be brought by building mixing storage system
Rise, while ensureing that cost is controllable.
That is, by the Spark persistence frameworks of optimization, realizing the persistence on demand of Spark data.And then, use
The API towards SSD persistences that family can call the Spark frameworks after optimization to be provided is lasting by high temperature RDD partition data
Change into SSD, thus effectively lift Spark performances.
The present invention also provides a kind of computer-readable recording medium, is stored thereon with computer program, and the program is processed
The step of device realizes method in above-mentioned Fig. 6 when performing.
In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show
The description of example " or " some examples " etc. means to combine specific features, structure, material or the spy that the embodiment or example are described
Point is contained at least one embodiment of the present invention or example.In this manual, to the schematic representation of above-mentioned term not
Identical embodiment or example must be directed to.Moreover, specific features, structure, material or the feature of description can be with office
Combined in an appropriate manner in one or more embodiments or example.In addition, in the case of not conflicting, the skill of this area
Art personnel can be tied the not be the same as Example or the feature of example and non-be the same as Example or example described in this specification
Close and combine.
Although embodiments of the invention have been shown and described above, it is to be understood that above-described embodiment is example
Property, it is impossible to limitation of the present invention is interpreted as, one of ordinary skill in the art within the scope of the invention can be to above-mentioned
Embodiment is changed, changed, replacing and modification.
Claims (7)
1. a kind of Spark framework optimization methods of the mixing storage system based on SSD and HDD, it is characterised in that:Methods described bag
Include:
SSD directory managements variable and HDD directory management variables are set;
Device adapter is set to realize the matching between data persistence rank and correspondence temporary file directory;
Two persistence rank SSD_ONLY are set and with HDD_ONLY to generate two persistence interfaces;
Expand the scope of action scope of two persistence ranks to the device adapter.
2. Spark frameworks optimization method as claimed in claim 1, it is characterised in that:The setting SSD directory managements variable and
The step of HDD directory management variables, specifically include:
Increase SSD directory managements variable and HDD directory management variables;
SSD directory managements variable is pointed into SSD temporary file directories, and HDD directory managements variable is pointed into HDD temporary files
Catalogue.
3. Spark frameworks optimization method as claimed in claim 1, it is characterised in that:It is described to set device adapter to realize
Data persistence rank and correspondence temporary file directory between matching the step of, specifically include:
Increase device adapter;
The default persistence rank of data is received by device adapter, and data are read according to the default persistence rank of data
Default persistence rank corresponding to temporary file directory in directory management variable;
Matching between data persistence rank and correspondence temporary file directory is realized by device adapter.
4. Spark frameworks optimization method as claimed in claim 1, it is characterised in that:Described two persistence ranks of expansion
The step of scope of action scope, it is specially:
The scope of the action scope of two persistence ranks is extended into the device adapter.
5. Spark frameworks optimization method as claimed in claim 1, it is characterised in that:Described two persistence ranks of expansion
The step of scope of action scope, it is specially:
The disk block that the scope of the action scope of two persistence ranks is passed through in park frameworks from the block manager in Spark frameworks
Manager is to the device adapter.
6. Spark frameworks optimization method as claimed in claim 1, it is characterised in that:Two persistence interfaces include SSD interface
And HDD interface.
7. a kind of computer-readable recording medium, is stored thereon with computer program, it is characterised in that the program is held by processor
The step of method as claimed in any one of claims 1 to 6 being realized during row.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710358537.9A CN107179883B (en) | 2017-05-19 | 2017-05-19 | Spark architecture optimization method of hybrid storage system based on SSD and HDD |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710358537.9A CN107179883B (en) | 2017-05-19 | 2017-05-19 | Spark architecture optimization method of hybrid storage system based on SSD and HDD |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107179883A true CN107179883A (en) | 2017-09-19 |
CN107179883B CN107179883B (en) | 2020-07-17 |
Family
ID=59831444
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710358537.9A Active CN107179883B (en) | 2017-05-19 | 2017-05-19 | Spark architecture optimization method of hybrid storage system based on SSD and HDD |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107179883B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107590003A (en) * | 2017-09-28 | 2018-01-16 | 深圳大学 | A kind of Spark method for allocating tasks and system |
CN107590077A (en) * | 2017-09-22 | 2018-01-16 | 深圳大学 | A kind of Spark load memory access behavior method for tracing and device |
WO2019056305A1 (en) * | 2017-09-22 | 2019-03-28 | 深圳大学 | Method and apparatus for tracking spark load memory access behavior |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104216988A (en) * | 2014-09-04 | 2014-12-17 | 天津大学 | SSD (Solid State Disk) and HDD(Hard Driver Disk)hybrid storage method for distributed big data |
CN105426472A (en) * | 2015-11-16 | 2016-03-23 | 广州供电局有限公司 | Distributed computing system and data processing method thereof |
CN105893541A (en) * | 2016-03-31 | 2016-08-24 | 中国科学院软件研究所 | Streaming data self-adaption persistence method and system based on mixed storage |
-
2017
- 2017-05-19 CN CN201710358537.9A patent/CN107179883B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104216988A (en) * | 2014-09-04 | 2014-12-17 | 天津大学 | SSD (Solid State Disk) and HDD(Hard Driver Disk)hybrid storage method for distributed big data |
CN105426472A (en) * | 2015-11-16 | 2016-03-23 | 广州供电局有限公司 | Distributed computing system and data processing method thereof |
CN105893541A (en) * | 2016-03-31 | 2016-08-24 | 中国科学院软件研究所 | Streaming data self-adaption persistence method and system based on mixed storage |
Non-Patent Citations (1)
Title |
---|
陈丽: "一种基于SSD的高性能Hadoop系统的设计与应用", 《广东水利电力职业技术学院学报》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107590077A (en) * | 2017-09-22 | 2018-01-16 | 深圳大学 | A kind of Spark load memory access behavior method for tracing and device |
WO2019056305A1 (en) * | 2017-09-22 | 2019-03-28 | 深圳大学 | Method and apparatus for tracking spark load memory access behavior |
CN107590077B (en) * | 2017-09-22 | 2020-09-11 | 深圳大学 | Spark load memory access behavior tracking method and device |
CN107590003A (en) * | 2017-09-28 | 2018-01-16 | 深圳大学 | A kind of Spark method for allocating tasks and system |
CN107590003B (en) * | 2017-09-28 | 2020-10-23 | 深圳大学 | Spark task allocation method and system |
Also Published As
Publication number | Publication date |
---|---|
CN107179883B (en) | 2020-07-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107193494A (en) | RDD (remote data description) persistence method based on SSD (solid State disk) and HDD (hard disk drive) hybrid storage system | |
US9367574B2 (en) | Efficient query processing in columnar databases using bloom filters | |
CN102521406B (en) | Distributed query method and system for complex task of querying massive structured data | |
US8782324B1 (en) | Techniques for managing placement of extents based on a history of active extents | |
CN110268394A (en) | KVS tree | |
CN110291518A (en) | Merge tree garbage index | |
CN103488704B (en) | A kind of date storage method and device | |
CN110268399A (en) | Merging tree for attended operation is modified | |
US20150347492A1 (en) | Representing an outlier value in a non-nullable column as null in metadata | |
CN104850572A (en) | HBase non-primary key index building and inquiring method and system | |
CN105302840B (en) | A kind of buffer memory management method and equipment | |
US20130290665A1 (en) | Storing large objects on disk and not in main memory of an in-memory database system | |
CN103559300B (en) | The querying method and inquiry unit of data | |
CN104035925B (en) | Date storage method, device and storage system | |
TW201415262A (en) | Construction of inverted index system, data processing method and device based on Lucene | |
CN102968464B (en) | A kind of search method of the local resource quick retrieval system based on index | |
CN109542907A (en) | Database caches construction method, device, computer equipment and storage medium | |
CN107179883A (en) | Spark architecture optimization method of hybrid storage system based on SSD and HDD | |
CN104270412A (en) | Three-level caching method based on Hadoop distributed file system | |
CN106649828A (en) | Data query method and system | |
CN102857560A (en) | Multi-service application orientated cloud storage data distribution method | |
CN102779138A (en) | Hard disk access method of real time data | |
CN111061802B (en) | Power data management processing method, device and storage medium | |
CN109902101A (en) | Transparent partition method and device based on SparkSQL | |
WO2015168988A1 (en) | Data index creation method and device, and computer storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20220517 Address after: 518000 east of the fourth floor of plant 1 (Building 1) of Baode technology R & D and production base, gaoxinyuan, Guanlan street, Longhua new area, Shenzhen, Guangdong Patentee after: Baode network security system (Shenzhen) Co.,Ltd. Address before: 518000 No. 3688 Nanhai Road, Shenzhen, Guangdong, Nanshan District Patentee before: SHENZHEN University |