CN105095352B - Data processing method and device applied to distributed system - Google Patents

Data processing method and device applied to distributed system Download PDF

Info

Publication number
CN105095352B
CN105095352B CN201510344249.9A CN201510344249A CN105095352B CN 105095352 B CN105095352 B CN 105095352B CN 201510344249 A CN201510344249 A CN 201510344249A CN 105095352 B CN105095352 B CN 105095352B
Authority
CN
China
Prior art keywords
memory
data
access
file data
file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510344249.9A
Other languages
Chinese (zh)
Other versions
CN105095352A (en
Inventor
郭照斌
李博
苗艳超
季旻
姜国梁
杨鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dawning Information Industry Co Ltd
Original Assignee
Dawning Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dawning Information Industry Co Ltd filed Critical Dawning Information Industry Co Ltd
Priority to CN201510344249.9A priority Critical patent/CN105095352B/en
Publication of CN105095352A publication Critical patent/CN105095352A/en
Application granted granted Critical
Publication of CN105095352B publication Critical patent/CN105095352B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/172Caching, prefetching or hoarding of files

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Techniques For Improving Reliability Of Storages (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention proposes a kind of data processing methods and device applied to distributed system, this method comprises: the file data of write-in is carried out erasure code coding, generate the redundant data of file data;By file data and redundant data storage to first memory;Redundant data of the temperature lower than the file data of preset value will be accessed and be transferred to second memory;When needing to read the corresponding data information of file data, file data is read in the first memory directly to obtain data information.Data processing method of the invention can effectively guarantee to access the response speed of data, and carrying cost and O&M cost is greatly saved.

Description

Data processing method and device applied to distributed system
Technical field
The present invention relates to computer fields, it particularly relates to a kind of data processing method applied to distributed system And device.
Background technique
Distributed file system generally comprises client, meta data server and data server, and client is responsible for file The access interface of data is formulated, and meta data server handles the layout and attribute of file, the data of data server storage file Content.
For distributed file system, can store mass data and have high reliability is its most important feature, when A large amount of file is stored in system, needs a large amount of disk storage, but disk is higher by very much, then relative to tape library cost Storing data is classified with tape library and disk seems necessary.
Traditional way is that the total data of the less file of access times within by a period of time is directly stored in tape library On, and the more file of access times is stored on sata disk, and disadvantage of this is that when needing the file accessed to be located at tape When on library, access speed becomes slower than sata disk very much, and user experience is very poor, and repeatedly access tape library will also result in magnetic The acceleration of tape pool damages, and causes file that can not repair.
For the problems in the relevant technologies, currently no effective solution has been proposed.
Summary of the invention
For the problems in the relevant technologies, the present invention proposes a kind of data processing method and dress applied to distributed system It sets.
The technical scheme of the present invention is realized as follows:
According to an aspect of the invention, there is provided a kind of data processing method applied to distributed system.
This method comprises:
The file data of write-in is subjected to erasure code coding, generates the redundant data of file data;
By file data and redundant data storage to first memory;
Redundant data of the temperature lower than the file data of preset value will be accessed and be transferred to second memory;
When needing to read the corresponding data information of file data, file data is read in the first memory directly to obtain Take data information.
If carrying out erasure to the redundant data being stored in second memory when first memory damage Code coding, generates file data corresponding with redundant data, and file data is stored in the normal first memory of performance.
And when second memory damage when, then to be stored in first memory and with being stored in it is superfluous in second memory Remainder carries out erasure code coding according to corresponding file data, and the redundant data storage regenerated is normal in performance Second memory.
Wherein first memory is following one of any:
Sata disk, ssd disk, sas disk;And
Second memory is following one of any:
Tape library, sata disk, ssd disk, sas disk;
But first memory with do not repeated selected by second memory.
In addition, access temperature includes at least one of:
Access times, access frequency, access duration of the moment away from current time of access time, the last time.
According to another aspect of the present invention, a kind of data processing equipment applied to distributed system is additionally provided, comprising:
Generation module, the file data for that will be written carry out erasure code coding, generate the redundancy of file data Data;
First memory module, for by file data and redundant data storage to first memory;
Second memory module is deposited for the redundant data for accessing the file data that temperature is lower than preset value to be transferred to second Reservoir;
Read module, for directly reading in the first memory when needing to read the corresponding data information of file data File data is taken to obtain data information.
In addition, the device can also include:
First unloading module, for when first memory damage when, to the redundant data being stored in second memory into Row erasure code coding, generates corresponding with redundant data file data, and it is normal that file data is stored in performance First memory.
Second unloading module, for when second memory damage, to be stored in first memory and be stored in the The corresponding file data of redundant data in two memories carries out erasure code coding, and the redundant data regenerated is deposited It is stored in the normal second memory of performance.
Wherein, first memory is following one of any:
Sata disk, ssd disk, sas disk;And
Second memory is following one of any:
Tape library, sata disk, ssd disk, sas disk;
But first memory with do not repeated selected by second memory.
In addition, access temperature includes at least one of:
Access times, access frequency, access duration of the moment away from current time of access time, the last time.
Data processing method of the invention can effectively guarantee to access the response speed of data, and storage is greatly saved Cost and O&M cost.
Detailed description of the invention
It in order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, below will be to institute in embodiment Attached drawing to be used is needed to be briefly described, it should be apparent that, the accompanying drawings in the following description is only some implementations of the invention Example, for those of ordinary skill in the art, without creative efforts, can also obtain according to these attached drawings Obtain other attached drawings.
Fig. 1 is the flow chart of the data processing method according to an embodiment of the present invention applied to distributed system;
Fig. 2 is the schematic diagram of the data processing method according to an embodiment of the present invention applied to distributed system;
Fig. 3 is the block diagram of the data processing equipment according to an embodiment of the present invention applied to distributed system.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, those of ordinary skill in the art's every other embodiment obtained belong to what the present invention protected Range.
According to an embodiment of the invention, providing a kind of data processing method applied to distributed system.
As shown in Figure 1, the data processing method according to an embodiment of the present invention applied to distributed system includes:
The file data of write-in is carried out erasure code coding, generates the redundant data of file data by step S101;
Step S103, by file data and redundant data storage to first memory;
Step S105 will access redundant data of the temperature lower than the file data of preset value and be transferred to second memory;
Step S107 directly reads text when needing to read the corresponding data information of file data in the first memory Number of packages obtains data information accordingly.
In addition, if then being carried out to the redundant data being stored in second memory when first memory damage Erasure code coding, generates file data corresponding with the redundant data, and the file data restored again is stored in The normal first memory of performance.
And when second memory damage when, then to be stored in first memory and with being stored in it is superfluous in second memory Remainder carries out erasure code coding according to corresponding file data, and the redundant data storage regenerated is normal in performance Second memory.
Wherein first memory can be following one of any:
Sata disk, ssd disk, sas disk;And
Second memory can be following one of any:
Tape library, sata disk, ssd disk, sas disk;
But the first memory of selection cannot be repeated with second memory.
In addition, access temperature includes at least one of:
Access times, access frequency, access duration of the moment away from current time of access time, the last time.
Where preferably embodying the difference and advantage of technical solution of the present invention and prior art, in a reality It applies in example, first memory is chosen for sata disk, and second memory is chosen for tape library.It is literary when being written in distributed file system When part, erasure code coding is carried out to this document first, after generating corresponding redundancy check, it is unified by initial data and Redundant data is stored on the sata disk of data storage server, then will infrequently according to the access time of file last time The redundant data of the file of access dumps on tape library, and deletes the redundant copy on sata disk, when this document is by again When being accessed, then do not have to read data from tape library, only need to read initial data from sata disk, if when on disk Corrupted data when, redundancy can be read from tape library and recovers initial data again, and when the corrupted data on tape library When, then it can go out the data damaged with the data reconstruction on sata disk, ensure that response speed when data access in this way, again Taking full advantage of tape library reduces carrying cost, while meeting high reliability request again.
Technical solution of the present invention is understood in order to clearer, it is in a specific embodiment, real referring to shown in Fig. 2 Specific step is as follows for existing technical solution of the present invention:
1. client obtains the request of file layout information to meta data server when file write-in;
2. client requests redundant data and former data write-in layout to after file progress erasure code coding In layout specified sata disk;
3. meta data server timing scan file directory accesses last time according to the file access strategy of setting Time is more than that the file general of certain time (such as 3 days) elects so far;
4. the redundant data of the file screened is moved on tape library, and modify file layout information.
5. the data on tape library are read according to file layout information and pass through erasure code when sata adjustment debit bad when Coding calculates in initial data reparation to good sata disk;
6. data reading on sata disk is passed through according to file layout information when there is corrupted data in tape library Erasure code coding calculates redundant data and re-writes in tape library;
When user reads data, it is only necessary to the initial data in common disk is read, without accessing on tape library Data.
According to an embodiment of the invention, additionally providing a kind of data processing equipment applied to distributed system.
As shown in figure 3, the data processing equipment according to an embodiment of the present invention applied to distributed system includes:
Generation module 31, the file data for that will be written carry out erasure code coding, generate the superfluous of file data Remainder evidence;
First memory module 32, for by file data and redundant data storage to first memory;
Second memory module 33 is transferred to second for will access redundant data of the temperature lower than the file data of preset value Memory;
Read module 34, for when needing to read the corresponding data information of file data, directly in the first memory File data is read to obtain data information.
In addition, the device can also include:
First unloading module (not shown) is used for when first memory damage, superfluous in second memory to being stored in Remainder generates corresponding with redundant data file data according to erasure code coding is carried out, and by being stored in property of file data It can normal first memory.
Second unloading module (not shown), for when second memory damage when, to be stored in first memory and with The corresponding file data of the redundant data being stored in second memory carries out erasure code coding, superfluous by what is regenerated Remainder evidence is stored in the normal second memory of performance.
Wherein, first memory can be following one of any:
Sata disk, ssd disk, sas disk;And
Second memory can be following one of any:
Tape library, sata disk, ssd disk, sas disk;
But first memory with do not repeated selected by second memory.
In addition, access temperature includes at least one of:
Access times, access frequency, access duration of the moment away from current time of access time, the last time.
Data processing method of the invention can effectively guarantee to access the response speed of data, and storage is greatly saved Cost and O&M cost.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all in essence of the invention Within mind and principle, any modification, equivalent replacement, improvement and so on be should all be included in the protection scope of the present invention.

Claims (6)

1. a kind of data processing method applied to distributed system characterized by comprising
The file data of write-in is subjected to erasure code coding, generates the redundant data of the file data;
By the file data and redundant data storage to first memory;
Redundant data of the temperature lower than the file data of preset value will be accessed and be transferred to second memory, and the second memory Access speed be less than the first memory access speed;
When needing to read the corresponding data information of the file data, the file is directly read in the first memory Data are to obtain the data information;
When first memory damage, erasure code is carried out to the redundant data being stored in the second memory Coding generates file data corresponding with the redundant data, and the file data is stored in performance normal first and is deposited Reservoir;
When the second memory damage when, to be stored in the first memory and be stored in the second memory The corresponding file data of redundant data carry out erasure code coding, just in performance by the redundant data storage regenerated Normal second memory.
2. the method according to claim 1, wherein including:
The first memory includes:
Sata disk, ssd disk, sas disk;And
The second memory includes:
Tape library, sata disk, ssd disk, sas disk;
Wherein, the first memory is not repeated with selected by the second memory.
3. the method according to claim 1, wherein the access temperature includes at least one of:
Access times, access frequency, access duration of the moment away from current time of access time, the last time.
4. a kind of data processing equipment applied to distributed system characterized by comprising
Generation module, the file data for that will be written carry out erasure code coding, generate the redundancy of the file data Data;
First memory module, for by the file data and redundant data storage to first memory;
Second memory module is transferred to the second storage for will access redundant data of the temperature lower than the file data of preset value Device, and the access speed of the second memory is less than the access speed of the first memory;
Read module, for when needing to read the corresponding data information of the file data, directly in the first memory It is middle to read the file data to obtain the data information;
First unloading module is used for when first memory damage, to the redundant digit being stored in the second memory According to erasure code coding is carried out, file data corresponding with the redundant data is generated, and the file data is stored In the normal first memory of performance;
Second unloading module, for when the second memory damage when, to be stored in the first memory and with storage The corresponding file data of redundant data in the second memory carries out erasure code coding, superfluous by what is regenerated Remainder evidence is stored in the normal second memory of performance.
5. device according to claim 4 characterized by comprising
The first memory includes:
Sata disk, ssd disk, sas disk;And
The second memory includes:
Tape library, sata disk, ssd disk, sas disk;
Wherein, the first memory is not repeated with selected by the second memory.
6. device according to claim 4, which is characterized in that the access temperature includes at least one of:
Access times, access frequency, access duration of the moment away from current time of access time, the last time.
CN201510344249.9A 2015-06-19 2015-06-19 Data processing method and device applied to distributed system Active CN105095352B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510344249.9A CN105095352B (en) 2015-06-19 2015-06-19 Data processing method and device applied to distributed system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510344249.9A CN105095352B (en) 2015-06-19 2015-06-19 Data processing method and device applied to distributed system

Publications (2)

Publication Number Publication Date
CN105095352A CN105095352A (en) 2015-11-25
CN105095352B true CN105095352B (en) 2019-03-05

Family

ID=54575789

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510344249.9A Active CN105095352B (en) 2015-06-19 2015-06-19 Data processing method and device applied to distributed system

Country Status (1)

Country Link
CN (1) CN105095352B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106528002A (en) * 2016-12-06 2017-03-22 郑州云海信息技术有限公司 Time-based storage scheduling method
CN106649891A (en) * 2017-02-24 2017-05-10 深圳市中博睿存信息技术有限公司 Distributed data storage method and system
CN112256472B (en) * 2020-10-20 2024-06-25 平安科技(深圳)有限公司 Distributed data retrieval method and device, electronic equipment and storage medium
CN112558886A (en) * 2020-12-25 2021-03-26 北京嘀嘀无限科技发展有限公司 Data storage method and system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101488104A (en) * 2009-02-26 2009-07-22 北京世纪互联宽带数据中心有限公司 System and method for implementing high-efficiency security memory
CN102508789A (en) * 2011-10-14 2012-06-20 浪潮电子信息产业股份有限公司 Grading storage method for system

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103631666B (en) * 2012-08-24 2018-04-20 中兴通讯股份有限公司 The fault-tolerant adaptation management equipment of data redundancy, service equipment, system and method
CN102937967B (en) * 2012-10-11 2018-02-27 南京中兴新软件有限责任公司 Data redundancy realization method and device
US9495246B2 (en) * 2013-01-21 2016-11-15 Kaminario Technologies Ltd. Raid erasure code applied to partitioned stripe
CN104281533B (en) * 2014-09-18 2018-03-20 深圳市中博科创信息技术有限公司 A kind of method and device of data storage

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101488104A (en) * 2009-02-26 2009-07-22 北京世纪互联宽带数据中心有限公司 System and method for implementing high-efficiency security memory
CN102508789A (en) * 2011-10-14 2012-06-20 浪潮电子信息产业股份有限公司 Grading storage method for system

Also Published As

Publication number Publication date
CN105095352A (en) 2015-11-25

Similar Documents

Publication Publication Date Title
US10802727B2 (en) Solid-state storage power failure protection using distributed metadata checkpointing
US10977124B2 (en) Distributed storage system, data storage method, and software program
US8103847B2 (en) Storage virtual containers
US10127166B2 (en) Data storage controller with multiple pipelines
TWI645404B (en) Data storage device and control method for non-volatile memory
CN105573681B (en) Method and system for establishing RAID in SSD
CN101916173B (en) RAID (Redundant Array of Independent Disks) based data reading and writing method and system thereof
US20160217040A1 (en) Raid parity stripe reconstruction
KR101870521B1 (en) Methods and systems for improving storage journaling
CN107391027A (en) Redundant Array of Inexpensive Disc storage device and its management method
KR101678868B1 (en) Apparatus for flash address translation apparatus and method thereof
US8843704B2 (en) Stride based free space management on compressed volumes
JP2007012058A (en) File system for storing transaction records in flash-like media
CN105095352B (en) Data processing method and device applied to distributed system
CN102799533B (en) Method and apparatus for shielding damaged sector of disk
US11379155B2 (en) System and method for flash storage management using multiple open page stripes
CN103064765A (en) Method and device for data recovery and cluster storage system
CN103425589A (en) Control apparatus, storage device, and storage control method
CN111124262B (en) Method, apparatus and computer readable medium for managing Redundant Array of Independent Disks (RAID)
TW200907995A (en) Method and system of defect management for storage medium
CN106686095A (en) Data storage method and device based on erasure code technology
CN102024021A (en) Method for logging metadata in logical file system
CN105302665A (en) Improved copy-on-write snapshot method and system
CN107728943B (en) Method for delaying generation of check optical disc and corresponding data recovery method
US11347860B2 (en) Randomizing firmware loaded to a processor memory

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant