CN109241023A - Distributed memory system date storage method, device, system and storage medium - Google Patents

Distributed memory system date storage method, device, system and storage medium Download PDF

Info

Publication number
CN109241023A
CN109241023A CN201811108494.XA CN201811108494A CN109241023A CN 109241023 A CN109241023 A CN 109241023A CN 201811108494 A CN201811108494 A CN 201811108494A CN 109241023 A CN109241023 A CN 109241023A
Authority
CN
China
Prior art keywords
stored
blocks
file
files
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811108494.XA
Other languages
Chinese (zh)
Inventor
徐晓阳
赵万里
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhengzhou Yunhai Information Technology Co Ltd
Original Assignee
Zhengzhou Yunhai Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhengzhou Yunhai Information Technology Co Ltd filed Critical Zhengzhou Yunhai Information Technology Co Ltd
Priority to CN201811108494.XA priority Critical patent/CN109241023A/en
Publication of CN109241023A publication Critical patent/CN109241023A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of distributed memory system date storage methods, comprising: to file to be stored piecemeal, obtains several file to be stored blocks;File to be stored block is carried out content with pre-stored blocks of files to compare, with the presence or absence of the blocks of files with file to be stored block content matching in judgement system;If so, in acquisition system the blocks of files of content matching data storage location;Data directory is established to matched file to be stored block according to data storage location.This method carries out piecemeal by file that will be to be stored, carries out the determination that comparing determines redundant data by segmentation blocks of files, the detection probability to part repeated data in file can be improved, and realizes precision data and deletes again.The present invention also provides a kind of distributed memory system data storage device, system and readable storage medium storing program for executing, have above-mentioned beneficial effect.

Description

Distributed memory system date storage method, device, system and storage medium
Technical field
The present invention relates to computer field, in particular to a kind of distributed memory system date storage method, device, system And readable storage medium storing program for executing.
Background technique
With data information high speed development, the various data interactions such as cloud computing, big data, Internet of Things cause storing data fast Speed increases, and has become a more crucial problem to the data management of storage equipment storage.
The data stored generally have high redundancy in storage equipment, i.e., repeated between different files the data of storage compared with It is more, especially various backup storage systems and each type operating system.Storage sky can effectively be promoted by reducing Data duplication redundancy Between utilization rate, be the important research project in storage system.
The comparison that Data duplication redundancy typically directly carries out file, the method that duplicate file is deleted are reduced at present.And Current duplicate file is mostly the big file such as disaster recovery backup, and file content is more, and current file seldom exists complete between file Full weight is multiple, and mainly there are more redundant datas for file internal, by carrying out whole comparison repetition that can be detected to file File is less, and the repeated and redundant data in system are still more, smaller to the excessive occupancy situation remission effect of Installed System Memory.
Therefore, how to realize that precision data are deleted again, alleviate the occupancy situation of Installed System Memory, be that those skilled in the art need Technical problems to be solved.
Summary of the invention
The object of the present invention is to provide a kind of distributed memory system date storage method, this method passes through will be to be stored File carries out piecemeal, carries out the determination that comparing determines redundant data by segmentation blocks of files, can be improved in the middle part of file The detection probability for dividing repeated data realizes precision data and deletes again;It is a further object of the present invention to provide a kind of distributions to deposit Storage system data storage device system and readable storage medium storing program for executing have above-mentioned beneficial effect.
In order to solve the above technical problems, the present invention provides a kind of distributed memory system date storage method, comprising:
To file to be stored piecemeal, several file to be stored blocks are obtained;
The file to be stored block and pre-stored blocks of files are carried out content to compare, in judgement system with the presence or absence of with The blocks of files of the file to be stored block content matching;
If so, the data storage location of the blocks of files of content matching described in acquisition system;
Data directory is established to matched file to be stored block according to the data storage location.
Preferably, described to compare the file to be stored block and pre-stored blocks of files progress content, judge system In with the presence or absence of with the blocks of files of the file to be stored block content matching include:
The hash value for calculating the file to be stored block obtains hash value to be stored;
The hash value to be stored is compared with blocks of files hash value in concordance list, judge in the concordance list whether There is the identical blocks of files hash value with the hash value to be stored;Wherein, it is stored in system in the concordance list and has stored text The blocks of files hash value of part and corresponding blocks of files storage location.
Preferably, the distributed memory system date storage method further include:
If the identical blocks of files hash value with the hash value to be stored, storage be not described wait deposit in the concordance list Data are stored up, and the hash value to be stored and corresponding data storage location are added in the concordance list.
Preferably, the distributed memory system date storage method further include:
By updated concordance list real-time release into system each node.
Preferably, the hash value for calculating the file to be stored block includes:
The hash value of the file to be stored block is calculated by SHA-1 hash function.
Preferably, the generation method of the concordance list includes:
It whether there is pre-stored file in judgement system;
If so, calculating the blocks of files hash of the storage file according to the blocks of files occupancy situation of storage file Value;
Each blocks of files hash value of storage file and the corresponding data storage location is calculated, generates Concordance list.
Preferably, the distributed memory system date storage method further include:
Blocks of files hash value each in the concordance list is compared two-by-two, is judged in the concordance list with the presence or absence of identical hash The blocks of files of value;
If so, determining document retaining block and non-reserved blocks of files;
The data storage location of the document retaining block is replaced to the storing data of the non-reserved blocks of files.
The present invention discloses a kind of distributed memory system data storage device, comprising:
Blocking unit, for obtaining several file to be stored blocks to file to be stored piecemeal;
Comparing unit is compared for the file to be stored block and pre-stored blocks of files to be carried out content, judge be With the presence or absence of the blocks of files with the file to be stored block content matching in system;
Data information acquiring unit, for if so, the data of the blocks of files of content matching described in acquisition system store Position;
Index establishes unit, for establishing data rope to matched file to be stored block according to the data storage location Draw.
The present invention discloses a kind of distributed memory system data storage device, comprising:
Memory, for storing program;
Processor, the step of distributed memory system date storage method is realized when for executing described program.
The present invention discloses a kind of readable storage medium storing program for executing, and program is stored on the readable storage medium storing program for executing, and described program is located The step of reason device realizes the distributed memory system date storage method when executing.
Distributed memory system date storage method provided by the present invention carries out piecemeal by file that will be to be stored, File to be stored block and pre-stored blocks of files are carried out content by the analysis that data are carried out by dividing documents into data block It comparing, the detection probability for improving part repeated data in file may be implemented, the redundant file block that can be inquired can greatly increase, Carrying out comparing precision by segmentation blocks of files realizes the determination of redundant data, carries out whole point compared to entire file Analysis can greatly improve the detection probability of redundant data;If there is the file with file to be stored block content matching in system Block shows that the content of currently stored this document block has the blocks of files of content matching in systems, i.e. the file to be stored block is Redundant file block establishes data rope to the redundant file block according to the data storage location of the blocks of files of content matching in system Draw, i.e., do not store the data of the redundant file block, by the way that current file block is directed toward pre-stored matched data, that is, meet to The demand of storage file system storage, and greatly reduce the EMS memory occupation of redundant data.
In addition, another embodiment of the present invention, which is disclosed, compares this technology by blocks of files hash value progress blocks of files content Feature, the hash value of blocks of files can embody the uniqueness characteristic of blocks of files content by simple characteristic value, not only can be big The big consumption for simplifying system resource and load in the comparison process of file content, and the efficiency that data are deleted again can be improved, Realize that high efficiency is deleted again.
The present invention also provides a kind of distributed memory system data storage device, system and readable storage medium storing program for executing, have Above-mentioned beneficial effect, details are not described herein.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this The embodiment of invention for those of ordinary skill in the art without creative efforts, can also basis The attached drawing of offer obtains other attached drawings.
Fig. 1 is a kind of flow chart of distributed memory system date storage method provided in an embodiment of the present invention;
Fig. 2 is a kind of structural block diagram of distributed memory system data storage device provided in an embodiment of the present invention;
Fig. 3 is a kind of structural block diagram of distributed memory system data storage device provided in an embodiment of the present invention;
Fig. 4 is a kind of structural schematic diagram of distributed memory system data storage device provided in an embodiment of the present invention.
Specific embodiment
Core of the invention is to provide a kind of distributed memory system date storage method, and this method passes through will be to be stored File carries out piecemeal, carries out the determination that comparing determines redundant data by segmentation blocks of files, can be improved in the middle part of file The detection probability for dividing repeated data realizes precision data and deletes again;Another core of the invention is to provide a kind of distribution and deposits Storage system data storage device, system and readable storage medium storing program for executing.
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art Every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.
Referring to FIG. 1, Fig. 1 is the flow chart of distributed memory system date storage method provided in this embodiment;The party Method mainly may include:
Step s110, to file to be stored piecemeal, several file to be stored blocks are obtained.
After client receives data storage request, system carries out deblocking, this implementation to the file with storage first Example without limitation, can specify that unified piecemeal rule to the method for partition of file, all files to be stored is divided into fixed big Small blocks of files can also carry out the piecemeal rule with storing data according to file content and memory node customized.
Specifically, customizing method can according to piecemeal size can by size of data and memory node (abridging again a little) come It determines, if memory node is more, and data file data volume is little, then block size can be defined as to lower value, such as 1kb, Block size can be defined as high value, such as 512K if data volume is larger and memory node is not up to the larger value.
By the corresponding several blocks of files of this document available after file to be stored piecemeal, text is stored in each blocks of files Number of packages evidence.
Step s120, file to be stored block and pre-stored blocks of files content is carried out to compare, judge in system whether In the presence of the blocks of files with file to be stored block content matching.
Blocks of files pre-stored in blocks of files and system after segmentation progress content is compared, is deposited in advance in judgement system Whether there is the blocks of files with the file to be stored block content matching after segmentation in the data of storage, determines that matched rule can basis File content determines that can reach 98% or more in content similarity can be regarded as two blocks of files content matchings, can also be in content phase Reaching 100% like degree just can be regarded as two blocks of files content matchings etc., without limitation to specific matching rule at this, when file is important When rank is higher, file content precision degree is higher, higher matching degree can be set.
Content matching, that is, file repeats, if blocks of files content matching occur can determine blocks of files repetition occur, wait store Blocks of files can be determined as redundant block.
The specific method that the present embodiment compares the content of blocks of files without limitation, is referred at present carry out file whole The matched method of body carries out the comparison of blocks of files content one by one, can also treat storage text with the characteristic value of extraction document block content Part block and pre-stored file block eigenvalue are compared, and data can be greatly simplified by being compared by extraction characteristic value Comparison process reduces the resource occupation of comparison process.
If there is the blocks of files with file to be stored block content matching in system, s130 is thened follow the steps.
Step s130, in acquisition system the blocks of files of content matching data storage location.
Step s140, data directory is established to matched file to be stored block according to data storage location.
If there is the blocks of files with file to be stored block content matching in system, show that blocks of files currently to be stored is Redundant block includes the blocks of files with this document block content matching, the file to be stored block in the pre-stored blocks of files of system For redundant file block.It is occupied to reduce the datarams of redundant file block, does not then store the partial redundance data, it will be pre- in system First store with the matched data of the partial data as the partial data, by the data directory of the matching files block to be stored It migrates to the pre-stored blocks of files of Corresponding matching, according to the storage location of matched data to matched file to be stored block Establish data directory.
If not finding to match, illustrate that data to be stored block is Non-redundant data, data storage rule in such cases Without limitation, it is referred to existing data storage rule and carries out data storage.
Based on above-mentioned introduction, distributed memory system date storage method disclosed by the embodiments of the present invention, by will be wait deposit The file of storage carries out piecemeal, and the analysis of data is carried out by dividing documents into data block, by file to be stored block with deposit in advance The blocks of files of storage carries out content comparison, the detection probability for improving part repeated data in file may be implemented, what can be inquired is superfluous Remaining blocks of files can greatly increase, and carrying out comparing precision by segmentation blocks of files realizes the determination of redundant data, compare The detection probability of redundant data can be greatly improved by carrying out global analysis to entire file;If existed and text to be stored in system The blocks of files of part block content matching shows that the content of currently stored this document block has the blocks of files of content matching in systems, I.e. the file to be stored block is redundant file block, according to the data storage location of the blocks of files of content matching in system to the redundancy Blocks of files establishes data directory, i.e., does not store the data of the redundant file block, pre-stored by the way that current file block to be directed toward Matched data meets the needs of file to be stored system storage, and greatly reduces the EMS memory occupation of redundant data.
It in above-described embodiment without limitation to the specific method of the content comparison of blocks of files, can be with extraction document block content File to be stored block and pre-stored file block eigenvalue is compared in characteristic value, to reduce the resource of comparison process It occupies.The present embodiment is specifically introduced the content comparison method of blocks of files.
There are many extracting methods of blocks of files characteristic, is referred to existing characteristic extracting method, for example, NMF algorithm, FAST algorithm, SURF algorithm, hash algorithm etc..Wherein hash Value Data more refines, the characteristic value being calculated It is relatively simple, and there is uniqueness, it can accurately embody blocks of files content characteristic.Preferably, calculation document block can be passed through Hash value be compared and realize the content of blocks of files is compared, specifically, by file to be stored block and pre-stored file Block carries out content comparison, in judgement system with the presence or absence of the blocks of files with file to be stored block content matching can specifically include with Lower step:
The hash value for calculating file to be stored block, obtains hash value to be stored;
Hash value to be stored is compared with blocks of files hash value in concordance list, judge whether to have in concordance list with wait deposit Store up the identical blocks of files hash value of hash value;Wherein, the blocks of files hash value of storage file in system is stored in concordance list And corresponding blocks of files storage location.
The comparison in file content can be not only greatly simplified by carrying out the comparison of blocks of files content to blocks of files hash value The consumption of system resource and load in the process, and the efficiency that data delete (deleting duplicated data) again can be improved, it realizes efficient Change is deleted again.Wherein, the specific steps of calculation document block hash value are referred to the prior art, and details are not described herein.
Based on the above embodiment, in concordance list not with hash value to be stored the case where identical blocks of files hash value not It limits, can directly store the partial data, it is preferable that, can also will hash be stored while carrying out data storage Value and corresponding data storage location are added in concordance list, to realize the real-time update to blocks of files content in concordance list.
In addition, including several nodes in distributed memory system, realize that different data functions, including data are deposited respectively Storage, data management etc., concordance list is generally stored in management node, and is the government pressure for alleviating management node, can generally be selected Take several ordinary nodes to share the management role of management node, if concordance list update after in system other nodes it is possible that letter The case where breath delay, carries out information comparison still according to concordance list before and avoids the repetition of data from depositing to reduce such case Storage, can by updated concordance list real-time release into system each node, with realize to node index information each in system Update prompt.
Wherein, hash algorithm includes lock kind many kinds of function, without limitation to the hash function specifically selected, it is preferable that can To calculate the hash value of file to be stored block by SHA-1 hash function.SHA-1 hash function calculating speed is fast, can be promoted Information comparison speed promotes data storage efficiency.
Further, since before the date storage method that the application present invention provides apple carries out data storage, system magnetic Possible data with existing is stored in advance in disk, which can not be compared, only to using after the storage method System file to be stored carries out comparing, i.e., can not include the pre-stored data in the part in concordance list, with reduction pair The occupancy of system resource.
And the part be pre-stored data volume it is larger when, in the blocks of files newly stored may exist largely with the portion The duplicate data of divided data, for the occupancy for being further reduced system space, rate is deleted in promotion again, it is preferable that may determine that in system With the presence or absence of pre-stored file;If so, calculating storage file according to the blocks of files occupancy situation of storage file Blocks of files hash value;Each blocks of files hash value of storage file and corresponding data storage location is calculated, it is raw At concordance list.
In addition, since there may be repeated datas to mention for the repetitive rate for reducing pre-stored data for pre-stored data itself Rise system storage performance, it is preferable that can be compared two-by-two to blocks of files hash value each in concordance list, judge whether deposit in concordance list In the blocks of files of identical hash value;If so, determining document retaining block and non-reserved blocks of files;The data of document retaining block are deposited Storage space sets the storing data for replacing non-reserved blocks of files.By the way that hash value in concordance list is compared, if there is repeated data, Repeated data replaced with into the corresponding data directory of the partial data, realizes and repeated data in pre-stored data is deleted again, drop Low system data stored memory occupies.
To deepen the understanding to the date storage method provided is invented, herein to whole for carrying out blocks of files hash comparison Body Stored Procedure is introduced, and other implementations based on distributed memory system date storage method provided by the invention are equal It can refer to the introduction of the present embodiment.
Client receives data storage request, carries out piecemeal to the file to be stored received.
The memory node (abridging again a little) that each blocks of files is distributed to cluster after piecemeal, realizes multiple nodal parallels Business is deleted again.
The hash value for calculate by SHA-1 hash function data to be stored block, is inquired after obtaining data block hash value Data directory, if in concordance list there are the hash value if indicate that the data block is existing and stores, the file to be stored block Corresponding data block is redundant data, then only records the hash value pointer position inquired;If in data directory In do not find to match, illustrate that the data block is Non-redundant data, then carry out data storage, and the pointer position of data block is stored in Data directory is deposited into data directory, and by the pointer position of data block.
Wherein, concordance list is deposited in database, one group of data in each blocks of files manipulative indexing table, with hash value work For index key, because hash value may insure the data block accuracy that data directory is stored as the identification of unique block, It can also be improved the raising of data directory inquiry velocity and delete efficiency again.
Due to the multinode storing data of distributed mass memory system, it can realize that nodal parallel is held by deleting algorithm again Row deletes service again, and the service of deleting again deletes number according to whether the verification of the data directory of all nodes sharings attaches most importance to after data block issues According to updating shared data rope if Non-redundant data if it is the pointer in redundant data then direct storing data index Draw table, and real-time release is to each node updates.By above-mentioned mass memory, (the capacity growth in data storage is to tend to nothing Limit, without the upper limit) distributed technology of deleting again realizes that multi-node parallel data are deleted again, process is deleted in optimization again, realizes greater efficiency Data are deleted again
The present embodiment uses the Harbin SHA-1 function calculation document block by carrying out storage verification to the file for preparing storage Hash value, and data directory is established according to pointer, already present data block hash value is verified, by being in judgement concordance list It is no containing hash value identical with the data block, and if so, show the data be redundant data, the data block can be stored Hash value pointer reduces storage device data and uses capacity, promotes space utilisation, and can reduce system load pressure, subtract Small data read-write delay;If there is no then showing that the data block is not redundant data, then the hash value is recorded in data rope Draw in table, while there is mass storage system (MSS) multinode data distribution to share characteristic, so when different data stores different nodes Block (KB) grade data can directly be carried out parallel to delete again, for backup, calamity can effectively improve data storage efficiency for data such as data.
Referring to FIG. 2, Fig. 2 is the structural frames of distributed memory system data storage device provided in an embodiment of the present invention Figure;It may include: that blocking unit 210, comparing unit 220, data information acquiring unit 230 and index establish unit 240.This The distributed memory system data storage device that embodiment provides can be mutual with above-mentioned distributed memory system date storage method Control.
Wherein, blocking unit 210 is mainly used for obtaining several file to be stored blocks to file to be stored piecemeal;
Comparing unit 220 is mainly used for comparing file to be stored block and pre-stored blocks of files progress content, judges With the presence or absence of the blocks of files with file to be stored block content matching in system;
If data information acquiring unit 230 is mainly used for the file for having with file to be stored block content matching in system Block, the data storage location of the blocks of files of content matching in acquisition system;
Index establishes unit 240 and is mainly used for establishing data rope to matched file to be stored block according to data storage location Draw.
Preferably, comparing unit is specifically as follows hash comparing unit, comprising:
Hash value computation subunit obtains hash value to be stored for calculating the hash value of file to be stored block;
Hash value comparison subunit is sentenced for hash value to be stored to be compared with blocks of files hash value in concordance list Whether have and the identical blocks of files hash value of hash value to be stored in disconnected concordance list;Wherein, it is stored in concordance list in system The blocks of files hash value of storage file and corresponding blocks of files storage location.
Preferably, distributed memory system data storage device provided in this embodiment can be with further include: storage unit is deposited Storage unit is connect with hash value comparison subunit, if being mainly used in concordance list the not identical file with hash value to be stored Block hash value stores data to be stored, and hash value to be stored and corresponding data storage location is added in concordance list.
Preferably, distributed memory system data storage device provided in this embodiment can be with further include: updating unit, more New unit is connect with storage unit, is mainly used for updated concordance list real-time release into system each node.
Preferably, hash value computation subunit specifically can be used for: calculate file to be stored block by SHA-1 hash function Hash value.
Preferably, concordance list generation unit mainly may include: in distributed memory system data storage device
Judgment sub-unit, for whether there is pre-stored file in judgement system;
Pre-stored computation subunit, if for there are pre-stored files in system, according to the text of storage file Part block occupancy situation calculates the blocks of files hash value of storage file;
Each blocks of files hash value of storage file and corresponding data storage location is calculated, generates index Table.
It preferably, can be in concordance list generation unit further include: concordance list repeats comparing unit;
Concordance list repeats comparing unit
Pre-stored comparison subunit, for being compared two-by-two to blocks of files hash value each in concordance list, judge be in concordance list It is no that there are the blocks of files of identical hash value;
Blocks of files determines subelement, if for, there are the blocks of files of identical hash value, determining document retaining in concordance list Block and non-reserved blocks of files;
Data replace subelement, for the data storage location of document retaining block to be replaced to the storage number of non-reserved blocks of files According to.
Distributed memory system data storage device provided in this embodiment by blocking unit by file to be stored into Row piecemeal, comparing unit are carried out the determination that comparing determines redundant data by segmentation blocks of files, can be improved in file The detection probability of part repeated data realizes precision data and deletes again.
Referring to FIG. 3, Fig. 3 is the structural block diagram of distributed memory system data storage device provided in this embodiment;It should Equipment may include: memory 300 and processor 310.Distributed memory system data storage device can refer to above-mentioned distribution The introduction of formula memory system data storage method.
Wherein, memory 300 is mainly used for storing program;
Processor 310 is mainly used for the step of realizing above-mentioned distributed memory system date storage method when executing program.
Referring to FIG. 4, being the structural schematic diagram of distributed memory system data storage device provided in this embodiment, the number Bigger difference can be generated because configuration or performance are different according to storage equipment, may include one or more processors (central processing units, CPU) 322 (for example, one or more processors) and memory 332, one Or (such as one or more mass memories are set the storage medium 330 of more than one storage application program 342 or data 344 It is standby).Wherein, memory 332 and storage medium 330 can be of short duration storage or persistent storage.It is stored in the journey of storage medium 330 Sequence may include one or more modules (diagram does not mark), and each module may include to one in data processing equipment Series of instructions operation.Further, central processing unit 322 can be set to communicate with storage medium 330, set in data storage The series of instructions operation in storage medium 330 is executed on standby 301.
Data storage device 301 can also include one or more power supplys 326, one or more wired or nothings Wired network interface 350, one or more input/output interfaces 358, and/or, one or more operating systems 341, Such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM etc..
Step in distributed memory system date storage method described in above figure 1 can be by distributed memory system The structure of data storage device is realized.
Present embodiment discloses a kind of readable storage medium storing program for executing, program is stored on readable storage medium storing program for executing, program is by processor The step of distributed memory system date storage method is realized when execution, wherein distributed memory system date storage method can Corresponding embodiment referring to Fig.1, details are not described herein.
The readable storage medium storing program for executing be specifically as follows USB flash disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), the various program storage generations such as random access memory (Random Access Memory, RAM), magnetic or disk The readable storage medium storing program for executing of code.
Each embodiment is described in a progressive manner in specification, the highlights of each of the examples are with other realities The difference of example is applied, the same or similar parts in each embodiment may refer to each other.For device disclosed in embodiment Speech, since it is corresponded to the methods disclosed in the examples, so being described relatively simple, related place is referring to method part illustration ?.
Professional further appreciates that, unit described in conjunction with the examples disclosed in the embodiments of the present disclosure And algorithm steps, can be realized with electronic hardware, computer software, or a combination of the two, in order to clearly demonstrate hardware and The interchangeability of software generally describes each exemplary composition and step according to function in the above description.These Function is implemented in hardware or software actually, the specific application and design constraint depending on technical solution.Profession Technical staff can use different methods to achieve the described function each specific application, but this realization is not answered Think beyond the scope of this invention.
The step of method described in conjunction with the examples disclosed in this document or algorithm, can directly be held with hardware, processor The combination of capable software module or the two is implemented.Software module can be placed in random access memory (RAM), memory, read-only deposit Reservoir (ROM), electrically programmable ROM, electrically erasable ROM, register, hard disk, moveable magnetic disc, CD-ROM or technology In any other form of storage medium well known in field.
Above to distributed memory system date storage method, device, equipment and readable storage medium provided by the present invention Matter is described in detail.Used herein a specific example illustrates the principle and implementation of the invention, above The explanation of embodiment is merely used to help understand method and its core concept of the invention.It should be pointed out that for the art Those of ordinary skill for, without departing from the principle of the present invention, can also to the present invention carry out it is several improvement and repair Decorations, these improvements and modifications also fall within the scope of protection of the claims of the present invention.

Claims (10)

1. a kind of distributed memory system date storage method characterized by comprising
To file to be stored piecemeal, several file to be stored blocks are obtained;
The file to be stored block and pre-stored blocks of files are carried out content to compare, in judgement system with the presence or absence of with it is described The blocks of files of file to be stored block content matching;
If so, the data storage location of the blocks of files of content matching described in acquisition system;
Data directory is established to matched file to be stored block according to the data storage location.
2. distributed memory system date storage method as described in claim 1, which is characterized in that it is described by described wait store Blocks of files carries out content with pre-stored blocks of files and compares, and whether there is and the file to be stored block content in judgement system Matched blocks of files includes:
The hash value for calculating the file to be stored block obtains hash value to be stored;
The hash value to be stored is compared with blocks of files hash value in concordance list, judge whether to have in the concordance list with The identical blocks of files hash value of the hash value to be stored;Wherein, storage file is stored in system in the concordance list Blocks of files hash value and corresponding blocks of files storage location.
3. distributed memory system date storage method as claimed in claim 2, which is characterized in that further include:
If the not no identical blocks of files hash value with the hash value to be stored in the concordance list, stores the number to be stored According to, and the hash value to be stored and corresponding data storage location are added in the concordance list.
4. distributed memory system date storage method as claimed in claim 3, which is characterized in that further include:
By updated concordance list real-time release into system each node.
5. distributed memory system date storage method as claimed in claim 2, which is characterized in that calculate the text to be stored The hash value of part block includes:
The hash value of the file to be stored block is calculated by SHA-1 hash function.
6. distributed memory system date storage method as claimed in claim 2, which is characterized in that the generation of the concordance list Method includes:
It whether there is pre-stored file in judgement system;
If so, calculating the blocks of files hash value of the storage file according to the blocks of files occupancy situation of storage file;
Each blocks of files hash value of storage file and the corresponding data storage location is calculated, generates index Table.
7. distributed memory system date storage method as claimed in claim 6, which is characterized in that further include:
Blocks of files hash value each in the concordance list is compared two-by-two, is judged in the concordance list with the presence or absence of identical hash value Blocks of files;
If so, determining document retaining block and non-reserved blocks of files;
The data storage location of the document retaining block is replaced to the storing data of the non-reserved blocks of files.
8. a kind of distributed memory system data storage device characterized by comprising
Blocking unit, for obtaining several file to be stored blocks to file to be stored piecemeal;
Comparing unit is compared for the file to be stored block to be carried out content with pre-stored blocks of files, in judgement system With the presence or absence of the blocks of files with the file to be stored block content matching;
Data information acquiring unit, for if so, the blocks of files of content matching described in acquisition system data storage location;
Index establishes unit, for establishing data directory to matched file to be stored block according to the data storage location.
9. a kind of distributed memory system data storage device characterized by comprising
Memory, for storing program;
Processor realizes that the distributed memory system data as described in any one of claim 1 to 7 are deposited when for executing described program The step of method for storing.
10. a kind of readable storage medium storing program for executing, which is characterized in that be stored with program on the readable storage medium storing program for executing, described program is located It manages and is realized when device executes as described in any one of claim 1 to 7 the step of distributed memory system date storage method.
CN201811108494.XA 2018-09-21 2018-09-21 Distributed memory system date storage method, device, system and storage medium Pending CN109241023A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811108494.XA CN109241023A (en) 2018-09-21 2018-09-21 Distributed memory system date storage method, device, system and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811108494.XA CN109241023A (en) 2018-09-21 2018-09-21 Distributed memory system date storage method, device, system and storage medium

Publications (1)

Publication Number Publication Date
CN109241023A true CN109241023A (en) 2019-01-18

Family

ID=65055957

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811108494.XA Pending CN109241023A (en) 2018-09-21 2018-09-21 Distributed memory system date storage method, device, system and storage medium

Country Status (1)

Country Link
CN (1) CN109241023A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109977121A (en) * 2019-03-27 2019-07-05 上海鸣鸾互联网科技有限公司 A kind of big data quick storage system
CN110413589A (en) * 2019-07-30 2019-11-05 中国联合网络通信集团有限公司 Approaches to IM and platform based on interspace file system
CN111158590A (en) * 2019-12-17 2020-05-15 苏州浪潮智能科技有限公司 Method and equipment for solving hash collision
CN111475502A (en) * 2019-01-24 2020-07-31 中国电力科学研究院有限公司 Data management method and system for distributed renewable energy
CN112667858A (en) * 2020-12-25 2021-04-16 深圳创新科技术有限公司 Method for storing data by adopting HASH chain and data writing and reading methods
CN112799584A (en) * 2019-11-13 2021-05-14 杭州海康威视数字技术股份有限公司 Data storage method and device
CN113064556A (en) * 2021-04-29 2021-07-02 山东英信计算机技术有限公司 BIOS data storage method, device, equipment and storage medium
CN113127421A (en) * 2021-04-01 2021-07-16 山东英信计算机技术有限公司 Method and equipment for searching file content in storage system
CN114817230A (en) * 2022-06-29 2022-07-29 深圳市乐易网络股份有限公司 Data stream filtering method and system
CN118394802A (en) * 2024-06-27 2024-07-26 国网山东省电力公司滨州市沾化区供电公司 Power monitoring data storage management method, system, equipment and medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110078219A1 (en) * 2006-08-25 2011-03-31 Qnx Software Systems Gmbh & Co. Kg Filesystem having a filename cache
CN103051676A (en) * 2012-11-26 2013-04-17 浪潮电子信息产业股份有限公司 Distributed data storage management method
CN103714123A (en) * 2013-12-06 2014-04-09 西安工程大学 Methods for deleting duplicated data and controlling reassembly versions of cloud storage segmented objects of enterprise
CN103916483A (en) * 2014-04-28 2014-07-09 中国科学院成都生物研究所 Self-adaptation data storage and reconstruction method for coding redundancy storage system
CN105354246A (en) * 2015-10-13 2016-02-24 华南理工大学 Distributed memory calculation based data deduplication method
CN106873919A (en) * 2017-03-20 2017-06-20 郑州云海信息技术有限公司 A kind of date storage method and device based on cloud storage system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110078219A1 (en) * 2006-08-25 2011-03-31 Qnx Software Systems Gmbh & Co. Kg Filesystem having a filename cache
CN103051676A (en) * 2012-11-26 2013-04-17 浪潮电子信息产业股份有限公司 Distributed data storage management method
CN103714123A (en) * 2013-12-06 2014-04-09 西安工程大学 Methods for deleting duplicated data and controlling reassembly versions of cloud storage segmented objects of enterprise
CN103916483A (en) * 2014-04-28 2014-07-09 中国科学院成都生物研究所 Self-adaptation data storage and reconstruction method for coding redundancy storage system
CN105354246A (en) * 2015-10-13 2016-02-24 华南理工大学 Distributed memory calculation based data deduplication method
CN106873919A (en) * 2017-03-20 2017-06-20 郑州云海信息技术有限公司 A kind of date storage method and device based on cloud storage system

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111475502A (en) * 2019-01-24 2020-07-31 中国电力科学研究院有限公司 Data management method and system for distributed renewable energy
CN109977121A (en) * 2019-03-27 2019-07-05 上海鸣鸾互联网科技有限公司 A kind of big data quick storage system
CN110413589A (en) * 2019-07-30 2019-11-05 中国联合网络通信集团有限公司 Approaches to IM and platform based on interspace file system
CN112799584A (en) * 2019-11-13 2021-05-14 杭州海康威视数字技术股份有限公司 Data storage method and device
CN112799584B (en) * 2019-11-13 2023-04-07 杭州海康威视数字技术股份有限公司 Data storage method and device
CN111158590A (en) * 2019-12-17 2020-05-15 苏州浪潮智能科技有限公司 Method and equipment for solving hash collision
CN111158590B (en) * 2019-12-17 2021-07-06 苏州浪潮智能科技有限公司 Method and equipment for solving hash collision
CN112667858A (en) * 2020-12-25 2021-04-16 深圳创新科技术有限公司 Method for storing data by adopting HASH chain and data writing and reading methods
CN113127421A (en) * 2021-04-01 2021-07-16 山东英信计算机技术有限公司 Method and equipment for searching file content in storage system
CN113064556A (en) * 2021-04-29 2021-07-02 山东英信计算机技术有限公司 BIOS data storage method, device, equipment and storage medium
CN114817230A (en) * 2022-06-29 2022-07-29 深圳市乐易网络股份有限公司 Data stream filtering method and system
CN118394802A (en) * 2024-06-27 2024-07-26 国网山东省电力公司滨州市沾化区供电公司 Power monitoring data storage management method, system, equipment and medium

Similar Documents

Publication Publication Date Title
CN109241023A (en) Distributed memory system date storage method, device, system and storage medium
JP6373328B2 (en) Aggregation of reference blocks into a reference set for deduplication in memory management
EP3120261B1 (en) Dependency-aware transaction batching for data replication
CN109983456B (en) Method and system for searching key range in memory
US10783163B2 (en) Instance-based distributed data recovery method and apparatus
CN107368260A (en) Memory space method for sorting, apparatus and system based on distributed system
US9442914B2 (en) Using parallel insert sub-ranges to insert into a column store
CN112100182B (en) Data warehouse-in processing method, device and server
CN114281989B (en) Data deduplication method and device based on text similarity, storage medium and server
CN110036381B (en) In-memory data search technique
CN110647318A (en) Method, device, equipment and medium for creating instance of stateful application
CN114936188A (en) Data processing method and device, electronic equipment and storage medium
CN112835511B (en) Data writing method, device, equipment and medium of distributed storage cluster
CN113609090B (en) Data storage method and device, computer readable storage medium and electronic equipment
CN111143323B (en) MPP database management method, device and system
US20160275134A1 (en) Nosql database data validation
US9684668B1 (en) Systems and methods for performing lookups on distributed deduplicated data systems
CN108595251A (en) Dynamic Graph update method, device, storage engines interface and program medium
CN109992575B (en) Distributed storage system for big data
CN116820350A (en) Data storage method, device, equipment and storage medium
CN114518848B (en) Method, device, equipment and medium for processing stored data
CN110297842B (en) Data comparison method, device, terminal and storage medium
US10891274B2 (en) Data shuffling with hierarchical tuple spaces
US11829398B2 (en) Three-dimensional probabilistic data structure
CN117171266B (en) Data synchronization method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190118