CN108011956A - Distributed storage method based on file content cryptographic Hash - Google Patents

Distributed storage method based on file content cryptographic Hash Download PDF

Info

Publication number
CN108011956A
CN108011956A CN201711274018.0A CN201711274018A CN108011956A CN 108011956 A CN108011956 A CN 108011956A CN 201711274018 A CN201711274018 A CN 201711274018A CN 108011956 A CN108011956 A CN 108011956A
Authority
CN
China
Prior art keywords
file
cryptographic hash
document
storage
client
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201711274018.0A
Other languages
Chinese (zh)
Inventor
唐文建
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
E-House (china) Enterprise Group Ltd By Share Ltd
Original Assignee
E-House (china) Enterprise Group Ltd By Share Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by E-House (china) Enterprise Group Ltd By Share Ltd filed Critical E-House (china) Enterprise Group Ltd By Share Ltd
Priority to CN201711274018.0A priority Critical patent/CN108011956A/en
Publication of CN108011956A publication Critical patent/CN108011956A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/06Protocols specially adapted for file transfer, e.g. file transfer protocol [FTP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/02Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/32Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials
    • H04L9/3236Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials using cryptographic hash functions
    • H04L9/3239Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials using cryptographic hash functions involving non-keyed hash functions, e.g. modification detection codes [MDCs], MD5, SHA or RIPEMD

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A kind of distributed storage method based on file content cryptographic Hash, including:Step 1, client obtains the first cryptographic Hash of the file content to be uploaded by hash algorithm, and the write request with the first cryptographic Hash is sent to file application server;Step 2, mapping table of the file application server reception in write request of the client with the first cryptographic Hash, locating file database of record;Step 3, file application server obtains the second cryptographic Hash of this document content by hash algorithm, and by the second cryptographic Hash compared with the first cryptographic Hash;Step 4, the mapping table in file application server locating file storage database;If oneself is there are the map record of this document, the result that this document has been transmitted through on is returned to client;If there is no the map record of this document, this document is write into file storage database.The present invention has following advantage:Save memory space;Machine loading is effectively reduced, greatly improves document storage system handling capacity.

Description

Distributed storage method based on file content cryptographic Hash
Technical field
The present invention relates to computer memory technical field, particularly a kind of distributed storage based on file content cryptographic Hash Method.
Background technology
In the Internet, applications, file storage is a more commonly used service module, either picture, audio, video, Excel or PDF document can all use file storage service, it can be seen that, file storage service module is in the wide of the Internet, applications General property and importance.And traditional file memory method is to upload the date according to file to create directory storage, the position that file stores Put and compare concentration, the read-write pressure so to disk is bigger, and document storage server load also can be very high, causes file to store Handling capacity is low, when file reads or writes peak, easily reaches system bottleneck.
The content of the invention
For above-mentioned technological deficiency, it is an object of the invention to provide a kind of solution above-mentioned technical problem based in file Hold the distributed storage method of cryptographic Hash.
In order to solve the above technical problems, the distributed storage method provided by the invention based on file content cryptographic Hash, bag Include:Step 1, client obtains the first cryptographic Hash of the file content to be uploaded by hash algorithm, to file application service Device sends the write request with the first cryptographic Hash;
Step 2, file application server receives the write request that the first cryptographic Hash is carried from client, locating file record data Mapping table in storehouse;If oneself is there are the map record of this document, result that this document has been transmitted through on is returned to client and should Access address of the file in file storage database;If there is no the map record of this document, agree to client to file Application server uploads this document;
Step 3, file application server obtains the second cryptographic Hash of this document content by hash algorithm, and by the second cryptographic Hash Compared with the first cryptographic Hash;If the second cryptographic Hash is different from the value of the first cryptographic Hash, the second cryptographic Hash and first are breathed out The result that the value of uncommon value is different returns to client and return to step 1, if the second cryptographic Hash is identical with the value of the first cryptographic Hash, into Enter step 4;
Step 4, the mapping table in file application server locating file storage database;If there are the mapping note of this document for oneself Record, then return to the result that this document has been transmitted through on to client;If there is no the map record of this document, this document is write File storage database.
Step 4 includes:
Step 4.1, storage catalogue writable in file storage database is calculated in file application server;
Step 4.2, this document is write the storage catalogue and is returned the result to client by file application server.
Step 4.1 includes:
Step 4.1.1, file application server is according to second cryptographic Hash of this document or preceding 2 characters and text of the first cryptographic Hash The ID and memory space situation of document storage server in part storage database, are calculated writable file storage service Device;
Step 4.1.2, file application server is according to second cryptographic Hash of this document or first 3 to 6 words of the first cryptographic Hash Symbol, which is calculated, is used for the storage catalogue for writing this document in this document storage server.
In step 4.2, this document is write under the storage catalogue;If write-in failure, write-in failure is returned to client If as a result, write successfully, returned to client and write successful result.
In step 4.2, this document is write under the storage catalogue by name of cryptographic Hash.
In step 4.2, this document is write under the storage catalogue, the maximum attempts of write-in is three times.
In step 4.2, writing successful result includes the cryptographic Hash of this document content, and the ID of document storage server, deposit Store up catalogue, preserve the access address of successful result and this document.
Client uploads this document by HTTP interface to file application server.
Distributed storage method of the invention based on file content cryptographic Hash has following advantage:
1)Cryptographic Hash based on file content, identical file repeatedly uploads, and only storage once, saves memory space;
2)The server and storage catalogue calculated according to file content cryptographic Hash, the server of storage and position are more dispersed, this The storage efficiency of sample file is high, and pressure is also disperseed when reading, is effectively reduced machine loading, is greatly improved document storage system Handling capacity.
3)The storage server and storage catalogue calculated according to cryptographic Hash is all stored in database, and such file is deposited Storage system easily extends.
4)Client accesses file and is accessed according to file cryptographic Hash, so favorably beneficial to system fast positioning service Device and storage catalogue, accelerate file reading speed.
5)The cryptographic Hash of file content is to be calculated in advance in client, and identical big file not repeat to transmit, and save Bandwidth and transmission cost.
Embodiment
Distributed storage method of the invention based on file content cryptographic Hash, includes the following steps:
1st, the upper transmitting file of client selection, client is according to the file content to be uploaded, the Sha1 values of calculating this document content(Breathe out Uncommon value);
2nd, client is the Sha1 values by uploading file content, requesting query file application server, file application server According to the Sha1 that will upload file content, uploaded to file and transmitting file is inquired about on this in the file record database of record whether Upload;
If the 3, transmitting file is transmitted through on this, file application server directly returns to the access that result and this document are transmitted through on Address;
If the 4th, change upper transmitting file not on be transmitted through, file application server tell client not on be transmitted through, what client will upload The Sha1 values of file and its content are transferred to file application server by HTTP interface;
5th, the Sha1 values of this document content are calculated according to the upload file content, file application server will obtain Sha1 values and visitor The Sha1 values of this document content of family end transmission are compared;
If the 6, Sha1 values are inconsistent, the inconsistent result of file Sha1 values is directly returned into client;
If the 7, Sha1 values are consistent, to file storage database inquire about the Sha1 values file whether on be transmitted through, if on be transmitted through, directly Return to this document and upload result to client;
If the 8th, not on be transmitted through, according to upload file content Sha1 values preceding 2 characters and file storage database in file deposit List and the storage size of server are stored up, the document storage server that transmitting file should store on this is calculated;
9th, the storage catalogue in file in this document storage server is calculated according to 3 to 6 characters of the Sha1 values of file content;
10th, after obtaining storage this document storage catalogue, transmitting file on this is stored in this document storage by name of Sha1 values Under catalogue;
If the 11, transmitting file preserves failure on this, maximum has three tries, if proving an abortion, failure result is returned to client.
If the 12, transmitting file preserves successfully on this, file application server is transmitting file Sha1 values, file storage service on this Device ID and storage catalogue are saved in data, and preservation successful result and upload file access address are returned to client in the lump.
The preferred embodiment to the invention is illustrated above, but the present invention is not limited to embodiment, Those skilled in the art can also be made on the premise of without prejudice to the invention spirit a variety of equivalent deformations or Replace, these equivalent deformations or replacement are all contained in scope of the present application.

Claims (8)

1. a kind of distributed storage method based on file content cryptographic Hash, it is characterised in that include the following steps:
Step 1, client obtains the first cryptographic Hash of the file content to be uploaded by hash algorithm, to file application service Device sends the write request with the first cryptographic Hash;
Step 2, file application server receives the write request that the first cryptographic Hash is carried from client, locating file record data Mapping table in storehouse;If oneself is there are the map record of this document, result that this document has been transmitted through on is returned to client and should Access address of the file in file storage database;If there is no the map record of this document, client is to file application Server uploads this document;
Step 3, file application server obtains the second cryptographic Hash of this document content by hash algorithm, and by the second cryptographic Hash Compared with the first cryptographic Hash;If the second cryptographic Hash is different from the value of the first cryptographic Hash, the second cryptographic Hash and first are breathed out The result that the value of uncommon value is different returns to client and return to step 1, if the second cryptographic Hash is identical with the value of the first cryptographic Hash, into Enter step 4;
Step 4, the mapping table in file application server locating file storage database;If there are the mapping note of this document for oneself Record, then return to the result that this document has been transmitted through on to client;If there is no the map record of this document, this document is write File storage database.
2. the distributed storage method according to claim 1 based on file content cryptographic Hash, it is characterised in that step 4 Including:
Step 4.1, storage catalogue writable in file storage database is calculated in file application server;
Step 4.2, this document is write the storage catalogue and is returned the result to client by file application server.
3. the distributed storage method according to claim 2 based on file content cryptographic Hash, it is characterised in that step 4.1 including:
Step 4.1.1, file application server is according to second cryptographic Hash of this document or preceding 2 characters and text of the first cryptographic Hash The ID and memory space situation of document storage server in part storage database, are calculated writable file storage service Device;
Step 4.1.2, file application server is according to second cryptographic Hash of this document or first 3 to 6 words of the first cryptographic Hash Symbol, which is calculated, is used for the storage catalogue for writing this document in this document storage server.
4. the distributed storage method according to claim 2 based on file content cryptographic Hash, it is characterised in that step In 4.2, this document is write under the storage catalogue;If write-in failure, if being returned to client that write-in fails as a result, write-in Success, then return to client and write successful result.
5. the distributed storage method according to claim 4 based on file content cryptographic Hash, it is characterised in that step In 4.2, this document is write under the storage catalogue by name of cryptographic Hash.
6. the distributed storage method according to claim 4 based on file content cryptographic Hash, it is characterised in that step In 4.2, this document is write under the storage catalogue, the maximum attempts of write-in is three times.
7. the distributed storage method according to claim 4 based on file content cryptographic Hash, it is characterised in that step In 4.2, writing successful result includes the cryptographic Hash of this document content, the ID of document storage server, storage catalogue, preservation into The result of work(and the access address of this document.
8. the distributed storage method according to claim 1 based on file content cryptographic Hash, it is characterised in that step 2 In, client uploads this document by HTTP interface to file application server.
CN201711274018.0A 2017-12-06 2017-12-06 Distributed storage method based on file content cryptographic Hash Withdrawn CN108011956A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711274018.0A CN108011956A (en) 2017-12-06 2017-12-06 Distributed storage method based on file content cryptographic Hash

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711274018.0A CN108011956A (en) 2017-12-06 2017-12-06 Distributed storage method based on file content cryptographic Hash

Publications (1)

Publication Number Publication Date
CN108011956A true CN108011956A (en) 2018-05-08

Family

ID=62056839

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711274018.0A Withdrawn CN108011956A (en) 2017-12-06 2017-12-06 Distributed storage method based on file content cryptographic Hash

Country Status (1)

Country Link
CN (1) CN108011956A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117407372A (en) * 2023-10-18 2024-01-16 北京安证通信息科技股份有限公司 Method and system for removing duplicate of uploaded file

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101534322A (en) * 2009-04-13 2009-09-16 腾讯科技(深圳)有限公司 File upload system and file upload method
US7680998B1 (en) * 2007-06-01 2010-03-16 Emc Corporation Journaled data backup during server quiescence or unavailability
CN102622366A (en) * 2011-01-28 2012-08-01 阿里巴巴集团控股有限公司 Similar picture identification method and similar picture identification device
CN104067259A (en) * 2012-04-16 2014-09-24 惠普发展公司,有限责任合伙企业 File upload based on hash value comparison
CN106446001A (en) * 2016-07-29 2017-02-22 北京北信源软件股份有限公司 Method and system for storing files in computer storage mediums

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7680998B1 (en) * 2007-06-01 2010-03-16 Emc Corporation Journaled data backup during server quiescence or unavailability
CN101534322A (en) * 2009-04-13 2009-09-16 腾讯科技(深圳)有限公司 File upload system and file upload method
CN102622366A (en) * 2011-01-28 2012-08-01 阿里巴巴集团控股有限公司 Similar picture identification method and similar picture identification device
CN104067259A (en) * 2012-04-16 2014-09-24 惠普发展公司,有限责任合伙企业 File upload based on hash value comparison
CN106446001A (en) * 2016-07-29 2017-02-22 北京北信源软件股份有限公司 Method and system for storing files in computer storage mediums

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
微软公司: "《面向.NET的Web应用程序设计》", 29 February 2004, 北京:高等教育出版社 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117407372A (en) * 2023-10-18 2024-01-16 北京安证通信息科技股份有限公司 Method and system for removing duplicate of uploaded file

Similar Documents

Publication Publication Date Title
US8650164B2 (en) Efficient storage and retrieval for large number of data objects
US10664196B2 (en) Random access file management
AU2016204919B2 (en) Intelligent data delivery and storage based on data characteristics
US9336406B2 (en) Multiprotocol access control list with guaranteed protocol compliance
CN107153644B (en) Data synchronization method and device
US20180260412A1 (en) Unified file and object data storage
US9497257B1 (en) File level referrals
US10592106B2 (en) Replication target service
CN106506587A (en) A kind of Docker image download methods based on distributed storage
US20110119233A1 (en) System, method and computer program for synchronizing data between data management applications
US20100312749A1 (en) Scalable lookup service for distributed database
CN103475682A (en) File transfer method and file transfer equipment
CN107911461A (en) Object processing method, storage server and cloud storage system in cloud storage system
CN101916289A (en) Method for establishing digital library storage system supporting mass small files and dynamic backup number
CN108108247A (en) Distributed picture storage service system and method
CN108011956A (en) Distributed storage method based on file content cryptographic Hash
CN105187565A (en) Method for utilizing network storage data
CN111966742A (en) Data migration method and system
US9239860B1 (en) Augmenting virtual directories
US20110047165A1 (en) Network cache, a user device, a computer program product and a method for managing files
CN106934066A (en) A kind of metadata processing method, device and storage device
CN103701937A (en) Method for uploading large files
US20130058333A1 (en) Method For Handling Requests In A Storage System And A Storage Node For A Storage System
US20150026126A1 (en) Method of replicating data in asymmetric file system
US20220191345A1 (en) System and method for determining compression rates for images comprising text

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20180508