CN103501319A - Low-delay distributed storage system for small files - Google Patents

Low-delay distributed storage system for small files Download PDF

Info

Publication number
CN103501319A
CN103501319A CN201310429804.9A CN201310429804A CN103501319A CN 103501319 A CN103501319 A CN 103501319A CN 201310429804 A CN201310429804 A CN 201310429804A CN 103501319 A CN103501319 A CN 103501319A
Authority
CN
China
Prior art keywords
cluster
client
dataserver
node
write
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201310429804.9A
Other languages
Chinese (zh)
Inventor
王鲁俊
龙翔
王雷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN201310429804.9A priority Critical patent/CN103501319A/en
Publication of CN103501319A publication Critical patent/CN103501319A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a low-delay distributed storage system for small files. According to the low-delay distributed storage system for the small files, all DataServers of the low-delay distributed storage system are logically organized to form a ring, the consistent Hash scheme is adopted by the low-delay distributed storage system, hashing is carried out on IDs of the DataServers according to a specific Hash algorithm, the DataServers are distributed on a ring of a whole Hash value range according to Hash values, a central CV node is arranged in a cluster, cluster topological information managed by the CV node comprises all moving DataServer lists in the cluster and a version number of the current cluster topological information, and a client locally caches the cluster topological information. When the client carries out clustering for the first time, the client can have access to the CV node to obtain the cluster topological information, and locally cache the cluster topological information, and the locally-cached cluster topological information is used when follow-up read-write is carried out. When the client carries out read-write, hashing is carried out on filenames according to the filenames and the consistent Hash scheme, the DataServers where the small files are located are determined, then, comparison between the version number of the cluster topological information stored on the DataServers and the version number of the cluster topological information stored on the client is carried out, and when the version numbers are the same, the actual read-write operation can be carried out on the DataServers.

Description

A kind of distributed memory system towards small documents of low delay
Technical field
The present invention relates to distributed storage and mass small documents field of storage, be specifically related to a kind of system of the distributed storage towards small documents of low delay.
Background technology
Small documents is often referred to the file that document size is less than HDFS default tile size (being 64MB), and in current application, photo files, music file, email content of text, microblogging content etc. can be thought small documents.
The small documents problem has caused some concerns in academia and industrial quarters gradually.Famous social network sites Facebook has stored 2,600 hundred million pictures, and capacity surpasses 20PB, and these file overwhelming majority all are less than 64MB.In the supercomputer field, for example, the application program on ORNL ' s CrayXT5 cluster (18688 nodes, 12 processors of each node) can, periodically by the application state writing in files, cause system to produce a large amount of small documents.U.S.'s Pacific Northwest National Laboratory a research report of 2007 shows, 1,200 ten thousand files are arranged in this laboratory system, wherein 94% file is less than 64MB, 58% file is less than 64KB, in concrete scientific research computing environment, for example, during some biology calculates, may produce 3,000 ten thousand files, and its mean size only has 190KB.The huge whale net of music site has been included the music file of 3,600,000 MP3 format.Other documents also show, the data of accessing on the Internet mostly are the small documents of high access frequency.
GFS technical control people Sean Quinlan mentions one of them application scenarios of BigTable towards small documents in the GFS interview.Hadoop existing problems aspect the processing mass small documents are also pointed out in the report about Small File Problem of the famous Hadoop application Cloudera of company issue.
Hadoop itself provides Hadoop archive(HAR) be used for small documents is merged into to large file.The HAR file is to go up by HDFS the file system that builds a stratification to carry out work, and HAR file is that the archive order by Hadoop is created, this order actual motion a Mapreudce task small documents is packaged into to the HAR file.
GIGA+ has studied the applicable cases of mass small documents under single catalogue, the good directory design scheme GIGA+ of a kind of autgmentability has been proposed, GIGA+ is by being distributed to index on server nodes different in cluster and avoiding synchronous and serializing, realize asynchronous a, final consistency, the directory design of the outmoded Index Status of tolerable.This design is that original cluster application program is without change to having supplementing of cluster file system now.
Facebook has designed the Haystack storage system for its picture-storage application.Haystack Store is by being bundled to picture a large-scale volume(100G) in, by picture id and picture, the retrieving information (offset and size) in volume is set up and is shone upon and be placed in internal memory.System made the assembly of the Haystack Cache picture that comes buffer memory to newly increase, and have Haystack Directory to process mapping and the load balancing of volume.Carry out accelerated reconstruction memory-mapped information by setting up index file (index file).
TFS is the storage system towards mass small documents of increasing income, and is widely used in the companies such as Taobao, Renren Network.TFS comprises three parts: TFS cluster, Meta service cluster, and client library.
The TFS cluster mainly comprises NameServer and a plurality of DataServer.Mode with the Haystack system management piece of Facebook is consistent, and TFS merges a large amount of small documents to become a large file, and this large file is called piece (Block), and each piece has unique ID, and piece is distributed on each DataServer.NameServer is responsible for the condition managing of DataServer, and the mapping relations of maintenance block and DataServer.The read-write of the not responsible real data of NameServer, the read-write of real data is completed by DataServer.The TFS cluster is used TFS filename storing documents, and the TFS filename is the character string after piece number, skew, file size are encoded.
The Meta service cluster comprises that a main controlled node RootServer and a plurality of service node MetaServer form.RootServer mainly manages all MetaServer, and MetaServer is for managing the mapping of user-defined file name and TFS filename.TFS is used the MySQL database that the rear end persistent storage is provided at present.
TFS can not configure the Meta service cluster, and now TFS only uses the TFS filename, does not support the user-defined file name.The TFS that remembers this configuration is TFS-noname.The cluster that note has configured the Meta service cluster is TFS-name.
TFS can completing user user-defined file name access small documents, but still there are 3 problems in TFS.
Needing to set up repeatedly network in the first, TFS reading and writing of files process connects.When client is write small documents, at first client library is accessed the NameServer in the TFS cluster, NameServer specifies one can write piece for writing this small documents, then client-access this can write piece place DataServer and carry out actual write operation and return to TFS filename of client, client visits again the Meta service cluster, by the user-defined file name with just obtained TFS filename mapping relations and write the Meta service cluster.When client is read small documents, at first client accesses the Meta service cluster, read corresponding TFS filename according to the User Defined filename, then client-access NameServer, resolve the TFS filename, obtain wherein piece number, and arrive the mapping relations of DataServer according to the piece of managing in NameServer number, obtain the DataServer that should access.Client visits again DataServer, carries out actual file read operation.TFS is when carrying out read-write operation, if there is no buffer memory MetaServer information in client library, client need to first be accessed RootServer and be obtained current active MetaServer, then carries out follow-up processing.Therefore TFS is minimum carries out three networks and has connected read-write operation one time, while accessing for the first time, needs four networks to connect.This is one of reason of TFS reading and writing of files poor efficiency.
The MySQL of the second, TFS operating weight level, as the rear end storage system of storage TFS filename and user-defined file name mapping relations, than the NoSQL database with lightweight, postpones expense also larger.
The 3rd, the NameServer of TFS has recorded the information of all, has safeguarded that piece number arrives the mapping relations of DataServer.If NameServer carries out while recovering after occurring to lose efficacy, must reconstructed block information and mapping relations.From TFS Organization Chart and read-write flow process, NameServer is the Single Point of Faliure of TFS cluster, and when NameServer lost efficacy, whole cluster read-write is all unavailable.Therefore the TFS availability still is improved space.
Summary of the invention
The problem existed for TFS, the present invention designs a kind of new distributed memory system towards small documents.System architecture is as Fig. 1, and all DataServer logically are organized into ring (S1 in figure~S8 node).System adopts consistency Hash scheme, according to specific hash algorithm, the ID of DataServer is carried out to Hash, and according to cryptographic Hash, each DataServer is distributed on the ring of whole Hash codomain.
The CV(Central Version at Yi Ge center is set in cluster) node, each DataServer periodically sends its heartbeat message to the CV node, and the CV node receives these message, for the topology information of management cluster.The cluster topology information of CV node administration comprises the DataServer list of all activities in cluster and the version number of current cluster topology information.The ID of each movable DataServer and IP address and the port that this DataServer monitors have been preserved in the DataServer list.Cluster topology information version number means with the monotonically increasing timestamp.Whenever cluster has new DataServer to add or original DataServer while exiting, the CV node regenerates a cluster topology information, and the version number of this cluster topology information is set to the current time stamp, then the cluster topology information that the CV node is new by this sends to the DataServer of all current actives, and all like this DataServer can preserve same cluster global information.
Client is in local cache cluster topology information.Client during cluster, can be accessed the CV node and obtain the cluster topology information, and be buffered in this locality for the first time, uses the cluster topology information of local cache during follow-up read-write.
When client is read and write, at first according to filename, according to consistency Hash scheme, filename is carried out to Hash, and determine the DataServer that this small documents drops on.Then contrast the version number of the cluster topology information of cluster topology information that DataServer preserves and client storage, if version number is consistent, at DataServer, carry out actual read-write operation.
DataServer has two primary clusterings, and as Fig. 2, one is the piece Management Unit, and one is the search information managing assembly.The piece Management Unit is used small documents to be merged into the scheme of bulk.System is allocated larger blocks of files in advance, and the small documents then newly write can write in bulk.At known small documents place piece number, small documents is in the situation that piece bias internal amount and these retrieving informations of small documents size just can retrieve this small documents from a DataServer.System stores the mapping relations of management document name to retrieving information with Key-Value, that is:
Key:filename→Value:(BlockId,Offset,Size)
System has also been realized a Key-Value storage that is similar to Redis and possesses the persistence function.System stores to manage retrieving information with this Key-Value.
The accompanying drawing explanation
Fig. 1 is system architecture diagram.
Fig. 2 is the structure chart of DataServer in system.
Embodiment
Step 1: design a piece Management Unit.
Small documents is left in bulk, and each bulk preassignment is good.The small documents newly write sequentially writes in bulk.The piece Management Unit provides to writing a small documents in piece and read the interface of a small documents from piece.After writing a small documents in the piece Management Unit, the piece Management Unit returns to a retrieving information, and retrieving information comprises piece number, side-play amount, size.The piece Management Unit can read out according to retrieving information a small documents from the piece Management Unit.
Step 2: design a key-value Management Unit.
Realize the storage of an internal memory key-value, and the key-value that each is deposited in writes disk simultaneously.Key is the small documents filename, and value is retrieving information.Key-value at internal memory realizes with Hash table, hash function use murmurhash algorithm.To the key-value couple of each new insertion, all order writes disk.
Step 3: design CV node.
Design CV node, for receiving the heartbeat message of DataServer, and safeguard the topology information of whole cluster.When the cluster topology information changes, the CV node sends up-to-date cluster topology information to each node.Start a monitoring service on the CV node, monitor the heartbeat message that each is connected to the DataServer of CV.And use std::vector to manage all active DataServer.For each new heartbeat message, if the sender DataServer of this message add vector, and mark cluster topology information has had renewal not in vector.If exist, this DataServer deleted from vector and add the vector end to.Check first DataServer in vector, if the last heartbeat message time gap current time of this DataServer surpasses certain threshold value, clear up this DataServer(and clear up corresponding network connection simultaneously when the network connected reference make mistakes), mark cluster topology information has had renewal.If the cluster topology information has had renewal, new cluster topology information is sent to all active DataServer.
Step 4: read-write flow process.
The flow process that system is write small documents is as follows:
1. if client is access system for the first time, client-access CV node, the topology information of request cluster, and be recorded to this locality.During connected reference, if not access system for the first time, client terminal local buffer memory the topology information of cluster.
2. client is carried out Hash to filename, and determines that according to the consistency hash algorithm which DataServer this small documents should be processed by.
3. the DataServer obtained in client-access 2, send to this DataServer by the filename of the cluster topology information of client-cache, small documents, small documents content buffering area.
4.DataServer at first judge that whether the cluster topology information of client-cache is out-of-date, whether the version number that contrasts the cluster topology information of DataServer record itself and cluster topology information in client write request message is consistent.If unanimously turn 5.If inconsistent, contrast the cluster topology information in client write request message, judge whether difference can affect this write operation, if do not affect, mark NEED_UPDATE also turns 5, otherwise tell the failure of client write operation, and new cluster topology information is sent to client, write operation finishes.
5.DataServer access search information managing assembly, check whether this small documents filename exists, if exist, tells the client file name to exist.Otherwise turn 6.
6.DataServer by the content write-in block of small documents, the retrieving information piece Management Unit obtained and filename write the search information managing assembly with the key-value form by the piece Management Unit simultaneously.And return and write success message to client, if be provided with the NEED_UPDATE mark, tell client by new cluster topology information simultaneously, write operation finishes.
The flow process that system is read small documents is as follows:
1. if client is access system for the first time, client-access CV node, the topology information of request cluster, and be recorded to this locality.During connected reference, if not access system for the first time, client terminal local buffer memory the topology information of cluster.
2. client is carried out Hash to filename, and determines that according to the consistency hash algorithm which DataServer this small documents should be processed by.
3. the DataServer obtained in client-access 2.Judge that whether cluster topology information subsidiary in the client read request is consistent with the cluster topology information version number of local DataServer record.If consistent, turn 4.If inconsistent, mark NEED_UPDATE.
4.DataServer inquire about the filename of this small documents to the search information managing assembly, check whether this small documents filename exists.If exist, read out retrieving information, turn 5.If there is no, to the client Transmit message, do not have message, if 3 be provided with the NEED_UPDATE mark, new cluster topology information is attached in there is not message in file, the notice client is upgraded the cluster topology information in buffer memory, and read operation finishes.
5.DataServer the retrieving information by obtaining in 4 reads the small documents content, and sends to client from the piece Management Unit, if mark NEED_UPDATE, new cluster topology information is attached in this message, read operation finishes.
The TFS that company of Taobao increases income, in the read-write process, at least carries out three networks and connects.In the situation that the client of TFS does not have the MetaServer information of buffer memory TFS, client library also needs to connect the RootServer of TFS again.
From our read-write flow process of system, can find out, in the situation of connected reference, client, after accessing for the first time the CV node, reads the cluster topology information, and at client-cache.When the cluster topology does not change, the follow-up read-write operation of client according to client the cluster topology information of buffer memory can directly determine the DataServer that client need to be accessed, this DataServer of client-access completes read request or write request.
When the cluster topology changes, at first the follow-up read-write requests of client according to the outmoded cluster topology information of buffer memory before client, determines the DataServer that will access.
If client successful connection, and correctly carried out reading or writing, show that DataServer judges the read-write requests that cluster change in topology now (newly-increased node or have node to exit) does not have influence on this small documents, can from request message read out subsidiary up-to-date cluster topology information after client completes this read-write requests more simultaneously, upgrade the outmoded cluster topology information buffer memory of client terminal local.Follow-up read-write requests just can visit by up-to-date cluster topology information the cluster of our design again.If connect unsuccessfully, show to want the node of access to lose efficacy and exit, client needs manyly once access the CV node, obtains current up-to-date cluster topology information, re-starts read-write.If access successfully but DataServer has judged the cluster topology change affects this time read-write requests, DataServer replys the read-write on client side failure, and up-to-date cluster topology information is attached to client, and client re-starts read-write.
Therefore designed system of the present invention has been simplified the read-write flow process, and the network reduced in each read-write process connects number of times.This improvement of the results show can effectively reduce delay.
In addition, designed system of the present invention uses the more Key-Value of lightweight to store to manage retrieving information.There is test to show, many at retrieving information, write continuously in the situation of retrieving information, Key-Value storage tends on performance the delay as lower in MySQL shows than traditional database and higher throughput.
From center node load, fault recovery speed, three angle comparative analysis TFS of system robustness and system of the present invention, prove that system of the present invention more has superiority aspect availability.
Whether the Centroid CV of system of the present invention only is responsible for monitoring DataServer still active.List of CV node maintenance, each is all the time that a DataServer and the last heartbeat thereof arrive for list.When the CV node is received the heartbeat message that new certain DataServer sends, if this DataServer be in list, do not have add list, otherwise upgrade the last heart time corresponding to this DataServer in list.Check in list simultaneously and whether the overtime DataServer that does not receive heartbeat is arranged, if having, it is removed from list.If list has DataServer newly-increased or that be eliminated, the cluster topology information version number CV safeguarded is updated to the current time stamp.And up-to-date cluster topology information is distributed to all active DataServer.Because the CV node is only to receive DataServer to send heartbeat message, safeguard a list, so the load of CV node process is very low.There is document to show, the higher node of load in cluster, the probability that inefficacy occurs is higher.Have document to show, the node that relates to more IO in cluster more easily lost efficacy simultaneously.Therefore, compare NameServer node in TFS, in system of the present invention, the load of CV node is much lower, the CV node only has network I/O and there is no disk I/O simultaneously, therefore under same running environment, the probability that in the likelihood ratio TFS that the CV node of system of the present invention occurs to lose efficacy, NameServer lost efficacy is lower.
NameServer is that Single Point of Faliure is the same in TFS, and under certain meaning, the CV node is the Single Point of Faliure of system of the present invention.Because NameServer in TFS has safeguarded all map informations to DataServer, therefore, after the NameServer node lost efficacy, need to rebuild this mapping relations, the data structure complexity of safeguarding in this process.And in system of the present invention, the CV node only safeguards that whether DataServer is still active, so the CV node failure, after process is restarted, can realize second recovery of level.Therefore in same situation about losing efficacy, system down time of the present invention is shorter, according to the availability calculations formula:
A = E [ Uptime ] E [ Uptime ] + E [ Downtime ]
Wherein, E[Uptime] and E[Downtime] be respectively up duration (system can provide service) and down time (system can not provide service).Under same running environment, the E[Downtime of system of the present invention] less, so system availability is higher.
In TFS, each read-write operation must pass through NameServer, and therefore, as long as the NameServer of TFS lost efficacy, all requests of client all can't complete.In system of the present invention, in the situation of connected reference cluster, even CV node failure or part DataServer lost efficacy, for client part read-write requests, still may correctly complete.Therefore, the CV node is not the proper Single Point of Faliure of system of the present invention.

Claims (2)

1. the present invention has designed new system architecture, it is characterized in that: do not have in vicissitudinous situation in the cluster topology, under the pattern of connected reference, while accessing, only need primary network to connect at every turn.This as TFS, accesses more efficient with respect to similar system.
2. the CV node load that system architecture of the present invention is used is extremely light, and it is characterized in that: CV node failure probability is low, and can realize second recovery of level.
CN201310429804.9A 2013-09-18 2013-09-18 Low-delay distributed storage system for small files Pending CN103501319A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310429804.9A CN103501319A (en) 2013-09-18 2013-09-18 Low-delay distributed storage system for small files

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310429804.9A CN103501319A (en) 2013-09-18 2013-09-18 Low-delay distributed storage system for small files

Publications (1)

Publication Number Publication Date
CN103501319A true CN103501319A (en) 2014-01-08

Family

ID=49866489

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310429804.9A Pending CN103501319A (en) 2013-09-18 2013-09-18 Low-delay distributed storage system for small files

Country Status (1)

Country Link
CN (1) CN103501319A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104780201A (en) * 2015-03-02 2015-07-15 新浪网技术(中国)有限公司 Data packet processing method and device for use in IPVS (Internet Protocol Virtual Server) cluster
CN105162891A (en) * 2015-10-14 2015-12-16 四川中科腾信科技有限公司 Data storage method based on IP network
CN105187565A (en) * 2015-10-14 2015-12-23 四川携创信息技术服务有限公司 Method for utilizing network storage data
CN106210151A (en) * 2016-09-27 2016-12-07 深圳市彬讯科技有限公司 A kind of zedis distributed caching and server cluster monitoring method
CN106776891A (en) * 2016-11-30 2017-05-31 山东浪潮商用系统有限公司 A kind of method and apparatus of file storage
CN107357921A (en) * 2017-07-21 2017-11-17 北京奇艺世纪科技有限公司 A kind of small documents storage localization method and system
CN108345693A (en) * 2018-03-16 2018-07-31 中国银行股份有限公司 A kind of document handling method and device
CN109471834A (en) * 2018-11-15 2019-03-15 上海联影医疗科技有限公司 Synchronous ring structure, synchronous method, medical image system, equipment and storage medium
CN114422518A (en) * 2022-03-31 2022-04-29 北京奥星贝斯科技有限公司 Method and device for requesting service
CN114745281A (en) * 2022-04-11 2022-07-12 京东科技信息技术有限公司 Data processing method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020147815A1 (en) * 2001-04-09 2002-10-10 Alexander Tormasov Distributed network data storage system and method
CN101741731A (en) * 2009-12-03 2010-06-16 中兴通讯股份有限公司 Content metadata storing, inquiring method and managing system in content delivery network (CDN)
CN102664914A (en) * 2012-03-22 2012-09-12 北京英孚斯迈特信息技术有限公司 IS/DFS-Image distributed file storage query system
CN103176754A (en) * 2013-04-02 2013-06-26 浪潮电子信息产业股份有限公司 Reading and storing method for massive amounts of small files

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020147815A1 (en) * 2001-04-09 2002-10-10 Alexander Tormasov Distributed network data storage system and method
CN101741731A (en) * 2009-12-03 2010-06-16 中兴通讯股份有限公司 Content metadata storing, inquiring method and managing system in content delivery network (CDN)
CN102664914A (en) * 2012-03-22 2012-09-12 北京英孚斯迈特信息技术有限公司 IS/DFS-Image distributed file storage query system
CN103176754A (en) * 2013-04-02 2013-06-26 浪潮电子信息产业股份有限公司 Reading and storing method for massive amounts of small files

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
付松龄等: "FlatLFS: 一种面向海量小文件处理优化的轻量级文件系统", 《国防科技大学学报》 *
第 8 期: "海量小文件存储文件系统研究综述", 《计算机应用与软件》 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104780201A (en) * 2015-03-02 2015-07-15 新浪网技术(中国)有限公司 Data packet processing method and device for use in IPVS (Internet Protocol Virtual Server) cluster
CN105162891A (en) * 2015-10-14 2015-12-16 四川中科腾信科技有限公司 Data storage method based on IP network
CN105187565A (en) * 2015-10-14 2015-12-23 四川携创信息技术服务有限公司 Method for utilizing network storage data
CN106210151A (en) * 2016-09-27 2016-12-07 深圳市彬讯科技有限公司 A kind of zedis distributed caching and server cluster monitoring method
CN106776891A (en) * 2016-11-30 2017-05-31 山东浪潮商用系统有限公司 A kind of method and apparatus of file storage
CN107357921A (en) * 2017-07-21 2017-11-17 北京奇艺世纪科技有限公司 A kind of small documents storage localization method and system
CN108345693A (en) * 2018-03-16 2018-07-31 中国银行股份有限公司 A kind of document handling method and device
CN108345693B (en) * 2018-03-16 2022-01-28 中国银行股份有限公司 File processing method and device
CN109471834A (en) * 2018-11-15 2019-03-15 上海联影医疗科技有限公司 Synchronous ring structure, synchronous method, medical image system, equipment and storage medium
CN109471834B (en) * 2018-11-15 2022-04-15 上海联影医疗科技股份有限公司 Sync ring structure, synchronization method, medical imaging system, apparatus, and storage medium
CN114422518A (en) * 2022-03-31 2022-04-29 北京奥星贝斯科技有限公司 Method and device for requesting service
CN114745281A (en) * 2022-04-11 2022-07-12 京东科技信息技术有限公司 Data processing method and device
CN114745281B (en) * 2022-04-11 2023-12-05 京东科技信息技术有限公司 Data processing method and device

Similar Documents

Publication Publication Date Title
US11153380B2 (en) Continuous backup of data in a distributed data store
US20240045848A1 (en) Key-value store and file system integration
CN103501319A (en) Low-delay distributed storage system for small files
US11016943B2 (en) Garbage collection for objects within object store
US11144498B2 (en) Defragmentation for objects within object store
US9547706B2 (en) Using colocation hints to facilitate accessing a distributed data storage system
CN106775446B (en) Distributed file system small file access method based on solid state disk acceleration
US20160212203A1 (en) Multi-site heat map management
US11080253B1 (en) Dynamic splitting of contentious index data pages
CN104239575A (en) Virtual machine mirror image file storage and distribution method and device
EP4139781B1 (en) Persistent memory architecture
CN102662992A (en) Method and device for storing and accessing massive small files
CN104111804A (en) Distributed file system
US10909143B1 (en) Shared pages for database copies
JP2016513306A (en) Data storage method, data storage device, and storage device
US11080207B2 (en) Caching framework for big-data engines in the cloud
US10909091B1 (en) On-demand data schema modifications
CN103037004A (en) Implement method and device of cloud storage system operation
CN111984191A (en) Multi-client caching method and system supporting distributed storage
US11544007B2 (en) Forwarding operations to bypass persistent memory
US11822520B2 (en) Freeing pages within persistent memory
CN109726211B (en) Distributed time sequence database
CN102904917A (en) Mass image processing system and method thereof
CN105187565A (en) Method for utilizing network storage data
CN103491124A (en) Method for processing multimedia message data and distributed cache system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20140108