CN104778270A - Storage method for multiple files - Google Patents

Storage method for multiple files Download PDF

Info

Publication number
CN104778270A
CN104778270A CN201510200979.1A CN201510200979A CN104778270A CN 104778270 A CN104778270 A CN 104778270A CN 201510200979 A CN201510200979 A CN 201510200979A CN 104778270 A CN104778270 A CN 104778270A
Authority
CN
China
Prior art keywords
file
user
index
resource
write
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510200979.1A
Other languages
Chinese (zh)
Inventor
刘颖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Hui Zhi Distant View Science And Technology Ltd
Original Assignee
Chengdu Hui Zhi Distant View Science And Technology Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Hui Zhi Distant View Science And Technology Ltd filed Critical Chengdu Hui Zhi Distant View Science And Technology Ltd
Priority to CN201510200979.1A priority Critical patent/CN104778270A/en
Publication of CN104778270A publication Critical patent/CN104778270A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention provides a storage method for multiple files. The storage method is used for processing the files in a cloud storage platform, wherein the cloud storage platform comprises a user interface layer, a service logic layer and a storage layer; the service logic layer is used for combining the files of which the sizes are smaller than preset sizes and then establishing a file index; the storage layer is constructed on a distributed file system and provides a file reading and writing-in interface for the user. According to the storage method provided by the invention, the response speed of the cloud storage platform and the integral performance of the distributed file system are maintained under the reading-writing conditions of a large number of small files.

Description

A kind of storage means for multifile
Technical field
The present invention relates to data to store, particularly a kind of multifile storage means of large data.
Background technology
Along with the appearance with massive medical data that develops rapidly of intelligent medical treatment, need corresponding large database as carrier to preserve these data, but large data be scheduled to a large problem.The document retrieval quantity of medical circle along with Internet resources also exponentially level increase, especially the renewal speed of small documents and semi-invariant all constantly promote, and have become medical cloud and have stored problem demanding prompt solution.Although distributed file system be widely used in large-scale data storage, analyze, a lot of mechanism adopted distributed file system to solve the data problem increased fast all.But the design of existing distributed file system is mainly for the read-write of large files, the storage of large amount of small documents can reduce file system overall performance, can not be stored as in main system by being applied in well medical retrieval etc. with small documents.Therefore, for the problems referred to above existing in correlation technique, at present effective solution is not yet proposed.
Summary of the invention
For solving the problem existing for above-mentioned prior art, the present invention proposes a kind of storage means for multifile, for processing file in cloud storage platform, described cloud storage platform comprises three-decker, i.e. user interface layer, Business Logic and accumulation layer, it is characterized in that, first the file lower than default size merges by described Business Logic, then sets up file index, accumulation layer is built on a distributed, for user provides file to read and write interface.
Preferably, described distributed file system adopts the mode be separated with server cluster by Web server, and the user interface that namely described user interface layer provides, send request and receiving feedback information for making user;
Described file index comprises resource collection master index and resource entries secondary index, master index data are stored in relational database, access is provided by relational database access interface, the Map data structure in Java is used to preserve, on the basis of resource collection write into Databasce, be increased in the field be worth by system generation when resource entries is added, data acquisition in master index Key/Value structure, use Map data structure in Java, also exist according to this Map object of content initialization in database when service starts always, add when there being new resource collection or have deleted time, this Map object is upgraded, secondary index is created by open source projects Lucene, supports small documents metadata retrieval, real-time update index file whenever user adds resource entries time, when multiple user adds resource entries under a resource collection simultaneously, realize the con current control of file write,
Described file is carried out merging comprising further: create SequenceFile object, by the filtration of filtrator, merge meeting pre-conditioned file, resource collection according to resource entries place is searched in master index, after finding file path corresponding to resource collection, create SequenceFile object, and obtain the Writer object of SequenceFile and it is configured, prepare writing in files, a new thread is opened while execute file write, by file place value corresponding for this resource entries, length information write resource entries secondary index, resource entries writes successfully and closes output stream, return and submit to successfully, otherwise return and submit to unsuccessfully.
Preferably, described Business Logic also comprises distributed system client, it encapsulates operation document system and the mutual API in the external world, when file system receives file read request, first judge through file filter device, if belong to merged file, the metadata information of locating file first in the buffer, if there is not metadata information, then search in indexed file, if search in indexed file less than, communicate with namenode, SequenceFile object is built after Check finds file metadata, then the Reader object obtaining SequenceFile sends read requests to back end, inlet flow is closed after transferring data to user.
Preferably, the write of described file comprises further: when Web server receive user submit resource request to time, first judge whether to need to carry out small documents merging, if do not need Piece file mergence, distributed file system write interface is then directly used to write, prepare file to write distributed file system by distributed type file system client side after Piece file mergence, while distributed system client writing in files, call small documents index upgrade module and perform small documents index and renewal, write and renewal are performed by different threads simultaneously, submission successful information is returned to client when distributed file system writes Web service successfully,
Described file reads and comprises further: user to need in browser document perhaps download file time send file read request, Web server receives the read requests of user, first the condition submitted to according to user by searching system is retrieved, the resource entries result set that obtaining user needs returns to user and browses, the entry set being presented at user interface in result set is sent to cache module simultaneously, and open an independent thread and upgrade buffer memory, when user browsed the result set page request that returns download or browse detailed time, Web service is called distributed type file system client side and is prepared file reading content, distributed type file system client side is locating file positional information first in the buffer, if do not find file location information, then search in file index, after finding positional information, then directly arrive back end read data, return to user.
Preferably, described set up file index after, the method also comprises looks ahead to index, described in look ahead and comprise further:
After user sends retrieval request, the resource entries result set meeting user's needs is ask in Web service according to user search condition Check, return to user, create asynchronous thread simultaneously and upgrade buffer memory, download or upgrade cache contents in time interval between browse operation returning user's result set and browse result set to user and determine to click, when receiving renewal cache contents request, call index module to retrieve, the metadata of current results collection entry is loaded into buffer memory, when user sends download or browse request, Web service is called distributed type file system client side and is searched metadata in the buffer and start to read data and to client transmissions, the thread pool of a server maintenance fixing number of threads, a thread process is called when receiving renewal cache request at every turn, if do not have idle thread in thread pool, this buffer memory task is allowed to wait for, FIFO algorithm is utilized to set up cache pool and allocating cache pond size, key-value pair key/value is kept in cache pool, wherein filename is as key, the back end ID of file, the combination of reference position and length is as value, eliminate the cache entries at most, this cache pool provides two operations and put operation and get to operate, put operation puts into data toward cache pool, if existing data reach the upper limit inside cache pool, then according to FIFO cache replacement algorithm replacement data, Get operation obtains corresponding value value according to key value.
The present invention compared to existing technology, has the following advantages:
For the read-write of the small documents for full-text search, by index and raising recall precision of looking ahead, keep response speed and the distributed file system overall performance of cloud storage platform when storage and the reading of large amount of small documents.
Accompanying drawing explanation
Fig. 1 is the process flow diagram of the storage means for multifile according to the embodiment of the present invention.
Embodiment
Detailed description to one or more embodiment of the present invention is hereafter provided together with the accompanying drawing of the diagram principle of the invention.Describe the present invention in conjunction with such embodiment, but the invention is not restricted to any embodiment.Scope of the present invention is only defined by the claims, and the present invention contain many substitute, amendment and equivalent.Set forth many details in the following description to provide thorough understanding of the present invention.These details are provided for exemplary purposes, and also can realize the present invention according to claims without some in these details or all details.
An aspect of of the present present invention provides a kind of storage means for multifile for small documents read-write, for large data access and utilization provide new solution.Fig. 1 is the storage means process flow diagram for multifile according to the embodiment of the present invention.
After user submits resource in cloud storage platform, first resource needs the screening through file filter device, and filtercondition comprises file size, type etc., is referred to as small documents by the file of screening.Propose a kind of consolidation strategy to these small documents subsequently, the small documents of some merges the new storage file of rear generation, generally merges the small documents belonging to same attribute.Index file is upgraded while by new storage file writing system.Index in cloud storage platform comprises, and master index is the resource collection belonging to file, as type etc.; Secondary index is concrete resource entries.When needs file reading, in master index and secondary index, Check askes successively, reduces query context, can ensure higher reading response.
The core of the accumulation layer design of cloud storage platform system of the present invention comprises: first carry out merging to small documents and generate storage file, file set up secondary index after being combined based on the storage feature of database again, is looked ahead by index and improves the response speed that file reads.Below accumulation layer concrete details is introduced in detail.
The 1 storage file generation strategy merged based on small documents
Divide File is become block and block one by one, the default size of block is 64M.The NameSpace of distributed file system, is loaded in internal memory by namenode during startup by persistence in an image file.Large amount of small documents can cause namenode low memory, generates the search efficiency of file during excessive image file reduction file reading.To the read-write operation of each file, first inquire about in NameSpace, the information such as block address, file size of locating file, and then retrieve in back end space.When the file read is very little, in read-write process, main time all consumes in retrieval and inquisition, instead of the transmission of file data, affects the treatment effeciency of server cluster.
Cloud storage platform utilizes small documents to merge and generates storage file.First realize a filtrator to filter with size by type file, select the document files that can carry out full-text search, the threshold value of file size setting is herein 10M, is then considered as large files, does not need to merge when file is greater than 10M.After filtering, according to resource collection belonging to file entries, to be unit carry out merging to the small documents after filtering to cloud storage platform becomes blocks of files.Resource collection is the set of the resource entries with certain correlativity, and a resource entries only belongs to a resource collection.Usual set is according to the division such as range of attributes, time, and file can divide according to Attribute domain.In new blocks of files, resource entries has very large relevance, blocks of files just can be distributed to a MapReduce task by Data processing afterwards, the calculated amount avoided because of task wastes the time of task matching and switching very little, reduce data movement in the cluster, just meet Hadoop mobile computing principle more more effective than Mobile data.
2 set up secondary index optimizes file reading speed
After small documents merges, namenode internal memory is the performance bottleneck of whole file system, because all file metadata information need to be stored in its internal memory, can reduce the quantity of file after being merged by small documents, save a lot of memory headroom, but the file reading efficiency after merging can be very low.
The preferred embodiment of the invention adopts hierarchical index to set up small documents index of metadata, is little index file by large index file with rational regular partition.Take resource collection as master index, the resource entries content under each resource collection, as secondary index, is first carried out Check according to the set of resource entries place when searching like this and is looked for, then search in corresponding secondary index file.Although many processes of searching in master index, because resource collection number can not be too many, its time of searching is very little, and much less than global index's file of the secondary index file through dividing, so can improve search efficiency on the whole.Simultaneously secondary index file also and not all is loaded into internal memory, according to internal memory service condition and binding cache strategy carries out flexible dispatching, can solve the problem of low memory.
The 3 file response speed optimizations of looking ahead based on index
Here the data that the index proposed will be accessed below looking ahead and referring to by user's current accessed data prediction user, and buffer memory is called in its index.If energy Accurate Prediction, the data just can user will accessed in advance are loaded into buffer memory, just can obtain system responses faster when user accesses.
User, in download or before browsing resource entries, usually must be obtained " intermediate result collection " by the mode of retrieval or directory search, the resource entries needed then could be selected wherein to access further.The interval of a several seconds is there is between user sees the result set page and execution is downloaded or browsed, during this period of time by the index of buffered in advance intermediate result pooling of resources entry, the inquiry of a series of file metadata just need not be performed again when user clicks and downloads or browse, directly carry out transfer files, the request response of these files can be improved so to a great extent.This response promotes does not need too many internal memory, and suppose has 100,000 users performing retrieval simultaneously, and each result set page shows 20 resource entries, and the metadata information of a buffer memory file needs 150B, also only needs 0.3G memory headroom.
Below describe the accumulation layer framework of cloud storage platform of the present invention in detail.Cloud storage platform system is except utilizing above-mentioned strategy, and when realizing, its accumulation layer framework is the basis of system.Cloud storage platform accumulation layer is structured on the distributed memory system on Hadoop cluster, provides basic file preserve and read service.
The framework of cloud storage platform accumulation layer adopts three-decker design: user interface layer, Business Logic and accumulation layer, and in order to improve performance, adopts the mode be separated with server cluster by Web server.The user interface that namely user interface layer provides, user is sent request and receiving feedback information by the function that this layer provides.Business Logic is the function realization layer that small documents reads and writes, and comprises Piece file mergence, index construct and buffer memory structure etc.
Business Logic comprises the functional modules such as Piece file mergence, searching system, small documents index, buffer memory and distributed system client.Each module is implemented as follows:
(1) Piece file mergence
Piece file mergence function comprises 2 stages: establishment SequenceFile object carries out small documents and merges.By the filtration of filtrator, merge meeting the file merging requirement, first search in master index according to the resource collection at resource entries place, after finding file path corresponding to resource collection, create SequenceFile object, and obtain the Writer object of SequenceFile and it is configured, prepare writing in files.A new thread is opened, by the metadata information such as file place value, length corresponding for this resource entries write resource entries secondary index while execute file write.Resource entries writes successfully and closes output stream, returns and submits to successfully, otherwise returns and submit to unsuccessfully.
(2) searching system
Document retrieval function is provided, relies on this module to carry out reading optimization based on " intermediate result collection " to distributed file system.
(3) small documents index
Build small documents index, comprise resource collection master index and resource entries secondary index, index file creation is provided, add and the function such as deletion record.
Master index data are stored in relational database, provide access by relational database access interface, use the Map data structure in Java to preserve.Because resource collection is stored in database, only needs according to this index to be increased in the field be worth by system generation time resource entries is added, so can be kept in relational database, do not affect treatment effeciency.Data acquisition in master index Key/Value structure, can use Map data structure in Java to improve Check and ask efficiency.In addition, for ensureing recall precision, also must exist according to this Map object of content initialization in database when service starts always, because master index number of files is few, Map object committed memory is very little, so system overhead is limited, add when there being new resource collection or have deleted time, need upgrade this Map object.
Secondary index is created by open source projects Lucene, supports small documents metadata retrieval.Lucene has a set of perfect index construct, upgrades and search solution, and when indexed file is less than 1G, search efficiency is very high, can be used for building commercial search engine.The function that the index that cloud storage platform will create needs some special, as needed real-time update index file whenever user adds resource entries time; When multiple user adds resource entries under a resource collection simultaneously, the con current control of file write; Compressed index file is to reduce EMS memory occupation etc.
(4) look ahead
In order to promote response speed better, providing the cache management to user's interested " intermediate result collection " here, comprising spatial cache and safeguarding, buffer update, the functions such as update algorithm maintenance.
After user sends retrieval request, the resource entries result set meeting user's needs is ask in Web service according to user search condition Check, return to user, create asynchronous thread simultaneously and upgrade buffer memory, upgrade cache contents returning user's result set and browse result set to user and determine to click in the time interval between download or browse operation.When cache module receives renewal cache contents request, call index module and retrieve, the metadata of current results collection entry is loaded into buffer memory.When user sends download or browse request, Web service is called distributed system client and is searched metadata in the buffer and start to read data and to client transmissions.
The thread pool of a system maintenance fixing number of threads calls a thread when receiving renewal cache request at every turn and goes process, if do not have idle thread in thread pool, allows this buffer memory task wait for.The system resource that buffer update task accounts for can be maintained in a rational scope like this, not influential system overall performance.The present invention selects FIFO algorithm realization cache module scheduling feature, eliminates the cache entries at most in the most efficient manner.Specific implementation is: set up cache pool, and allocating cache pond size, is defaulted as 32M, can preserve 200,000 file metadata information.That store inside cache pool is key-value pair key/value one by one, and filename is as key, and the back end ID of file, the combination of reference position and length is as value.This cache pool provides two to operate put and get.Put puts into data toward cache pool, if existing data reach the upper limit inside cache pool, then replaces corresponding data according to cache replacement algorithm, directly puts into just if also had living space.Get operation obtains corresponding value value, as then do not returned sky according to key value.
(5) distributed system client
Distributed system client encapsulates operation document system and the mutual API in the external world, comprises reading and writing of files and inquiry file position etc.When file system receives file read request, first judge through file filter device, the metadata information of the file having belonged to merged then locating file first in the buffer, if do not exist, then search in indexed file, if still search less than; communicate with namenode.Check builds the Reader object that then SequenceFile object obtain SequenceFile and sends read requests to back end after finding file metadata, close inlet flow, returned after transferring data to user.
User has two kinds of request methods, and a kind of is the write request of presenting a paper, a kind of be inquiry, browse or the read requests of Gains resources.
When Web server receive user submit resource request to time, first judge whether that needing to do small documents merges, and if desired, then carries out Piece file mergence, does not need, directly use distributed file system to write interface and carry out writing.Prepare file to write distributed file system by distributed type file system client side after Piece file mergence, while distributed system client writing in files, call small documents index upgrade module and perform small documents index and renewal, because Web server main frame is separated with server cluster, write and renewal can be performed by different threads simultaneously, do not affect each other.Submission successful information is returned to client when distributed file system writes Web service successfully.
Send file read request when user needs browser document detailed content or download file, this request frequency is high, expends system resource maximum.When Web server receives the read requests of user, first the condition submitted to according to user by searching system is retrieved, the resource entries result set that obtaining user needs returns to user and browses, the entry set (giving tacit consent to 20) showing first page in the user interface in result set is sent to cache module simultaneously, and open an independent thread and upgrade buffer memory, when user browsed the result set page request that returns download or browse detailed time, Web service is called distributed type file system client side and is prepared file reading content, distributed type file system client side is locating file positional information first in the buffer, if do not find, then search in small documents index, then directly arrive back end after finding positional information and read data, return to user.
In sum, the present invention proposes a kind of storage means for multifile, for the read-write of the small documents for full-text search, by index and raising recall precision of looking ahead, keep response speed and the distributed file system overall performance of cloud storage platform when storage and the reading of large amount of small documents.
Obviously, it should be appreciated by those skilled in the art, above-mentioned of the present invention each module or each step can realize with general computing system, they can concentrate on single computing system, or be distributed on network that multiple computing system forms, alternatively, they can realize with the executable program code of computing system, thus, they can be stored and be performed by computing system within the storage system.Like this, the present invention is not restricted to any specific hardware and software combination.
Should be understood that, above-mentioned embodiment of the present invention only for exemplary illustration or explain principle of the present invention, and is not construed as limiting the invention.Therefore, any amendment made when without departing from the spirit and scope of the present invention, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.In addition, claims of the present invention be intended to contain fall into claims scope and border or this scope and border equivalents in whole change and modification.

Claims (5)

1. the storage means for multifile, for processing file in cloud storage platform, described cloud storage platform comprises three-decker, i.e. user interface layer, Business Logic and accumulation layer, is characterized in that: first the file lower than default size merges by described Business Logic;
Set up file index, described accumulation layer is built on a distributed, for user provides file to read and write interface.
2. method according to claim 1, it is characterized in that, described distributed file system adopts the mode be separated with server cluster by Web server, and the user interface that namely described user interface layer provides, send request and receiving feedback information for making user;
Described file index comprises resource collection master index and resource entries secondary index, master index data are stored in relational database, access is provided by relational database access interface, the Map data structure in Java is used to preserve, on the basis of resource collection write into Databasce, be increased in the field be worth by system generation when resource entries is added, data acquisition in master index Key/Value structure, use Map data structure in Java, also exist according to this Map object of content initialization in database when service starts always, add when there being new resource collection or have deleted time, this Map object is upgraded, secondary index is created by open source projects Lucene, supports small documents metadata retrieval, real-time update index file whenever user adds resource entries time, when multiple user adds resource entries under a resource collection simultaneously, realize the con current control of file write,
Described file is carried out merging comprising further: create SequenceFile object, by the filtration of filtrator, merge meeting pre-conditioned file, resource collection according to resource entries place is searched in master index, after finding file path corresponding to resource collection, create SequenceFile object, and obtain the Writer object of SequenceFile and it is configured, prepare writing in files, a new thread is opened while execute file write, by file place value corresponding for this resource entries, length information write resource entries secondary index, resource entries writes successfully and closes output stream, return and submit to successfully, otherwise return and submit to unsuccessfully.
3. method according to claim 1, it is characterized in that, described Business Logic also comprises distributed system client, it encapsulates operation document system and the mutual API in the external world, when file system receives file read request, first judge through file filter device, if belong to merged file, the metadata information of locating file first in the buffer, if there is not metadata information, then search in indexed file, if search in indexed file less than, communicate with namenode, SequenceFile object is built after Check finds file metadata, then the Reader object obtaining SequenceFile sends read requests to back end, inlet flow is closed after transferring data to user.
4. method according to claim 1, it is characterized in that, the write of described file comprises further: when Web server receive user submit resource request to time, first judge whether to need to carry out small documents merging, if do not need Piece file mergence, distributed file system write interface is then directly used to write, prepare file to write distributed file system by distributed type file system client side after Piece file mergence, while distributed system client writing in files, call small documents index upgrade module and perform small documents index and renewal, write and renewal are performed by different threads simultaneously, submission successful information is returned to client when distributed file system writes Web service successfully,
Described file reads and comprises further: user to need in browser document perhaps download file time send file read request, Web server receives the read requests of user, first the condition submitted to according to user by searching system is retrieved, the resource entries result set that obtaining user needs returns to user and browses, the entry set being presented at user interface in result set is sent to cache module simultaneously, and open an independent thread and upgrade buffer memory, when user browsed the result set page request that returns download or browse detailed time, Web service is called distributed type file system client side and is prepared file reading content, distributed type file system client side is locating file positional information first in the buffer, if do not find file location information, then search in file index, after finding positional information, then directly arrive back end read data, return to user.
5. method according to claim 1, is characterized in that, described set up file index after, the method also comprises looks ahead to index, described in look ahead and comprise further:
After user sends retrieval request, the resource entries result set meeting user's needs is ask in Web service according to user search condition Check, return to user, create asynchronous thread simultaneously and upgrade buffer memory, download or upgrade cache contents in time interval between browse operation returning user's result set and browse result set to user and determine to click, when receiving renewal cache contents request, call index module to retrieve, the metadata of current results collection entry is loaded into buffer memory, when user sends download or browse request, Web service is called distributed type file system client side and is searched metadata in the buffer and start to read data and to client transmissions, the thread pool of a server maintenance fixing number of threads, a thread process is called when receiving renewal cache request at every turn, if do not have idle thread in thread pool, this buffer memory task is allowed to wait for, FIFO algorithm is utilized to set up cache pool and allocating cache pond size, key-value pair key/value is kept in cache pool, wherein filename is as key, the back end ID of file, the combination of reference position and length is as value, eliminate the cache entries at most, this cache pool provides two operations and put operation and get to operate, put operation puts into data toward cache pool, if existing data reach the upper limit inside cache pool, then according to FIFO cache replacement algorithm replacement data, Get operation obtains corresponding value value according to key value.
CN201510200979.1A 2015-04-24 2015-04-24 Storage method for multiple files Pending CN104778270A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510200979.1A CN104778270A (en) 2015-04-24 2015-04-24 Storage method for multiple files

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510200979.1A CN104778270A (en) 2015-04-24 2015-04-24 Storage method for multiple files

Publications (1)

Publication Number Publication Date
CN104778270A true CN104778270A (en) 2015-07-15

Family

ID=53619734

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510200979.1A Pending CN104778270A (en) 2015-04-24 2015-04-24 Storage method for multiple files

Country Status (1)

Country Link
CN (1) CN104778270A (en)

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105117502A (en) * 2015-10-13 2015-12-02 四川中科腾信科技有限公司 Search method based on big data
CN106612330A (en) * 2017-01-05 2017-05-03 广州慧睿思通信息科技有限公司 System and method supporting distributed multi-file importing
CN106873920A (en) * 2017-03-22 2017-06-20 世纪恒通科技股份有限公司 A kind of call center for avoiding disk fragmentses records storage system and storage method
CN107402924A (en) * 2016-05-19 2017-11-28 普天信息技术有限公司 MR files apply the implementation method and device in HDFS
CN107480039A (en) * 2017-09-22 2017-12-15 郑州云海信息技术有限公司 The small documents readwrite performance method of testing and device of a kind of distributed memory system
CN107729432A (en) * 2017-09-29 2018-02-23 浪潮软件股份有限公司 A kind of storage of distributed small documents, read method, device and access system
CN108053863A (en) * 2017-12-22 2018-05-18 中国人民解放军第三军医大学第附属医院 It is suitble to the magnanimity medical data storage system and date storage method of big small documents
CN108475211A (en) * 2015-12-15 2018-08-31 微软技术许可有限责任公司 Long-term running storage manageability operational administrative
CN108763473A (en) * 2018-05-29 2018-11-06 郑州云海信息技术有限公司 A kind of the native object storage method and device of distributed storage
CN109213760A (en) * 2018-08-02 2019-01-15 南瑞集团有限公司 The storage of high load business and search method of non-relation data storage
CN109241021A (en) * 2018-09-04 2019-01-18 郑州云海信息技术有限公司 A kind of file polling method, apparatus, equipment and computer readable storage medium
CN109697020A (en) * 2017-10-23 2019-04-30 中移(苏州)软件技术有限公司 A kind of date storage method, server and system
WO2019166858A1 (en) * 2018-03-01 2019-09-06 Pratik Sharma File hosting service in cloud
CN110502472A (en) * 2019-08-09 2019-11-26 西藏宁算科技集团有限公司 A kind of the cloud storage optimization method and its system of large amount of small documents
CN111400247A (en) * 2020-04-13 2020-07-10 杭州九州方园科技有限公司 User behavior auditing method and file storage method
CN111966845A (en) * 2020-08-31 2020-11-20 重庆紫光华山智安科技有限公司 Picture management method and device, storage node and storage medium
CN112948330A (en) * 2021-02-26 2021-06-11 拉卡拉支付股份有限公司 Data merging method, device, electronic equipment, storage medium and program product
CN113821704A (en) * 2020-06-18 2021-12-21 华为技术有限公司 Method and device for constructing index, electronic equipment and storage medium
CN114143177A (en) * 2021-12-01 2022-03-04 云赛智联股份有限公司 Business service monitoring system and monitoring method based on data blood margin
CN114238417A (en) * 2021-12-27 2022-03-25 四川启睿克科技有限公司 Data caching method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101452465A (en) * 2007-12-05 2009-06-10 高德软件有限公司 Mass file data storing and reading method
US20100162230A1 (en) * 2008-12-24 2010-06-24 Yahoo! Inc. Distributed computing system for large-scale data handling
US20100179855A1 (en) * 2009-01-09 2010-07-15 Ye Chen Large-Scale Behavioral Targeting for Advertising over a Network
CN101854388A (en) * 2010-05-17 2010-10-06 浪潮(北京)电子信息产业有限公司 Method and system concurrently accessing a large amount of small documents in cluster storage
CN103577123A (en) * 2013-11-12 2014-02-12 河海大学 Small file optimization storage method based on HDFS

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101452465A (en) * 2007-12-05 2009-06-10 高德软件有限公司 Mass file data storing and reading method
US20100162230A1 (en) * 2008-12-24 2010-06-24 Yahoo! Inc. Distributed computing system for large-scale data handling
US20100179855A1 (en) * 2009-01-09 2010-07-15 Ye Chen Large-Scale Behavioral Targeting for Advertising over a Network
CN101854388A (en) * 2010-05-17 2010-10-06 浪潮(北京)电子信息产业有限公司 Method and system concurrently accessing a large amount of small documents in cluster storage
CN103577123A (en) * 2013-11-12 2014-02-12 河海大学 Small file optimization storage method based on HDFS

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
卞艺杰等: ""Hdspace分布式机构知识库系统的小文件存储"", 《计算机系统应用》 *

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105117502A (en) * 2015-10-13 2015-12-02 四川中科腾信科技有限公司 Search method based on big data
CN108475211A (en) * 2015-12-15 2018-08-31 微软技术许可有限责任公司 Long-term running storage manageability operational administrative
CN108475211B (en) * 2015-12-15 2022-02-11 微软技术许可有限责任公司 Stateless system and system for obtaining resources
CN107402924A (en) * 2016-05-19 2017-11-28 普天信息技术有限公司 MR files apply the implementation method and device in HDFS
CN106612330A (en) * 2017-01-05 2017-05-03 广州慧睿思通信息科技有限公司 System and method supporting distributed multi-file importing
CN106873920B (en) * 2017-03-22 2023-07-28 世纪恒通科技股份有限公司 Call center recording storage system and method capable of avoiding disk fragments
CN106873920A (en) * 2017-03-22 2017-06-20 世纪恒通科技股份有限公司 A kind of call center for avoiding disk fragmentses records storage system and storage method
CN107480039A (en) * 2017-09-22 2017-12-15 郑州云海信息技术有限公司 The small documents readwrite performance method of testing and device of a kind of distributed memory system
CN107480039B (en) * 2017-09-22 2020-12-04 郑州云海信息技术有限公司 Small file read-write performance test method and device for distributed storage system
CN107729432A (en) * 2017-09-29 2018-02-23 浪潮软件股份有限公司 A kind of storage of distributed small documents, read method, device and access system
CN109697020A (en) * 2017-10-23 2019-04-30 中移(苏州)软件技术有限公司 A kind of date storage method, server and system
CN108053863A (en) * 2017-12-22 2018-05-18 中国人民解放军第三军医大学第附属医院 It is suitble to the magnanimity medical data storage system and date storage method of big small documents
CN108053863B (en) * 2017-12-22 2020-09-11 中国人民解放军第三军医大学第一附属医院 Mass medical data storage system and data storage method suitable for large and small files
WO2019166858A1 (en) * 2018-03-01 2019-09-06 Pratik Sharma File hosting service in cloud
CN108763473A (en) * 2018-05-29 2018-11-06 郑州云海信息技术有限公司 A kind of the native object storage method and device of distributed storage
CN109213760A (en) * 2018-08-02 2019-01-15 南瑞集团有限公司 The storage of high load business and search method of non-relation data storage
CN109213760B (en) * 2018-08-02 2021-10-22 南瑞集团有限公司 High-load service storage and retrieval method for non-relational data storage
CN109241021A (en) * 2018-09-04 2019-01-18 郑州云海信息技术有限公司 A kind of file polling method, apparatus, equipment and computer readable storage medium
CN110502472A (en) * 2019-08-09 2019-11-26 西藏宁算科技集团有限公司 A kind of the cloud storage optimization method and its system of large amount of small documents
CN111400247A (en) * 2020-04-13 2020-07-10 杭州九州方园科技有限公司 User behavior auditing method and file storage method
CN113821704A (en) * 2020-06-18 2021-12-21 华为技术有限公司 Method and device for constructing index, electronic equipment and storage medium
CN113821704B (en) * 2020-06-18 2024-01-16 华为云计算技术有限公司 Method, device, electronic equipment and storage medium for constructing index
CN111966845A (en) * 2020-08-31 2020-11-20 重庆紫光华山智安科技有限公司 Picture management method and device, storage node and storage medium
CN111966845B (en) * 2020-08-31 2023-11-17 重庆紫光华山智安科技有限公司 Picture management method, device, storage node and storage medium
CN112948330A (en) * 2021-02-26 2021-06-11 拉卡拉支付股份有限公司 Data merging method, device, electronic equipment, storage medium and program product
CN114143177A (en) * 2021-12-01 2022-03-04 云赛智联股份有限公司 Business service monitoring system and monitoring method based on data blood margin
CN114238417A (en) * 2021-12-27 2022-03-25 四川启睿克科技有限公司 Data caching method

Similar Documents

Publication Publication Date Title
CN104778270A (en) Storage method for multiple files
CN104679898A (en) Big data access method
CN107247808B (en) Distributed NewSQL database system and picture data query method
Marcu et al. Spark versus flink: Understanding performance in big data analytics frameworks
Cambazoglu et al. Scalability challenges in web search engines
CN102169507B (en) Implementation method of distributed real-time search engine
US8364751B2 (en) Automated client/server operation partitioning
US20160267132A1 (en) Abstraction layer between a database query engine and a distributed file system
US8356050B1 (en) Method or system for spilling in query environments
JP2003006036A (en) Clustered application server and web system having database structure
US9148329B1 (en) Resource constraints for request processing
CN103595797B (en) Caching method for distributed storage system
CN101184106A (en) Associated transaction processing method of mobile database
CN105303456A (en) Method for processing monitoring data of electric power transmission equipment
CN103023982A (en) Low-latency metadata access method of cloud storage client
US11080207B2 (en) Caching framework for big-data engines in the cloud
CN114138776A (en) Method, system, apparatus and medium for graph structure and graph attribute separation design
CN106709010A (en) High-efficient HDFS uploading method based on massive small files and system thereof
CN116108057B (en) Distributed database access method, device, equipment and storage medium
US11429629B1 (en) Data driven indexing in a spreadsheet based data store
Durner et al. Crystal: a unified cache storage system for analytical databases
US20230161792A1 (en) Scaling database query processing using additional processing clusters
Marcu KerA: A Unified Ingestion and Storage System for Scalable Big Data Processing
US20220342888A1 (en) Object tagging
US11256695B1 (en) Hybrid query execution engine using transaction and analytical engines

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
EXSB Decision made by sipo to initiate substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20150715

RJ01 Rejection of invention patent application after publication