CN104679898A - Big data access method - Google Patents

Big data access method Download PDF

Info

Publication number
CN104679898A
CN104679898A CN201510118185.0A CN201510118185A CN104679898A CN 104679898 A CN104679898 A CN 104679898A CN 201510118185 A CN201510118185 A CN 201510118185A CN 104679898 A CN104679898 A CN 104679898A
Authority
CN
China
Prior art keywords
file
resource
index
user
entries
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510118185.0A
Other languages
Chinese (zh)
Inventor
刘颖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Hui Zhi Distant View Science And Technology Ltd
Original Assignee
Chengdu Hui Zhi Distant View Science And Technology Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Hui Zhi Distant View Science And Technology Ltd filed Critical Chengdu Hui Zhi Distant View Science And Technology Ltd
Priority to CN201510118185.0A priority Critical patent/CN104679898A/en
Publication of CN104679898A publication Critical patent/CN104679898A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a big data access method used for accessing file resources in a cloud storage platform. The big data access method comprises the following steps: combining files of which the sizes are smaller than preset sizes in a distributed file system to form a new storage file; establishing a primary index and a secondary index for the combined file; buffering the indexes before a user requests by utilizing a pre-access mechanism. According to the big data access method, the response speed of a cloud storage platform and the overall performance of the distributed file system are kept under the condition that a great number of small files are read and written.

Description

A kind of large data access method
Technical field
The present invention relates to data to store, particularly a kind of file access method of large data.
Background technology
Along with the appearance with massive medical data that develops rapidly of intelligent medical treatment, need corresponding large database as carrier to preserve these data, but large data be scheduled to a large problem.The document retrieval quantity of medical circle along with Internet resources also exponentially level increase, especially the renewal speed of small documents and semi-invariant all constantly promote, and have become medical cloud and have stored problem demanding prompt solution.Although distributed file system be widely used in large-scale data storage, analyze, a lot of mechanism adopted distributed file system to solve the data problem increased fast all.But the design of existing distributed file system is mainly for the read-write of large files, the storage of large amount of small documents can reduce file system overall performance, can not be stored as in main system by being applied in well medical retrieval etc. with small documents.Therefore, for the problems referred to above existing in correlation technique, at present effective solution is not yet proposed.
Summary of the invention
For solving the problem existing for above-mentioned prior art, the present invention proposes a kind of large data access method, comprising:
In distributed file system, the file lower than default size is merged, forms new storage file,
File set up master index after being combined and secondary index,
Utilize prefetch mechanisms, before user's request, buffer memory is carried out to index.
Preferably, described the file lower than default size to be merged, comprises further:
Divide File is become block, by the NameSpace persistence of distributed file system in an image file, by namenode, this image file is loaded in internal memory during startup;
User presents a paper after resource in cloud storage platform, first screened by file filter device according to filtercondition, for qualified file, generate new storage file by after the Piece file mergence of the same attribute of predetermined quantity, while by new storage file writing system, upgrade index file;
To the read-write operation of each file, first inquire about in NameSpace, the information such as block address, file size of locating file, and then retrieve in back end space.
Preferably, described in be combined after file set up master index and secondary index, wherein master index is the resource collection belonging to file; Described secondary index is concrete resource entries, and described resource collection is the set of the resource entries with correlativity, and a resource entries only belongs to a resource collection, and file can divide according to Attribute domain; When needs file reading, in master index and secondary index, Check askes successively; At file after filtering, according to resource collection belonging to file entries, to be unit carry out merging to the file after filtering to cloud storage platform becomes blocks of files; In data processing, just resource entries in new blocks of files can be distributed to same MapReduce task.
Preferably, described prefetch mechanisms comprises further: the data will visited by the data prediction user of user's current accessed, and buffer memory is called in its index, system responses is accelerated when user accesses, described user is in download or before browsing resource entries, intermediate result collection is obtained by the mode of retrieval or directory search, then the resource entries needed is selected to access further wherein, see the result set page user and perform the index by buffered in advance intermediate result pooling of resources entry in the interval between downloading or browsing, user click download or browse time not execute file metadata query, directly carry out transfer files.
Preferably, describedly buffer memory is carried out to index comprise further, after user sends retrieval request, the resource entries result set meeting user's needs is ask in Web service according to user search condition Check, return to user, create asynchronous thread simultaneously and upgrade buffer memory, download or upgrade cache contents in time interval between browse operation returning user's result set and browse result set to user and determine to click, when receiving renewal cache contents request, call index module to retrieve, the metadata of current results collection entry is loaded into buffer memory, when user sends download or browse request, Web service is called distributed type file system client side and is searched metadata in the buffer and start to read data and to client transmissions, the thread pool of a server maintenance fixing number of threads, a thread process is called when receiving renewal cache request at every turn, if do not have idle thread in thread pool, this buffer memory task is allowed to wait for, FIFO algorithm is utilized to set up cache pool and allocating cache pond size, key-value pair key/value is kept in cache pool, wherein filename is as key, the back end ID of file, the combination of reference position and length is as value, eliminate the cache entries at most, this cache pool provides two operations and put operation and get to operate, put operation puts into data toward cache pool, if existing data reach the upper limit inside cache pool, then according to FIFO cache replacement algorithm replacement data, Get operation obtains corresponding value value according to key value.
Preferably, described master index data are stored in relational database, access is provided by relational database access interface, the Map data structure in Java is used to preserve, on the basis of resource collection write into Databasce, be increased in the field be worth by system generation when resource entries is added, data acquisition in master index Key/Value structure, use Map data structure in Java, also exist according to this Map object of content initialization in database when service starts always, add when there being new resource collection or have deleted time, this Map object is upgraded, secondary index is created by open source projects Lucene, supports small documents metadata retrieval, real-time update index file whenever user adds resource entries time, when multiple user adds resource entries under a resource collection simultaneously, realize the con current control of file write,
Described file is carried out merging comprising further: create SequenceFile object, by the filtration of filtrator, merge meeting pre-conditioned file, resource collection according to resource entries place is searched in master index, after finding file path corresponding to resource collection, create SequenceFile object, and obtain the Writer object of SequenceFile and it is configured, prepare writing in files, a new thread is opened while execute file write, by file place value corresponding for this resource entries, length information write resource entries secondary index, resource entries writes successfully and closes output stream, return and submit to successfully, otherwise return and submit to unsuccessfully
The present invention compared to existing technology, has the following advantages:
For the read-write of the small documents for full-text search, by index and raising recall precision of looking ahead, keep response speed and the distributed file system overall performance of cloud storage platform when storage and the reading of large amount of small documents.
Accompanying drawing explanation
Fig. 1 is the process flow diagram of the large data access method according to the embodiment of the present invention.
Embodiment
Detailed description to one or more embodiment of the present invention is hereafter provided together with the accompanying drawing of the diagram principle of the invention.Describe the present invention in conjunction with such embodiment, but the invention is not restricted to any embodiment.Scope of the present invention is only defined by the claims, and the present invention contain many substitute, amendment and equivalent.Set forth many details in the following description to provide thorough understanding of the present invention.These details are provided for exemplary purposes, and also can realize the present invention according to claims without some in these details or all details.
An aspect of of the present present invention provides a kind of large data access method for small documents read-write, for large data access and utilization provide new solution.Fig. 1 is the large data access method process flow diagram according to the embodiment of the present invention.
After user submits resource in cloud storage platform, first resource needs the screening through file filter device, and filtercondition comprises file size, type etc., is referred to as small documents by the file of screening.Propose a kind of consolidation strategy to these small documents subsequently, the small documents of some merges the new storage file of rear generation, generally merges the small documents belonging to same attribute.Index file is upgraded while by new storage file writing system.Index in cloud storage platform comprises, and master index is the resource collection belonging to file, as type etc.; Secondary index is concrete resource entries.When needs file reading, in master index and secondary index, Check askes successively, reduces query context, can ensure higher reading response.
The core of the accumulation layer design of cloud storage platform system of the present invention comprises: first carry out merging to small documents and generate storage file, file set up secondary index after being combined based on the storage feature of database again, is looked ahead by index and improves the response speed that file reads.Below accumulation layer concrete details is introduced in detail.
The 1 storage file generation strategy merged based on small documents
Divide File is become block and block one by one, the default size of block is 64M.The NameSpace of distributed file system, is loaded in internal memory by namenode during startup by persistence in an image file.Large amount of small documents can cause namenode low memory, generates the search efficiency of file during excessive image file reduction file reading.To the read-write operation of each file, first inquire about in NameSpace, the information such as block address, file size of locating file, and then retrieve in back end space.When the file read is very little, in read-write process, main time all consumes in retrieval and inquisition, instead of the transmission of file data, affects the treatment effeciency of server cluster.
Cloud storage platform utilizes small documents to merge and generates storage file.First realize a filtrator to filter with size by type file, select the document files that can carry out full-text search, the threshold value of file size setting is herein 10M, is then considered as large files, does not need to merge when file is greater than 10M.After filtering, according to resource collection belonging to file entries, to be unit carry out merging to the small documents after filtering to cloud storage platform becomes blocks of files.Resource collection is the set of the resource entries with certain correlativity, and a resource entries only belongs to a resource collection.Usual set is according to the division such as range of attributes, time, and file can divide according to Attribute domain.In new blocks of files, resource entries has very large relevance, blocks of files just can be distributed to a MapReduce task by Data processing afterwards, the calculated amount avoided because of task wastes the time of task matching and switching very little, reduce data movement in the cluster, just meet Hadoop mobile computing principle more more effective than Mobile data.
2 set up secondary index optimizes file reading speed
After small documents merges, namenode internal memory is the performance bottleneck of whole file system, because all file metadata information need to be stored in its internal memory, can reduce the quantity of file after being merged by small documents, save a lot of memory headroom, but the file reading efficiency after merging can be very low.
The preferred embodiment of the invention adopts hierarchical index to set up small documents index of metadata, is little index file by large index file with rational regular partition.Take resource collection as master index, the resource entries content under each resource collection, as secondary index, is first carried out Check according to the set of resource entries place when searching like this and is looked for, then search in corresponding secondary index file.Although many processes of searching in master index, because resource collection number can not be too many, its time of searching is very little, and much less than global index's file of the secondary index file through dividing, so can improve search efficiency on the whole.Simultaneously secondary index file also and not all is loaded into internal memory, according to internal memory service condition and binding cache strategy carries out flexible dispatching, can solve the problem of low memory.
The 3 file response speed optimizations of looking ahead based on index
Here the data that the index proposed will be accessed below looking ahead and referring to by user's current accessed data prediction user, and buffer memory is called in its index.If energy Accurate Prediction, the data just can user will accessed in advance are loaded into buffer memory, just can obtain system responses faster when user accesses.
User, in download or before browsing resource entries, usually must be obtained " intermediate result collection " by the mode of retrieval or directory search, the resource entries needed then could be selected wherein to access further.The interval of a several seconds is there is between user sees the result set page and execution is downloaded or browsed, during this period of time by the index of buffered in advance intermediate result pooling of resources entry, the inquiry of a series of file metadata just need not be performed again when user clicks and downloads or browse, directly carry out transfer files, the request response of these files can be improved so to a great extent.This response promotes does not need too many internal memory, and suppose has 100,000 users performing retrieval simultaneously, and each result set page shows 20 resource entries, and the metadata information of a buffer memory file needs 150B, also only needs 0.3G memory headroom.
Below describe the accumulation layer framework of cloud storage platform of the present invention in detail.Cloud storage platform system is except utilizing above-mentioned strategy, and when realizing, its accumulation layer framework is the basis of system.Cloud storage platform accumulation layer is structured on the distributed memory system on Hadoop cluster, provides basic file preserve and read service.
The framework of cloud storage platform accumulation layer adopts three-decker design: user interface layer, Business Logic and accumulation layer, and in order to improve performance, adopts the mode be separated with server cluster by Web server.The user interface that namely user interface layer provides, user is sent request and receiving feedback information by the function that this layer provides.Business Logic is the function realization layer that small documents reads and writes, and comprises Piece file mergence, index construct and buffer memory structure etc.
Business Logic comprises the functional modules such as Piece file mergence, searching system, small documents index, buffer memory and distributed system client.Each module is implemented as follows:
(1) Piece file mergence
Piece file mergence function comprises 2 stages: establishment SequenceFile object carries out small documents and merges.By the filtration of filtrator, merge meeting the file merging requirement, first search in master index according to the resource collection at resource entries place, after finding file path corresponding to resource collection, create SequenceFile object, and obtain the Writer object of SequenceFile and it is configured, prepare writing in files.A new thread is opened, by the metadata information such as file place value, length corresponding for this resource entries write resource entries secondary index while execute file write.Resource entries writes successfully and closes output stream, returns and submits to successfully, otherwise returns and submit to unsuccessfully.
(2) searching system
Document retrieval function is provided, relies on this module to carry out reading optimization based on " intermediate result collection " to distributed file system.
(3) small documents index
Build small documents index, comprise resource collection master index and resource entries secondary index, index file creation is provided, add and the function such as deletion record.
Master index data are stored in relational database, provide access by relational database access interface, use the Map data structure in Java to preserve.Because resource collection is stored in database, only needs according to this index to be increased in the field be worth by system generation time resource entries is added, so can be kept in relational database, do not affect treatment effeciency.Data acquisition in master index Key/Value structure, can use Map data structure in Java to improve Check and ask efficiency.In addition, for ensureing recall precision, also must exist according to this Map object of content initialization in database when service starts always, because master index number of files is few, Map object committed memory is very little, so system overhead is limited, add when there being new resource collection or have deleted time, need upgrade this Map object.
Secondary index is created by open source projects Lucene, supports small documents metadata retrieval.Lucene has a set of perfect index construct, upgrades and search solution, and when indexed file is less than 1G, search efficiency is very high, can be used for building commercial search engine.The function that the index that cloud storage platform will create needs some special, as needed real-time update index file whenever user adds resource entries time; When multiple user adds resource entries under a resource collection simultaneously, the con current control of file write; Compressed index file is to reduce EMS memory occupation etc.
(4) look ahead
In order to promote response speed better, providing the cache management to user's interested " intermediate result collection " here, comprising spatial cache and safeguarding, buffer update, the functions such as update algorithm maintenance.
After user sends retrieval request, the resource entries result set meeting user's needs is ask in Web service according to user search condition Check, return to user, create asynchronous thread simultaneously and upgrade buffer memory, upgrade cache contents returning user's result set and browse result set to user and determine to click in the time interval between download or browse operation.When cache module receives renewal cache contents request, call index module and retrieve, the metadata of current results collection entry is loaded into buffer memory.When user sends download or browse request, Web service is called distributed system client and is searched metadata in the buffer and start to read data and to client transmissions.
The thread pool of a system maintenance fixing number of threads calls a thread when receiving renewal cache request at every turn and goes process, if do not have idle thread in thread pool, allows this buffer memory task wait for.The system resource that buffer update task accounts for can be maintained in a rational scope like this, not influential system overall performance.The present invention selects FIFO algorithm realization cache module scheduling feature, eliminates the cache entries at most in the most efficient manner.Specific implementation is: set up cache pool, and allocating cache pond size, is defaulted as 32M, can preserve 200,000 file metadata information.That store inside cache pool is key-value pair key/value one by one, and filename is as key, and the back end ID of file, the combination of reference position and length is as value.This cache pool provides two to operate put and get.Put puts into data toward cache pool, if existing data reach the upper limit inside cache pool, then replaces corresponding data according to cache replacement algorithm, directly puts into just if also had living space.Get operation obtains corresponding value value, as then do not returned sky according to key value.
(5) distributed system client
Distributed system client encapsulates operation document system and the mutual API in the external world, comprises reading and writing of files and inquiry file position etc.When file system receives file read request, first judge through file filter device, the metadata information of the file having belonged to merged then locating file first in the buffer, if do not exist, then search in indexed file, if still search less than; communicate with namenode.Check builds the Reader object that then SequenceFile object obtain SequenceFile and sends read requests to back end after finding file metadata, close inlet flow, returned after transferring data to user.
User has two kinds of request methods, and a kind of is the write request of presenting a paper, a kind of be inquiry, browse or the read requests of Gains resources.
When Web server receive user submit resource request to time, first judge whether that needing to do small documents merges, and if desired, then carries out Piece file mergence, does not need, directly use distributed file system to write interface and carry out writing.Prepare file to write distributed file system by distributed type file system client side after Piece file mergence, while distributed system client writing in files, call small documents index upgrade module and perform small documents index and renewal, because Web server main frame is separated with server cluster, write and renewal can be performed by different threads simultaneously, do not affect each other.Submission successful information is returned to client when distributed file system writes Web service successfully.
Send file read request when user needs browser document detailed content or download file, this request frequency is high, expends system resource maximum.When Web server receives the read requests of user, first the condition submitted to according to user by searching system is retrieved, the resource entries result set that obtaining user needs returns to user and browses, the entry set (giving tacit consent to 20) showing first page in the user interface in result set is sent to cache module simultaneously, and open an independent thread and upgrade buffer memory, when user browsed the result set page request that returns download or browse detailed time, Web service is called distributed type file system client side and is prepared file reading content, distributed type file system client side is locating file positional information first in the buffer, if do not find, then search in small documents index, then directly arrive back end after finding positional information and read data, return to user.
In sum, the present invention proposes a kind of large data access method, for the read-write of the small documents for full-text search, by index and raising recall precision of looking ahead, keep response speed and the distributed file system overall performance of cloud storage platform when storage and the reading of large amount of small documents.
Obviously, it should be appreciated by those skilled in the art, above-mentioned of the present invention each module or each step can realize with general computing system, they can concentrate on single computing system, or be distributed on network that multiple computing system forms, alternatively, they can realize with the executable program code of computing system, thus, they can be stored and be performed by computing system within the storage system.Like this, the present invention is not restricted to any specific hardware and software combination.
Should be understood that, above-mentioned embodiment of the present invention only for exemplary illustration or explain principle of the present invention, and is not construed as limiting the invention.Therefore, any amendment made when without departing from the spirit and scope of the present invention, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.In addition, claims of the present invention be intended to contain fall into claims scope and border or this scope and border equivalents in whole change and modification.

Claims (6)

1. a large data access method, for accessing file resource in cloud storage platform, is characterized in that, comprising:
In distributed file system, the file lower than default size is merged, form new storage file, the file set up master index after being combined and secondary index;
Utilize prefetch mechanisms, before user's request resource, buffer memory is carried out to index.
2. method according to claim 1, is characterized in that, is describedly merged by the file lower than default size, comprises further:
Divide File is become block, by the NameSpace persistence of distributed file system in an image file, by namenode, this image file is loaded in internal memory during startup;
User presents a paper after resource in cloud storage platform, first screened by file filter device according to filtercondition, for qualified file, generate new storage file by after the Piece file mergence of the same attribute of predetermined quantity, while by new storage file writing system, upgrade index file;
To the read-write operation of each file, first inquire about in NameSpace, the information such as block address, file size of locating file, and then retrieve in back end space.
3. method according to claim 2, is characterized in that, described in be combined after file set up master index and secondary index, wherein master index is the resource collection belonging to file; Described secondary index is concrete resource entries, and described resource collection is the set of the resource entries with correlativity, and a resource entries only belongs to a resource collection, and file can divide according to Attribute domain; When needs file reading, in master index and secondary index, Check askes successively; At file after filtering, according to resource collection belonging to file entries, to be unit carry out merging to the file after filtering to cloud storage platform becomes blocks of files; In data processing, just resource entries in new blocks of files can be distributed to same MapReduce task.
4. method according to claim 1, it is characterized in that, described prefetch mechanisms comprises further: the data will visited by the data prediction user of user's current accessed, and buffer memory is called in its index, system responses is accelerated when user accesses, described user is in download or before browsing resource entries, intermediate result collection is obtained by the mode of retrieval or directory search, then the resource entries needed is selected to access further wherein, see the result set page user and perform the index by buffered in advance intermediate result pooling of resources entry in the interval between downloading or browsing, user click download or browse time not execute file metadata query, directly carry out transfer files.
5. method according to claim 4, it is characterized in that, describedly buffer memory is carried out to index comprise further, after user sends retrieval request, the resource entries result set meeting user's needs is ask in Web service according to user search condition Check, return to user, create asynchronous thread simultaneously and upgrade buffer memory, download or upgrade cache contents in time interval between browse operation returning user's result set and browse result set to user and determine to click, when receiving renewal cache contents request, call index module to retrieve, the metadata of current results collection entry is loaded into buffer memory, when user sends download or browse request, Web service is called distributed type file system client side and is searched metadata in the buffer and start to read data and to client transmissions, the thread pool of a server maintenance fixing number of threads, a thread process is called when receiving renewal cache request at every turn, if do not have idle thread in thread pool, this buffer memory task is allowed to wait for, FIFO algorithm is utilized to set up cache pool and allocating cache pond size, key-value pair key/value is kept in cache pool, wherein filename is as key, the back end ID of file, the combination of reference position and length is as value, eliminate the cache entries at most, this cache pool provides two operations and put operation and get to operate, put operation puts into data toward cache pool, if existing data reach the upper limit inside cache pool, then according to FIFO cache replacement algorithm replacement data, Get operation obtains corresponding value value according to key value.
6. method according to claim 3, it is characterized in that, described master index data are stored in relational database, access is provided by relational database access interface, the Map data structure in Java is used to preserve, on the basis of resource collection write into Databasce, be increased in the field be worth by system generation when resource entries is added, data acquisition in master index Key/Value structure, use Map data structure in Java, also exist according to this Map object of content initialization in database when service starts always, add when there being new resource collection or have deleted time, this Map object is upgraded, secondary index is created by open source projects Lucene, supports small documents metadata retrieval, real-time update index file whenever user adds resource entries time, when multiple user adds resource entries under a resource collection simultaneously, realize the con current control of file write,
Described file is carried out merging comprising further: create SequenceFile object, by the filtration of filtrator, merge meeting pre-conditioned file, resource collection according to resource entries place is searched in master index, after finding file path corresponding to resource collection, create SequenceFile object, and obtain the Writer object of SequenceFile and it is configured, prepare writing in files, a new thread is opened while execute file write, by file place value corresponding for this resource entries, length information write resource entries secondary index, resource entries writes successfully and closes output stream, return and submit to successfully, otherwise return and submit to unsuccessfully.
CN201510118185.0A 2015-03-18 2015-03-18 Big data access method Pending CN104679898A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510118185.0A CN104679898A (en) 2015-03-18 2015-03-18 Big data access method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510118185.0A CN104679898A (en) 2015-03-18 2015-03-18 Big data access method

Publications (1)

Publication Number Publication Date
CN104679898A true CN104679898A (en) 2015-06-03

Family

ID=53314940

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510118185.0A Pending CN104679898A (en) 2015-03-18 2015-03-18 Big data access method

Country Status (1)

Country Link
CN (1) CN104679898A (en)

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105005622A (en) * 2015-07-24 2015-10-28 肖华 Method for high-speed storage of high-fidelity continuous-frame queries and image output method thereof
CN105677904A (en) * 2016-02-04 2016-06-15 杭州数梦工场科技有限公司 Distributed file system based small file storage method and device
CN106095832A (en) * 2016-06-01 2016-11-09 东软集团股份有限公司 Distributed parallel processing method and device
CN107040596A (en) * 2017-04-17 2017-08-11 山东辰华科技信息有限公司 The construction method of science service ecosystem platform based on big data cloud computing
CN107103095A (en) * 2017-05-19 2017-08-29 成都四象联创科技有限公司 Method for computing data based on high performance network framework
CN107391555A (en) * 2017-06-07 2017-11-24 中国科学院信息工程研究所 A kind of metadata real time updating method towards Spark Sql retrievals
CN108053863A (en) * 2017-12-22 2018-05-18 中国人民解放军第三军医大学第附属医院 It is suitble to the magnanimity medical data storage system and date storage method of big small documents
CN108427590A (en) * 2018-02-09 2018-08-21 福建星网锐捷通讯股份有限公司 A kind of implementation method of UI Dynamic Distributions
CN108460054A (en) * 2017-02-22 2018-08-28 北京京东尚科信息技术有限公司 A kind of mthods, systems and devices improving cloud storage system performance
CN108647193A (en) * 2018-04-20 2018-10-12 河南中烟工业有限责任公司 A kind of unique identifier generation method can be applied to distributed system and device
CN109766318A (en) * 2018-12-17 2019-05-17 新华三大数据技术有限公司 File reading and device
CN109947718A (en) * 2019-02-25 2019-06-28 全球能源互联网研究院有限公司 A kind of date storage method, storage platform and storage device
CN110377562A (en) * 2019-07-23 2019-10-25 宿州星尘网络科技有限公司 Big data method for secure storing based on Hadoop Open Source Platform
CN110688361A (en) * 2019-08-16 2020-01-14 平安普惠企业管理有限公司 Data migration method, electronic device and computer equipment
CN110995799A (en) * 2019-11-22 2020-04-10 山东九州信泰信息科技股份有限公司 Data interaction method based on Fetch and springMVC
CN111400247A (en) * 2020-04-13 2020-07-10 杭州九州方园科技有限公司 User behavior auditing method and file storage method
CN111818021A (en) * 2020-06-20 2020-10-23 深圳市众创达企业咨询策划有限公司 Configuration information safety protection system and method based on new generation information technology
CN112347044A (en) * 2020-11-10 2021-02-09 北京赛思信安技术股份有限公司 Object storage optimization method based on SPDK
CN114356230A (en) * 2021-12-22 2022-04-15 天津南大通用数据技术股份有限公司 Method and system for improving reading performance of column storage engine
CN114461146A (en) * 2022-01-26 2022-05-10 北京百度网讯科技有限公司 Cloud storage data processing method, device, system, equipment, medium and product
WO2022141650A1 (en) * 2021-01-04 2022-07-07 Alibaba Group Holding Limited Memory-frugal index design in storage engine
CN118069589A (en) * 2024-04-17 2024-05-24 济南浪潮数据技术有限公司 File access method, device, computer equipment and program product

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100162230A1 (en) * 2008-12-24 2010-06-24 Yahoo! Inc. Distributed computing system for large-scale data handling
US20100179855A1 (en) * 2009-01-09 2010-07-15 Ye Chen Large-Scale Behavioral Targeting for Advertising over a Network
CN103577339A (en) * 2012-07-27 2014-02-12 深圳市腾讯计算机系统有限公司 Method and system for storing data
CN103856567A (en) * 2014-03-26 2014-06-11 西安电子科技大学 Small file storage method based on Hadoop distributed file system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100162230A1 (en) * 2008-12-24 2010-06-24 Yahoo! Inc. Distributed computing system for large-scale data handling
US20100179855A1 (en) * 2009-01-09 2010-07-15 Ye Chen Large-Scale Behavioral Targeting for Advertising over a Network
CN103577339A (en) * 2012-07-27 2014-02-12 深圳市腾讯计算机系统有限公司 Method and system for storing data
CN103856567A (en) * 2014-03-26 2014-06-11 西安电子科技大学 Small file storage method based on Hadoop distributed file system

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
卞艺杰: ""Hdspace 分布式机构知识库系统的小文件存储"", 《计算机系统应用》 *
张春明等: ""一种Hadoop小文件存储和读取的方法"", 《计算机应用与软件》 *
陈光景: ""Hadoop小文件处理技术的研究和实现"", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Cited By (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105005622B (en) * 2015-07-24 2018-12-07 肖华 A kind of method and its image output method of high speed storing Gao Zhenlian frame inquiry number
CN105005622A (en) * 2015-07-24 2015-10-28 肖华 Method for high-speed storage of high-fidelity continuous-frame queries and image output method thereof
CN105677904A (en) * 2016-02-04 2016-06-15 杭州数梦工场科技有限公司 Distributed file system based small file storage method and device
CN105677904B (en) * 2016-02-04 2019-07-12 杭州数梦工场科技有限公司 Small documents storage method and device based on distributed file system
CN106095832A (en) * 2016-06-01 2016-11-09 东软集团股份有限公司 Distributed parallel processing method and device
CN108460054A (en) * 2017-02-22 2018-08-28 北京京东尚科信息技术有限公司 A kind of mthods, systems and devices improving cloud storage system performance
CN107040596A (en) * 2017-04-17 2017-08-11 山东辰华科技信息有限公司 The construction method of science service ecosystem platform based on big data cloud computing
CN107103095A (en) * 2017-05-19 2017-08-29 成都四象联创科技有限公司 Method for computing data based on high performance network framework
CN107391555A (en) * 2017-06-07 2017-11-24 中国科学院信息工程研究所 A kind of metadata real time updating method towards Spark Sql retrievals
CN107391555B (en) * 2017-06-07 2020-08-04 中国科学院信息工程研究所 Spark-Sql retrieval-oriented metadata real-time updating method
CN108053863A (en) * 2017-12-22 2018-05-18 中国人民解放军第三军医大学第附属医院 It is suitble to the magnanimity medical data storage system and date storage method of big small documents
CN108053863B (en) * 2017-12-22 2020-09-11 中国人民解放军第三军医大学第一附属医院 Mass medical data storage system and data storage method suitable for large and small files
CN108427590A (en) * 2018-02-09 2018-08-21 福建星网锐捷通讯股份有限公司 A kind of implementation method of UI Dynamic Distributions
CN108427590B (en) * 2018-02-09 2021-02-05 福建星网锐捷通讯股份有限公司 Method for realizing UI dynamic layout
CN108647193B (en) * 2018-04-20 2021-11-19 河南中烟工业有限责任公司 Unique identifier generation method and device applicable to distributed system
CN108647193A (en) * 2018-04-20 2018-10-12 河南中烟工业有限责任公司 A kind of unique identifier generation method can be applied to distributed system and device
CN109766318A (en) * 2018-12-17 2019-05-17 新华三大数据技术有限公司 File reading and device
CN109947718A (en) * 2019-02-25 2019-06-28 全球能源互联网研究院有限公司 A kind of date storage method, storage platform and storage device
CN110377562B (en) * 2019-07-23 2022-11-01 安徽朵朵云网络科技有限公司 Big data safe storage method based on Hadoop open source platform
CN110377562A (en) * 2019-07-23 2019-10-25 宿州星尘网络科技有限公司 Big data method for secure storing based on Hadoop Open Source Platform
CN110688361A (en) * 2019-08-16 2020-01-14 平安普惠企业管理有限公司 Data migration method, electronic device and computer equipment
CN110995799A (en) * 2019-11-22 2020-04-10 山东九州信泰信息科技股份有限公司 Data interaction method based on Fetch and springMVC
CN111400247A (en) * 2020-04-13 2020-07-10 杭州九州方园科技有限公司 User behavior auditing method and file storage method
CN111818021A (en) * 2020-06-20 2020-10-23 深圳市众创达企业咨询策划有限公司 Configuration information safety protection system and method based on new generation information technology
CN111818021B (en) * 2020-06-20 2021-02-09 深圳市众创达企业咨询策划有限公司 Configuration information safety protection system and method based on new generation information technology
CN112347044A (en) * 2020-11-10 2021-02-09 北京赛思信安技术股份有限公司 Object storage optimization method based on SPDK
CN112347044B (en) * 2020-11-10 2024-04-12 北京赛思信安技术股份有限公司 Object storage optimization method based on SPDK
WO2022141650A1 (en) * 2021-01-04 2022-07-07 Alibaba Group Holding Limited Memory-frugal index design in storage engine
CN114356230A (en) * 2021-12-22 2022-04-15 天津南大通用数据技术股份有限公司 Method and system for improving reading performance of column storage engine
CN114356230B (en) * 2021-12-22 2024-04-23 天津南大通用数据技术股份有限公司 Method and system for improving read performance of column storage engine
CN114461146A (en) * 2022-01-26 2022-05-10 北京百度网讯科技有限公司 Cloud storage data processing method, device, system, equipment, medium and product
CN114461146B (en) * 2022-01-26 2024-05-07 北京百度网讯科技有限公司 Cloud storage data processing method, device, system, equipment, medium and product
CN118069589A (en) * 2024-04-17 2024-05-24 济南浪潮数据技术有限公司 File access method, device, computer equipment and program product

Similar Documents

Publication Publication Date Title
CN104679898A (en) Big data access method
CN104778270A (en) Storage method for multiple files
JP7113040B2 (en) Versioned hierarchical data structure for distributed data stores
CN107247808B (en) Distributed NewSQL database system and picture data query method
Wei et al. Xstore: Fast rdma-based ordered key-value store using remote learned cache
EP2973018B1 (en) A method to accelerate queries using dynamically generated alternate data formats in flash cache
Cambazoglu et al. Scalability challenges in web search engines
CN102169507B (en) Implementation method of distributed real-time search engine
US8364751B2 (en) Automated client/server operation partitioning
US8356050B1 (en) Method or system for spilling in query environments
WO2015094179A1 (en) Abstraction layer between a database query engine and a distributed file system
JP2003006036A (en) Clustered application server and web system having database structure
US9148329B1 (en) Resource constraints for request processing
CN103595797B (en) Caching method for distributed storage system
CN103530387A (en) Improved method aimed at small files of HDFS
CN101184106A (en) Associated transaction processing method of mobile database
US20100274795A1 (en) Method and system for implementing a composite database
CN103023982A (en) Low-latency metadata access method of cloud storage client
US11080207B2 (en) Caching framework for big-data engines in the cloud
CN116108057B (en) Distributed database access method, device, equipment and storage medium
CN106709010A (en) High-efficient HDFS uploading method based on massive small files and system thereof
Durner et al. Crystal: a unified cache storage system for analytical databases
US7743333B2 (en) Suspending a result set and continuing from a suspended result set for scrollable cursors
Marcu KerA: A Unified Ingestion and Storage System for Scalable Big Data Processing
US10387384B1 (en) Method and system for semantic metadata compression in a two-tier storage system using copy-on-write

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20150603