CN104679898A - Big data access method - Google Patents
Big data access method Download PDFInfo
- Publication number
- CN104679898A CN104679898A CN201510118185.0A CN201510118185A CN104679898A CN 104679898 A CN104679898 A CN 104679898A CN 201510118185 A CN201510118185 A CN 201510118185A CN 104679898 A CN104679898 A CN 104679898A
- Authority
- CN
- China
- Prior art keywords
- file
- resource
- index
- user
- entries
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a big data access method used for accessing file resources in a cloud storage platform. The big data access method comprises the following steps: combining files of which the sizes are smaller than preset sizes in a distributed file system to form a new storage file; establishing a primary index and a secondary index for the combined file; buffering the indexes before a user requests by utilizing a pre-access mechanism. According to the big data access method, the response speed of a cloud storage platform and the overall performance of the distributed file system are kept under the condition that a great number of small files are read and written.
Description
Technical field
The present invention relates to data to store, particularly a kind of file access method of large data.
Background technology
Along with the appearance with massive medical data that develops rapidly of intelligent medical treatment, need corresponding large database as carrier to preserve these data, but large data be scheduled to a large problem.The document retrieval quantity of medical circle along with Internet resources also exponentially level increase, especially the renewal speed of small documents and semi-invariant all constantly promote, and have become medical cloud and have stored problem demanding prompt solution.Although distributed file system be widely used in large-scale data storage, analyze, a lot of mechanism adopted distributed file system to solve the data problem increased fast all.But the design of existing distributed file system is mainly for the read-write of large files, the storage of large amount of small documents can reduce file system overall performance, can not be stored as in main system by being applied in well medical retrieval etc. with small documents.Therefore, for the problems referred to above existing in correlation technique, at present effective solution is not yet proposed.
Summary of the invention
For solving the problem existing for above-mentioned prior art, the present invention proposes a kind of large data access method, comprising:
In distributed file system, the file lower than default size is merged, forms new storage file,
File set up master index after being combined and secondary index,
Utilize prefetch mechanisms, before user's request, buffer memory is carried out to index.
Preferably, described the file lower than default size to be merged, comprises further:
Divide File is become block, by the NameSpace persistence of distributed file system in an image file, by namenode, this image file is loaded in internal memory during startup;
User presents a paper after resource in cloud storage platform, first screened by file filter device according to filtercondition, for qualified file, generate new storage file by after the Piece file mergence of the same attribute of predetermined quantity, while by new storage file writing system, upgrade index file;
To the read-write operation of each file, first inquire about in NameSpace, the information such as block address, file size of locating file, and then retrieve in back end space.
Preferably, described in be combined after file set up master index and secondary index, wherein master index is the resource collection belonging to file; Described secondary index is concrete resource entries, and described resource collection is the set of the resource entries with correlativity, and a resource entries only belongs to a resource collection, and file can divide according to Attribute domain; When needs file reading, in master index and secondary index, Check askes successively; At file after filtering, according to resource collection belonging to file entries, to be unit carry out merging to the file after filtering to cloud storage platform becomes blocks of files; In data processing, just resource entries in new blocks of files can be distributed to same MapReduce task.
Preferably, described prefetch mechanisms comprises further: the data will visited by the data prediction user of user's current accessed, and buffer memory is called in its index, system responses is accelerated when user accesses, described user is in download or before browsing resource entries, intermediate result collection is obtained by the mode of retrieval or directory search, then the resource entries needed is selected to access further wherein, see the result set page user and perform the index by buffered in advance intermediate result pooling of resources entry in the interval between downloading or browsing, user click download or browse time not execute file metadata query, directly carry out transfer files.
Preferably, describedly buffer memory is carried out to index comprise further, after user sends retrieval request, the resource entries result set meeting user's needs is ask in Web service according to user search condition Check, return to user, create asynchronous thread simultaneously and upgrade buffer memory, download or upgrade cache contents in time interval between browse operation returning user's result set and browse result set to user and determine to click, when receiving renewal cache contents request, call index module to retrieve, the metadata of current results collection entry is loaded into buffer memory, when user sends download or browse request, Web service is called distributed type file system client side and is searched metadata in the buffer and start to read data and to client transmissions, the thread pool of a server maintenance fixing number of threads, a thread process is called when receiving renewal cache request at every turn, if do not have idle thread in thread pool, this buffer memory task is allowed to wait for, FIFO algorithm is utilized to set up cache pool and allocating cache pond size, key-value pair key/value is kept in cache pool, wherein filename is as key, the back end ID of file, the combination of reference position and length is as value, eliminate the cache entries at most, this cache pool provides two operations and put operation and get to operate, put operation puts into data toward cache pool, if existing data reach the upper limit inside cache pool, then according to FIFO cache replacement algorithm replacement data, Get operation obtains corresponding value value according to key value.
Preferably, described master index data are stored in relational database, access is provided by relational database access interface, the Map data structure in Java is used to preserve, on the basis of resource collection write into Databasce, be increased in the field be worth by system generation when resource entries is added, data acquisition in master index Key/Value structure, use Map data structure in Java, also exist according to this Map object of content initialization in database when service starts always, add when there being new resource collection or have deleted time, this Map object is upgraded, secondary index is created by open source projects Lucene, supports small documents metadata retrieval, real-time update index file whenever user adds resource entries time, when multiple user adds resource entries under a resource collection simultaneously, realize the con current control of file write,
Described file is carried out merging comprising further: create SequenceFile object, by the filtration of filtrator, merge meeting pre-conditioned file, resource collection according to resource entries place is searched in master index, after finding file path corresponding to resource collection, create SequenceFile object, and obtain the Writer object of SequenceFile and it is configured, prepare writing in files, a new thread is opened while execute file write, by file place value corresponding for this resource entries, length information write resource entries secondary index, resource entries writes successfully and closes output stream, return and submit to successfully, otherwise return and submit to unsuccessfully
The present invention compared to existing technology, has the following advantages:
For the read-write of the small documents for full-text search, by index and raising recall precision of looking ahead, keep response speed and the distributed file system overall performance of cloud storage platform when storage and the reading of large amount of small documents.
Accompanying drawing explanation
Fig. 1 is the process flow diagram of the large data access method according to the embodiment of the present invention.
Embodiment
Detailed description to one or more embodiment of the present invention is hereafter provided together with the accompanying drawing of the diagram principle of the invention.Describe the present invention in conjunction with such embodiment, but the invention is not restricted to any embodiment.Scope of the present invention is only defined by the claims, and the present invention contain many substitute, amendment and equivalent.Set forth many details in the following description to provide thorough understanding of the present invention.These details are provided for exemplary purposes, and also can realize the present invention according to claims without some in these details or all details.
An aspect of of the present present invention provides a kind of large data access method for small documents read-write, for large data access and utilization provide new solution.Fig. 1 is the large data access method process flow diagram according to the embodiment of the present invention.
After user submits resource in cloud storage platform, first resource needs the screening through file filter device, and filtercondition comprises file size, type etc., is referred to as small documents by the file of screening.Propose a kind of consolidation strategy to these small documents subsequently, the small documents of some merges the new storage file of rear generation, generally merges the small documents belonging to same attribute.Index file is upgraded while by new storage file writing system.Index in cloud storage platform comprises, and master index is the resource collection belonging to file, as type etc.; Secondary index is concrete resource entries.When needs file reading, in master index and secondary index, Check askes successively, reduces query context, can ensure higher reading response.
The core of the accumulation layer design of cloud storage platform system of the present invention comprises: first carry out merging to small documents and generate storage file, file set up secondary index after being combined based on the storage feature of database again, is looked ahead by index and improves the response speed that file reads.Below accumulation layer concrete details is introduced in detail.
The 1 storage file generation strategy merged based on small documents
Divide File is become block and block one by one, the default size of block is 64M.The NameSpace of distributed file system, is loaded in internal memory by namenode during startup by persistence in an image file.Large amount of small documents can cause namenode low memory, generates the search efficiency of file during excessive image file reduction file reading.To the read-write operation of each file, first inquire about in NameSpace, the information such as block address, file size of locating file, and then retrieve in back end space.When the file read is very little, in read-write process, main time all consumes in retrieval and inquisition, instead of the transmission of file data, affects the treatment effeciency of server cluster.
Cloud storage platform utilizes small documents to merge and generates storage file.First realize a filtrator to filter with size by type file, select the document files that can carry out full-text search, the threshold value of file size setting is herein 10M, is then considered as large files, does not need to merge when file is greater than 10M.After filtering, according to resource collection belonging to file entries, to be unit carry out merging to the small documents after filtering to cloud storage platform becomes blocks of files.Resource collection is the set of the resource entries with certain correlativity, and a resource entries only belongs to a resource collection.Usual set is according to the division such as range of attributes, time, and file can divide according to Attribute domain.In new blocks of files, resource entries has very large relevance, blocks of files just can be distributed to a MapReduce task by Data processing afterwards, the calculated amount avoided because of task wastes the time of task matching and switching very little, reduce data movement in the cluster, just meet Hadoop mobile computing principle more more effective than Mobile data.
2 set up secondary index optimizes file reading speed
After small documents merges, namenode internal memory is the performance bottleneck of whole file system, because all file metadata information need to be stored in its internal memory, can reduce the quantity of file after being merged by small documents, save a lot of memory headroom, but the file reading efficiency after merging can be very low.
The preferred embodiment of the invention adopts hierarchical index to set up small documents index of metadata, is little index file by large index file with rational regular partition.Take resource collection as master index, the resource entries content under each resource collection, as secondary index, is first carried out Check according to the set of resource entries place when searching like this and is looked for, then search in corresponding secondary index file.Although many processes of searching in master index, because resource collection number can not be too many, its time of searching is very little, and much less than global index's file of the secondary index file through dividing, so can improve search efficiency on the whole.Simultaneously secondary index file also and not all is loaded into internal memory, according to internal memory service condition and binding cache strategy carries out flexible dispatching, can solve the problem of low memory.
The 3 file response speed optimizations of looking ahead based on index
Here the data that the index proposed will be accessed below looking ahead and referring to by user's current accessed data prediction user, and buffer memory is called in its index.If energy Accurate Prediction, the data just can user will accessed in advance are loaded into buffer memory, just can obtain system responses faster when user accesses.
User, in download or before browsing resource entries, usually must be obtained " intermediate result collection " by the mode of retrieval or directory search, the resource entries needed then could be selected wherein to access further.The interval of a several seconds is there is between user sees the result set page and execution is downloaded or browsed, during this period of time by the index of buffered in advance intermediate result pooling of resources entry, the inquiry of a series of file metadata just need not be performed again when user clicks and downloads or browse, directly carry out transfer files, the request response of these files can be improved so to a great extent.This response promotes does not need too many internal memory, and suppose has 100,000 users performing retrieval simultaneously, and each result set page shows 20 resource entries, and the metadata information of a buffer memory file needs 150B, also only needs 0.3G memory headroom.
Below describe the accumulation layer framework of cloud storage platform of the present invention in detail.Cloud storage platform system is except utilizing above-mentioned strategy, and when realizing, its accumulation layer framework is the basis of system.Cloud storage platform accumulation layer is structured on the distributed memory system on Hadoop cluster, provides basic file preserve and read service.
The framework of cloud storage platform accumulation layer adopts three-decker design: user interface layer, Business Logic and accumulation layer, and in order to improve performance, adopts the mode be separated with server cluster by Web server.The user interface that namely user interface layer provides, user is sent request and receiving feedback information by the function that this layer provides.Business Logic is the function realization layer that small documents reads and writes, and comprises Piece file mergence, index construct and buffer memory structure etc.
Business Logic comprises the functional modules such as Piece file mergence, searching system, small documents index, buffer memory and distributed system client.Each module is implemented as follows:
(1) Piece file mergence
Piece file mergence function comprises 2 stages: establishment SequenceFile object carries out small documents and merges.By the filtration of filtrator, merge meeting the file merging requirement, first search in master index according to the resource collection at resource entries place, after finding file path corresponding to resource collection, create SequenceFile object, and obtain the Writer object of SequenceFile and it is configured, prepare writing in files.A new thread is opened, by the metadata information such as file place value, length corresponding for this resource entries write resource entries secondary index while execute file write.Resource entries writes successfully and closes output stream, returns and submits to successfully, otherwise returns and submit to unsuccessfully.
(2) searching system
Document retrieval function is provided, relies on this module to carry out reading optimization based on " intermediate result collection " to distributed file system.
(3) small documents index
Build small documents index, comprise resource collection master index and resource entries secondary index, index file creation is provided, add and the function such as deletion record.
Master index data are stored in relational database, provide access by relational database access interface, use the Map data structure in Java to preserve.Because resource collection is stored in database, only needs according to this index to be increased in the field be worth by system generation time resource entries is added, so can be kept in relational database, do not affect treatment effeciency.Data acquisition in master index Key/Value structure, can use Map data structure in Java to improve Check and ask efficiency.In addition, for ensureing recall precision, also must exist according to this Map object of content initialization in database when service starts always, because master index number of files is few, Map object committed memory is very little, so system overhead is limited, add when there being new resource collection or have deleted time, need upgrade this Map object.
Secondary index is created by open source projects Lucene, supports small documents metadata retrieval.Lucene has a set of perfect index construct, upgrades and search solution, and when indexed file is less than 1G, search efficiency is very high, can be used for building commercial search engine.The function that the index that cloud storage platform will create needs some special, as needed real-time update index file whenever user adds resource entries time; When multiple user adds resource entries under a resource collection simultaneously, the con current control of file write; Compressed index file is to reduce EMS memory occupation etc.
(4) look ahead
In order to promote response speed better, providing the cache management to user's interested " intermediate result collection " here, comprising spatial cache and safeguarding, buffer update, the functions such as update algorithm maintenance.
After user sends retrieval request, the resource entries result set meeting user's needs is ask in Web service according to user search condition Check, return to user, create asynchronous thread simultaneously and upgrade buffer memory, upgrade cache contents returning user's result set and browse result set to user and determine to click in the time interval between download or browse operation.When cache module receives renewal cache contents request, call index module and retrieve, the metadata of current results collection entry is loaded into buffer memory.When user sends download or browse request, Web service is called distributed system client and is searched metadata in the buffer and start to read data and to client transmissions.
The thread pool of a system maintenance fixing number of threads calls a thread when receiving renewal cache request at every turn and goes process, if do not have idle thread in thread pool, allows this buffer memory task wait for.The system resource that buffer update task accounts for can be maintained in a rational scope like this, not influential system overall performance.The present invention selects FIFO algorithm realization cache module scheduling feature, eliminates the cache entries at most in the most efficient manner.Specific implementation is: set up cache pool, and allocating cache pond size, is defaulted as 32M, can preserve 200,000 file metadata information.That store inside cache pool is key-value pair key/value one by one, and filename is as key, and the back end ID of file, the combination of reference position and length is as value.This cache pool provides two to operate put and get.Put puts into data toward cache pool, if existing data reach the upper limit inside cache pool, then replaces corresponding data according to cache replacement algorithm, directly puts into just if also had living space.Get operation obtains corresponding value value, as then do not returned sky according to key value.
(5) distributed system client
Distributed system client encapsulates operation document system and the mutual API in the external world, comprises reading and writing of files and inquiry file position etc.When file system receives file read request, first judge through file filter device, the metadata information of the file having belonged to merged then locating file first in the buffer, if do not exist, then search in indexed file, if still search less than; communicate with namenode.Check builds the Reader object that then SequenceFile object obtain SequenceFile and sends read requests to back end after finding file metadata, close inlet flow, returned after transferring data to user.
User has two kinds of request methods, and a kind of is the write request of presenting a paper, a kind of be inquiry, browse or the read requests of Gains resources.
When Web server receive user submit resource request to time, first judge whether that needing to do small documents merges, and if desired, then carries out Piece file mergence, does not need, directly use distributed file system to write interface and carry out writing.Prepare file to write distributed file system by distributed type file system client side after Piece file mergence, while distributed system client writing in files, call small documents index upgrade module and perform small documents index and renewal, because Web server main frame is separated with server cluster, write and renewal can be performed by different threads simultaneously, do not affect each other.Submission successful information is returned to client when distributed file system writes Web service successfully.
Send file read request when user needs browser document detailed content or download file, this request frequency is high, expends system resource maximum.When Web server receives the read requests of user, first the condition submitted to according to user by searching system is retrieved, the resource entries result set that obtaining user needs returns to user and browses, the entry set (giving tacit consent to 20) showing first page in the user interface in result set is sent to cache module simultaneously, and open an independent thread and upgrade buffer memory, when user browsed the result set page request that returns download or browse detailed time, Web service is called distributed type file system client side and is prepared file reading content, distributed type file system client side is locating file positional information first in the buffer, if do not find, then search in small documents index, then directly arrive back end after finding positional information and read data, return to user.
In sum, the present invention proposes a kind of large data access method, for the read-write of the small documents for full-text search, by index and raising recall precision of looking ahead, keep response speed and the distributed file system overall performance of cloud storage platform when storage and the reading of large amount of small documents.
Obviously, it should be appreciated by those skilled in the art, above-mentioned of the present invention each module or each step can realize with general computing system, they can concentrate on single computing system, or be distributed on network that multiple computing system forms, alternatively, they can realize with the executable program code of computing system, thus, they can be stored and be performed by computing system within the storage system.Like this, the present invention is not restricted to any specific hardware and software combination.
Should be understood that, above-mentioned embodiment of the present invention only for exemplary illustration or explain principle of the present invention, and is not construed as limiting the invention.Therefore, any amendment made when without departing from the spirit and scope of the present invention, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.In addition, claims of the present invention be intended to contain fall into claims scope and border or this scope and border equivalents in whole change and modification.
Claims (6)
1. a large data access method, for accessing file resource in cloud storage platform, is characterized in that, comprising:
In distributed file system, the file lower than default size is merged, form new storage file, the file set up master index after being combined and secondary index;
Utilize prefetch mechanisms, before user's request resource, buffer memory is carried out to index.
2. method according to claim 1, is characterized in that, is describedly merged by the file lower than default size, comprises further:
Divide File is become block, by the NameSpace persistence of distributed file system in an image file, by namenode, this image file is loaded in internal memory during startup;
User presents a paper after resource in cloud storage platform, first screened by file filter device according to filtercondition, for qualified file, generate new storage file by after the Piece file mergence of the same attribute of predetermined quantity, while by new storage file writing system, upgrade index file;
To the read-write operation of each file, first inquire about in NameSpace, the information such as block address, file size of locating file, and then retrieve in back end space.
3. method according to claim 2, is characterized in that, described in be combined after file set up master index and secondary index, wherein master index is the resource collection belonging to file; Described secondary index is concrete resource entries, and described resource collection is the set of the resource entries with correlativity, and a resource entries only belongs to a resource collection, and file can divide according to Attribute domain; When needs file reading, in master index and secondary index, Check askes successively; At file after filtering, according to resource collection belonging to file entries, to be unit carry out merging to the file after filtering to cloud storage platform becomes blocks of files; In data processing, just resource entries in new blocks of files can be distributed to same MapReduce task.
4. method according to claim 1, it is characterized in that, described prefetch mechanisms comprises further: the data will visited by the data prediction user of user's current accessed, and buffer memory is called in its index, system responses is accelerated when user accesses, described user is in download or before browsing resource entries, intermediate result collection is obtained by the mode of retrieval or directory search, then the resource entries needed is selected to access further wherein, see the result set page user and perform the index by buffered in advance intermediate result pooling of resources entry in the interval between downloading or browsing, user click download or browse time not execute file metadata query, directly carry out transfer files.
5. method according to claim 4, it is characterized in that, describedly buffer memory is carried out to index comprise further, after user sends retrieval request, the resource entries result set meeting user's needs is ask in Web service according to user search condition Check, return to user, create asynchronous thread simultaneously and upgrade buffer memory, download or upgrade cache contents in time interval between browse operation returning user's result set and browse result set to user and determine to click, when receiving renewal cache contents request, call index module to retrieve, the metadata of current results collection entry is loaded into buffer memory, when user sends download or browse request, Web service is called distributed type file system client side and is searched metadata in the buffer and start to read data and to client transmissions, the thread pool of a server maintenance fixing number of threads, a thread process is called when receiving renewal cache request at every turn, if do not have idle thread in thread pool, this buffer memory task is allowed to wait for, FIFO algorithm is utilized to set up cache pool and allocating cache pond size, key-value pair key/value is kept in cache pool, wherein filename is as key, the back end ID of file, the combination of reference position and length is as value, eliminate the cache entries at most, this cache pool provides two operations and put operation and get to operate, put operation puts into data toward cache pool, if existing data reach the upper limit inside cache pool, then according to FIFO cache replacement algorithm replacement data, Get operation obtains corresponding value value according to key value.
6. method according to claim 3, it is characterized in that, described master index data are stored in relational database, access is provided by relational database access interface, the Map data structure in Java is used to preserve, on the basis of resource collection write into Databasce, be increased in the field be worth by system generation when resource entries is added, data acquisition in master index Key/Value structure, use Map data structure in Java, also exist according to this Map object of content initialization in database when service starts always, add when there being new resource collection or have deleted time, this Map object is upgraded, secondary index is created by open source projects Lucene, supports small documents metadata retrieval, real-time update index file whenever user adds resource entries time, when multiple user adds resource entries under a resource collection simultaneously, realize the con current control of file write,
Described file is carried out merging comprising further: create SequenceFile object, by the filtration of filtrator, merge meeting pre-conditioned file, resource collection according to resource entries place is searched in master index, after finding file path corresponding to resource collection, create SequenceFile object, and obtain the Writer object of SequenceFile and it is configured, prepare writing in files, a new thread is opened while execute file write, by file place value corresponding for this resource entries, length information write resource entries secondary index, resource entries writes successfully and closes output stream, return and submit to successfully, otherwise return and submit to unsuccessfully.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510118185.0A CN104679898A (en) | 2015-03-18 | 2015-03-18 | Big data access method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510118185.0A CN104679898A (en) | 2015-03-18 | 2015-03-18 | Big data access method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN104679898A true CN104679898A (en) | 2015-06-03 |
Family
ID=53314940
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510118185.0A Pending CN104679898A (en) | 2015-03-18 | 2015-03-18 | Big data access method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104679898A (en) |
Cited By (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105005622A (en) * | 2015-07-24 | 2015-10-28 | 肖华 | Method for high-speed storage of high-fidelity continuous-frame queries and image output method thereof |
CN105677904A (en) * | 2016-02-04 | 2016-06-15 | 杭州数梦工场科技有限公司 | Distributed file system based small file storage method and device |
CN106095832A (en) * | 2016-06-01 | 2016-11-09 | 东软集团股份有限公司 | Distributed parallel processing method and device |
CN107040596A (en) * | 2017-04-17 | 2017-08-11 | 山东辰华科技信息有限公司 | The construction method of science service ecosystem platform based on big data cloud computing |
CN107103095A (en) * | 2017-05-19 | 2017-08-29 | 成都四象联创科技有限公司 | Method for computing data based on high performance network framework |
CN107391555A (en) * | 2017-06-07 | 2017-11-24 | 中国科学院信息工程研究所 | A kind of metadata real time updating method towards Spark Sql retrievals |
CN108053863A (en) * | 2017-12-22 | 2018-05-18 | 中国人民解放军第三军医大学第附属医院 | It is suitble to the magnanimity medical data storage system and date storage method of big small documents |
CN108427590A (en) * | 2018-02-09 | 2018-08-21 | 福建星网锐捷通讯股份有限公司 | A kind of implementation method of UI Dynamic Distributions |
CN108460054A (en) * | 2017-02-22 | 2018-08-28 | 北京京东尚科信息技术有限公司 | A kind of mthods, systems and devices improving cloud storage system performance |
CN108647193A (en) * | 2018-04-20 | 2018-10-12 | 河南中烟工业有限责任公司 | A kind of unique identifier generation method can be applied to distributed system and device |
CN109766318A (en) * | 2018-12-17 | 2019-05-17 | 新华三大数据技术有限公司 | File reading and device |
CN109947718A (en) * | 2019-02-25 | 2019-06-28 | 全球能源互联网研究院有限公司 | A kind of date storage method, storage platform and storage device |
CN110377562A (en) * | 2019-07-23 | 2019-10-25 | 宿州星尘网络科技有限公司 | Big data method for secure storing based on Hadoop Open Source Platform |
CN110688361A (en) * | 2019-08-16 | 2020-01-14 | 平安普惠企业管理有限公司 | Data migration method, electronic device and computer equipment |
CN110995799A (en) * | 2019-11-22 | 2020-04-10 | 山东九州信泰信息科技股份有限公司 | Data interaction method based on Fetch and springMVC |
CN111400247A (en) * | 2020-04-13 | 2020-07-10 | 杭州九州方园科技有限公司 | User behavior auditing method and file storage method |
CN111818021A (en) * | 2020-06-20 | 2020-10-23 | 深圳市众创达企业咨询策划有限公司 | Configuration information safety protection system and method based on new generation information technology |
CN112347044A (en) * | 2020-11-10 | 2021-02-09 | 北京赛思信安技术股份有限公司 | Object storage optimization method based on SPDK |
CN114356230A (en) * | 2021-12-22 | 2022-04-15 | 天津南大通用数据技术股份有限公司 | Method and system for improving reading performance of column storage engine |
CN114461146A (en) * | 2022-01-26 | 2022-05-10 | 北京百度网讯科技有限公司 | Cloud storage data processing method, device, system, equipment, medium and product |
WO2022141650A1 (en) * | 2021-01-04 | 2022-07-07 | Alibaba Group Holding Limited | Memory-frugal index design in storage engine |
CN118069589A (en) * | 2024-04-17 | 2024-05-24 | 济南浪潮数据技术有限公司 | File access method, device, computer equipment and program product |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100162230A1 (en) * | 2008-12-24 | 2010-06-24 | Yahoo! Inc. | Distributed computing system for large-scale data handling |
US20100179855A1 (en) * | 2009-01-09 | 2010-07-15 | Ye Chen | Large-Scale Behavioral Targeting for Advertising over a Network |
CN103577339A (en) * | 2012-07-27 | 2014-02-12 | 深圳市腾讯计算机系统有限公司 | Method and system for storing data |
CN103856567A (en) * | 2014-03-26 | 2014-06-11 | 西安电子科技大学 | Small file storage method based on Hadoop distributed file system |
-
2015
- 2015-03-18 CN CN201510118185.0A patent/CN104679898A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100162230A1 (en) * | 2008-12-24 | 2010-06-24 | Yahoo! Inc. | Distributed computing system for large-scale data handling |
US20100179855A1 (en) * | 2009-01-09 | 2010-07-15 | Ye Chen | Large-Scale Behavioral Targeting for Advertising over a Network |
CN103577339A (en) * | 2012-07-27 | 2014-02-12 | 深圳市腾讯计算机系统有限公司 | Method and system for storing data |
CN103856567A (en) * | 2014-03-26 | 2014-06-11 | 西安电子科技大学 | Small file storage method based on Hadoop distributed file system |
Non-Patent Citations (3)
Title |
---|
卞艺杰: ""Hdspace 分布式机构知识库系统的小文件存储"", 《计算机系统应用》 * |
张春明等: ""一种Hadoop小文件存储和读取的方法"", 《计算机应用与软件》 * |
陈光景: ""Hadoop小文件处理技术的研究和实现"", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
Cited By (33)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105005622B (en) * | 2015-07-24 | 2018-12-07 | 肖华 | A kind of method and its image output method of high speed storing Gao Zhenlian frame inquiry number |
CN105005622A (en) * | 2015-07-24 | 2015-10-28 | 肖华 | Method for high-speed storage of high-fidelity continuous-frame queries and image output method thereof |
CN105677904A (en) * | 2016-02-04 | 2016-06-15 | 杭州数梦工场科技有限公司 | Distributed file system based small file storage method and device |
CN105677904B (en) * | 2016-02-04 | 2019-07-12 | 杭州数梦工场科技有限公司 | Small documents storage method and device based on distributed file system |
CN106095832A (en) * | 2016-06-01 | 2016-11-09 | 东软集团股份有限公司 | Distributed parallel processing method and device |
CN108460054A (en) * | 2017-02-22 | 2018-08-28 | 北京京东尚科信息技术有限公司 | A kind of mthods, systems and devices improving cloud storage system performance |
CN107040596A (en) * | 2017-04-17 | 2017-08-11 | 山东辰华科技信息有限公司 | The construction method of science service ecosystem platform based on big data cloud computing |
CN107103095A (en) * | 2017-05-19 | 2017-08-29 | 成都四象联创科技有限公司 | Method for computing data based on high performance network framework |
CN107391555A (en) * | 2017-06-07 | 2017-11-24 | 中国科学院信息工程研究所 | A kind of metadata real time updating method towards Spark Sql retrievals |
CN107391555B (en) * | 2017-06-07 | 2020-08-04 | 中国科学院信息工程研究所 | Spark-Sql retrieval-oriented metadata real-time updating method |
CN108053863A (en) * | 2017-12-22 | 2018-05-18 | 中国人民解放军第三军医大学第附属医院 | It is suitble to the magnanimity medical data storage system and date storage method of big small documents |
CN108053863B (en) * | 2017-12-22 | 2020-09-11 | 中国人民解放军第三军医大学第一附属医院 | Mass medical data storage system and data storage method suitable for large and small files |
CN108427590A (en) * | 2018-02-09 | 2018-08-21 | 福建星网锐捷通讯股份有限公司 | A kind of implementation method of UI Dynamic Distributions |
CN108427590B (en) * | 2018-02-09 | 2021-02-05 | 福建星网锐捷通讯股份有限公司 | Method for realizing UI dynamic layout |
CN108647193B (en) * | 2018-04-20 | 2021-11-19 | 河南中烟工业有限责任公司 | Unique identifier generation method and device applicable to distributed system |
CN108647193A (en) * | 2018-04-20 | 2018-10-12 | 河南中烟工业有限责任公司 | A kind of unique identifier generation method can be applied to distributed system and device |
CN109766318A (en) * | 2018-12-17 | 2019-05-17 | 新华三大数据技术有限公司 | File reading and device |
CN109947718A (en) * | 2019-02-25 | 2019-06-28 | 全球能源互联网研究院有限公司 | A kind of date storage method, storage platform and storage device |
CN110377562B (en) * | 2019-07-23 | 2022-11-01 | 安徽朵朵云网络科技有限公司 | Big data safe storage method based on Hadoop open source platform |
CN110377562A (en) * | 2019-07-23 | 2019-10-25 | 宿州星尘网络科技有限公司 | Big data method for secure storing based on Hadoop Open Source Platform |
CN110688361A (en) * | 2019-08-16 | 2020-01-14 | 平安普惠企业管理有限公司 | Data migration method, electronic device and computer equipment |
CN110995799A (en) * | 2019-11-22 | 2020-04-10 | 山东九州信泰信息科技股份有限公司 | Data interaction method based on Fetch and springMVC |
CN111400247A (en) * | 2020-04-13 | 2020-07-10 | 杭州九州方园科技有限公司 | User behavior auditing method and file storage method |
CN111818021A (en) * | 2020-06-20 | 2020-10-23 | 深圳市众创达企业咨询策划有限公司 | Configuration information safety protection system and method based on new generation information technology |
CN111818021B (en) * | 2020-06-20 | 2021-02-09 | 深圳市众创达企业咨询策划有限公司 | Configuration information safety protection system and method based on new generation information technology |
CN112347044A (en) * | 2020-11-10 | 2021-02-09 | 北京赛思信安技术股份有限公司 | Object storage optimization method based on SPDK |
CN112347044B (en) * | 2020-11-10 | 2024-04-12 | 北京赛思信安技术股份有限公司 | Object storage optimization method based on SPDK |
WO2022141650A1 (en) * | 2021-01-04 | 2022-07-07 | Alibaba Group Holding Limited | Memory-frugal index design in storage engine |
CN114356230A (en) * | 2021-12-22 | 2022-04-15 | 天津南大通用数据技术股份有限公司 | Method and system for improving reading performance of column storage engine |
CN114356230B (en) * | 2021-12-22 | 2024-04-23 | 天津南大通用数据技术股份有限公司 | Method and system for improving read performance of column storage engine |
CN114461146A (en) * | 2022-01-26 | 2022-05-10 | 北京百度网讯科技有限公司 | Cloud storage data processing method, device, system, equipment, medium and product |
CN114461146B (en) * | 2022-01-26 | 2024-05-07 | 北京百度网讯科技有限公司 | Cloud storage data processing method, device, system, equipment, medium and product |
CN118069589A (en) * | 2024-04-17 | 2024-05-24 | 济南浪潮数据技术有限公司 | File access method, device, computer equipment and program product |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104679898A (en) | Big data access method | |
CN104778270A (en) | Storage method for multiple files | |
JP7113040B2 (en) | Versioned hierarchical data structure for distributed data stores | |
CN107247808B (en) | Distributed NewSQL database system and picture data query method | |
Wei et al. | Xstore: Fast rdma-based ordered key-value store using remote learned cache | |
EP2973018B1 (en) | A method to accelerate queries using dynamically generated alternate data formats in flash cache | |
Cambazoglu et al. | Scalability challenges in web search engines | |
CN102169507B (en) | Implementation method of distributed real-time search engine | |
US8364751B2 (en) | Automated client/server operation partitioning | |
US8356050B1 (en) | Method or system for spilling in query environments | |
WO2015094179A1 (en) | Abstraction layer between a database query engine and a distributed file system | |
JP2003006036A (en) | Clustered application server and web system having database structure | |
US9148329B1 (en) | Resource constraints for request processing | |
CN103595797B (en) | Caching method for distributed storage system | |
CN103530387A (en) | Improved method aimed at small files of HDFS | |
CN101184106A (en) | Associated transaction processing method of mobile database | |
US20100274795A1 (en) | Method and system for implementing a composite database | |
CN103023982A (en) | Low-latency metadata access method of cloud storage client | |
US11080207B2 (en) | Caching framework for big-data engines in the cloud | |
CN116108057B (en) | Distributed database access method, device, equipment and storage medium | |
CN106709010A (en) | High-efficient HDFS uploading method based on massive small files and system thereof | |
Durner et al. | Crystal: a unified cache storage system for analytical databases | |
US7743333B2 (en) | Suspending a result set and continuing from a suspended result set for scrollable cursors | |
Marcu | KerA: A Unified Ingestion and Storage System for Scalable Big Data Processing | |
US10387384B1 (en) | Method and system for semantic metadata compression in a two-tier storage system using copy-on-write |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20150603 |