CN106095796A - Distributed data storage method, Apparatus and system - Google Patents
Distributed data storage method, Apparatus and system Download PDFInfo
- Publication number
- CN106095796A CN106095796A CN201610371832.3A CN201610371832A CN106095796A CN 106095796 A CN106095796 A CN 106095796A CN 201610371832 A CN201610371832 A CN 201610371832A CN 106095796 A CN106095796 A CN 106095796A
- Authority
- CN
- China
- Prior art keywords
- data
- type
- basic data
- storage
- sub
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/17—Details of further file system functions
- G06F16/172—Caching, prefetching or hoarding of files
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/13—File access structures, e.g. distributed indices
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/182—Distributed file systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of distributed data storage method, Apparatus and system.Wherein, the method includes: screen the basic data got, and determines the type of basic data, and wherein, type at least includes: structured type and destructuring type;Basic data is stored to the first sub-storage cluster and/or the second sub-storage cluster according to type.The present invention solves the technical problem that the data access delay of existing distributed data-storage system is high.
Description
Technical field
The present invention relates to internet arena, in particular to a kind of distributed data storage method, Apparatus and system.
Background technology
Apache Nutch is the source of Hadoop, and Hadoop technology is widely applied at internet arena,
Also obtain the common concern of research circle simultaneously.As Yahool uses a group of planes for 4000 nodes to run Hadoop, support ad system
Research with Web search;Facebook uses a group of planes for 1000 nodes to run Hadoop, storing daily record data, supports thereon
Data analysis and machine learning;Baidu Hadoop processes weekly the data of 200TB, scans for log analysis and web data
Excacation;Middle mobile academy develops " Herba Cistanches " (BigCloud) system based on Hadoop, is used not only for related data and divides
Analysis, the most externally provides service;The Hadoop system of Taobao is for storing and process the transactional related data of ecommerce.
Further, domestic colleges and universities and scientific research institutions also based on Hadoop in data storage, resource management, job scheduling, property
Energy optimization, system high-available and safety aspect are studied.
But, in existing Hadoop technology, there are the following problems:
1, data access delay is high, is not suitable for the data access operation of low latency.
2, data access delay is high, and causing cannot the substantial amounts of small documents of efficient storage.
3, multi-user management is not supported, it is impossible to realize multi-user's write and amendment.
For the problem that the data access delay of above-mentioned existing distributed data-storage system is high, the most not yet propose effectively
Solution.
Summary of the invention
Embodiments provide a kind of distributed data storage method, Apparatus and system, at least to solve existing point
The technical problem that the data access delay of cloth data-storage system is high.
An aspect according to embodiments of the present invention, it is provided that a kind of distributed data-storage system, including: data acquisition
Server, for being acquired basic data;Data processing server, is connected with data acquisition server, for basis
Data are classified, and determine the type of basic data, and wherein, type at least includes: structured type and destructuring type;Point
Cloth storage cluster, is connected with data processing server, for the basic data of structured type being stored to the first son storage
Cluster, stores the basic data of destructuring type to the second sub-storage cluster.
Further, above-mentioned distributed storage cluster also includes: index server, is connected with the first sub-storage cluster, uses
Data indexing information is generated in the basic data according to structured type.
Further, said system also includes: buffer memory server, is connected with data processing server, for by number
The basic data collected according to acquisition server caches.
Further, the second sub-storage cluster uses Hadoop HDFS distributed file storage framework.
Further, said system also includes: application server, is connected with distributed storage cluster, for providing dividing
The data-interface that in cloth storage cluster, the basic data of storage conducts interviews.
Another aspect according to embodiments of the present invention, additionally provides a kind of distributed data storage method, including: to acquisition
To basic data screen, determine the type of basic data, wherein, type at least includes: structured type and non-structural
Change type;Basic data is stored to the first sub-storage cluster and/or the second sub-storage cluster according to type.
Further, screening the basic data got, after determining the type of basic data, method is also wrapped
Include: according to the basic data of destructuring type, generate the metadata corresponding with basic data;Using metadata as structuring class
The basic data of type stores to the first sub-storage cluster.
Further, basic data is being stored to the first sub-storage cluster and/or the second sub-storage cluster according to type
Afterwards, method also includes: generating data indexing information according to basic data, wherein, data indexing information at least includes: basis number
According to description information and storage positional information;Data indexing information is stored to index server.
Further, basic data is stored to the first sub-storage cluster and/or the second sub-storage cluster, bag according to type
Include: according to type, basic data is stored to caching server;According to the storage strategy pre-set, by the base of structured type
Plinth data store to the first sub-storage cluster, and the basic data storage of destructuring type is worth the second sub-storage cluster.
Another aspect according to embodiments of the present invention, additionally provides a kind of distributed data storage method, including: screening mould
Block, for screening the basic data got, determines the type of basic data, and wherein, type at least includes: structuring
Type and destructuring type;First memory module, for by basic data according to type store to the first sub-storage cluster and/
Or the second sub-storage cluster.
Further, said apparatus also includes: the first generation module, for the basic data according to destructuring type,
Generate the metadata corresponding with basic data;Second memory module, for using metadata as the basic data of structured type
Store to the first sub-storage cluster.
Further, said apparatus also includes: the second generation module, generates for the basic data according to structured type
Data indexing information, wherein, data indexing information at least includes: the description information of basic data and storage positional information;3rd
Memory module, for storing data indexing information to index server.
In embodiments of the present invention, use and the basic data got is screened, determine the type of basic data, its
In, type at least includes: structured type and destructuring type;Basic data is stored to the first son storage collection according to type
Group and/or the mode of the second sub-storage cluster, thus reach to improve the purpose of distributed storage cluster global storage efficiency, it is achieved
Reducing the technique effect of the time delay of distributed storage cluster, the data solving existing distributed data-storage system are visited
Ask and postpone high technical problem.
Accompanying drawing explanation
Accompanying drawing described herein is used for providing a further understanding of the present invention, constitutes the part of the application, this
Bright schematic description and description is used for explaining the present invention, is not intended that inappropriate limitation of the present invention.In the accompanying drawings:
Fig. 1 is the system framework figure of distributed data-storage system according to embodiments of the present invention;
Fig. 2 is the system framework figure of a kind of optional distributed data-storage system according to embodiments of the present invention;
Fig. 3 is the system framework figure of a kind of optional distributed data-storage system according to embodiments of the present invention;
Fig. 4 is the system framework figure of a kind of optional distributed data-storage system according to embodiments of the present invention;
Fig. 5 is the flow chart of distributed data storage method according to embodiments of the present invention;
Fig. 6 is the schematic diagram of a kind of optional Distributed Storage device according to embodiments of the present invention;
Fig. 7 is the schematic diagram of a kind of optional Distributed Storage device according to embodiments of the present invention;And
Fig. 8 is the schematic diagram of a kind of optional Distributed Storage device according to embodiments of the present invention.
Detailed description of the invention
In order to make those skilled in the art be more fully understood that the present invention program, below in conjunction with in the embodiment of the present invention
Accompanying drawing, is clearly and completely described the technical scheme in the embodiment of the present invention, it is clear that described embodiment is only
The embodiment of a present invention part rather than whole embodiments.Based on the embodiment in the present invention, ordinary skill people
The every other embodiment that member is obtained under not making creative work premise, all should belong to the model of present invention protection
Enclose.
It should be noted that term " first " in description and claims of this specification and above-mentioned accompanying drawing, "
Two " it is etc. for distinguishing similar object, without being used for describing specific order or precedence.Should be appreciated that so use
Data can exchange in the appropriate case, in order to embodiments of the invention described herein can with except here diagram or
Order beyond those described is implemented.Additionally, term " includes " and " having " and their any deformation, it is intended that cover
Cover non-exclusive comprising, such as, contain series of steps or the process of unit, method, system, product or equipment are not necessarily limited to
Those steps clearly listed or unit, but can include the most clearly listing or for these processes, method, product
Or intrinsic other step of equipment or unit.
According to embodiments of the present invention, it is provided that the system embodiment of a kind of distributed data-storage system, Fig. 1 is according to this
The system framework figure of the distributed data-storage system of inventive embodiments, as it is shown in figure 1, this system includes: data acquisition service
Device 21, data processing server 23 and distributed storage cluster 25.
Wherein, data acquisition server 21, for being acquired basic data;Data processing server 23, with data
Acquisition server 21 connects, and for classifying basic data, determines the type of basic data, and wherein, type at least includes:
Structured type and destructuring type;Distributed storage cluster 25, is connected with data processing server 23, for by structuring
The basic data of type stores to the first sub-storage cluster 251, the basic data of destructuring type is stored to the second son and deposits
Accumulation 253.
Concrete, by above-mentioned data acquisition server 21, data processing server 23 and distributed storage cluster 25,
Before basic data is carried out distributed storage, carried out by the type of the data processing server 23 basic data to collecting
Classification processes, and according to the type of basic data, the sub-storage cluster of difference basic data being stored in distributed storage cluster
In.According to the type of basic data, different types of basic data is stored to the son using the storage form adapted to it and deposit
In the middle of accumulation.
The type of basic data is at least divided into structured type and destructuring type, wherein, structured type
Basic data be row data, can be stored directly in data base, with bivariate table structure come logical expression realize data.
The basic data of destructuring type is for the basic data of structured type, and inconvenience data base's two dimension logical table comes
Performance, it includes the office documents of all forms, text, picture, XML, HTML, all kinds of form, image and audio/visual information
Deng.
As the optional embodiment of one, data processing server 23, after classifying basic data, also may be used
Carry out further examination with the further basic data being destructuring type to type, the literary composition of shorthand information will be used for
Content in part is extracted, and is identified the content in picture format by optical character recognition OCR technique, extracts
Corresponding metadata, stores metadata to the first sub-storage cluster as the data of structured type.
By above-mentioned data acquisition server 21, data processing server 23 and distributed storage cluster 25, can basis
Type, by basic data, stores with the storage mode adapted to it, thus reaches to improve distributed storage cluster entirety and deposit
The purpose of storage efficiency, it is achieved that reduce the technique effect of the time delay of distributed storage cluster, solve existing distributed number
According to the technical problem that the data access delay of storage system is high.
As the optional embodiment of one, as in figure 2 it is shown, above-mentioned distributed storage cluster 25, it is also possible to including: index
Server 255.
Wherein, index server 255, it is connected with the first sub-storage cluster 251, for the basic number according to structured type
According to generating data indexing information.
Wherein, by index server 255, index can be generated according to the storage position of the basic data of structured type
Data, it is also possible to according to storage position and the storage position of corresponding metadata of the basic data of destructuring type,
Generate index data.By index server 255, in original accurate index inquiry, metadata query, structured data query
On the basis of, it is achieved by multiple index combination, unstructured data is carried out the inquiry mode retrieved at a high speed.
As the optional embodiment of one, as it is shown on figure 3, system also includes: buffer memory server 27.
Wherein, buffer memory server 27, it is connected with data processing server 21, for by data acquisition server collection
To basic data cache.
Concrete, by buffer memory server 27, the basic data that data processing server 21 collects can be entered
The storage that row is provisional, and according to the storage strategy pre-set, basic data is concentrated and is uploaded to distributed storage cluster 25
In the middle of.
As the optional embodiment of one, buffer memory server can be carried out layering arrange according to data scale.Logical
Obtain basic data after successively, and successively collect, according to the storage strategy pre-set, the form uploaded, basic data is carried out
Gather and arrange.
In the middle of reality application, at least can arrange in buffer memory server 27: save preposition caching server (one-level
Caching server), the preposition caching server of national centre (L2 cache server) and taking with the background process of system interaction
Business device (three grades of caching servers).
Wherein it is possible to by plug-in unit by scanning, the basic data of nonstructured type uploaded, utilize buffer memory server
Successively it is uploaded to the second sub-storage cluster for storing nonstructured type.Unstructured data pipe in second sub-storage cluster
Unstructured data is stored by platform with the structure of basic storage cell, and to corresponding operation system feedback association letter
Breath.Wherein, basic storage cell in the second sub-storage cluster can be according to the requirement of operation system or pre-set
Blocks of files size is cut by storage strategy.
When basic data is had access to, message can be had access to the by front end applications service directly request and transmission
Unstructured data management platform in two sub-storage clusters, unstructured data management platform analysis request message, and utilize
Have access to engine and isolate the unstructured data needed for operation system, feed back in time and have access to front-end server, in operation system
Middle integrative display is out.
As the optional embodiment of one, the second sub-storage cluster 253 uses Hadoop HDFS distributed document to store
Framework.
In the middle of reality application, replace existing storage architecture with Hadoop HDFS distributed file storage framework,
It is mainly in view of the feature of Hadoop HDFS, in order to preferably basic data be managed and provide basis to operation system
The support of data.
Hadoop HDFS can support linear expansion and the backup of many copies, and this advantage can fully meet destructuring
Data management platform stores horizontal dilatation, safety and node data and stores wanting of dynamic equalization national centre's data
Ask;Hadoop can build the HA Namenode of High Availabitity.A lot of ripe to the high availability of Hadoop HA in the industry and
Solution reliably, the Master HA deployment mode for national centre provides guidance;Utilize that Hadoop provides is abundant
Function, stores and manages unstructured data and the structural data of magnanimity, and data type can be various.
This feature can be unstructured data management platform realize unstructured data classification storage provide the foundation;Utilize
Hadoop, it is possible to use map reduce realizes cloud computing flexibly.On the basis of meeting future usage distributed storage, it is provided that
Cloud computing builds expands basis;Utilize Hadoop, can be easier to integrated third-party instrument or assembly, such as hbase,
Hive, zookeeper etc., thus realize more powerful critical-path analysis function, self-management ability, also provide big number for next step
Provide an environmental condition according to statistics.
As the optional embodiment of one, the second sub-storage cluster 253 can use Master HA to store structure further
Frame.
In the middle of reality application, for managing the distributed storage of the unstructured data management platform of destructuring type
Use can be that Master-Salve pattern realizes doing memory node the work such as node analysis, data management.So that
Master service becomes the process core of platform.Further, it is possible to use the ripe scheme of existing Hadoop HA also combines
Apply actual deployment, so that the two-shipper of Master has high availability, and ensure the stalwartness of platform in the case of accident
With stable.
As the optional embodiment of one, as shown in Figure 4, in above-mentioned distributed data-storage system, it is also possible to bag
Include: application server 29.
Wherein, application server 29, it is connected with distributed storage cluster 25, deposits in distributed storage cluster for providing
The data-interface that the basic data of storage conducts interviews.
In the middle of reality application, for ensureing comprehensively, be efficiently completed that operation system accesses, serviced by Standard Interface and be
System access standard.Unified interface service is supplied to external system various protocols by application server 29 and accesses support, by one is
Row access and realize the use of various service tuple in unstructured data management platform base service framework.According to different system
Service logic and demand customize access interface Services Composition, it is achieved the most succinct system access pattern, with reach save
The cost such as time, investment.
From the foregoing, it will be observed that above-mentioned distributed data-storage system relatively prior art, have a characteristic that
The distributed system increased income is used to create unified distributed data-storage system, it is achieved mass data storage and pipe
Reason.Owing to the basic data amount of the destructuring type of class enterprise of bank is huge, produce all kinds of vouchers every day and file data is high
Reaching 2TB, the data volume storing and managing is up to PB level.In this case, Hadoop adopts as Apache tissue
The open source projects framework that the thought of Google storage and management mass data is released just is being suitable for designing requirement.Distributed data is deposited
Storage system uses Hadoop framework to build distributed environment, and mass small documents carries out Piece file mergence storage, uses ZooKeeper
The cluster that management builds.
A large amount of cheap PC Server cluster and low side array is used to replace the system hardware of traditional high-side storage solution
Framework.Distributed data-storage system based on hadoop open source technology uses the Technical Architecture increased income not only to meet bank sea
Amount data whole nation centralized stores manages, accesses for operation system the requirement of the self-characters such as offer loose coupling service, is also future
The degree of depth excavates destructuring and semi-structured basic data use value lays framework basis further, achieves employing especially big
Measure cheap PC Server cluster and the system hardware framework of low side array replacement traditional high-side storage solution.This is not only
Enterprise saves the cost that substantial contribution puts into, reduces data infrastructure, the most in no way inferior in the access of professional high-end storage
Efficiency, the highest in the case of magnanimity unstructured data stores, improve destructuring number under big data environment especially
According to value.
Distributed data-storage system based on hadoop open source technology has extraordinary autgmentability and stability.Distribution
Formula storage architecture not only solves the performance pressures that extension brings, and the equipment that is also easy to expands and debugs and dispose, it is possible to
Save, for enterprise, a large amount of human and material resources costs that upgrading brings, reduce potential risk, maintenance platform that system upgrade brings
Production run steady in a long-term.
Distributed data-storage system based on hadoop open source technology, can be for bank based on big data management
Class enterprise magnanimity unstructured data storage with share provide solution while, also provide for for destructuring type
The management of basic data complete lifecycle, have perfect security authentication mechanism, it is possible to for class enterprise of bank with content be
The business driven provides complete flow process to realize.
Distributed Full-text Indexing Technology can be complementary with relational data library inquiry, meets efficient data retrieval requirement.Base
In the metadata of relation data library storage, face storage data volume huge, the problems such as recall precision is the highest.Utilize distributed full text
Index solve relevant database cannot fuzzy search problem, and batch precise search utilize traditional database advantage to realize.As
This forms the complementation of document retrieval pattern, it is possible to meet the requirement that bank uses for unstructured data.
Distributed data-storage system based on hadoop open source technology achieves the basic data of destructuring type
Gather, manage and share in each operation system;Realize the optimization of operation flow and reproduce, making the unstructured datas such as archives
Manage more science, rationally.For the following all kinds of business developments of class enterprise of bank provide image file, data file centralized Control and
The strong basic platform of standardized management supports.Realize inside control system procedure, it is achieved rules and regulations implant operation flow, finally
Realize Work Flow Optimizing and reproducing, lay a good foundation for striding forward to Functional Bank from traditional bank of department.
According to embodiments of the present invention, it is provided that the embodiment of the method for a kind of distributed data storage method, explanation is needed
It is can to perform in the computer system of such as one group of computer executable instructions in the step shown in the flow chart of accompanying drawing,
And, although show logical order in flow charts, but in some cases, can perform with the order being different from herein
Shown or described step.
Fig. 5 is the flow chart of distributed data storage method according to embodiments of the present invention, as it is shown in figure 5, the method bag
Include following steps:
Step S21, screens the basic data got, and determines the type of basic data, and wherein, type is at least wrapped
Include: structured type and destructuring type.
Step S23, stores basic data to the first sub-storage cluster and/or the second sub-storage cluster according to type.
Concrete, in above-mentioned steps S21 to step S23, determined the class of the basic data got by data screening
Type, and according to type, basic data is stored to predetermined storage cluster with corresponding storage form.Thus reach raising point
The purpose of cloth storage cluster global storage efficiency, it is achieved that reduce the technique effect of the time delay of distributed storage cluster,
Solve the technical problem that the data access delay of existing distributed data-storage system is high.
As the optional embodiment of one, in step S21, the basic data got is screened, determine basis number
According to type after, the method also includes:
Step S221, according to the basic data of destructuring type, generates the metadata corresponding with basic data.
Step S223, stores metadata to the first sub-storage cluster as the basic data of structured type.
Concrete, by step S221 to step S223, after basic data is classified, then it is non-structural to type
The content changed in the basic data of type is extracted, and gets first number of basic data for describing destructuring type
According to.Further, metadata is stored to the first sub-storage cluster as the basic data of structured type, to improve read-write
Efficiency.
As the optional embodiment of one, in step S23, basic data is stored to the first son storage collection according to type
After group and/or the second sub-storage cluster, the method also includes:
Step S25, generates data indexing information according to basic data, and wherein, data indexing information at least includes: basis number
According to description information and storage positional information.
Step S27, stores data indexing information to index server.
Concrete, by step S25 to step S27, according to the content description information of basic data, storage position and/or
Incidence relation generates data indexing information, and is stored to index server by data indexing information.Thus reduce distributed
The load of storage cluster, and improve the system effectiveness of overall distribution formula storage system.
As the optional embodiment of one, in step S23, basic data is stored to the first son storage collection according to type
In group and/or the second sub-storage cluster, this step includes:
Step S231, stores basic data to caching server according to type.
Step S233, according to the storage strategy pre-set, stores the basic data of structured type to the first son and deposits
Accumulation, is worth the second sub-storage cluster by the basic data storage of destructuring type.
Concrete, caching server can be set in distributed data-storage system, and these caching servers can
Arrange with classification.Caching server can store the basic data that data acquisition server collects with user temporarily.According in advance
The storage strategy arranged, is successively uploaded to the first sub-storage cluster by basic data according to set of types and the second sub-storage cluster is worked as
In.
According to embodiments of the present invention, additionally provide the device embodiment of a kind of Distributed Storage device, such as Fig. 6 institute
Showing, above-mentioned Distributed Storage device includes: screening module 31 and the first memory module 33.
Wherein, screen module 31, for the basic data got is screened, determine the type of basic data, its
In, type at least includes: structured type and destructuring type;First memory module 33, is used for basic data according to class
Type stores to the first sub-storage cluster and/or the second sub-storage cluster.
Concrete, in above-mentioned screening module 31 and the first memory module 33, determined the base got by data screening
The type of plinth data, and according to type, basic data is stored to predetermined storage cluster with corresponding storage form.Thus
Reach to improve the purpose of distributed storage cluster global storage efficiency, it is achieved that reduce the time delay of distributed storage cluster
Technique effect, solves the technical problem that the data access delay of existing distributed data-storage system is high.
As the optional embodiment of one, as it is shown in fig. 7, said apparatus can also include: the first generation module 321 He
Second memory module 323.
Wherein, the first generation module 321, for the basic data according to destructuring type, generate and basic data pair
The metadata answered;Second memory module 323, deposits for metadata being stored to the first son as the basic data of structured type
Accumulation.
Concrete, by above-mentioned first generation module 321 and the second memory module 323, basic data is being classified
After, then the content in the basic data that type is destructuring type is extracted, get for describing destructuring class
The metadata of the basic data of type.Further, metadata is stored to the first son storage as the basic data of structured type
In the middle of cluster, to improve read-write efficiency.
As the optional embodiment of one, as shown in Figure 8, said apparatus can also include: the second generation module 35 He
3rd memory module 37.
Second generation module 35, generates data indexing information, wherein, data for the basic data according to structured type
Index information at least includes: the description information of basic data and storage positional information;3rd memory module 37, for by data rope
Fuse breath stores to index server.
Concrete, by above-mentioned second generation module 35 and the 3rd memory module 37, describe according to the content of basic data
Information, storage position and/or incidence relation generate data indexing information, and data indexing information is stored to index server work as
In.Thus reduce the load of distributed storage cluster, and improve the system effectiveness of overall distribution formula storage system.
Further, as the optional embodiment of one, in above-mentioned first memory module 33, can perform to walk as follows
Rapid:
According to type, basic data is stored to caching server.And according to the storage strategy pre-set, by structuring
The basic data of type stores to the first sub-storage cluster, and the basic data storage of destructuring type is worth the second son storage collection
Group.
Concrete, caching server can be set in distributed data-storage system, and these caching servers can
Arrange with classification.Caching server can store the basic data that data acquisition server collects with user temporarily.According in advance
The storage strategy arranged, is successively uploaded to the first sub-storage cluster by basic data according to set of types and the second sub-storage cluster is worked as
In.
The invention described above embodiment sequence number, just to describing, does not represent the quality of embodiment.
In the above embodiment of the present invention, the description to each embodiment all emphasizes particularly on different fields, and does not has in certain embodiment
The part described in detail, may refer to the associated description of other embodiments.
In several embodiments provided herein, it should be understood that disclosed technology contents, can be passed through other
Mode realizes.Wherein, device embodiment described above is only schematically, the division of the most described unit, Ke Yiwei
A kind of logic function divides, actual can have when realizing other dividing mode, the most multiple unit or assembly can in conjunction with or
Person is desirably integrated into another system, or some features can be ignored, or does not performs.Another point, shown or discussed is mutual
Between coupling direct-coupling or communication connection can be the INDIRECT COUPLING by some interfaces, unit or module or communication link
Connect, can be being electrical or other form.
The described unit illustrated as separating component can be or may not be physically separate, shows as unit
The parts shown can be or may not be physical location, i.e. may be located at a place, or can also be distributed to multiple
On unit.Some or all of unit therein can be selected according to the actual needs to realize the purpose of the present embodiment scheme.
It addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, it is also possible to
It is that unit is individually physically present, it is also possible to two or more unit are integrated in a unit.Above-mentioned integrated list
Unit both can realize to use the form of hardware, it would however also be possible to employ the form of SFU software functional unit realizes.
If described integrated unit realizes and as independent production marketing or use using the form of SFU software functional unit
Time, can be stored in a computer read/write memory medium.Based on such understanding, technical scheme is substantially
The part that in other words prior art contributed or this technical scheme completely or partially can be with the form of software product
Embodying, this computer software product is stored in a storage medium, including some instructions with so that a computer
Equipment (can be for personal computer, server or the network equipment etc.) perform the whole of method described in each embodiment of the present invention or
Part steps.And aforesaid storage medium includes: USB flash disk, read only memory (ROM, Read-Only Memory), random access memory are deposited
Reservoir (RAM, Random Access Memory), portable hard drive, magnetic disc or CD etc. are various can store program code
Medium.
The above is only the preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art
For Yuan, under the premise without departing from the principles of the invention, it is also possible to make some improvements and modifications, these improvements and modifications also should
It is considered as protection scope of the present invention.
Claims (10)
1. a distributed data-storage system, it is characterised in that including:
Data acquisition server, for being acquired basic data;
Data processing server, is connected with described data acquisition server, for classifying described basic data, determines institute
Stating the type of basic data, wherein, described type at least includes: structured type and destructuring type;
Distributed storage cluster, is connected with described data processing server, for by the described basis number of described structured type
According to storing to the first sub-storage cluster, the described basic data of described destructuring type is stored to the second sub-storage cluster.
System the most according to claim 1, it is characterised in that described distributed storage cluster includes:
Index server, is connected with described first sub-storage cluster, for the described basic data according to described structured type
Generate data indexing information.
System the most according to claim 2, it is characterised in that described system also includes:
Buffer memory server, is connected with described data processing server, for collected by described data acquisition server
Described basic data caches.
System the most according to claim 1, it is characterised in that described second sub-storage cluster uses Hadoop HDFS to divide
Cloth file storage framework.
System the most as claimed in any of claims 1 to 4, it is characterised in that described system also includes:
Application server, is connected with described distributed storage cluster, for providing storage in described distributed storage cluster
The data-interface that described basic data conducts interviews.
6. being applied in claim 1 to 5 distributed data storage method for system described in any one, its feature exists
In, including:
Screening the basic data got, determine the type of described basic data, wherein, described type at least includes:
Structured type and destructuring type;
Described basic data is stored to the first sub-storage cluster and/or the second sub-storage cluster according to described type.
Method the most according to claim 6, it is characterised in that the basic data got is being screened, is determining institute
After stating the type of basic data, described method also includes:
According to the described basic data of described destructuring type, generate the metadata corresponding with described basic data;
Described metadata is stored to described first sub-storage cluster as the described basic data of described structured type.
Method the most according to claim 7, it is characterised in that described basic data is being stored to according to described type
After one sub-storage cluster and/or the second sub-storage cluster, described method also includes:
Generating data indexing information according to described basic data, wherein, described data indexing information at least includes: described basis number
According to description information and storage positional information;
Described data indexing information is stored to index server.
Method the most according to claim 8, it is characterised in that described basic data is stored to described according to described type
First sub-storage cluster and/or described second sub-storage cluster, including:
According to described type, described basic data is stored to caching server;
According to the storage strategy pre-set, the described basic data of described structured type is stored to the first son storage collection
Group, is worth the second sub-storage cluster by the described basic data storage of described destructuring type.
10. a Distributed Storage device, it is characterised in that including:
Screening module, for screening the basic data got, determines the type of described basic data, wherein, described
Type at least includes: structured type and destructuring type;
First memory module, for storing described basic data to the first sub-storage cluster and/or second according to described type
Sub-storage cluster.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610371832.3A CN106095796A (en) | 2016-05-30 | 2016-05-30 | Distributed data storage method, Apparatus and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610371832.3A CN106095796A (en) | 2016-05-30 | 2016-05-30 | Distributed data storage method, Apparatus and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106095796A true CN106095796A (en) | 2016-11-09 |
Family
ID=57229470
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610371832.3A Pending CN106095796A (en) | 2016-05-30 | 2016-05-30 | Distributed data storage method, Apparatus and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106095796A (en) |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106886585A (en) * | 2017-02-14 | 2017-06-23 | 北京数码大方科技股份有限公司 | The data save method and device drawn in applying |
CN107194007A (en) * | 2017-06-20 | 2017-09-22 | 哈尔滨工业大学 | A kind of integrated management system of spacecraft isomery test data |
CN108023957A (en) * | 2017-12-07 | 2018-05-11 | 温州中壹技术服务有限公司 | A kind of collaborative computer network management system for the processing of information Quick Acquisition |
CN108255851A (en) * | 2016-12-29 | 2018-07-06 | 北京京东尚科信息技术有限公司 | A kind of combing system and method for project data |
CN109165207A (en) * | 2018-07-16 | 2019-01-08 | 华南农业大学 | Drinking water mass data storage management method and system based on Hadoop |
CN109388657A (en) * | 2018-09-10 | 2019-02-26 | 平安科技(深圳)有限公司 | Data processing method, device, computer equipment and storage medium |
CN109670027A (en) * | 2018-12-27 | 2019-04-23 | 上海农村商业银行股份有限公司 | A kind of image query, caching, storing method and system |
CN109933587A (en) * | 2019-02-26 | 2019-06-25 | 厦门市美亚柏科信息股份有限公司 | Data processing method, device, system and storage medium based on catalogue registration |
CN110088745A (en) * | 2016-12-22 | 2019-08-02 | 日本电信电话株式会社 | Data processing system and data processing method |
CN110704698A (en) * | 2019-12-13 | 2020-01-17 | 中国人民解放军国防科技大学 | Correlation and query method for unstructured massive network security data |
CN110769072A (en) * | 2019-10-31 | 2020-02-07 | 北京达佳互联信息技术有限公司 | Multimedia resource acquisition method, device and storage medium |
CN111190991A (en) * | 2019-12-10 | 2020-05-22 | 华能集团技术创新中心有限公司 | Unstructured data transmission system and interaction method |
CN111210205A (en) * | 2020-01-13 | 2020-05-29 | 上海威派格智慧水务股份有限公司 | Data processing system |
CN112015952A (en) * | 2019-06-03 | 2020-12-01 | 食亨(上海)科技服务有限公司 | Data processing system and method |
CN113961628A (en) * | 2021-12-20 | 2022-01-21 | 广州市腾嘉自动化仪表有限公司 | Distributed data analysis control system |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103440288A (en) * | 2013-08-16 | 2013-12-11 | 曙光信息产业股份有限公司 | Big data storage method and device |
CN103678603A (en) * | 2013-12-13 | 2014-03-26 | 江苏物联网研究发展中心 | Multi-source heterogeneous data efficient converging and storing frame system |
CN104035943A (en) * | 2013-03-08 | 2014-09-10 | 联想(北京)有限公司 | Data storage method and corresponding server |
CN104081385A (en) * | 2011-04-29 | 2014-10-01 | 汤姆森路透社全球资源公司 | Representing information from documents |
CN104820670A (en) * | 2015-03-13 | 2015-08-05 | 国家电网公司 | Method for acquiring and storing big data of power information |
-
2016
- 2016-05-30 CN CN201610371832.3A patent/CN106095796A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104081385A (en) * | 2011-04-29 | 2014-10-01 | 汤姆森路透社全球资源公司 | Representing information from documents |
CN104035943A (en) * | 2013-03-08 | 2014-09-10 | 联想(北京)有限公司 | Data storage method and corresponding server |
CN103440288A (en) * | 2013-08-16 | 2013-12-11 | 曙光信息产业股份有限公司 | Big data storage method and device |
CN103678603A (en) * | 2013-12-13 | 2014-03-26 | 江苏物联网研究发展中心 | Multi-source heterogeneous data efficient converging and storing frame system |
CN104820670A (en) * | 2015-03-13 | 2015-08-05 | 国家电网公司 | Method for acquiring and storing big data of power information |
Cited By (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110088745A (en) * | 2016-12-22 | 2019-08-02 | 日本电信电话株式会社 | Data processing system and data processing method |
CN110088745B (en) * | 2016-12-22 | 2023-08-08 | 日本电信电话株式会社 | Data processing system and data processing method |
CN108255851A (en) * | 2016-12-29 | 2018-07-06 | 北京京东尚科信息技术有限公司 | A kind of combing system and method for project data |
CN106886585A (en) * | 2017-02-14 | 2017-06-23 | 北京数码大方科技股份有限公司 | The data save method and device drawn in applying |
CN107194007A (en) * | 2017-06-20 | 2017-09-22 | 哈尔滨工业大学 | A kind of integrated management system of spacecraft isomery test data |
CN108023957A (en) * | 2017-12-07 | 2018-05-11 | 温州中壹技术服务有限公司 | A kind of collaborative computer network management system for the processing of information Quick Acquisition |
CN109165207A (en) * | 2018-07-16 | 2019-01-08 | 华南农业大学 | Drinking water mass data storage management method and system based on Hadoop |
CN109165207B (en) * | 2018-07-16 | 2021-11-26 | 华南农业大学 | Drinking water mass data storage management method and system based on Hadoop |
CN109388657A (en) * | 2018-09-10 | 2019-02-26 | 平安科技(深圳)有限公司 | Data processing method, device, computer equipment and storage medium |
CN109388657B (en) * | 2018-09-10 | 2023-08-08 | 平安科技(深圳)有限公司 | Data processing method, device, computer equipment and storage medium |
CN109670027A (en) * | 2018-12-27 | 2019-04-23 | 上海农村商业银行股份有限公司 | A kind of image query, caching, storing method and system |
CN109933587A (en) * | 2019-02-26 | 2019-06-25 | 厦门市美亚柏科信息股份有限公司 | Data processing method, device, system and storage medium based on catalogue registration |
CN109933587B (en) * | 2019-02-26 | 2023-04-11 | 厦门市美亚柏科信息股份有限公司 | Data processing method, device and system based on directory registration and storage medium |
CN112015952A (en) * | 2019-06-03 | 2020-12-01 | 食亨(上海)科技服务有限公司 | Data processing system and method |
CN110769072A (en) * | 2019-10-31 | 2020-02-07 | 北京达佳互联信息技术有限公司 | Multimedia resource acquisition method, device and storage medium |
CN111190991A (en) * | 2019-12-10 | 2020-05-22 | 华能集团技术创新中心有限公司 | Unstructured data transmission system and interaction method |
CN111190991B (en) * | 2019-12-10 | 2023-11-10 | 华能集团技术创新中心有限公司 | Unstructured data transmission system and interaction method |
CN110704698A (en) * | 2019-12-13 | 2020-01-17 | 中国人民解放军国防科技大学 | Correlation and query method for unstructured massive network security data |
CN110704698B (en) * | 2019-12-13 | 2020-04-10 | 中国人民解放军国防科技大学 | Correlation and query method for unstructured massive network security data |
CN111210205A (en) * | 2020-01-13 | 2020-05-29 | 上海威派格智慧水务股份有限公司 | Data processing system |
CN113961628A (en) * | 2021-12-20 | 2022-01-21 | 广州市腾嘉自动化仪表有限公司 | Distributed data analysis control system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106095796A (en) | Distributed data storage method, Apparatus and system | |
DE102016105472B4 (en) | Storage tiering and block-level parallel allocation in file systems | |
CN104820670B (en) | A kind of acquisition of power information big data and storage method | |
CN101645032B (en) | Performance analysis method of application server and application server | |
CN102917009B (en) | A kind of stock certificate data collection based on cloud computing technology and storage means and system | |
DE202015009777U1 (en) | Transparent discovery of a semi-structured data scheme | |
CN101611399A (en) | Webpage, website modeling and generation | |
DE202011110890U1 (en) | System for providing a data storage and data processing service | |
CN102609526A (en) | Internet website content management system | |
CN106294695A (en) | A kind of implementation method towards the biggest data search engine | |
CN105007314B (en) | Towards the big data processing system of magnanimity readers ' reading data | |
CN108399199A (en) | A kind of collection of the application software running log based on Spark and service processing system and method | |
CN108776672A (en) | Knowledge Management System based on SOLR | |
Nguyen et al. | Design of a platform for collecting and analyzing agricultural big data | |
CN103995807A (en) | Massive data query and secondary processing method based on Web architecture | |
CN106649718A (en) | Large data acquisition and processing method for PDM system | |
CN115269743A (en) | Data collection and processing system for data fusion | |
Ali et al. | A state of art survey for big data processing and nosql database architecture | |
Ma | Traditional music protection system from the ecological perspective based on big data analysis. | |
KR101665649B1 (en) | System for analyzing social media data and method for analyzing social media data using the same | |
CN101441645A (en) | System and method of technical data analysis | |
CN102890708A (en) | Procurement decision auxiliary support system for library | |
Trifu et al. | Big data components for business process optimization | |
CN113485987A (en) | Enterprise information tag generation method and device | |
CN113177150A (en) | Publication resource integration method and publication resource integration system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20161109 |