CN109344161A - A kind of mass data storage means based on mongodb - Google Patents
A kind of mass data storage means based on mongodb Download PDFInfo
- Publication number
- CN109344161A CN109344161A CN201811470546.8A CN201811470546A CN109344161A CN 109344161 A CN109344161 A CN 109344161A CN 201811470546 A CN201811470546 A CN 201811470546A CN 109344161 A CN109344161 A CN 109344161A
- Authority
- CN
- China
- Prior art keywords
- data
- library
- character string
- center
- cluster
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000013500 data storage Methods 0.000 title claims abstract description 22
- 238000000034 method Methods 0.000 description 10
- 230000007812 deficiency Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000005194 fractionation Methods 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/602—Providing cryptographic facilities or services
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Bioethics (AREA)
- General Health & Medical Sciences (AREA)
- Computer Hardware Design (AREA)
- Computer Security & Cryptography (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention provides a kind of mass data storage means based on mongodb herein, according to being implemented as follows;1) data storage request: between data storage and user's request, in addition a mathematical logic distribution process center;Every data is before being stored, it is necessary to initiate to request to processing center, parameter needed for being passed to;2) data cluster stores: finding corresponding cluster according to the storage information of return and just stores;3) data point library stores: finding the correspondence library under corresponding cluster according to the storage information of return and just stores;4) data divide table to store: finding corresponding table according to the storage information of return, just final data stores;5) value: when inquiring a data, distribution process center is needed to obtain the location information of the data, it may be assumed that parameter needed for incoming, center obtains location information and the return of the data further according to the same rule of storage.The location information inquired further according to return goes to search the data.
Description
Technical field
The present invention relates to a kind of mass data storage means, are specifically that a kind of mass data based on mongodb is deposited
Other databases can also be applied in method for storing.
Background technique
Currently, Mongodb database itself is a non-relational database for being designed to storage information, but in reality
Problem is had in the use process of border:
1, the data volume or limited that forms data cluster is supported, also has similar problems using allocation methods;
2, to accelerate inquiry with regard to needing to index, indexing response speed will be slack-off;
3, mass data is stored even with more clusters, if not good storage scheme, still has performance bottle
Neck problem, because more clusters mean that data processing also can be very complicated, under the framework of more clusters, the allotment of data just needs to collect
In be uniformly processed, while needing corresponding solution.
Example: when the storing data of mongodb is more than hundred billion, necessarily table point library is divided to solve data by diversity group
Storage and inquiry, then how to data implement diversity group divide table point library, this just needs to select suitable allocation plan ability
It does the trick.
Summary of the invention
Therefore, in order to solve above-mentioned deficiency, the present invention provides a kind of mass data storage means based on mongodb herein
(present invention is also suitable other databases);The present invention is primarily used to the storage scheme of processing big data.Example: pass through test of many times
The result shows that when data with hundred million for after the order of magnitude, to the md5 keyword obtained after unique key encryption, according to rule above
It then splits, every table in each library of each cluster can be evenly distributed to substantially, the great disparity between the data volume of every table is not
It can be very big.
The invention is realized in this way constructing a kind of mass data storage means based on mongodb, it is characterised in that:
According to being implemented as follows;
1) data storage location is requested:
Between data storage and user's request, in addition a mathematical logic distribution process center, the processing center according to
The publication of rule process data storage;The first step of data terminal is exactly to node locating for the regular center requests data;
Example: the data that terminal is submitted are that the keyword of the data obtains after processing center carries out md5 processing to keyword
One 32 character strings, it is the representation method of 16 systems which, which has a feature, it may be assumed that character string by 0 to 9 with
And a, b, c, d, e, f composition;
2) data diversity group rule: diversity group is needed when handling big data, divides library, divides table;So it is directed to magnanimity
Data, split according to md5 keyword, for example split with the first character of character string, then can will
Entire data are split to 16 clusters, are split, all data can be split to the first two character of character string
256 clusters;
3) data divide library regular: similarly with diversity group, according to md5 keyword, taking the 2nd to the 4th of character string to carry out
Divide library;Library name is named with the character string of value;
4) data divide table regular: similarly with point library, according to md5 keyword, taking the 5th to the 6th of character string to be divided
Table, table name are named with the character string of value;
5) it stores: when a data thinks that the data present position is arrived in processing center request, according to the position data of return
And rule, it is known that it should be stored in that cluster, that library, that table, it can be stored in corresponding position;
6) single data value: when going to obtain single data according to condition, to the incoming required parameter of processing center, processing
Center returns to the data present position, and the location information and rule returned according to data center is it is known that the data are stored in
Then that cluster, that library, that table go corresponding position value.
7) each cluster value more data queries: is gone using mapreduce according to condition.
The present invention provides a kind of mass data storage means based on mongodb herein, according to being implemented as follows;1) data are deposited
Storage request: between data storage and user's request, in addition a mathematical logic distribution process center;Every data is being deposited
Before storage, it is necessary to initiate to request to processing center, parameter needed for being passed to.Processing center generates a md5 value according to incoming parameter,
The md5 value is globally unique value.Then processing center is according to rule, generates cluster locating for the data according to md5 value, library,
Simultaneously return position data (example: take the 1st character of character string to carry out cluster fractionation, all data can be split to 16 table
A cluster (if split according to preceding 2 characters, character can be split as to 256 clusters, under normal circumstances it is not recommended that
More than 256 clusters);The the 2nd to the 4th of character string is taken to carry out a point library;The the 5th to the 6th of character string is taken to be divided
Table;);2) data cluster stores: finding corresponding cluster according to the storage information of return and just stores;3) data point library stores:
The correspondence library under corresponding cluster is found according to the storage information of return just to store;4) data divide table to store: according to return
Storage information finds corresponding table, and just final data stores;;5) value: when inquiring a data, distribution process is needed
Center obtains the location information of the data, it may be assumed that parameter needed for incoming, center obtains the data further according to the same rule of storage
Location information and return.The location information inquired further according to return goes to search the data.
Remarks: after data are more than hundred billion data magnitudes, to the md5 keyword obtained after unique key encryption, root
It is split according to rule above, every table can be evenly distributed to substantially, the data volume between every table is of substantially equal.
The present invention has the advantage that the present invention provides a kind of mass data storage means based on mongodb herein;It is main
If being used to handle the storage scheme of big data.According to long-term test, after data are more than hundred billion data magnitudes, to unique
The md5 keyword obtained after keyword encryption splits according to rule above, can be evenly distributed to every table, every table substantially
Between great disparity will not be especially big.
Detailed description of the invention
Fig. 1 is that the corresponding data Stored Procedure diagram of the present invention is intended to.
Specific embodiment
Below in conjunction with attached drawing 1, the present invention is described in detail, and technical solution in the embodiment of the present invention carries out clear
Chu is fully described by, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.
Based on the embodiments of the present invention, obtained by those of ordinary skill in the art without making creative efforts all
Other embodiments shall fall within the protection scope of the present invention.
The present invention provides a kind of mass data storage means based on mongodb by improving herein, according to being implemented as follows;
1) data storage location is requested:
Between data storage and user's request, in addition a mathematical logic distribution process center, the processing center according to
The publication of rule process data storage;The first step of data terminal is exactly to node locating for the regular center requests data;
Example: the data that terminal is submitted are that the keyword of the data obtains after processing center carries out md5 processing to keyword
One 32 character strings, it is the representation method of 16 systems which, which has a feature, it may be assumed that character string by 0 to 9 with
And a, b, c, d, e, f composition;
2) data diversity group rule: diversity group is needed when handling big data, divides library, divides table;So it is directed to magnanimity
Data, split according to md5 keyword, for example split with the first character of character string, then can will
Entire data are split to 16 clusters, are split, all data can be split to the first two character of character string
256 clusters;
3) data divide library regular: similarly with diversity group, according to md5 keyword, taking the 2nd to the 4th of character string to carry out
Divide library;Library name is named with the character string of value;
4) data divide table regular: similarly with point library, according to md5 keyword, taking the 5th to the 6th of character string to be divided
Table, table name are named with the character string of value;
5) it stores: when a data thinks that the data present position is arrived in processing center request, according to the position data of return
And rule, it is known that it should be stored in that cluster, that library, that table, it can be stored in corresponding position;
6) single data value: when going to obtain single data according to condition, to the incoming required parameter of processing center, processing
Center returns to the data present position, and the location information and rule returned according to data center is it is known that the data are stored in
Then that cluster, that library, that table go corresponding position value.
7) each cluster value more data queries: is gone using mapreduce according to condition.
The foregoing description of the disclosed embodiments enables those skilled in the art to implement or use the present invention.
Various modifications to these embodiments will be readily apparent to those skilled in the art, as defined herein
General Principle can be realized in other embodiments without departing from the spirit or scope of the present invention.Therefore, of the invention
It is not intended to be limited to the embodiments shown herein, and is to fit to and the principles and novel features disclosed herein phase one
The widest scope of cause.
Claims (1)
1. a kind of mass data storage means based on mongodb, it is characterised in that: according to being implemented as follows;
1) data storage location is requested:
Between data storage and user's request, in addition a mathematical logic distribution process center, the processing center is according to rule
Handle the publication of data storage;The first step of data terminal is exactly to node locating for the regular center requests data;
2) data diversity group rule: diversity group is needed when handling big data, divides library, divides table;So it is directed to the number of magnanimity
According to being split according to md5 keyword, for example split with the first character of character string, then can will be entire
Data are split to 16 clusters, are split with the first two character of character string, all data can be split to 256
Cluster;
3) data divide library regular: similarly with diversity group, according to md5 keyword, taking the 2nd to the 4th of character string to carry out a point library;
Library name is named with the character string of value;
4) data divide table regular: similarly with point library, according to md5 keyword, taking the 5th to the 6th of character string to carry out a point table, table
Name is named with the character string of value;
5) store: when a data thinks processing center request to the data present position, according to the position data of return and
Rule is known that it should be stored in that cluster, that library, that table, it can be stored in corresponding position;
6) single data value: when going to obtain single data according to condition, to the incoming required parameter of processing center, processing center
The data present position is returned to, the location information and rule returned according to data center is it is known that the data are stored in that
Then cluster, that library, that table go corresponding position value.
7) each cluster value more data queries: is gone using mapreduce according to condition.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811470546.8A CN109344161A (en) | 2018-12-04 | 2018-12-04 | A kind of mass data storage means based on mongodb |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811470546.8A CN109344161A (en) | 2018-12-04 | 2018-12-04 | A kind of mass data storage means based on mongodb |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109344161A true CN109344161A (en) | 2019-02-15 |
Family
ID=65319607
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811470546.8A Pending CN109344161A (en) | 2018-12-04 | 2018-12-04 | A kind of mass data storage means based on mongodb |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109344161A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113849458A (en) * | 2021-09-18 | 2021-12-28 | 四川长虹网络科技有限责任公司 | MongoDB middleware, data storage method and data migration method |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103023970A (en) * | 2012-11-15 | 2013-04-03 | 中国科学院计算机网络信息中心 | Method and system for storing mass data of Internet of Things (IoT) |
US20130332484A1 (en) * | 2012-06-06 | 2013-12-12 | Rackspace Us, Inc. | Data Management and Indexing Across a Distributed Database |
CN105426396A (en) * | 2015-10-28 | 2016-03-23 | 深圳市万姓宗祠网络科技股份有限公司 | Routing algorithm based database sharding method, system and middleware system |
US20170161351A1 (en) * | 2014-03-07 | 2017-06-08 | Adobe Systems Incorporated | Processing data in a distributed database across a plurality of clusters |
CN106909556A (en) * | 2015-12-23 | 2017-06-30 | 中国电信股份有限公司 | The storage equalization methods and device of main memory cluster |
CN107229688A (en) * | 2017-05-12 | 2017-10-03 | 上海前隆金融信息服务有限公司 | A kind of database level point storehouse point table method and system, server |
-
2018
- 2018-12-04 CN CN201811470546.8A patent/CN109344161A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130332484A1 (en) * | 2012-06-06 | 2013-12-12 | Rackspace Us, Inc. | Data Management and Indexing Across a Distributed Database |
CN103023970A (en) * | 2012-11-15 | 2013-04-03 | 中国科学院计算机网络信息中心 | Method and system for storing mass data of Internet of Things (IoT) |
US20170161351A1 (en) * | 2014-03-07 | 2017-06-08 | Adobe Systems Incorporated | Processing data in a distributed database across a plurality of clusters |
CN105426396A (en) * | 2015-10-28 | 2016-03-23 | 深圳市万姓宗祠网络科技股份有限公司 | Routing algorithm based database sharding method, system and middleware system |
CN106909556A (en) * | 2015-12-23 | 2017-06-30 | 中国电信股份有限公司 | The storage equalization methods and device of main memory cluster |
CN107229688A (en) * | 2017-05-12 | 2017-10-03 | 上海前隆金融信息服务有限公司 | A kind of database level point storehouse point table method and system, server |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113849458A (en) * | 2021-09-18 | 2021-12-28 | 四川长虹网络科技有限责任公司 | MongoDB middleware, data storage method and data migration method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10467245B2 (en) | System and methods for mapping and searching objects in multidimensional space | |
EP3314477B1 (en) | Systems and methods for parallelizing hash-based operators in smp databases | |
CN110297799B (en) | Data management system and method | |
US7899851B2 (en) | Indexing method of database management system | |
EP3236365A1 (en) | Data query method and device | |
KR101928529B1 (en) | Code Distributed Hash Table based MapReduce System and Method | |
US20140122484A1 (en) | System and Method for Flexible Distributed Massively Parallel Processing (MPP) Database | |
CN106599091B (en) | RDF graph structure storage and index method based on key value storage | |
CN107180031B (en) | Distributed storage method and device, and data processing method and device | |
CN106250226A (en) | Task Scheduling Mechanism based on concordance hash algorithm and system | |
US10509803B2 (en) | System and method of using replication for additional semantically defined partitioning | |
CN111723073B (en) | Data storage processing method, device, processing system and storage medium | |
CN107209768A (en) | Method and apparatus for the expansible sequence of data set | |
CN106815258A (en) | A kind of date storage method and coordinator node | |
CN109117426A (en) | Distributed networks database query method, apparatus, equipment and storage medium | |
KR101255639B1 (en) | Column-oriented database system and join process method using join index thereof | |
CN111400301B (en) | Data query method, device and equipment | |
CN105550180B (en) | The method, apparatus and system of data processing | |
CN117806659A (en) | ES high-availability cluster containerized deployment method and related device | |
CN109344161A (en) | A kind of mass data storage means based on mongodb | |
CN107239568A (en) | Distributed index implementation method and device | |
CN111767287A (en) | Data import method, device, equipment and computer storage medium | |
CN112805695A (en) | Co-sharding and randomized co-sharding | |
KR100907533B1 (en) | Distributed Distributed Processing Systems and Methods | |
US11086689B2 (en) | Method for automatically and dynamically assigning the responsibility for tasks to the available computing components in a highly distributed data-processing system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
AD01 | Patent right deemed abandoned | ||
AD01 | Patent right deemed abandoned |
Effective date of abandoning: 20240726 |