CN109344161A

CN109344161A - A kind of mass data storage means based on mongodb

Info

Publication number: CN109344161A
Application number: CN201811470546.8A
Authority: CN
Inventors: 龙平波; 宣善明
Original assignee: BEIJING XINHUA RUIDE ELECTRONIC READING TECHNOLOGY Co Ltd
Current assignee: BEIJING XINHUA RUIDE ELECTRONIC READING TECHNOLOGY Co Ltd
Priority date: 2018-12-04
Filing date: 2018-12-04
Publication date: 2019-02-15

Abstract

The present invention provides a kind of mass data storage means based on mongodb herein, according to being implemented as follows；1) data storage request: between data storage and user's request, in addition a mathematical logic distribution process center；Every data is before being stored, it is necessary to initiate to request to processing center, parameter needed for being passed to；2) data cluster stores: finding corresponding cluster according to the storage information of return and just stores；3) data point library stores: finding the correspondence library under corresponding cluster according to the storage information of return and just stores；4) data divide table to store: finding corresponding table according to the storage information of return, just final data stores；5) value: when inquiring a data, distribution process center is needed to obtain the location information of the data, it may be assumed that parameter needed for incoming, center obtains location information and the return of the data further according to the same rule of storage.The location information inquired further according to return goes to search the data.

Description

A kind of mass data storage means based on mongodb

Technical field

The present invention relates to a kind of mass data storage means, are specifically that a kind of mass data based on mongodb is deposited Other databases can also be applied in method for storing.

Background technique

Currently, Mongodb database itself is a non-relational database for being designed to storage information, but in reality Problem is had in the use process of border:

1, the data volume or limited that forms data cluster is supported, also has similar problems using allocation methods；

2, to accelerate inquiry with regard to needing to index, indexing response speed will be slack-off；

3, mass data is stored even with more clusters, if not good storage scheme, still has performance bottle Neck problem, because more clusters mean that data processing also can be very complicated, under the framework of more clusters, the allotment of data just needs to collect In be uniformly processed, while needing corresponding solution.

Example: when the storing data of mongodb is more than hundred billion, necessarily table point library is divided to solve data by diversity group Storage and inquiry, then how to data implement diversity group divide table point library, this just needs to select suitable allocation plan ability It does the trick.

Summary of the invention

Therefore, in order to solve above-mentioned deficiency, the present invention provides a kind of mass data storage means based on mongodb herein (present invention is also suitable other databases)；The present invention is primarily used to the storage scheme of processing big data.Example: pass through test of many times The result shows that when data with hundred million for after the order of magnitude, to the md5 keyword obtained after unique key encryption, according to rule above It then splits, every table in each library of each cluster can be evenly distributed to substantially, the great disparity between the data volume of every table is not It can be very big.

The invention is realized in this way constructing a kind of mass data storage means based on mongodb, it is characterised in that: According to being implemented as follows；

1) data storage location is requested:

Between data storage and user's request, in addition a mathematical logic distribution process center, the processing center according to The publication of rule process data storage；The first step of data terminal is exactly to node locating for the regular center requests data；

Example: the data that terminal is submitted are that the keyword of the data obtains after processing center carries out md5 processing to keyword One 32 character strings, it is the representation method of 16 systems which, which has a feature, it may be assumed that character string by 0 to 9 with And a, b, c, d, e, f composition；

2) data diversity group rule: diversity group is needed when handling big data, divides library, divides table；So it is directed to magnanimity Data, split according to md5 keyword, for example split with the first character of character string, then can will Entire data are split to 16 clusters, are split, all data can be split to the first two character of character string 256 clusters；

3) data divide library regular: similarly with diversity group, according to md5 keyword, taking the 2nd to the 4th of character string to carry out Divide library；Library name is named with the character string of value；

4) data divide table regular: similarly with point library, according to md5 keyword, taking the 5th to the 6th of character string to be divided Table, table name are named with the character string of value；

5) it stores: when a data thinks that the data present position is arrived in processing center request, according to the position data of return And rule, it is known that it should be stored in that cluster, that library, that table, it can be stored in corresponding position；

6) single data value: when going to obtain single data according to condition, to the incoming required parameter of processing center, processing Center returns to the data present position, and the location information and rule returned according to data center is it is known that the data are stored in Then that cluster, that library, that table go corresponding position value.

7) each cluster value more data queries: is gone using mapreduce according to condition.

The present invention provides a kind of mass data storage means based on mongodb herein, according to being implemented as follows；1) data are deposited Storage request: between data storage and user's request, in addition a mathematical logic distribution process center；Every data is being deposited Before storage, it is necessary to initiate to request to processing center, parameter needed for being passed to.Processing center generates a md5 value according to incoming parameter, The md5 value is globally unique value.Then processing center is according to rule, generates cluster locating for the data according to md5 value, library, Simultaneously return position data (example: take the 1st character of character string to carry out cluster fractionation, all data can be split to 16 table A cluster (if split according to preceding 2 characters, character can be split as to 256 clusters, under normal circumstances it is not recommended that More than 256 clusters)；The the 2nd to the 4th of character string is taken to carry out a point library；The the 5th to the 6th of character string is taken to be divided Table；)；2) data cluster stores: finding corresponding cluster according to the storage information of return and just stores；3) data point library stores: The correspondence library under corresponding cluster is found according to the storage information of return just to store；4) data divide table to store: according to return Storage information finds corresponding table, and just final data stores；；5) value: when inquiring a data, distribution process is needed Center obtains the location information of the data, it may be assumed that parameter needed for incoming, center obtains the data further according to the same rule of storage Location information and return.The location information inquired further according to return goes to search the data.

Remarks: after data are more than hundred billion data magnitudes, to the md5 keyword obtained after unique key encryption, root It is split according to rule above, every table can be evenly distributed to substantially, the data volume between every table is of substantially equal.

The present invention has the advantage that the present invention provides a kind of mass data storage means based on mongodb herein；It is main If being used to handle the storage scheme of big data.According to long-term test, after data are more than hundred billion data magnitudes, to unique The md5 keyword obtained after keyword encryption splits according to rule above, can be evenly distributed to every table, every table substantially Between great disparity will not be especially big.

Detailed description of the invention

Fig. 1 is that the corresponding data Stored Procedure diagram of the present invention is intended to.

Specific embodiment

Below in conjunction with attached drawing 1, the present invention is described in detail, and technical solution in the embodiment of the present invention carries out clear Chu is fully described by, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments. Based on the embodiments of the present invention, obtained by those of ordinary skill in the art without making creative efforts all Other embodiments shall fall within the protection scope of the present invention.

The present invention provides a kind of mass data storage means based on mongodb by improving herein, according to being implemented as follows；

1) data storage location is requested:

The foregoing description of the disclosed embodiments enables those skilled in the art to implement or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, as defined herein General Principle can be realized in other embodiments without departing from the spirit or scope of the present invention.Therefore, of the invention It is not intended to be limited to the embodiments shown herein, and is to fit to and the principles and novel features disclosed herein phase one The widest scope of cause.

Claims

1. a kind of mass data storage means based on mongodb, it is characterised in that: according to being implemented as follows；

1) data storage location is requested:

Between data storage and user's request, in addition a mathematical logic distribution process center, the processing center is according to rule Handle the publication of data storage；The first step of data terminal is exactly to node locating for the regular center requests data；

2) data diversity group rule: diversity group is needed when handling big data, divides library, divides table；So it is directed to the number of magnanimity According to being split according to md5 keyword, for example split with the first character of character string, then can will be entire Data are split to 16 clusters, are split with the first two character of character string, all data can be split to 256 Cluster；

3) data divide library regular: similarly with diversity group, according to md5 keyword, taking the 2nd to the 4th of character string to carry out a point library； Library name is named with the character string of value；

4) data divide table regular: similarly with point library, according to md5 keyword, taking the 5th to the 6th of character string to carry out a point table, table Name is named with the character string of value；

5) store: when a data thinks processing center request to the data present position, according to the position data of return and Rule is known that it should be stored in that cluster, that library, that table, it can be stored in corresponding position；

6) single data value: when going to obtain single data according to condition, to the incoming required parameter of processing center, processing center The data present position is returned to, the location information and rule returned according to data center is it is known that the data are stored in that Then cluster, that library, that table go corresponding position value.