CN109344161A - A kind of mass data storage means based on mongodb - Google Patents

A kind of mass data storage means based on mongodb Download PDF

Info

Publication number
CN109344161A
CN109344161A CN201811470546.8A CN201811470546A CN109344161A CN 109344161 A CN109344161 A CN 109344161A CN 201811470546 A CN201811470546 A CN 201811470546A CN 109344161 A CN109344161 A CN 109344161A
Authority
CN
China
Prior art keywords
data
library
character string
center
cluster
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811470546.8A
Other languages
Chinese (zh)
Inventor
龙平波
宣善明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJING XINHUA RUIDE ELECTRONIC READING TECHNOLOGY Co Ltd
Original Assignee
BEIJING XINHUA RUIDE ELECTRONIC READING TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING XINHUA RUIDE ELECTRONIC READING TECHNOLOGY Co Ltd filed Critical BEIJING XINHUA RUIDE ELECTRONIC READING TECHNOLOGY Co Ltd
Priority to CN201811470546.8A priority Critical patent/CN109344161A/en
Publication of CN109344161A publication Critical patent/CN109344161A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention provides a kind of mass data storage means based on mongodb herein, according to being implemented as follows;1) data storage request: between data storage and user's request, in addition a mathematical logic distribution process center;Every data is before being stored, it is necessary to initiate to request to processing center, parameter needed for being passed to;2) data cluster stores: finding corresponding cluster according to the storage information of return and just stores;3) data point library stores: finding the correspondence library under corresponding cluster according to the storage information of return and just stores;4) data divide table to store: finding corresponding table according to the storage information of return, just final data stores;5) value: when inquiring a data, distribution process center is needed to obtain the location information of the data, it may be assumed that parameter needed for incoming, center obtains location information and the return of the data further according to the same rule of storage.The location information inquired further according to return goes to search the data.

Description

A kind of mass data storage means based on mongodb
Technical field
The present invention relates to a kind of mass data storage means, are specifically that a kind of mass data based on mongodb is deposited Other databases can also be applied in method for storing.
Background technique
Currently, Mongodb database itself is a non-relational database for being designed to storage information, but in reality Problem is had in the use process of border:
1, the data volume or limited that forms data cluster is supported, also has similar problems using allocation methods;
2, to accelerate inquiry with regard to needing to index, indexing response speed will be slack-off;
3, mass data is stored even with more clusters, if not good storage scheme, still has performance bottle Neck problem, because more clusters mean that data processing also can be very complicated, under the framework of more clusters, the allotment of data just needs to collect In be uniformly processed, while needing corresponding solution.
Example: when the storing data of mongodb is more than hundred billion, necessarily table point library is divided to solve data by diversity group Storage and inquiry, then how to data implement diversity group divide table point library, this just needs to select suitable allocation plan ability It does the trick.
Summary of the invention
Therefore, in order to solve above-mentioned deficiency, the present invention provides a kind of mass data storage means based on mongodb herein (present invention is also suitable other databases);The present invention is primarily used to the storage scheme of processing big data.Example: pass through test of many times The result shows that when data with hundred million for after the order of magnitude, to the md5 keyword obtained after unique key encryption, according to rule above It then splits, every table in each library of each cluster can be evenly distributed to substantially, the great disparity between the data volume of every table is not It can be very big.
The invention is realized in this way constructing a kind of mass data storage means based on mongodb, it is characterised in that: According to being implemented as follows;
1) data storage location is requested:
Between data storage and user's request, in addition a mathematical logic distribution process center, the processing center according to The publication of rule process data storage;The first step of data terminal is exactly to node locating for the regular center requests data;
Example: the data that terminal is submitted are that the keyword of the data obtains after processing center carries out md5 processing to keyword One 32 character strings, it is the representation method of 16 systems which, which has a feature, it may be assumed that character string by 0 to 9 with And a, b, c, d, e, f composition;
2) data diversity group rule: diversity group is needed when handling big data, divides library, divides table;So it is directed to magnanimity Data, split according to md5 keyword, for example split with the first character of character string, then can will Entire data are split to 16 clusters, are split, all data can be split to the first two character of character string 256 clusters;
3) data divide library regular: similarly with diversity group, according to md5 keyword, taking the 2nd to the 4th of character string to carry out Divide library;Library name is named with the character string of value;
4) data divide table regular: similarly with point library, according to md5 keyword, taking the 5th to the 6th of character string to be divided Table, table name are named with the character string of value;
5) it stores: when a data thinks that the data present position is arrived in processing center request, according to the position data of return And rule, it is known that it should be stored in that cluster, that library, that table, it can be stored in corresponding position;
6) single data value: when going to obtain single data according to condition, to the incoming required parameter of processing center, processing Center returns to the data present position, and the location information and rule returned according to data center is it is known that the data are stored in Then that cluster, that library, that table go corresponding position value.
7) each cluster value more data queries: is gone using mapreduce according to condition.
The present invention provides a kind of mass data storage means based on mongodb herein, according to being implemented as follows;1) data are deposited Storage request: between data storage and user's request, in addition a mathematical logic distribution process center;Every data is being deposited Before storage, it is necessary to initiate to request to processing center, parameter needed for being passed to.Processing center generates a md5 value according to incoming parameter, The md5 value is globally unique value.Then processing center is according to rule, generates cluster locating for the data according to md5 value, library, Simultaneously return position data (example: take the 1st character of character string to carry out cluster fractionation, all data can be split to 16 table A cluster (if split according to preceding 2 characters, character can be split as to 256 clusters, under normal circumstances it is not recommended that More than 256 clusters);The the 2nd to the 4th of character string is taken to carry out a point library;The the 5th to the 6th of character string is taken to be divided Table;);2) data cluster stores: finding corresponding cluster according to the storage information of return and just stores;3) data point library stores: The correspondence library under corresponding cluster is found according to the storage information of return just to store;4) data divide table to store: according to return Storage information finds corresponding table, and just final data stores;;5) value: when inquiring a data, distribution process is needed Center obtains the location information of the data, it may be assumed that parameter needed for incoming, center obtains the data further according to the same rule of storage Location information and return.The location information inquired further according to return goes to search the data.
Remarks: after data are more than hundred billion data magnitudes, to the md5 keyword obtained after unique key encryption, root It is split according to rule above, every table can be evenly distributed to substantially, the data volume between every table is of substantially equal.
The present invention has the advantage that the present invention provides a kind of mass data storage means based on mongodb herein;It is main If being used to handle the storage scheme of big data.According to long-term test, after data are more than hundred billion data magnitudes, to unique The md5 keyword obtained after keyword encryption splits according to rule above, can be evenly distributed to every table, every table substantially Between great disparity will not be especially big.
Detailed description of the invention
Fig. 1 is that the corresponding data Stored Procedure diagram of the present invention is intended to.
Specific embodiment
Below in conjunction with attached drawing 1, the present invention is described in detail, and technical solution in the embodiment of the present invention carries out clear Chu is fully described by, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments. Based on the embodiments of the present invention, obtained by those of ordinary skill in the art without making creative efforts all Other embodiments shall fall within the protection scope of the present invention.
The present invention provides a kind of mass data storage means based on mongodb by improving herein, according to being implemented as follows;
1) data storage location is requested:
Between data storage and user's request, in addition a mathematical logic distribution process center, the processing center according to The publication of rule process data storage;The first step of data terminal is exactly to node locating for the regular center requests data;
Example: the data that terminal is submitted are that the keyword of the data obtains after processing center carries out md5 processing to keyword One 32 character strings, it is the representation method of 16 systems which, which has a feature, it may be assumed that character string by 0 to 9 with And a, b, c, d, e, f composition;
2) data diversity group rule: diversity group is needed when handling big data, divides library, divides table;So it is directed to magnanimity Data, split according to md5 keyword, for example split with the first character of character string, then can will Entire data are split to 16 clusters, are split, all data can be split to the first two character of character string 256 clusters;
3) data divide library regular: similarly with diversity group, according to md5 keyword, taking the 2nd to the 4th of character string to carry out Divide library;Library name is named with the character string of value;
4) data divide table regular: similarly with point library, according to md5 keyword, taking the 5th to the 6th of character string to be divided Table, table name are named with the character string of value;
5) it stores: when a data thinks that the data present position is arrived in processing center request, according to the position data of return And rule, it is known that it should be stored in that cluster, that library, that table, it can be stored in corresponding position;
6) single data value: when going to obtain single data according to condition, to the incoming required parameter of processing center, processing Center returns to the data present position, and the location information and rule returned according to data center is it is known that the data are stored in Then that cluster, that library, that table go corresponding position value.
7) each cluster value more data queries: is gone using mapreduce according to condition.
The foregoing description of the disclosed embodiments enables those skilled in the art to implement or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, as defined herein General Principle can be realized in other embodiments without departing from the spirit or scope of the present invention.Therefore, of the invention It is not intended to be limited to the embodiments shown herein, and is to fit to and the principles and novel features disclosed herein phase one The widest scope of cause.

Claims (1)

1. a kind of mass data storage means based on mongodb, it is characterised in that: according to being implemented as follows;
1) data storage location is requested:
Between data storage and user's request, in addition a mathematical logic distribution process center, the processing center is according to rule Handle the publication of data storage;The first step of data terminal is exactly to node locating for the regular center requests data;
2) data diversity group rule: diversity group is needed when handling big data, divides library, divides table;So it is directed to the number of magnanimity According to being split according to md5 keyword, for example split with the first character of character string, then can will be entire Data are split to 16 clusters, are split with the first two character of character string, all data can be split to 256 Cluster;
3) data divide library regular: similarly with diversity group, according to md5 keyword, taking the 2nd to the 4th of character string to carry out a point library; Library name is named with the character string of value;
4) data divide table regular: similarly with point library, according to md5 keyword, taking the 5th to the 6th of character string to carry out a point table, table Name is named with the character string of value;
5) store: when a data thinks processing center request to the data present position, according to the position data of return and Rule is known that it should be stored in that cluster, that library, that table, it can be stored in corresponding position;
6) single data value: when going to obtain single data according to condition, to the incoming required parameter of processing center, processing center The data present position is returned to, the location information and rule returned according to data center is it is known that the data are stored in that Then cluster, that library, that table go corresponding position value.
7) each cluster value more data queries: is gone using mapreduce according to condition.
CN201811470546.8A 2018-12-04 2018-12-04 A kind of mass data storage means based on mongodb Pending CN109344161A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811470546.8A CN109344161A (en) 2018-12-04 2018-12-04 A kind of mass data storage means based on mongodb

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811470546.8A CN109344161A (en) 2018-12-04 2018-12-04 A kind of mass data storage means based on mongodb

Publications (1)

Publication Number Publication Date
CN109344161A true CN109344161A (en) 2019-02-15

Family

ID=65319607

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811470546.8A Pending CN109344161A (en) 2018-12-04 2018-12-04 A kind of mass data storage means based on mongodb

Country Status (1)

Country Link
CN (1) CN109344161A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113849458A (en) * 2021-09-18 2021-12-28 四川长虹网络科技有限责任公司 MongoDB middleware, data storage method and data migration method

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103023970A (en) * 2012-11-15 2013-04-03 中国科学院计算机网络信息中心 Method and system for storing mass data of Internet of Things (IoT)
US20130332484A1 (en) * 2012-06-06 2013-12-12 Rackspace Us, Inc. Data Management and Indexing Across a Distributed Database
CN105426396A (en) * 2015-10-28 2016-03-23 深圳市万姓宗祠网络科技股份有限公司 Routing algorithm based database sharding method, system and middleware system
US20170161351A1 (en) * 2014-03-07 2017-06-08 Adobe Systems Incorporated Processing data in a distributed database across a plurality of clusters
CN106909556A (en) * 2015-12-23 2017-06-30 中国电信股份有限公司 The storage equalization methods and device of main memory cluster
CN107229688A (en) * 2017-05-12 2017-10-03 上海前隆金融信息服务有限公司 A kind of database level point storehouse point table method and system, server

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130332484A1 (en) * 2012-06-06 2013-12-12 Rackspace Us, Inc. Data Management and Indexing Across a Distributed Database
CN103023970A (en) * 2012-11-15 2013-04-03 中国科学院计算机网络信息中心 Method and system for storing mass data of Internet of Things (IoT)
US20170161351A1 (en) * 2014-03-07 2017-06-08 Adobe Systems Incorporated Processing data in a distributed database across a plurality of clusters
CN105426396A (en) * 2015-10-28 2016-03-23 深圳市万姓宗祠网络科技股份有限公司 Routing algorithm based database sharding method, system and middleware system
CN106909556A (en) * 2015-12-23 2017-06-30 中国电信股份有限公司 The storage equalization methods and device of main memory cluster
CN107229688A (en) * 2017-05-12 2017-10-03 上海前隆金融信息服务有限公司 A kind of database level point storehouse point table method and system, server

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113849458A (en) * 2021-09-18 2021-12-28 四川长虹网络科技有限责任公司 MongoDB middleware, data storage method and data migration method

Similar Documents

Publication Publication Date Title
EP3314477B1 (en) Systems and methods for parallelizing hash-based operators in smp databases
US20070016555A1 (en) Indexing method of database management system
US9317536B2 (en) System and methods for mapping and searching objects in multidimensional space
CN106250226B (en) Method for scheduling task and system based on consistency hash algorithm
US10606892B1 (en) Graph database super vertex partitioning
KR101928529B1 (en) Code Distributed Hash Table based MapReduce System and Method
US20140122484A1 (en) System and Method for Flexible Distributed Massively Parallel Processing (MPP) Database
CN107391554A (en) Efficient distributed local sensitivity hash method
CN103246749A (en) Matrix data base system for distributed computing and query method thereof
CN106599091B (en) RDF graph structure storage and index method based on key value storage
CN111723073B (en) Data storage processing method, device, processing system and storage medium
CN107209768A (en) Method and apparatus for the expansible sequence of data set
US20220318074A1 (en) System and method for structuring and accessing tenant data in a hierarchical multi-tenant environment
CN106815258A (en) A kind of date storage method and coordinator node
KR101255639B1 (en) Column-oriented database system and join process method using join index thereof
CN105550180B (en) The method, apparatus and system of data processing
US10289723B1 (en) Distributed union all queries
CN109344161A (en) A kind of mass data storage means based on mongodb
KR20130047042A (en) Data partitioning apparatus for distributed data storages and method thereof
CN107239568A (en) Distributed index implementation method and device
CN109117426A (en) Distributed networks database query method, apparatus, equipment and storage medium
KR100907533B1 (en) Distributed Distributed Processing Systems and Methods
US11086689B2 (en) Method for automatically and dynamically assigning the responsibility for tasks to the available computing components in a highly distributed data-processing system
CN108306976B (en) SDN controller architecture used in cloud computing network virtualization
CN106156197A (en) The querying method of a kind of data base and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination