CN109918425A - A kind of method and system realized data and import non-relational database - Google Patents
A kind of method and system realized data and import non-relational database Download PDFInfo
- Publication number
- CN109918425A CN109918425A CN201711339911.7A CN201711339911A CN109918425A CN 109918425 A CN109918425 A CN 109918425A CN 201711339911 A CN201711339911 A CN 201711339911A CN 109918425 A CN109918425 A CN 109918425A
- Authority
- CN
- China
- Prior art keywords
- data
- region
- imported
- line unit
- server
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of method and systems realized data and import non-relational database, are related to field of computer technology.One specific embodiment of this method includes: to format data to be imported, to generate the file of non-relational database storage format;The file of the non-relational database storage format is assigned in the region created according to the line unit of the data to be imported;Store the metadata in the region.Data efficient is imported non-relational database by the embodiment, and the process does not also occupy CPU and memory source, to not influence the use of non-relational database on line.
Description
Technical field
The present invention relates to field of computer technology more particularly to a kind of methods realized data and import non-relational database
And system.
Background technique
With the rapid development of network technology, can all there are a large amount of data to generate daily, for such huge data volume
Data, relevant database is unable to satisfy its storage, generally can all be stored in non-relational database NOSQL, such as
HBase database.HBase is that the open source of Google Bigtable is realized, is stored using Hadoop HDFS as its file
System.In the prior art, it can only be connect by the TableOutputFormat of the API Calls MapReduce of HBase database
Mouthful, the data of HBase database to be imported are generated into Put object, Put object is packaged into KeyValue object again.Then, lead to
It crosses RPC (Remote Procedure Call Protocol remote procedure call protocol) and KeyValue object is sent to region
Server regionserver, regionserver are according to the rowkey of the KeyValue object received by this KeyValue pairs
As giving different region region.Region write the data to first WAL (Write Ahead Log HBase database
Log mechanism is write in advance), after WAL is written successfully, write the data to memstore.When memstore is more than specific time or is reached
After particular size, HDFS is written into memstore, generates HFile file.
In realizing process of the present invention, at least there are the following problems in the prior art for inventor's discovery: a large amount of
KeyValue object enters region, will cause the continuous Split of region, also, the process of the importing data can only be on line
It carries out, seriously affects the stability of HBase database, the inquiry response of HBase on line is made to become slow;HBase data are written
The efficiency in library is too low, time-consuming serious, can not meet business demand;Because data to be imported are first write memstore by region
In, if delay machine occurs in server at this time, the data of write-in may will lose, although there is WAL, it is very tired to restore data
Difficult and timeliness is slower.
Summary of the invention
In view of this, the embodiment of the present invention provides a kind of method realized data and import non-relational database, can incite somebody to action
Data efficient imports non-relational database, because importing data to non-relational database under line, which does not have
Have and occupy CPU and memory source, to not influence the use of non-relational database on line.And it avoids because of server delay machine
The problem of leading to loss of data.
To achieve the above object, according to an aspect of an embodiment of the present invention, a kind of non-pass of realization data importing is provided
It is the method for type database.
The method that the realization data of the embodiment of the present invention import non-relational database includes: that data to be imported are carried out lattice
Formula conversion, to generate the file of non-relational database storage format;According to the line unit of the data to be imported, by the non-pass
It is that the file of type database purchase format is assigned in the region created;Store the metadata in the region.
Optionally, before the file of the non-relational database storage format is assigned in the region created,
Further include: it is handled according to initial line unit of the preset line unit hashing rule to data to be imported, it is to be imported to generate
The new line unit of data;Then, according to the new line unit of the data to be imported by the non-relational database storage format
File is assigned in the region created.
Optionally, described to be handled according to initial line unit of the preset line unit hashing rule to data to be imported
Step includes: to carry out MD5 operation to the initial line unit of the data to be imported to generate character string;By the continuous of the character string
Several prefixes as the initial line unit, with the new line unit of data to be imported described in combination producing.
Optionally, before the file of the non-relational database storage format is assigned in the region created,
Further include: determine the data volume of data to be imported and the data volume of region storage;According to the number of the data to be imported
The quantity in required region is determined according to the data volume of amount and the storage of a region;Create the region of the quantity.
Optionally, before the region for creating the quantity, further includes: the quantity of column cluster is determined according to business demand, and
Determine the memory of region server, the ratio of the occupied area memstore server memory, the occupied area memstore server
Memory;Determine the number of regions under each region server according to the following formula: (the memory * memstore of region server is accounted for
With the ratio of region server memory)/(quantity of the memory * column cluster of the occupied area memstore server);According to required
Number of regions under the quantity in region and each region server determines the number of required region server.
Optionally, the step of storing the metadata in the region includes: by the metadata storage in the region to-ROOT-
In .META. table.
To achieve the above object, according to an aspect of an embodiment of the present invention, a kind of non-pass of realization data importing is provided
It is the system of type database.
The system that the realization data of the embodiment of the present invention import non-relational database includes: format converting module, is used for
Data to be imported are formatted, to generate the file of non-relational database storage format;Distribution module is used for basis
The file of the non-relational database storage format is assigned in the region created by the line unit of the data to be imported;
Metadata storage module, for storing the metadata in the region.
Optionally, further includes: line unit generation module is used for according to preset line unit hashing rule to data to be imported
Initial line unit handled, to generate the new line unit of data to be imported;Also, the distribution module is according to described to be imported
The file of the non-relational database storage format is assigned in the region created by the new line unit of data.
Optionally, the line unit generation module is also used to: to the initial line units of the data to be imported carry out MD5 operation with
Generate character string;Also, by continuous several of the character string prefixes as the initial line unit, with described in combination producing to
Import the new line unit of data.
Optionally, further includes: region creation module, for determining the data volume and a region storage of data to be imported
Data volume;The number in required region is determined according to the data volume that the data volume of the data to be imported and a region store
Amount;Create the region of the quantity.
Optionally, the region creation module is also used to: being determined the quantity of column cluster according to business demand, and is determined region
The memory of server, the ratio of the occupied area memstore server memory, the occupied area memstore server memory;Root
Determine the number of regions under each region server according to following formula: (the memory occupied area * memstore of region server services
The ratio of device memory)/(quantity of the memory * column cluster of the occupied area memstore server);According to the quantity in required region
The number of required region server is determined with the number of regions under each region server.
Optionally, the metadata storage module is also used to, by the metadata in region storage to-ROOT- and
.META. in table.
To achieve the above object, according to an aspect of an embodiment of the present invention, a kind of electronic equipment is provided.
The a kind of electronic equipment of the embodiment of the present invention includes: one or more processors;Storage device, for storing one
Or multiple programs, when one or more of programs are executed by one or more of processors, so that one or more of
Processor realizes the method that the realization data of any of the above-described import non-relational database.
To achieve the above object, according to an aspect of an embodiment of the present invention, a kind of computer-readable medium is provided,
On be stored with computer program, when described program is executed by processor realize any of the above-described realization data import non-relational
The method of database.
One embodiment in foregoing invention has the following advantages that or the utility model has the advantages that because by the format of data to be imported
The data format of non-relational database storing data file is converted to, and creates region region, by data to be imported
The file of the non-relational database storage format of generation is assigned to the region created, and data efficient is imported non-relational
Database.It by taking Hbase database as an example, avoids and wal is first written into data in the prior art, after wal is written successfully, data
Memstore is written, after memstore reaches particular size, HDFS is written into memstore, non-relational database is generated and deposits
Store up many steps such as the file of format.And non-relational database is imported data under line, so the importing process does not account for
With CPU and memory source, to not influence the use of non-relational database on line.And it avoids because server delay machine is led
The problem of causing loss of data.
Further effect possessed by above-mentioned non-usual optional way adds hereinafter in conjunction with specific embodiment
With explanation.
Detailed description of the invention
Attached drawing for a better understanding of the present invention, does not constitute an undue limitation on the present invention.Wherein:
Fig. 1 is showing for the main flow for the method that realization data according to an embodiment of the present invention import non-relational database
It is intended to;
Fig. 2 is the schematic diagram of HFile file data structure;
Fig. 3 is the schematic diagram of-ROOT- and .META. table structure;
Fig. 4 is the signal of the preferred flow for the method that realization data according to an embodiment of the present invention import HBase database
Figure;
Fig. 5 is the schematic diagram of the data structure of KeyValue;
Fig. 6 is the row record of-ROOT- table according to an embodiment of the present invention;
Fig. 7 is the row record of .META. table according to an embodiment of the present invention;
Fig. 8 is showing for the main modular for the system that realization data according to an embodiment of the present invention import non-relational database
It is intended to;
Fig. 9 is that the embodiment of the present invention can be applied to exemplary system architecture figure therein;
Figure 10 is adapted for showing for the structure of the computer system of the terminal device or server of realizing the embodiment of the present invention
It is intended to.
Specific embodiment
Below in conjunction with attached drawing, an exemplary embodiment of the present invention will be described, including the various of the embodiment of the present invention
Details should think them only exemplary to help understanding.Therefore, those of ordinary skill in the art should recognize
It arrives, it can be with various changes and modifications are made to the embodiments described herein, without departing from scope and spirit of the present invention.Together
Sample, for clarity and conciseness, descriptions of well-known functions and structures are omitted from the following description.
Fig. 1 is showing for the main flow for the method that realization data according to an embodiment of the present invention import non-relational database
It is intended to, as shown in Figure 1, the method that the realization data of the embodiment of the present invention import non-relational database specifically includes that
Step S101: data to be imported are formatted, to generate the file of non-relational database storage format.
By taking HBase database as an example: all data files are stored on Hadoop HDFS in HBase database, and Hfile is
The storage format of KeyValue data in HBase database, HFile are the binary format files of Hadoop, and region
The perdurable data of region is stored by HFile.Then data to be imported are formatted, to generate HFile file.It is right
In the data structure of HFile file, as shown in Fig. 2, HFile is by Data, Meta, File Info, Data Index, Meta
Index, Trailer are constituted.Data block prevents the random digit of corrupted data and a series of KeyValue of serializings by one
Object composition.Meta block stores the metadata of HFile, is the value of KeyValue type, but only save Value, Key value is stored in
In Section 5 Meta Index.File Info block saves HFile relevant information, such as the average length of key, and value's is averaged
Length.Data index block saves position of each Data block in HFile, size, the key value of first cell of block.
Meta index block saves position of each metadata in HFile, size, the key value of metadata.Trailer block includes
Other pieces of direction of pointer, finds Meta index, Data index, Fileinfo by pointer, Trailer can be write
Enter to end of file.
Step S102: the file of non-relational database storage format is distributed according to the line unit rowkey of data to be imported
Into the region created.The region created is the memory space of non-relational database storing data.And for
Rowkey, rowkey are usually to be made of several fields of business meaning, if rowkey can be made to concentrate without hashing
In individual region, there is serious data skew, reduce retrieval rate.So before step S102, according to default
Line unit hashing rule the initial line unit of data to be imported is handled, to generate the new line unit of data to be imported.
Wherein, cryptographic calculation, such as MD5 operation can be carried out for the initial line unit to data to be imported by presetting line unit hashing rule.
Specifically, carrying out MD5 operation to the initial line unit of data to be imported to generate character string;By character string it is continuous several as to
The prefix of the initial line unit of data is imported, with the new line unit of combination producing data to be imported.MD5 is computer safety field list
To the algorithm of encryption, even if former one byte of data changes, the MD5 value of generation will be very different, therefore can be with
Reach hash effect well, is distributed generally uniformly in rowkey in each region.Original rowkey is hashed
After processing generates new rowkey, according to the new line unit of data to be imported by the file of non-relational database storage format
It is assigned in the region created.
Also, before step S102, required amount of region is created according to the data volume of data to be imported.Specifically,
Determine the data volume of data to be imported and the data volume of region storage;According to the data volume of data to be imported and one
The data volume of region storage determines the quantity in required region;Create the region of quantity.For example, in 10 days futures, daily
The data volume for importing data is 3000M, and the data volume that a region can increase storage for mono- day newly is 600M, then creates 5
region.The quantity of required region is determined by the above process, and creates the region of the data, it can be to number to be imported
According to data volume and the growth trend of following a period of time (storage time) assessed, with carry out in advance pre- zoning design and
The hash of rowkey efficiently uses the resource of region.
Before creating required amount of region, further includes: determine the quantity of column cluster according to business demand, and determine region
The memory of server, the ratio of the occupied area memstore server memory, the occupied area memstore server memory;Root
The number of regions under each region server is determined according to following formula:
(ratio of the occupied area the memory * memstore server memory of region server)/(occupied area memstore
The quantity of the memory * column cluster of server)
Needed for being determined according to the number of regions under the quantity in required region and each region server region server
Region server number.One region server can manage multiple region, can by region server
To find corresponding region, after region quantity determines, region server quantity also determines that.It is imported in inquiry
During data, which data need to be determined in region server1, determine which data in region server2
Deng.
Step S103: the metadata of storage region.Data are filled in the region created, if it is not known that HBase
To the metadata of region, (Metadata is also known as broker data, relaying data, for the data data about for describing data
Data mainly describes the information of data attribute property, for supporting such as instruction storage location, historical data, resource to look into
Look for, the functions such as file record), then the data for importing HBase database can not be inquired, so needing to store the region created
Metadata, and then could externally provide service.
The metadata of region is stored into-ROOT- and .META. table, because there are two metadata for Hbase database
Table be had recorded in-ROOT- and .META. ,-ROOT- and .META. region distribution situation and each region it is detailed
Information.Wherein ,-ROOT- table has determined-ROOT- table in design with all region information of .META. table, HBase is saved
Only one region, while guaranteeing never to be split..META. table is used to store the position region of actual user's table
Confidence breath.One three layers of class B+ tree query mode: first layer is stored on ZooKeeper, the region of preservation-ROOT- table
Information;The second layer searches the region of matched .META. table into-ROOT- table;Third layer retrieves user into .META. table
The region information of table.- ROOT- and .META. table structure have identical table structure, as shown in Figure 3.Wherein, RowKey is by three
It is grouped as: TableName, StartKey and TimeStamp.The part of RowKey storage is referred to as RegionName.It is arranged in table
It include three Column:regioninfo, server, serverstartcode inside race info, info.Wherein
Regioninfo is exactly the details of Region, including StartKey, EndKey and the information of each Family etc..
Server storage is exactly ip and the port for managing the RegionServer of this Region, serverstartcode storage
At the beginning of RegionServer handles region.When Region is split, merges or redistributes, require
To modify this table.
Fig. 4 is the signal of the preferred flow for the method that realization data according to an embodiment of the present invention import HBase database
Figure.
As shown in figure 4, the method that the realization data of the embodiment of the present invention import HBase database includes:
Step S401: line unit hashing rule, and the required amount of region of creation are determined.Establish these in advance
Region and corresponding region server, avoids the split of region on line, to guarantee that HBase being capable of high-performance pair
Outer offer service.
In order to give full play to the advantage of Distributed Parallel Computing, data hotspot query and hot spot is avoided to be written, by data
It imports before Hbase database, the growth trend of data volume and the data of following a period of time to data to be imported carries out
Assessment, and pre- zoning design and rowkey hashing are carried out in advance.It is calculated by the following formula a region server pipe
The region fair amount of reason:
AvaNum=((region server memory) * (memstore fraction))/((memstoresize) *
(column families num))
Wherein, avaNum is the region fair amount of region server management;
Region server memory is the memory of region server;
Memstore fraction is the ratio of the occupied area memstore server memory;
Memstore size is the memory of the occupied area memstore server;
Column families num is the quantity of column cluster, can be determined according to business demand difference, usually 1 or 2.
Such as the memory setting of region server is 16G, the ratio of the occupied area memstore server memory is
0.4, the occupied area memstore server 128M memory is defaulted, then reasonable region number under this region server
For 16384*0.4/ (128*1)=51.
The size of region should not be too large or too small, optimal to store region under high concurrent production environment
Change, size can be set as 5-10GB.Although the daily data volume of data to be imported is very big, do not need to save for a long time,
General storage 10 days, the data more than 10 days are just cleaned out, excessive to avoid region.At this point, a region is every
The data volume of its newly-increased storage can be 500M-1000M.
It is calculated by the following formula the quantity of required region:
RegionTotalnum=dataSetSize/avaregionSize
Wherein, regionTotalnum is the quantity of required region;
DataSetSize is the data volume of each data to be imported;
AvaregionSize is the data volume that a region is newly stored every time.
For example, import a data daily, the region of creation is for the data of storage 10 days, then data to be imported every time
Data volume be that data volume to be imported, the data volume that a region is newly stored every time are that a region is increased newly daily daily
The data volume of storage.If evaluating in 10 days following, the data volume of daily data to be imported is 3000M, and a region is every
The data volume of its newly-increased storage is 600M, then the quantity of required region is 3000/600=5.
It is calculated by the following formula the number of required regionServer:
RegionServerNum=regionTotalnum/avaNum
Wherein, regionServerNum is the number of required regionServer.
Step S402: data to be imported are formatted, to generate HFile file.KeyValue is HFile file
Not subdivisible minimum data unit, the storage of cell Cell is realized by KeyValue, as shown in figure 5, KeyValue is by Key
Length, Value Length, the most of composition of Key, Value tetra-.
Wherein, Key Length: the length of Key is stored, 4B is accounted for;
Value Length: the length of Value is stored, 4B is accounted for;
Key: it is made of Row Length, Row, Column Family Length, Column Family;
Row Length: the length of Row, the i.e. length of rowkey are stored, 2B is accounted for;
Row: storage Row actual content, i.e. rowkey, size are Row Length;
Column Family Length: the length of storage column cluster Column Family accounts for 1B;
Column Family: storage Column Family actual content, size are ColumnFamily Length;
Column Qualifier: storage Column Qualifier corresponding data;
Time Stamp: storage time stabs Time Stamp, accounts for 8B;
Key Type: storage Key type Key Type accounts for 1B, Type points be Put, Delete, DeleteColumn,
The types such as DeleteFamilyVersion, DeleteFamily mark the type of this KeyValue;Value: storage unit
The corresponding actual value Value of lattice Cell.
According to the format specification of HFile file and KeyValue structure, write Spark program, the data that will be imported into
Row Data Format Transform, to generate HFile file.
Step S403: being handled according to initial rowkey of the preset line unit hashing rule to data to be imported,
Generate the new rowkey of data to be imported.MD5 operation is carried out to initial rowkey, generates 32 character strings of an encryption,
Every 1 is made of 26 letters and 10 numbers, i.e. 36 kinds of possibility, and three just have 36*36*36=46656 kind possible, is made with three
It is used enough for the possible number of partitions (using the front three of character string as the prefix of initial rowkey).Then take the word of generation
Prefix of the front three of string as rowkey is accorded with, is distributed generally uniformly in rowkey in each region, is guaranteed each
Region had not only stored considerable data, but also the growth data for having remaining space reply following.
Step S404: HFile file is assigned in the region created according to the new line unit of data to be imported.One
A region is responsible for the data in a section of rowkey, according to the new line unit of data to be imported by the HFile file of generation
The region created accordingly is distributed to, while recording split-keys file (rowkey of each region storage),
Rowkey is arranged according to lexicographic order.By split-keys file can quickly obtain this region start_key and
The information such as end_key can quickly obtain the responsible section rowkey this region by start_key and end_key.
Step S405: the metadata of region is stored.In the data inquired in Hbase database, first time is accessed
When user Table, the regionserver where reading-ROOT-Table first from ZooKeeper;Then according to request
TableName, rowkey read regionserver where .META.Table from regionserver;Then from this
The content of .META.Table is read in regionserver and obtains the position where the region that this time request needs to access,
Finally according to the desired data of position acquisition where region.
- ROOT- is arrived in metadata (information such as list, state and position of the region) storage that the region of data will be imported
With .META. table ,-ROOT- stores the region information of .META. table as shown in fig. 6, owning in .META. storage user's table
The information of region is as shown in Figure 7.
Technical solution according to an embodiment of the present invention, because the format of data to be imported is converted to HBase database
The data format of the HFile file of storing data, and region region is created, the HFile text that data to be imported are generated
Part is assigned to the region created, and then avoids and wal is first written in data in the prior art, after wal is written successfully, number
According to write-in memstore, after memstore reaches particular size, HDFS is written into memstore, generates HFile file etc. very
Data efficient is imported Hbase database by multi-step, and the process does not also occupy CPU and memory source, to not influence
The use of HBase database on line.Also, due to the hashing of rowkey, solves and importing data to HBase database
When existing data skew, inquiry slowly, the problems such as scalability is poor.
Fig. 8 is showing for the main modular for the system that realization data according to an embodiment of the present invention import non-relational database
It is intended to.As shown in figure 8, the system 800 that the realization data of the embodiment of the present invention import non-relational database includes:
Format converting module 801: for formatting data to be imported, to generate non-relational database storage
The file of format.
Distribution module 802: for being divided the file of non-relational database storage format according to the line unit of data to be imported
It is fitted in the region created.The region created is the memory space of non-relational database storing data.The present invention is implemented
The system that the realization data of example import non-relational database further includes line unit generation module, for hashing according to preset line unit
Processing rule handles the initial line unit of data to be imported, to generate the new line unit of data to be imported;And distribute mould
The file of non-relational database storage format is assigned to the region created according to the new line unit of data to be imported by block 802
In.Line unit generation module is also used to: carrying out MD5 operation to the initial line unit of the data to be imported to generate character string;Also,
By the prefix of continuous several of the character string initial line units as data to be imported, with data to be imported described in combination producing
New line unit.
Metadata storage module 803: the metadata for storage region.Metadata storage module 803 is also used to, by region
Metadata store into-ROOT- and .META. table.
The system that the realization data of the embodiment of the present invention import non-relational database further include: region creation module is used
In the data volume that the data volume for determining data to be imported and a region store;According to the data volume of the data to be imported
The quantity in required region is determined with the data volume of region storage;Create the region of the quantity.The region creation
Module is also used to: being determined the quantity of column cluster according to business demand, and is determined the memory of region server, the used area memstore
The memory of the ratio of domain server memory, the occupied area memstore server;Each regional service is determined according to the following formula
Number of regions under device: (ratio of the occupied area the memory * memstore server memory of region server)/(memstore is accounted for
With the quantity of the memory * column cluster of region server);According to the number of regions under the quantity in required region and each region server
Determine the number of required region server.
Technical solution according to an embodiment of the present invention, because the format of data to be imported is converted to non-relational data
The data format of the file of the non-relational database storage format of library storing data, and region region is created, it will be wait lead
The file for the non-relational database storage format that the data entered generate is assigned to the region created, and then realizes data
Non-relational database is efficiently imported, and the importing process can carry out under line, CPU and memory source not occupied, thus not
Influence the use of non-relational database on line.By taking Hbase database as an example, the format of data to be imported is converted to
The data format of the HFile file of HBase data database storing, and region region is created, data to be imported are raw
At HFile file be assigned to the region created, and then avoid and data be first written into wal in the prior art, wal write-in
After success, memstore is write the data to, after memstore reaches particular size, HDFS is written into memstore, generated
Data efficient is imported Hbase database by many steps such as HFile file, and the process is also without occupying CPU and memory money
Source, to not influence the use of HBase database on line.Also, it due to the hashing of rowkey, solves and is led by data
Enter existing data skew when HBase database, inquire the problems such as slow, scalability is poor.
Fig. 9 shows the method or realization that non-relational database can be imported using the realization data of the embodiment of the present invention
Data import the exemplary system architecture 900 of the system of non-relational database.
As shown in figure 9, system architecture 900 may include terminal device 901,902,903, network 904 and server 905.
Network 904 between terminal device 901,902,903 and server 905 to provide the medium of communication link.Network 904 can be with
Including various connection types, such as wired, wireless communication link or fiber optic cables etc..
User can be used terminal device 901,902,903 and be interacted by network 904 with server 905, to receive or send out
Send message etc..Various telecommunication customer end applications, such as the application of shopping class, net can be installed on terminal device 901,902,903
(merely illustrative) such as the application of page browsing device, searching class application, instant messaging tools, mailbox client, social platform softwares.
Terminal device 901,902,903 can be the various electronic equipments with display screen and supported web page browsing, packet
Include but be not limited to smart phone, tablet computer, pocket computer on knee and desktop computer etc..
Server 905 can be to provide the server of various services, such as utilize terminal device 901,902,903 to user
The shopping class website browsed provides the back-stage management server (merely illustrative) supported.Back-stage management server can be to reception
To the data such as information query request carry out the processing such as analyzing, and processing result is fed back into terminal device.
It should be noted that realizing that the method for data importing non-relational database is general provided by the embodiment of the present invention
It is executed by server 905, correspondingly, the system for realizing that data import non-relational database is generally positioned in server 905.
It should be understood that the number of terminal device, network and server in Fig. 9 is only schematical.According to realization need
It wants, can have any number of terminal device, network and server.
Below with reference to Figure 10, it illustrates the computer systems for the terminal device for being suitable for being used to realize the embodiment of the present invention
1000 structural schematic diagram.Terminal device shown in Figure 10 is only an example, should not function to the embodiment of the present invention and
Use scope brings any restrictions.
As shown in Figure 10, computer system 1000 include central processing unit (CPU) 1001, can according to be stored in only
It reads the program in memory (ROM) 1002 or is loaded into random access storage device (RAM) 1003 from storage section 1008
Program and execute various movements appropriate and processing.In RAM 1003, also it is stored with system 1000 and operates required various journeys
Sequence and data.CPU 1001, ROM 1002 and RAM 1003 are connected with each other by bus 1004.Input/output (I/O) interface
1005 are also connected to bus 1004.
I/O interface 1005 is connected to lower component: the importation 1006 including keyboard, mouse etc.;Including such as cathode
The output par, c 1007 of ray tube (CRT), liquid crystal display (LCD) etc. and loudspeaker etc.;Storage section including hard disk etc.
1008;And the communications portion 1009 of the network interface card including LAN card, modem etc..Communications portion 1009 passes through
Communication process is executed by the network of such as internet.Driver 1010 is also connected to I/O interface 1005 as needed.It is detachable to be situated between
Matter 1011, such as disk, CD, magneto-optic disk, semiconductor memory etc. are mounted on as needed on driver 1010, so as to
In being mounted into storage section 1008 as needed from the computer program read thereon.
Particularly, disclosed embodiment, the process described above with reference to flow chart may be implemented as counting according to the present invention
Calculation machine software program.For example, embodiment disclosed by the invention includes a kind of computer program product comprising be carried on computer
Computer program on readable medium, the computer program include the program code for method shown in execution flow chart.?
In such embodiment, which can be downloaded and installed from network by communications portion 1009, and/or from can
Medium 1011 is dismantled to be mounted.When the computer program is executed by central processing unit (CPU) 1001, executes and of the invention be
The above-mentioned function of being limited in system.
It should be noted that computer-readable medium shown in the present invention can be computer-readable signal media or meter
Calculation machine readable storage medium storing program for executing either the two any combination.Computer readable storage medium for example can be --- but not
Be limited to --- electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor system, device or device, or any above combination.Meter
The more specific example of calculation machine readable storage medium storing program for executing can include but is not limited to: have the electrical connection, just of one or more conducting wires
Taking formula computer disk, hard disk, random access storage device (RAM), read-only memory (ROM), erasable type may be programmed read-only storage
Device (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic memory device,
Or above-mentioned any appropriate combination.In the present invention, computer readable storage medium can be it is any include or storage journey
The tangible medium of sequence, the program can be commanded execution system, device or device use or in connection.And at this
In invention, computer-readable signal media may include in a base band or as carrier wave a part propagate data-signal,
Wherein carry computer-readable program code.The data-signal of this propagation can take various forms, including but unlimited
In electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be that computer can
Any computer-readable medium other than storage medium is read, which can send, propagates or transmit and be used for
By the use of instruction execution system, device or device or program in connection.Include on computer-readable medium
Program code can transmit with any suitable medium, including but not limited to: wireless, electric wire, optical cable, RF etc. are above-mentioned
Any appropriate combination.
Flow chart and block diagram in attached drawing are illustrated according to the system of various embodiments of the invention, method and computer journey
The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation
A part of one module, program segment or code of table, a part of above-mentioned module, program segment or code include one or more
Executable instruction for implementing the specified logical function.It should also be noted that in some implementations as replacements, institute in box
The function of mark can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are practical
On can be basically executed in parallel, they can also be executed in the opposite order sometimes, and this depends on the function involved.Also it wants
It is noted that the combination of each box in block diagram or flow chart and the box in block diagram or flow chart, can use and execute rule
The dedicated hardware based systems of fixed functions or operations is realized, or can use the group of specialized hardware and computer instruction
It closes to realize.
Being described in module involved in the embodiment of the present invention can be realized by way of software, can also be by hard
The mode of part is realized.Described module also can be set in the processor, for example, can be described as: a kind of processor packet
Include format converting module, distribution module and metadata storage module.Wherein, the title of these units not structure under certain conditions
The restriction of the pairs of unit itself, for example, format converting module is also described as " data to be imported being carried out format to turn
It changes, to generate the module of the file of non-relational database storage format ".
As on the other hand, the present invention also provides a kind of computer-readable medium, which be can be
Included in equipment described in above-described embodiment;It is also possible to individualism, and without in the supplying equipment.Above-mentioned calculating
Machine readable medium carries one or more program, when said one or multiple programs are executed by the equipment, makes
Obtaining the equipment includes: to format data to be imported, to generate the file of non-relational database storage format;According to
The file of the non-relational database storage format is assigned in the region created by the line unit of the data to be imported;It deposits
Store up the metadata in the region.
Technical solution according to an embodiment of the present invention, because the format of data to be imported is converted to non-relational data
The data format of the file of the non-relational database storage format of library storing data, and region region is created, it will be wait lead
The file for the non-relational database storage format that the data entered generate is assigned to the region created, and then realizes data
Non-relational database is efficiently imported, and the importing process can carry out under line, CPU and memory source not occupied, thus not
Influence the use of non-relational database on line.By taking Hbase database as an example, because the format of data to be imported is converted to
The data format of the HFile file of HBase data database storing, and region region is created, data to be imported are raw
At HFile file be assigned to the region created, and then avoid and data be first written into wal in the prior art, wal write-in
After success, memstore is write the data to, after memstore reaches particular size, HDFS is written into memstore, generated
Data efficient is imported Hbase database by many steps such as HFile file, because importing data to HBase database under line,
So the importing process does not occupy CPU and memory source, to not influence the use of HBase database on line.And it avoids
The problem of because of server delay machine leading to loss of data.
Above-mentioned specific embodiment, does not constitute a limitation on the scope of protection of the present invention.Those skilled in the art should be bright
It is white, design requirement and other factors are depended on, various modifications, combination, sub-portfolio and substitution can occur.It is any
Made modifications, equivalent substitutions and improvements etc. within the spirit and principles in the present invention, should be included in the scope of the present invention
Within.
Claims (14)
1. a kind of method realized data and import non-relational database characterized by comprising
Data to be imported are formatted, to generate the file of non-relational database storage format;
According to the line unit of the data to be imported, the file of the non-relational database storage format is assigned to and has been created
In region;
Store the metadata in the region.
2. the method according to claim 1, wherein by the file of the non-relational database storage format
Before being assigned in the region created, further includes: according to preset line unit hashing rule to the initial of data to be imported
Line unit is handled, to generate the new line unit of data to be imported;
Then, the file of the non-relational database storage format is assigned to according to the new line unit of the data to be imported
In the region of creation.
3. according to the method described in claim 2, it is characterized in that, described treat according to preset line unit hashing rule is led
Enter the step of the initial line units of data is handled and includes:
MD5 operation is carried out to generate character string to the initial line unit of the data to be imported;
By continuous several of the character string prefixes as the initial line unit, with the new of data to be imported described in combination producing
Line unit.
4. the method according to claim 1, wherein by the file of the non-relational database storage format
Before being assigned in the region created, further includes:
Determine the data volume of data to be imported and the data volume of region storage;
The quantity in required region is determined according to the data volume that the data volume of the data to be imported and a region store;
Create the region of the quantity.
5. according to the method described in claim 4, it is characterized in that, before the region for creating the quantity, further includes:
The quantity of column cluster is determined according to business demand, and determines the memory of region server, the service of the occupied area memstore
The memory of the ratio of device memory, the occupied area memstore server;
The number of regions under each region server is determined according to the following formula:
(ratio of the occupied area the memory * memstore server memory of the region server)/(occupied area memstore service
The quantity of the memory * column cluster of device)
Of required region server is determined according to the number of regions under the quantity in required region and each region server
Number.
6. the method according to claim 1, wherein the step of storing the metadata in the region includes:
The metadata in the region is stored into-ROOT- and .META. table.
7. a kind of realize that data import the system that non-relational database is deposited characterized by comprising
Format converting module, for formatting data to be imported, to generate non-relational database storage format
File;
Distribution module, for the line unit according to the data to be imported, by the file of the non-relational database storage format
It is assigned in the region created;
Metadata storage module, for storing the metadata in the region.
8. system according to claim 7, which is characterized in that further include:
Line unit generation module, for according to preset line unit hashing rule to the initial line unit of data to be imported at
Reason, to generate the new line unit of data to be imported;And
The distribution module is according to the new line units of the data to be imported by the text of the non-relational database storage format
Part is assigned in the region created.
9. system according to claim 8, which is characterized in that the line unit generation module is also used to: to described to be imported
The initial line unit of data carries out MD5 operation to generate character string;Also, by continuous several of the character string as described initial
The prefix of line unit, with the new line unit of data to be imported described in combination producing.
10. system according to claim 7, which is characterized in that further include:
Region creation module, for determining the data volume of data to be imported and the data volume of region storage;According to described
The data volume of data to be imported and the data volume of region storage determine the quantity in required region;Create the area of the quantity
Domain.
11. system according to claim 10, which is characterized in that the region creation module is also used to: according to business need
Seek the quantity of determining column cluster, and determine the memory of region server, the occupied area memstore server memory ratio,
The memory of the occupied area memstore server;The number of regions under each region server is determined according to the following formula: (region clothes
The ratio of the occupied area the memory * memstore server memory of business device)/(the memory * of the occupied area memstore server is arranged
The quantity of cluster);Required regional service is determined according to the number of regions under the quantity in required region and each region server
The number of device.
12. system according to claim 7, which is characterized in that the metadata storage module is also used to, by the region
Metadata store into-ROOT- and .META. table.
13. a kind of electronic equipment characterized by comprising
One or more processors;
Storage device, for storing one or more programs,
When one or more of programs are executed by one or more of processors, so that one or more of processors are real
Now such as method as claimed in any one of claims 1 to 6.
14. a kind of computer-readable medium, is stored thereon with computer program, which is characterized in that described program is held by processor
Such as method as claimed in any one of claims 1 to 6 is realized when row.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711339911.7A CN109918425A (en) | 2017-12-14 | 2017-12-14 | A kind of method and system realized data and import non-relational database |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711339911.7A CN109918425A (en) | 2017-12-14 | 2017-12-14 | A kind of method and system realized data and import non-relational database |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109918425A true CN109918425A (en) | 2019-06-21 |
Family
ID=66959362
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711339911.7A Pending CN109918425A (en) | 2017-12-14 | 2017-12-14 | A kind of method and system realized data and import non-relational database |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109918425A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111309719A (en) * | 2020-05-13 | 2020-06-19 | 深圳市赢时胜信息技术股份有限公司 | Data standardization method and system corresponding to HBase database |
CN111651509A (en) * | 2020-04-30 | 2020-09-11 | 中国平安财产保险股份有限公司 | Data importing method and device based on Hbase database, electronic device and medium |
CN112307012A (en) * | 2019-07-30 | 2021-02-02 | 中科云谷科技有限公司 | Mass industrial data storage and reading method |
CN112612805A (en) * | 2020-12-24 | 2021-04-06 | 北京浪潮数据技术有限公司 | Method and related device for indexing hbase data to query engine |
CN113328888A (en) * | 2021-05-31 | 2021-08-31 | 上海明略人工智能(集团)有限公司 | Private domain flow ID processing method, system, medium and equipment |
CN114546989A (en) * | 2022-02-22 | 2022-05-27 | 重庆长安汽车股份有限公司 | Hbase incremental data migration system, method and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103488704A (en) * | 2013-09-06 | 2014-01-01 | 乐视致新电子科技(天津)有限公司 | Method and device for storing data |
US20140059017A1 (en) * | 2012-08-22 | 2014-02-27 | Bitvore Corp. | Data relationships storage platform |
CN103617211A (en) * | 2013-11-20 | 2014-03-05 | 浪潮电子信息产业股份有限公司 | HBase loaded data importing method |
CN104750757A (en) * | 2013-12-31 | 2015-07-01 | 中国移动通信集团公司 | Data storage method and equipment based on HBase |
US20160188797A1 (en) * | 2015-06-15 | 2016-06-30 | ANOME Inc. | Method and system for high-throughput sequencing data analysis |
-
2017
- 2017-12-14 CN CN201711339911.7A patent/CN109918425A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140059017A1 (en) * | 2012-08-22 | 2014-02-27 | Bitvore Corp. | Data relationships storage platform |
CN103488704A (en) * | 2013-09-06 | 2014-01-01 | 乐视致新电子科技(天津)有限公司 | Method and device for storing data |
CN103617211A (en) * | 2013-11-20 | 2014-03-05 | 浪潮电子信息产业股份有限公司 | HBase loaded data importing method |
CN104750757A (en) * | 2013-12-31 | 2015-07-01 | 中国移动通信集团公司 | Data storage method and equipment based on HBase |
US20160188797A1 (en) * | 2015-06-15 | 2016-06-30 | ANOME Inc. | Method and system for high-throughput sequencing data analysis |
Non-Patent Citations (1)
Title |
---|
吕明育: "Hadoop架构下数据挖掘与数据迁移系统的设计与实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112307012A (en) * | 2019-07-30 | 2021-02-02 | 中科云谷科技有限公司 | Mass industrial data storage and reading method |
CN111651509A (en) * | 2020-04-30 | 2020-09-11 | 中国平安财产保险股份有限公司 | Data importing method and device based on Hbase database, electronic device and medium |
CN111651509B (en) * | 2020-04-30 | 2024-04-02 | 中国平安财产保险股份有限公司 | Hbase database-based data importing method and device, electronic equipment and medium |
CN111309719A (en) * | 2020-05-13 | 2020-06-19 | 深圳市赢时胜信息技术股份有限公司 | Data standardization method and system corresponding to HBase database |
CN112612805A (en) * | 2020-12-24 | 2021-04-06 | 北京浪潮数据技术有限公司 | Method and related device for indexing hbase data to query engine |
CN112612805B (en) * | 2020-12-24 | 2023-12-22 | 北京浪潮数据技术有限公司 | Method for indexing hbase data to query engine and related device |
CN113328888A (en) * | 2021-05-31 | 2021-08-31 | 上海明略人工智能(集团)有限公司 | Private domain flow ID processing method, system, medium and equipment |
CN114546989A (en) * | 2022-02-22 | 2022-05-27 | 重庆长安汽车股份有限公司 | Hbase incremental data migration system, method and storage medium |
CN114546989B (en) * | 2022-02-22 | 2024-04-12 | 重庆长安汽车股份有限公司 | Hbase incremental data migration system, method and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109918425A (en) | A kind of method and system realized data and import non-relational database | |
CN109189835A (en) | The method and apparatus of the wide table of data are generated in real time | |
CN105138592B (en) | A kind of daily record data storage and search method based on distributed structure/architecture | |
US11277498B2 (en) | Method, apparatus and system for processing data | |
CN110019350A (en) | Data query method and apparatus based on configuration information | |
CN109413127A (en) | A kind of method of data synchronization and device | |
CN110472207A (en) | List generation method and device | |
CN109683998A (en) | Internationalize implementation method, device and system | |
CN109947668A (en) | The method and apparatus of storing data | |
CN110019125A (en) | The method and apparatus of data base administration | |
CN105989076A (en) | Data statistical method and device | |
CN110427438A (en) | Data processing method and its device, electronic equipment and medium | |
CN109614402A (en) | Multidimensional data query method and device | |
CN108629029A (en) | A kind of data processing method and device applied to data warehouse | |
CN109388654A (en) | A kind of method and apparatus for inquiring tables of data | |
CN102609464A (en) | Method and device for associative table query of MONGODB shards | |
CN109002440A (en) | Method, apparatus and system for big data multidimensional analysis | |
CN110019539A (en) | A kind of method and apparatus that the data of data warehouse are synchronous | |
CN110019062A (en) | Method of data synchronization and system | |
CN110334145A (en) | The method and apparatus of data processing | |
CN107480205A (en) | A kind of method and apparatus for carrying out data partition | |
CN113312355A (en) | Data management method and device | |
CN109299059A (en) | File storage, search method, device, storage medium and server | |
CN110109912A (en) | A kind of identifier generation method and device | |
CN110020373A (en) | The method and apparatus that static page is stored, browsed |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190621 |
|
RJ01 | Rejection of invention patent application after publication |