CN109918425A

CN109918425A - A kind of method and system realized data and import non-relational database

Info

Publication number: CN109918425A
Application number: CN201711339911.7A
Authority: CN
Inventors: 李海龙; 王媛; 彭红晓
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Priority date: 2017-12-14
Filing date: 2017-12-14
Publication date: 2019-06-21

Abstract

The invention discloses a kind of method and systems realized data and import non-relational database, are related to field of computer technology.One specific embodiment of this method includes: to format data to be imported, to generate the file of non-relational database storage format；The file of the non-relational database storage format is assigned in the region created according to the line unit of the data to be imported；Store the metadata in the region.Data efficient is imported non-relational database by the embodiment, and the process does not also occupy CPU and memory source, to not influence the use of non-relational database on line.

Description

A kind of method and system realized data and import non-relational database

Technical field

The present invention relates to field of computer technology more particularly to a kind of methods realized data and import non-relational database And system.

Background technique

With the rapid development of network technology, can all there are a large amount of data to generate daily, for such huge data volume Data, relevant database is unable to satisfy its storage, generally can all be stored in non-relational database NOSQL, such as HBase database.HBase is that the open source of Google Bigtable is realized, is stored using Hadoop HDFS as its file System.In the prior art, it can only be connect by the TableOutputFormat of the API Calls MapReduce of HBase database Mouthful, the data of HBase database to be imported are generated into Put object, Put object is packaged into KeyValue object again.Then, lead to It crosses RPC (Remote Procedure Call Protocol remote procedure call protocol) and KeyValue object is sent to region Server regionserver, regionserver are according to the rowkey of the KeyValue object received by this KeyValue pairs As giving different region region.Region write the data to first WAL (Write Ahead Log HBase database Log mechanism is write in advance), after WAL is written successfully, write the data to memstore.When memstore is more than specific time or is reached After particular size, HDFS is written into memstore, generates HFile file.

In realizing process of the present invention, at least there are the following problems in the prior art for inventor's discovery: a large amount of KeyValue object enters region, will cause the continuous Split of region, also, the process of the importing data can only be on line It carries out, seriously affects the stability of HBase database, the inquiry response of HBase on line is made to become slow；HBase data are written The efficiency in library is too low, time-consuming serious, can not meet business demand；Because data to be imported are first write memstore by region In, if delay machine occurs in server at this time, the data of write-in may will lose, although there is WAL, it is very tired to restore data Difficult and timeliness is slower.

Summary of the invention

In view of this, the embodiment of the present invention provides a kind of method realized data and import non-relational database, can incite somebody to action Data efficient imports non-relational database, because importing data to non-relational database under line, which does not have Have and occupy CPU and memory source, to not influence the use of non-relational database on line.And it avoids because of server delay machine The problem of leading to loss of data.

To achieve the above object, according to an aspect of an embodiment of the present invention, a kind of non-pass of realization data importing is provided It is the method for type database.

The method that the realization data of the embodiment of the present invention import non-relational database includes: that data to be imported are carried out lattice Formula conversion, to generate the file of non-relational database storage format；According to the line unit of the data to be imported, by the non-pass It is that the file of type database purchase format is assigned in the region created；Store the metadata in the region.

Optionally, before the file of the non-relational database storage format is assigned in the region created, Further include: it is handled according to initial line unit of the preset line unit hashing rule to data to be imported, it is to be imported to generate The new line unit of data；Then, according to the new line unit of the data to be imported by the non-relational database storage format File is assigned in the region created.

Optionally, described to be handled according to initial line unit of the preset line unit hashing rule to data to be imported Step includes: to carry out MD5 operation to the initial line unit of the data to be imported to generate character string；By the continuous of the character string Several prefixes as the initial line unit, with the new line unit of data to be imported described in combination producing.

Optionally, before the file of the non-relational database storage format is assigned in the region created, Further include: determine the data volume of data to be imported and the data volume of region storage；According to the number of the data to be imported The quantity in required region is determined according to the data volume of amount and the storage of a region；Create the region of the quantity.

Optionally, before the region for creating the quantity, further includes: the quantity of column cluster is determined according to business demand, and Determine the memory of region server, the ratio of the occupied area memstore server memory, the occupied area memstore server Memory；Determine the number of regions under each region server according to the following formula: (the memory * memstore of region server is accounted for With the ratio of region server memory)/(quantity of the memory * column cluster of the occupied area memstore server)；According to required Number of regions under the quantity in region and each region server determines the number of required region server.

Optionally, the step of storing the metadata in the region includes: by the metadata storage in the region to-ROOT- In .META. table.

To achieve the above object, according to an aspect of an embodiment of the present invention, a kind of non-pass of realization data importing is provided It is the system of type database.

The system that the realization data of the embodiment of the present invention import non-relational database includes: format converting module, is used for Data to be imported are formatted, to generate the file of non-relational database storage format；Distribution module is used for basis The file of the non-relational database storage format is assigned in the region created by the line unit of the data to be imported； Metadata storage module, for storing the metadata in the region.

Optionally, further includes: line unit generation module is used for according to preset line unit hashing rule to data to be imported Initial line unit handled, to generate the new line unit of data to be imported；Also, the distribution module is according to described to be imported The file of the non-relational database storage format is assigned in the region created by the new line unit of data.

Optionally, the line unit generation module is also used to: to the initial line units of the data to be imported carry out MD5 operation with Generate character string；Also, by continuous several of the character string prefixes as the initial line unit, with described in combination producing to Import the new line unit of data.

Optionally, further includes: region creation module, for determining the data volume and a region storage of data to be imported Data volume；The number in required region is determined according to the data volume that the data volume of the data to be imported and a region store Amount；Create the region of the quantity.

Optionally, the region creation module is also used to: being determined the quantity of column cluster according to business demand, and is determined region The memory of server, the ratio of the occupied area memstore server memory, the occupied area memstore server memory；Root Determine the number of regions under each region server according to following formula: (the memory occupied area * memstore of region server services The ratio of device memory)/(quantity of the memory * column cluster of the occupied area memstore server)；According to the quantity in required region The number of required region server is determined with the number of regions under each region server.

Optionally, the metadata storage module is also used to, by the metadata in region storage to-ROOT- and .META. in table.

To achieve the above object, according to an aspect of an embodiment of the present invention, a kind of electronic equipment is provided.

The a kind of electronic equipment of the embodiment of the present invention includes: one or more processors；Storage device, for storing one Or multiple programs, when one or more of programs are executed by one or more of processors, so that one or more of Processor realizes the method that the realization data of any of the above-described import non-relational database.

To achieve the above object, according to an aspect of an embodiment of the present invention, a kind of computer-readable medium is provided, On be stored with computer program, when described program is executed by processor realize any of the above-described realization data import non-relational The method of database.

One embodiment in foregoing invention has the following advantages that or the utility model has the advantages that because by the format of data to be imported The data format of non-relational database storing data file is converted to, and creates region region, by data to be imported The file of the non-relational database storage format of generation is assigned to the region created, and data efficient is imported non-relational Database.It by taking Hbase database as an example, avoids and wal is first written into data in the prior art, after wal is written successfully, data Memstore is written, after memstore reaches particular size, HDFS is written into memstore, non-relational database is generated and deposits Store up many steps such as the file of format.And non-relational database is imported data under line, so the importing process does not account for With CPU and memory source, to not influence the use of non-relational database on line.And it avoids because server delay machine is led The problem of causing loss of data.

Further effect possessed by above-mentioned non-usual optional way adds hereinafter in conjunction with specific embodiment With explanation.

Detailed description of the invention

Attached drawing for a better understanding of the present invention, does not constitute an undue limitation on the present invention.Wherein:

Fig. 1 is showing for the main flow for the method that realization data according to an embodiment of the present invention import non-relational database It is intended to；

Fig. 2 is the schematic diagram of HFile file data structure；

Fig. 3 is the schematic diagram of-ROOT- and .META. table structure；

Fig. 4 is the signal of the preferred flow for the method that realization data according to an embodiment of the present invention import HBase database Figure；

Fig. 5 is the schematic diagram of the data structure of KeyValue；

Fig. 6 is the row record of-ROOT- table according to an embodiment of the present invention；

Fig. 7 is the row record of .META. table according to an embodiment of the present invention；

Fig. 8 is showing for the main modular for the system that realization data according to an embodiment of the present invention import non-relational database It is intended to；

Fig. 9 is that the embodiment of the present invention can be applied to exemplary system architecture figure therein；

Figure 10 is adapted for showing for the structure of the computer system of the terminal device or server of realizing the embodiment of the present invention It is intended to.

Specific embodiment

Below in conjunction with attached drawing, an exemplary embodiment of the present invention will be described, including the various of the embodiment of the present invention Details should think them only exemplary to help understanding.Therefore, those of ordinary skill in the art should recognize It arrives, it can be with various changes and modifications are made to the embodiments described herein, without departing from scope and spirit of the present invention.Together Sample, for clarity and conciseness, descriptions of well-known functions and structures are omitted from the following description.

Fig. 1 is showing for the main flow for the method that realization data according to an embodiment of the present invention import non-relational database It is intended to, as shown in Figure 1, the method that the realization data of the embodiment of the present invention import non-relational database specifically includes that

Step S101: data to be imported are formatted, to generate the file of non-relational database storage format. By taking HBase database as an example: all data files are stored on Hadoop HDFS in HBase database, and Hfile is The storage format of KeyValue data in HBase database, HFile are the binary format files of Hadoop, and region The perdurable data of region is stored by HFile.Then data to be imported are formatted, to generate HFile file.It is right In the data structure of HFile file, as shown in Fig. 2, HFile is by Data, Meta, File Info, Data Index, Meta Index, Trailer are constituted.Data block prevents the random digit of corrupted data and a series of KeyValue of serializings by one Object composition.Meta block stores the metadata of HFile, is the value of KeyValue type, but only save Value, Key value is stored in In Section 5 Meta Index.File Info block saves HFile relevant information, such as the average length of key, and value's is averaged Length.Data index block saves position of each Data block in HFile, size, the key value of first cell of block. Meta index block saves position of each metadata in HFile, size, the key value of metadata.Trailer block includes Other pieces of direction of pointer, finds Meta index, Data index, Fileinfo by pointer, Trailer can be write Enter to end of file.

Step S102: the file of non-relational database storage format is distributed according to the line unit rowkey of data to be imported Into the region created.The region created is the memory space of non-relational database storing data.And for Rowkey, rowkey are usually to be made of several fields of business meaning, if rowkey can be made to concentrate without hashing In individual region, there is serious data skew, reduce retrieval rate.So before step S102, according to default Line unit hashing rule the initial line unit of data to be imported is handled, to generate the new line unit of data to be imported. Wherein, cryptographic calculation, such as MD5 operation can be carried out for the initial line unit to data to be imported by presetting line unit hashing rule. Specifically, carrying out MD5 operation to the initial line unit of data to be imported to generate character string；By character string it is continuous several as to The prefix of the initial line unit of data is imported, with the new line unit of combination producing data to be imported.MD5 is computer safety field list To the algorithm of encryption, even if former one byte of data changes, the MD5 value of generation will be very different, therefore can be with Reach hash effect well, is distributed generally uniformly in rowkey in each region.Original rowkey is hashed After processing generates new rowkey, according to the new line unit of data to be imported by the file of non-relational database storage format It is assigned in the region created.

Also, before step S102, required amount of region is created according to the data volume of data to be imported.Specifically, Determine the data volume of data to be imported and the data volume of region storage；According to the data volume of data to be imported and one The data volume of region storage determines the quantity in required region；Create the region of quantity.For example, in 10 days futures, daily The data volume for importing data is 3000M, and the data volume that a region can increase storage for mono- day newly is 600M, then creates 5 region.The quantity of required region is determined by the above process, and creates the region of the data, it can be to number to be imported According to data volume and the growth trend of following a period of time (storage time) assessed, with carry out in advance pre- zoning design and The hash of rowkey efficiently uses the resource of region.

Before creating required amount of region, further includes: determine the quantity of column cluster according to business demand, and determine region The memory of server, the ratio of the occupied area memstore server memory, the occupied area memstore server memory；Root The number of regions under each region server is determined according to following formula:

(ratio of the occupied area the memory * memstore server memory of region server)/(occupied area memstore The quantity of the memory * column cluster of server)

Needed for being determined according to the number of regions under the quantity in required region and each region server region server Region server number.One region server can manage multiple region, can by region server To find corresponding region, after region quantity determines, region server quantity also determines that.It is imported in inquiry During data, which data need to be determined in region server1, determine which data in region server2 Deng.

Step S103: the metadata of storage region.Data are filled in the region created, if it is not known that HBase To the metadata of region, (Metadata is also known as broker data, relaying data, for the data data about for describing data Data mainly describes the information of data attribute property, for supporting such as instruction storage location, historical data, resource to look into Look for, the functions such as file record), then the data for importing HBase database can not be inquired, so needing to store the region created Metadata, and then could externally provide service.

The metadata of region is stored into-ROOT- and .META. table, because there are two metadata for Hbase database Table be had recorded in-ROOT- and .META. ,-ROOT- and .META. region distribution situation and each region it is detailed Information.Wherein ,-ROOT- table has determined-ROOT- table in design with all region information of .META. table, HBase is saved Only one region, while guaranteeing never to be split..META. table is used to store the position region of actual user's table Confidence breath.One three layers of class B+ tree query mode: first layer is stored on ZooKeeper, the region of preservation-ROOT- table Information；The second layer searches the region of matched .META. table into-ROOT- table；Third layer retrieves user into .META. table The region information of table.- ROOT- and .META. table structure have identical table structure, as shown in Figure 3.Wherein, RowKey is by three It is grouped as: TableName, StartKey and TimeStamp.The part of RowKey storage is referred to as RegionName.It is arranged in table It include three Column:regioninfo, server, serverstartcode inside race info, info.Wherein Regioninfo is exactly the details of Region, including StartKey, EndKey and the information of each Family etc.. Server storage is exactly ip and the port for managing the RegionServer of this Region, serverstartcode storage At the beginning of RegionServer handles region.When Region is split, merges or redistributes, require To modify this table.

Fig. 4 is the signal of the preferred flow for the method that realization data according to an embodiment of the present invention import HBase database Figure.

As shown in figure 4, the method that the realization data of the embodiment of the present invention import HBase database includes:

Step S401: line unit hashing rule, and the required amount of region of creation are determined.Establish these in advance Region and corresponding region server, avoids the split of region on line, to guarantee that HBase being capable of high-performance pair Outer offer service.

In order to give full play to the advantage of Distributed Parallel Computing, data hotspot query and hot spot is avoided to be written, by data It imports before Hbase database, the growth trend of data volume and the data of following a period of time to data to be imported carries out Assessment, and pre- zoning design and rowkey hashing are carried out in advance.It is calculated by the following formula a region server pipe The region fair amount of reason:

AvaNum=((region server memory) * (memstore fraction))/((memstoresize) * (column families num))

Wherein, avaNum is the region fair amount of region server management；

Region server memory is the memory of region server；

Memstore fraction is the ratio of the occupied area memstore server memory；

Memstore size is the memory of the occupied area memstore server；

Column families num is the quantity of column cluster, can be determined according to business demand difference, usually 1 or 2.

Such as the memory setting of region server is 16G, the ratio of the occupied area memstore server memory is 0.4, the occupied area memstore server 128M memory is defaulted, then reasonable region number under this region server For 16384*0.4/ (128*1)=51.

The size of region should not be too large or too small, optimal to store region under high concurrent production environment Change, size can be set as 5-10GB.Although the daily data volume of data to be imported is very big, do not need to save for a long time, General storage 10 days, the data more than 10 days are just cleaned out, excessive to avoid region.At this point, a region is every The data volume of its newly-increased storage can be 500M-1000M.

It is calculated by the following formula the quantity of required region:

RegionTotalnum=dataSetSize/avaregionSize

Wherein, regionTotalnum is the quantity of required region；

DataSetSize is the data volume of each data to be imported；

AvaregionSize is the data volume that a region is newly stored every time.

For example, import a data daily, the region of creation is for the data of storage 10 days, then data to be imported every time Data volume be that data volume to be imported, the data volume that a region is newly stored every time are that a region is increased newly daily daily The data volume of storage.If evaluating in 10 days following, the data volume of daily data to be imported is 3000M, and a region is every The data volume of its newly-increased storage is 600M, then the quantity of required region is 3000/600=5.

It is calculated by the following formula the number of required regionServer:

RegionServerNum=regionTotalnum/avaNum

Wherein, regionServerNum is the number of required regionServer.

Step S402: data to be imported are formatted, to generate HFile file.KeyValue is HFile file Not subdivisible minimum data unit, the storage of cell Cell is realized by KeyValue, as shown in figure 5, KeyValue is by Key Length, Value Length, the most of composition of Key, Value tetra-.

Wherein, Key Length: the length of Key is stored, 4B is accounted for；

Value Length: the length of Value is stored, 4B is accounted for；

Key: it is made of Row Length, Row, Column Family Length, Column Family；

Row Length: the length of Row, the i.e. length of rowkey are stored, 2B is accounted for；

Row: storage Row actual content, i.e. rowkey, size are Row Length；

Column Family Length: the length of storage column cluster Column Family accounts for 1B；

Column Family: storage Column Family actual content, size are ColumnFamily Length；

Column Qualifier: storage Column Qualifier corresponding data；

Time Stamp: storage time stabs Time Stamp, accounts for 8B；

Key Type: storage Key type Key Type accounts for 1B, Type points be Put, Delete, DeleteColumn, The types such as DeleteFamilyVersion, DeleteFamily mark the type of this KeyValue；Value: storage unit The corresponding actual value Value of lattice Cell.

According to the format specification of HFile file and KeyValue structure, write Spark program, the data that will be imported into Row Data Format Transform, to generate HFile file.

Step S403: being handled according to initial rowkey of the preset line unit hashing rule to data to be imported, Generate the new rowkey of data to be imported.MD5 operation is carried out to initial rowkey, generates 32 character strings of an encryption, Every 1 is made of 26 letters and 10 numbers, i.e. 36 kinds of possibility, and three just have 36*36*36=46656 kind possible, is made with three It is used enough for the possible number of partitions (using the front three of character string as the prefix of initial rowkey).Then take the word of generation Prefix of the front three of string as rowkey is accorded with, is distributed generally uniformly in rowkey in each region, is guaranteed each Region had not only stored considerable data, but also the growth data for having remaining space reply following.

Step S404: HFile file is assigned in the region created according to the new line unit of data to be imported.One A region is responsible for the data in a section of rowkey, according to the new line unit of data to be imported by the HFile file of generation The region created accordingly is distributed to, while recording split-keys file (rowkey of each region storage), Rowkey is arranged according to lexicographic order.By split-keys file can quickly obtain this region start_key and The information such as end_key can quickly obtain the responsible section rowkey this region by start_key and end_key.

Step S405: the metadata of region is stored.In the data inquired in Hbase database, first time is accessed When user Table, the regionserver where reading-ROOT-Table first from ZooKeeper；Then according to request TableName, rowkey read regionserver where .META.Table from regionserver；Then from this The content of .META.Table is read in regionserver and obtains the position where the region that this time request needs to access, Finally according to the desired data of position acquisition where region.

- ROOT- is arrived in metadata (information such as list, state and position of the region) storage that the region of data will be imported With .META. table ,-ROOT- stores the region information of .META. table as shown in fig. 6, owning in .META. storage user's table The information of region is as shown in Figure 7.

Technical solution according to an embodiment of the present invention, because the format of data to be imported is converted to HBase database The data format of the HFile file of storing data, and region region is created, the HFile text that data to be imported are generated Part is assigned to the region created, and then avoids and wal is first written in data in the prior art, after wal is written successfully, number According to write-in memstore, after memstore reaches particular size, HDFS is written into memstore, generates HFile file etc. very Data efficient is imported Hbase database by multi-step, and the process does not also occupy CPU and memory source, to not influence The use of HBase database on line.Also, due to the hashing of rowkey, solves and importing data to HBase database When existing data skew, inquiry slowly, the problems such as scalability is poor.

Fig. 8 is showing for the main modular for the system that realization data according to an embodiment of the present invention import non-relational database It is intended to.As shown in figure 8, the system 800 that the realization data of the embodiment of the present invention import non-relational database includes:

Format converting module 801: for formatting data to be imported, to generate non-relational database storage The file of format.

Distribution module 802: for being divided the file of non-relational database storage format according to the line unit of data to be imported It is fitted in the region created.The region created is the memory space of non-relational database storing data.The present invention is implemented The system that the realization data of example import non-relational database further includes line unit generation module, for hashing according to preset line unit Processing rule handles the initial line unit of data to be imported, to generate the new line unit of data to be imported；And distribute mould The file of non-relational database storage format is assigned to the region created according to the new line unit of data to be imported by block 802 In.Line unit generation module is also used to: carrying out MD5 operation to the initial line unit of the data to be imported to generate character string；Also, By the prefix of continuous several of the character string initial line units as data to be imported, with data to be imported described in combination producing New line unit.

Metadata storage module 803: the metadata for storage region.Metadata storage module 803 is also used to, by region Metadata store into-ROOT- and .META. table.

The system that the realization data of the embodiment of the present invention import non-relational database further include: region creation module is used In the data volume that the data volume for determining data to be imported and a region store；According to the data volume of the data to be imported The quantity in required region is determined with the data volume of region storage；Create the region of the quantity.The region creation Module is also used to: being determined the quantity of column cluster according to business demand, and is determined the memory of region server, the used area memstore The memory of the ratio of domain server memory, the occupied area memstore server；Each regional service is determined according to the following formula Number of regions under device: (ratio of the occupied area the memory * memstore server memory of region server)/(memstore is accounted for With the quantity of the memory * column cluster of region server)；According to the number of regions under the quantity in required region and each region server Determine the number of required region server.

Technical solution according to an embodiment of the present invention, because the format of data to be imported is converted to non-relational data The data format of the file of the non-relational database storage format of library storing data, and region region is created, it will be wait lead The file for the non-relational database storage format that the data entered generate is assigned to the region created, and then realizes data Non-relational database is efficiently imported, and the importing process can carry out under line, CPU and memory source not occupied, thus not Influence the use of non-relational database on line.By taking Hbase database as an example, the format of data to be imported is converted to The data format of the HFile file of HBase data database storing, and region region is created, data to be imported are raw At HFile file be assigned to the region created, and then avoid and data be first written into wal in the prior art, wal write-in After success, memstore is write the data to, after memstore reaches particular size, HDFS is written into memstore, generated Data efficient is imported Hbase database by many steps such as HFile file, and the process is also without occupying CPU and memory money Source, to not influence the use of HBase database on line.Also, it due to the hashing of rowkey, solves and is led by data Enter existing data skew when HBase database, inquire the problems such as slow, scalability is poor.

Fig. 9 shows the method or realization that non-relational database can be imported using the realization data of the embodiment of the present invention Data import the exemplary system architecture 900 of the system of non-relational database.

As shown in figure 9, system architecture 900 may include terminal device 901,902,903, network 904 and server 905. Network 904 between terminal device 901,902,903 and server 905 to provide the medium of communication link.Network 904 can be with Including various connection types, such as wired, wireless communication link or fiber optic cables etc..

User can be used terminal device 901,902,903 and be interacted by network 904 with server 905, to receive or send out Send message etc..Various telecommunication customer end applications, such as the application of shopping class, net can be installed on terminal device 901,902,903 (merely illustrative) such as the application of page browsing device, searching class application, instant messaging tools, mailbox client, social platform softwares.

Terminal device 901,902,903 can be the various electronic equipments with display screen and supported web page browsing, packet Include but be not limited to smart phone, tablet computer, pocket computer on knee and desktop computer etc..

Server 905 can be to provide the server of various services, such as utilize terminal device 901,902,903 to user The shopping class website browsed provides the back-stage management server (merely illustrative) supported.Back-stage management server can be to reception To the data such as information query request carry out the processing such as analyzing, and processing result is fed back into terminal device.

It should be noted that realizing that the method for data importing non-relational database is general provided by the embodiment of the present invention It is executed by server 905, correspondingly, the system for realizing that data import non-relational database is generally positioned in server 905.

It should be understood that the number of terminal device, network and server in Fig. 9 is only schematical.According to realization need It wants, can have any number of terminal device, network and server.

Below with reference to Figure 10, it illustrates the computer systems for the terminal device for being suitable for being used to realize the embodiment of the present invention 1000 structural schematic diagram.Terminal device shown in Figure 10 is only an example, should not function to the embodiment of the present invention and Use scope brings any restrictions.

As shown in Figure 10, computer system 1000 include central processing unit (CPU) 1001, can according to be stored in only It reads the program in memory (ROM) 1002 or is loaded into random access storage device (RAM) 1003 from storage section 1008 Program and execute various movements appropriate and processing.In RAM 1003, also it is stored with system 1000 and operates required various journeys Sequence and data.CPU 1001, ROM 1002 and RAM 1003 are connected with each other by bus 1004.Input/output (I/O) interface 1005 are also connected to bus 1004.

I/O interface 1005 is connected to lower component: the importation 1006 including keyboard, mouse etc.；Including such as cathode The output par, c 1007 of ray tube (CRT), liquid crystal display (LCD) etc. and loudspeaker etc.；Storage section including hard disk etc. 1008；And the communications portion 1009 of the network interface card including LAN card, modem etc..Communications portion 1009 passes through Communication process is executed by the network of such as internet.Driver 1010 is also connected to I/O interface 1005 as needed.It is detachable to be situated between Matter 1011, such as disk, CD, magneto-optic disk, semiconductor memory etc. are mounted on as needed on driver 1010, so as to In being mounted into storage section 1008 as needed from the computer program read thereon.

Particularly, disclosed embodiment, the process described above with reference to flow chart may be implemented as counting according to the present invention Calculation machine software program.For example, embodiment disclosed by the invention includes a kind of computer program product comprising be carried on computer Computer program on readable medium, the computer program include the program code for method shown in execution flow chart.? In such embodiment, which can be downloaded and installed from network by communications portion 1009, and/or from can Medium 1011 is dismantled to be mounted.When the computer program is executed by central processing unit (CPU) 1001, executes and of the invention be The above-mentioned function of being limited in system.

It should be noted that computer-readable medium shown in the present invention can be computer-readable signal media or meter Calculation machine readable storage medium storing program for executing either the two any combination.Computer readable storage medium for example can be --- but not Be limited to --- electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor system, device or device, or any above combination.Meter The more specific example of calculation machine readable storage medium storing program for executing can include but is not limited to: have the electrical connection, just of one or more conducting wires Taking formula computer disk, hard disk, random access storage device (RAM), read-only memory (ROM), erasable type may be programmed read-only storage Device (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic memory device, Or above-mentioned any appropriate combination.In the present invention, computer readable storage medium can be it is any include or storage journey The tangible medium of sequence, the program can be commanded execution system, device or device use or in connection.And at this In invention, computer-readable signal media may include in a base band or as carrier wave a part propagate data-signal, Wherein carry computer-readable program code.The data-signal of this propagation can take various forms, including but unlimited In electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be that computer can Any computer-readable medium other than storage medium is read, which can send, propagates or transmit and be used for By the use of instruction execution system, device or device or program in connection.Include on computer-readable medium Program code can transmit with any suitable medium, including but not limited to: wireless, electric wire, optical cable, RF etc. are above-mentioned Any appropriate combination.

Flow chart and block diagram in attached drawing are illustrated according to the system of various embodiments of the invention, method and computer journey The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation A part of one module, program segment or code of table, a part of above-mentioned module, program segment or code include one or more Executable instruction for implementing the specified logical function.It should also be noted that in some implementations as replacements, institute in box The function of mark can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are practical On can be basically executed in parallel, they can also be executed in the opposite order sometimes, and this depends on the function involved.Also it wants It is noted that the combination of each box in block diagram or flow chart and the box in block diagram or flow chart, can use and execute rule The dedicated hardware based systems of fixed functions or operations is realized, or can use the group of specialized hardware and computer instruction It closes to realize.

Being described in module involved in the embodiment of the present invention can be realized by way of software, can also be by hard The mode of part is realized.Described module also can be set in the processor, for example, can be described as: a kind of processor packet Include format converting module, distribution module and metadata storage module.Wherein, the title of these units not structure under certain conditions The restriction of the pairs of unit itself, for example, format converting module is also described as " data to be imported being carried out format to turn It changes, to generate the module of the file of non-relational database storage format ".

As on the other hand, the present invention also provides a kind of computer-readable medium, which be can be Included in equipment described in above-described embodiment；It is also possible to individualism, and without in the supplying equipment.Above-mentioned calculating Machine readable medium carries one or more program, when said one or multiple programs are executed by the equipment, makes Obtaining the equipment includes: to format data to be imported, to generate the file of non-relational database storage format；According to The file of the non-relational database storage format is assigned in the region created by the line unit of the data to be imported；It deposits Store up the metadata in the region.

Technical solution according to an embodiment of the present invention, because the format of data to be imported is converted to non-relational data The data format of the file of the non-relational database storage format of library storing data, and region region is created, it will be wait lead The file for the non-relational database storage format that the data entered generate is assigned to the region created, and then realizes data Non-relational database is efficiently imported, and the importing process can carry out under line, CPU and memory source not occupied, thus not Influence the use of non-relational database on line.By taking Hbase database as an example, because the format of data to be imported is converted to The data format of the HFile file of HBase data database storing, and region region is created, data to be imported are raw At HFile file be assigned to the region created, and then avoid and data be first written into wal in the prior art, wal write-in After success, memstore is write the data to, after memstore reaches particular size, HDFS is written into memstore, generated Data efficient is imported Hbase database by many steps such as HFile file, because importing data to HBase database under line, So the importing process does not occupy CPU and memory source, to not influence the use of HBase database on line.And it avoids The problem of because of server delay machine leading to loss of data.

Above-mentioned specific embodiment, does not constitute a limitation on the scope of protection of the present invention.Those skilled in the art should be bright It is white, design requirement and other factors are depended on, various modifications, combination, sub-portfolio and substitution can occur.It is any Made modifications, equivalent substitutions and improvements etc. within the spirit and principles in the present invention, should be included in the scope of the present invention Within.

Claims

1. a kind of method realized data and import non-relational database characterized by comprising

Data to be imported are formatted, to generate the file of non-relational database storage format；

According to the line unit of the data to be imported, the file of the non-relational database storage format is assigned to and has been created In region；

Store the metadata in the region.

2. the method according to claim 1, wherein by the file of the non-relational database storage format Before being assigned in the region created, further includes: according to preset line unit hashing rule to the initial of data to be imported Line unit is handled, to generate the new line unit of data to be imported；

Then, the file of the non-relational database storage format is assigned to according to the new line unit of the data to be imported In the region of creation.

3. according to the method described in claim 2, it is characterized in that, described treat according to preset line unit hashing rule is led Enter the step of the initial line units of data is handled and includes:

MD5 operation is carried out to generate character string to the initial line unit of the data to be imported；

By continuous several of the character string prefixes as the initial line unit, with the new of data to be imported described in combination producing Line unit.

4. the method according to claim 1, wherein by the file of the non-relational database storage format Before being assigned in the region created, further includes:

Determine the data volume of data to be imported and the data volume of region storage；

The quantity in required region is determined according to the data volume that the data volume of the data to be imported and a region store；

Create the region of the quantity.

5. according to the method described in claim 4, it is characterized in that, before the region for creating the quantity, further includes:

The quantity of column cluster is determined according to business demand, and determines the memory of region server, the service of the occupied area memstore The memory of the ratio of device memory, the occupied area memstore server；

The number of regions under each region server is determined according to the following formula:

(ratio of the occupied area the memory * memstore server memory of the region server)/(occupied area memstore service The quantity of the memory * column cluster of device)

Of required region server is determined according to the number of regions under the quantity in required region and each region server Number.

6. the method according to claim 1, wherein the step of storing the metadata in the region includes:

The metadata in the region is stored into-ROOT- and .META. table.

7. a kind of realize that data import the system that non-relational database is deposited characterized by comprising

Format converting module, for formatting data to be imported, to generate non-relational database storage format File；

Distribution module, for the line unit according to the data to be imported, by the file of the non-relational database storage format It is assigned in the region created；

Metadata storage module, for storing the metadata in the region.

8. system according to claim 7, which is characterized in that further include:

Line unit generation module, for according to preset line unit hashing rule to the initial line unit of data to be imported at Reason, to generate the new line unit of data to be imported；And

The distribution module is according to the new line units of the data to be imported by the text of the non-relational database storage format Part is assigned in the region created.

9. system according to claim 8, which is characterized in that the line unit generation module is also used to: to described to be imported The initial line unit of data carries out MD5 operation to generate character string；Also, by continuous several of the character string as described initial The prefix of line unit, with the new line unit of data to be imported described in combination producing.

10. system according to claim 7, which is characterized in that further include:

Region creation module, for determining the data volume of data to be imported and the data volume of region storage；According to described The data volume of data to be imported and the data volume of region storage determine the quantity in required region；Create the area of the quantity Domain.

11. system according to claim 10, which is characterized in that the region creation module is also used to: according to business need Seek the quantity of determining column cluster, and determine the memory of region server, the occupied area memstore server memory ratio, The memory of the occupied area memstore server；The number of regions under each region server is determined according to the following formula: (region clothes The ratio of the occupied area the memory * memstore server memory of business device)/(the memory * of the occupied area memstore server is arranged The quantity of cluster)；Required regional service is determined according to the number of regions under the quantity in required region and each region server The number of device.

12. system according to claim 7, which is characterized in that the metadata storage module is also used to, by the region Metadata store into-ROOT- and .META. table.

13. a kind of electronic equipment characterized by comprising

One or more processors；

Storage device, for storing one or more programs,

When one or more of programs are executed by one or more of processors, so that one or more of processors are real Now such as method as claimed in any one of claims 1 to 6.

14. a kind of computer-readable medium, is stored thereon with computer program, which is characterized in that described program is held by processor Such as method as claimed in any one of claims 1 to 6 is realized when row.