CN107016039A

CN107016039A - The method and Database Systems of database write-in

Info

Publication number: CN107016039A
Application number: CN201710009722.7A
Authority: CN
Inventors: 杜炼; 朱旭光; 唐欣
Original assignee: Alibaba Group Holding Ltd
Current assignee: Advanced New Technologies Co Ltd; Advantageous New Technologies Co Ltd
Priority date: 2017-01-06
Filing date: 2017-01-06
Publication date: 2017-08-04
Anticipated expiration: 2037-01-06
Also published as: CN107016039B

Abstract

The embodiment of the present application discloses the wiring method and system of a kind of Hbase databases, and method includes：The first daily record data distributed in the first stream process node receiving stream distribution node；Daily record in the first stream process node in dissection process first daily record data obtains the second daily record data, and determines the corresponding form of daily record and region in second daily record data；The content daily record in second daily record data carried in the first stream process node is according to form and Partial route to corresponding second stream process node；The daily record received is merged to form the 3rd daily record data according to form and region in the second stream process node；The second stream process node by title in the 3rd daily record data be the first form, area identification is the data that first area is identified, corresponding region server is identified by first form in Hbase databases and the first area, first form is write in the way of put is operated and the first area identifies corresponding region.

Description

The method and Database Systems of database write-in

Technical field

The application is related to database field, more particularly to the method and Database Systems that a kind of database writes.

Background technology

Existing Jstorm application in, its Hbase databases write operation, by original log and extract index Log content is merged into a line according to index+minute version before preservation；Index is preserved and gone in internal memory simultaneously Operate again, the write-in frequency is reduced with this.But, because the bar number of original log is a lot, writing is very big, existing Jstorm etc. Stream process applies the write-in to Hbase databases to there is bottleneck.

The bottleneck for how avoiding Hbase databases from writing, improves the efficiency of Hbase databases write-in, is that the application is wanted The technical problem of solution.

The content of the invention

The embodiment of the present application provides method and the stream processing system that a kind of database writes, it is possible to increase Hbase data The efficiency of storehouse write-in.

First aspect includes there is provided a kind of method that Hbase databases write, this method：

The first daily record data distributed in the first stream process node receiving stream distribution node, the stream distribution node is reading Prepare the message source node for being written to the log data of Hbase databases and distributing, the first stream process node is direct Receive the message stream process node of the data of stream distribution node distribution；

Daily record in the first stream process node in dissection process first daily record data obtains the second daily record data, and really The corresponding form of daily record and region in fixed second daily record data；

In the first stream process node by the content of the daily record carrying in second daily record data according to form and Partial route To corresponding second stream process node, the second stream process node is the end message stream process node of the stream distribution node；

The daily record received is merged to form the 3rd daily record data according to form and region in the second stream process node；

The second stream process node by title in the 3rd daily record data be the first form, area identification be the firstth area The data of domain identifier, corresponding region server is identified by first form in Hbase databases and the first area, with The mode of put operations writes first form and the first area identifies corresponding region.

Second aspect there is provided a kind of stream processing system, including：Stream distribution node, the first stream process node and second Node is handled, wherein,

Stream distribution node, reads the log data for preparing to be written to Hbase databases and to each first stream process section Point distribution；

First stream process node, the first daily record data of receiving stream distribution node distribution, dissection process first daily record Daily record in data obtains the second daily record data, and determines the corresponding form of daily record and region in second daily record data, and will The content that daily record in second daily record data is carried according to form and Partial route to corresponding second stream process node, its In, the first stream process node is the message stream process node for the data for directly receiving stream distribution node distribution；

Second stream process node, the daily record received is merged to form the 3rd daily record data according to form and region, And be the first form by title in the 3rd daily record data, area identification is the data that first area is identified, and passes through Hbase The form of this in database first and the first area identify corresponding region server, write in the way of put is operated this first Form and the first area identify corresponding region, wherein, the second stream process node is the end message of the stream distribution node Stream process node.

The technical scheme provided from above the embodiment of the present application, the embodiment of the present application by by daily record data according to table Lattice and Partial route will belong to same into the corresponding second stream process node of the form and region, and in the second stream process node The daily record data of the same area of form is merged into same region, then passes through the corresponding region server of the form and region It is written in Hbase databases, so as to the remote procedure call protocol (Remote for the client for reducing Hbase databases Procedure Call Protocol, RPC) connection number of times, and write-in number of times is reduced, improve the write-in of Hbase databases Energy.

Brief description of the drawings

, below will be to embodiment or existing in order to illustrate more clearly of the embodiment of the present application or technical scheme of the prior art There is the accompanying drawing used required in technology description to be briefly described, it should be apparent that, drawings in the following description are only this Some embodiments described in application, for those of ordinary skill in the art, are not paying the premise of creative labor Under, other accompanying drawings can also be obtained according to these accompanying drawings.

Fig. 1 is Storm or Jstorm stream process block schematic illustration.

Fig. 2 is the access schematic diagram of Hbase databases.

Fig. 3 is the wiring method flow chart of one embodiment Hbase databases of the application.

Fig. 4 is the particular flow sheet of the wiring method of one embodiment Hbase databases of the application.

Fig. 5 is the schematic configuration diagram of the electronic equipment of one embodiment of the application.

Fig. 6 is the structural representation of one embodiment stream processing system of the application.

Embodiment

Understand the embodiment of the present application for convenience, introduced first at this embodiment of the present application description in can introduce it is several will Element.

Storm：It is the real time processing system developed by BackType.Storm can be easily in a computer cluster In write the real-time calculating complicated with extension.Storm ensures that each message can be handled, and processing speed is quickly, It is per second to handle millions of message in one small cluster.Storm term includes message flow (Stream), message Source (Spout), Message Processing person (Bolt), task (Task), worker (Worker), Stream Grouping and Topology etc..Stream is processed data；Spout is data source；Bolt processing datas；Task is operate in Spout Or the thread in Bolt；Worker is the process for running these threads.

Message flow (Streams)：Message flow is the abstract of most critical in Storm, and a message flow is exactly one without side The tuple sequences on boundary, tuple is the data structure used in a kind of Storm, can be regarded as not having methodical Java object. These tuple sequences concurrently can be created and handled on cluster by a kind of distributed mode.Definition to message flow Main is exactly that the tuple inside message flow is defined, in order to preferably use tuple, it is necessary to each word in tuple Section takes a name, and the different corresponding types of tuple fields is identical, i.e., two tuple first character segment type Identical, second field type is identical, but the type of first character section and second field can be different.

Message source (Spouts)：Spouts is the life of message flow in a calculating task (Topology) in Storm clusters Production person, Spouts is usually to load data from other data source (for example, database or file system), then to Topology Middle transmitting message.Each Spouts can launch multiple message flows.

Message Processing person (Bolts)：The logic of all Message Processings is completed all in Bolt, can be completed such as in Bolt Filtering, classification, aggregation, calculating, inquiry database operation.Bolt can do simple Message Processing operation, for example, Bolt can Not do any operation, the message received is simply transmitted to other Bolt.Bolt can also do the message flow of complexity Processing, so as to need many Bolt.In actual use, a piece of news is generally required by multiple process steps.

Jstorm：It is a MapReduce for being similar to Hadoop computing system, it is the reality increased income by Alibaba When computation model, primary Storm models (Clojure and Java mixing are write) have been rewritten by using Java, and Many improvement have been done on the basis of originally.In Jstorm, each Stream has Stream source, i.e. Tuple Source, this abstract source is Spout.After having Spout, next need to handle related content.Similar, Jstorm's Abstract processing procedure is Bolt, and any number of inlet flow that Bolt can be flowed into the Bolt is handled, can also sent New stream is used to others Bolt.In Jstorm, it is only necessary to open specific Spout, the Tuple that Spout flows out is led To specific Bolt, then Bolt re-directs other Bolt etc. after being processed to the stream of importing.

Hbase：It is a PostgreSQL database distributed, towards row.In Hbase, data storage in the form of a table. Hbase table is made up of row and column, and row can be divided into multiple row races；Master of the Hbase table using row key (line unit) as retrieval Key；Multiple regions (Region) are divided on the direction that Hbase table is expert at, each Region is that the minimum of distributed storage is single Member.In Hbase databases, including client (Client), Zookeeper, Master,

Client (Client) includes the interface for accessing Hbase, and Client maintains some cache to accelerate to Hbase Access, such as the positional information of Region.

Zookeeper：Whenever is guarantee, only one of which manager (Master) in cluster；Store all Region's Addressingentry；The state of real-time monitor area server (Region Server), by Region Server reach the standard grade with it is offline Information real-time informing is to Master；Hbase schema is stored, which form (table) is included, which each table has Column family, etc..

Manager (Master)：Region is distributed for region server (Region Server)；It is responsible for Region Server load balancing；It was found that the Region Server of failure and redistributing Region thereon；Rubbish text on GFS Part is reclaimed；Handle schema and update request.

Region server (Region Server)：Region Server safeguard that Master distributes to its Region, place Manage the I/O Request to these Region；Region Server are also responsible for the Region that cutting becomes too much in the process of running.

In order that those skilled in the art more fully understand the technical scheme in the application, it is real below in conjunction with the application The accompanying drawing in example is applied, the technical scheme in the embodiment of the present application is clearly and completely described, it is clear that described implementation Example only some embodiments of the present application, rather than whole embodiments.Based on the embodiment in the application, this area is common The every other embodiment that technical staff is obtained under the premise of creative work is not made, should all belong to the application protection Scope.

Fig. 1 is that the stream process tool data stream such as Storm or Jstorm flows to schematic diagram.As shown in figure 1, message source Data flow is distributed to Bolt nodes by Spout nodes at random, and Bolt nodes are received after data flow, is handled and is transmitted to next Individual Bolt nodes.Wherein, data flow can use the structure type such as tuple to transmit.Fig. 1 Spout nodes are the application's Stream distribution node or message stream source node, Bolt nodes are the message stream process node of the application.

Fig. 2 is the schematic flow sheet of the preservation data of the embodiment of the present application Hbase databases.As shown in Fig. 2 user makes HbaseClient api are called to carry out puts operation with Storm or Jstorm etc., Hbase Client can inquire about this lot number Handled according to being belonging respectively to those Region Server and carry out multiple rpc request respectively.Server end Region Server is received after the write request of client, and Put objects can be deserialized as first, then performs various inspection operations, Such as whether whether inspection Region be read-only, memstore sizes more than blockingMemstoreSize etc..Check Afterwards, it can proceed as follows：

(1) row lock, Region are obtained and updates shared lock；

(2) start to write affairs；

(3) write buffer memstore；

(4) additional write-in HLog；

(5) row lock and shared lock are discharged；

(6) synchronization HLog to HDFS；

(7) terminate to write affairs；

(8) flush threads are started by Refresh Data to hard disk.

Fig. 3 is the wiring method flow chart of one embodiment Hbase databases of the application.Fig. 3 method should by Storm With or Jstorm application perform.Fig. 3 method may include：

S301, the first daily record data distributed in the first stream process node receiving stream distribution node.

Wherein, the stream distribution node is written to the log data of Hbase databases and disappearing for distributing to read preparation Source node is ceased, the first stream process node is the message stream process node for the data for directly receiving stream distribution node distribution.

It should be understood that in Storm or Jstorm systems, the stream distribution node is Spout nodes, the first stream process node It is connected with Spout nodes, directly receives the message stream process node that Spout nodes distribute data.

It should be understood that in the embodiment of the present application, stream distribution node can constantly read preparation pre-write to Hbase databases Log data, and daily record data is distributed to each the first stream process node in the form of data flow.Under normal circumstances, Stream distribution node is sent in a random way in Distribution Log data.Certainly, if stream distribution node changes transmitter System, the embodiment of the present application is not also limited herein.

S302, the daily record in the first stream process node in dissection process first daily record data obtains the second daily record number According to, and determine the corresponding form of daily record and region in second daily record data.

It should be understood that the first stream process node is when parsing the daily record of the first daily record data, it will usually parsed one by one by daily record, And determine the corresponding form of every daily record (table) and region (Region).

S303, in the first stream process node by the content of the daily record carrying in second daily record data according to form and region It is routed to corresponding second stream process node.

Wherein, the second stream process node is the end message stream process node of the stream distribution node.

It should be understood that the second stream process node is the end message stream process node of the stream distribution node, refer to second The leaf node that node is message stream process is handled, the second stream process node will not deliver the stream to next stream process node In.By taking Storm systems or Jstorm systems as an example, the second stream process node is the end in Storm systems or Jstorm systems Stream process node, will not be delivered the stream in Storm systems or Jstorm systems in next Bolt nodes.

It should be understood that in the embodiment of the present application, the daily record of the same area will flow to same end message under same form Stream process node.

S304, the daily record received in the second stream process node merged to form the 3rd daily record according to form and region Data.

It should be understood that in the second stream process node, the daily record received is merged according to form and region, refer to by The daily record data for belonging to the same area of same form is merged into a region.

S305, by title in the 3rd daily record data is the first form in the second stream process node, area identification is The data of first area mark, corresponding regional service is identified by first form in Hbase databases and the first area Device, writes first form in the way of push operation and the first area identifies corresponding region.

It should be understood that in the embodiment of the present application, area identification can for Region numberings or it is other being capable of unique mark Region mark.

It should be understood, of course, that the stream distribution node and Bolt nodes in the application can also be used in other stream processing systems and had The node for having similar functions is replaced, and this is not restricted for the embodiment of the present application.

In the embodiment of the present application, by the way that daily record data is corresponding to the form and region according to form and Partial route In second stream process node, and the daily record data of the same area that will belong to same form in the second stream process node is merged into together In one region, then it is written in Hbase databases by the corresponding region server of the form and region, so as to reduce The rpc connection number of times of the client of Hbase databases, and write-in number of times is reduced, improve the write performance of Hbase databases.

,, can be with the case where ensureing that Hbase clusters are stable using the method for the embodiment of the present application in simulation training So that Hbase handling capacity improves 50%.

Alternatively, as one embodiment, step S302 is implemented as：

Daily record in the first stream process node in dissection process first daily record data carries out structuring processing and tied The daily record data of structure；

According to the table belonging to the daily record data of the structuring and keyword key to the structuring in the first stream process node Daily record data be indexed；

According to form, index and scheduled time version in the first stream process node, or according to form and index to this The daily record data of structuring carries out content and merges to form second daily record；

The corresponding form of daily record and region in second daily record are determined according to form and index in the first stream process node.

In the embodiment of the present application, in the first stream process node according to form, index and scheduled time version, or according to table Lattice and index carry out content merging to daily record data, can also reduce the rpc connection number of times of the client of Hbase databases, and Write-in number of times is reduced, the write performance of Hbase databases is improved.

Alternatively, as one embodiment, step S303 is implemented as：

The daily record in second daily record data is sent to the 3rd stream process node, the 3rd stream in the first stream process node Processing node is the message for the content that second daily record data is transmitted between the first stream process node and the second stream process node Processor's node；

It will be docked in the 3rd stream process node according to form, index and scheduled time version, or according to form and index Second daily record data received carries out content and merges to form the 4th daily record data；

The content the 4th data carried in the 3rd stream process node is according to form and Partial route to corresponding second Stream process node.

In the embodiment of the present application, the 3rd stream process node between the first stream process node and the second stream process node Content merging is carried out to daily record data according to form, index and scheduled time version, or according to form and index, can also be subtracted The rpc connection number of times of the client of few Hbase databases, and write-in number of times is reduced, improve the write performance of Hbase databases.

Alternatively, when carrying out daily record merging according to form, index and scheduled time version, the scheduled time version is 30 Second version, minute version, 5 minutes versions or 10 minutes versions.

Alternatively, as one embodiment, step S305 is implemented as：When title is in the 3rd daily record data First form, when area identification is that the data that first area is identified are more than or equal to the first predetermined threshold, by the 3rd daily record number It is the first form according to middle title, area identification is the data that first area is identified, and passes through first table in Hbase databases Lattice and the first area identify corresponding region server, and first form and the first area are write in the way of put is operated Identify corresponding region.

In the embodiment of the present application, by entering row write again when Region data are more than or equal to the first predetermined threshold Enter operation, can also reduce the rpc connection number of times of the client of Hbase databases, and reduce write-in number of times, improve Hbase numbers According to the write performance in storehouse.

Alternatively, as another embodiment, step S305 is implemented as：When the form in the 3rd daily record data Entitled first form, area identification is more than or equal to for the time that the data that first area is identified write the Hbase databases It is the first form by title in the 3rd daily record data during the second predetermined threshold, area identification is what first area was identified Data, are identified corresponding region server by first form in Hbase databases and the first area, are operated with put Mode writes first form and the first area identifies corresponding region.

In the embodiment of the present application, write operation is carried out by being spaced for second scheduled time, Hbase databases can be also reduced Client rpc connection number of times, and reduce write-in number of times, improve the write performance of Hbase databases.

Below, will in conjunction with specific embodiments, the method to the embodiment of the present application is further described.

Fig. 4 is the particular flow sheet of the wiring method of embodiments herein Hbase databases.In embodiment illustrated in fig. 4 In, it is illustrated by taking Jstorm Spout and Bolt nodes as an example with reference to Hbase databases.It should be understood, of course, that this Shen Please in stream distribution node and Bolt nodes also can use other stream processing systems in have similar functions node replace, this Shen Please embodiment this is not restricted.In the embodiment of the present application, Bolt nodes are divided into 3 classes, are directly from stream distribution section respectively Point receives the first stream process node, the terminal node of Message Processing or leaf node the second stream process node of data, Yi Jiwei The 3rd stream process node on routed path between the first stream process node and the second stream process node.

It should be understood, of course, that although Fig. 4 merely illustrates the route road between the first stream process node and the second stream process node A the 3rd stream process node on footpath, but in actual application, between the first stream process node and the second stream process node Routed path on may also include multiple 3rd stream process nodes.In addition, in the embodiment of the present application, the can also be not present Three stream process nodes.

Fig. 4 method includes：

401, Spout node Distribution Log data.

In the embodiment of the present application, Spout nodes are the stream distribution node of embodiment illustrated in fig. 3.

Spout nodes can be pulled the log data of Hbase databases by block, and batch is distributed to downstream node.

402, splitBolt nodes carry out daily record parsing, processing, and send.

In the embodiment of the present application, splitBolt nodes are the first stream process node of embodiment illustrated in fig. 3.

It should be understood that in the embodiment of the present application, splitBolt nodes are received after the daily record data distributed in Spout, it is first First need to parse daily record.Daily record parsing is carried out to original log, is in order to by daily record text data structure.For example,

Original log：

2016-12-12 07:05:32,162[3005][lian.du,S,-,open/abnormalEvents.json, null,null]

When carrying out daily record parsing to above-mentioned original log, can use ", " number progress text dividing.In addition, definable will The value to be preserved is whole piece original log, or to be preserved be correspondence access link such as：open/ AbnormalEvents.json, etc..Implement and refer to prior art.

After daily record parsing is completed, data can be indexed with extraction, index is route as rowkey.

In the embodiment of the present application, it is necessary to be used as routing policy using form (table)+region (Region) information.

By taking above-mentioned daily record as an example, lian.du can be defined for search index by configuring.It is to protect to define search index Hbase rowkey is stored to, follow-up inquiry can carry out data query by rowkey.

In addition, in the embodiment of the present application, in order to determine the Region belonging to daily record, one can be set up and read Region letters The thread of breath.Defined due to the Region of Hbase tables by rowkey, inquired by rowkey and be assigned to that In Region, such as：

Region ID	Start key	Terminate key
			1	00001	00100
2	00101	00200

When rowkey is 00099, then the thread for reading Region information may know that corresponding region Id is 1.

After splitBolt nodes determine table and the Region information of daily record, you can according to table+Region's Routing policy sends the daily record data of structuring to next Bolt nodes.

It should be understood, of course, that in splitBolt nodes, can also enter to data according to table+rowkey to log content Row merges.The input merged to log content is index and value after structuring daily record.

For example, it is assumed that two data to be merged are as follows：

Key	value
		lian.du	open/abnormalEvents.json
lian.du	open/other.json

The result then exported is as follows：

The purpose that log content is merged is in order to when being saved in Hbase databases, by a plurality of index key identicals Daily record is merged into a daily record, can simply understand be by a plurality of value with ", " number separate preserve.

Further, since daily record data is huge, cause value values too big to avoid merging too many daily record, it is contemplated that according to Table+rowkey+ scheduled time versions are merged.The scheduled time version can be 30 seconds versions, minute version, 5 minutes Version or 10 minutes versions, etc..For example, being merged according to table+rowkey+ minutes versions, then it represents that by 1 minute Same table and rowkey log content are merged.

403, midBolt nodes are handled, and route is sent.

In the embodiment of the present application, midBolt nodes are the 3rd stream process node of embodiment illustrated in fig. 3.

MidBolt nodes receive upstream node and are routed to after the daily record data of this node, can be according to actual conditions needs Handled.

Alternatively, as a kind of scheme, the daily record data for being routed to this node directly can be routed to down by midBolt nodes One Bolt nodes.

Or, alternatively, alternatively, midBolt nodes can also to data according to table+rowkey to day Will content is merged, or is merged according to table+rowkey+ scheduled time versions, then by the daily record data after merging It is routed to next Bolt nodes.

It should be understood, of course, that step 403 is alternatively.

404, HbaseRegionBolt nodes merge, write-in.

In the embodiment of the present application, HbaseRegionBolt nodes are the second stream process section of embodiment illustrated in fig. 3 Point.

The major function of HbaseRegionBolt nodes is daily record data by this node is routed to according to table+ Region carries out Region merging, i.e., be merged into same Region with identical table titles and the Region daily record identified In.Specifically, Region marks can be Region numbering.

Complete after Region merging, for each Region, HbaseRegionBolt nodes can be set up and this Region pairs The RegionServer answered connection, and be written to by way of the RegionServer operates daily record data with put In Hbase databases.It should be understood that RegionServer herein is the region server of embodiment illustrated in fig. 3.

Specifically, HbaseRegionBolt nodes can be more than or equal to predetermined threshold in the corresponding file sizes of Region When, Region data are write in Hbase databases.

Or, the corresponding daily records of Region regularly can be written in Hbase databases by HbaseRegionBolt nodes.

Fig. 5 shows the schematic configuration diagram of the electronic equipment of the exemplary embodiment according to the application.It refer to Fig. 5, In hardware view, the electronic equipment includes processor, internal bus, network interface, internal memory and nonvolatile memory, certainly It is also possible that the hardware required for other business.Processor read from nonvolatile memory corresponding computer program to Then run in internal memory, the device of user interface unblock is formed on logic level.Certainly, in addition to software realization mode, The application is not precluded from other implementations, such as the mode of logical device or software and hardware combining etc., that is to say, that following The executive agent of handling process is not limited to each logic unit or hardware or logical device.

Fig. 6 is the structural representation of the stream processing system 600 of one embodiment of the application.Fig. 6 is refer to, it is soft in one kind In part embodiment, stream processing system 600 may include stream distribution node 610, the first stream process node 620 and the second stream process section Point 630, wherein,

Stream distribution node 610, reads the log data for preparing to be written to Hbase databases and to each first stream process Node 620 is distributed；

First stream process node 620, the first daily record data that receiving stream distribution node 610 is distributed, dissection process this Daily record in one daily record data obtains the second daily record data, and determines the corresponding form of daily record and area in second daily record data Domain, and the content that the daily record in second daily record data is carried is according to form and Partial route to corresponding second stream process section Point 630, wherein, the first stream process node 620 is at the message flow for the data for directly receiving the stream distribution node 610 distribution Manage node；

Second stream process node 630, the daily record received is merged to form the 3rd daily record number according to form and region According to, and be the first form by title in the 3rd daily record data, area identification is the data that first area is identified, and is passed through First form and the first area identify corresponding region server in Hbase databases, and this is write in the way of put is operated First form and the first area identify corresponding region, wherein, the second stream process node 630 is the stream distribution node 610 End message stream process node.

In the embodiment of the present application, by the way that daily record data is corresponding to the form and region according to form and Partial route In second stream process node, and the daily record data of identical form and region is merged into same region in the second stream process node In, then be written in Hbase databases by the corresponding region server of the form and region, so as to reduce Hbase numbers According to the rpc connection number of times of the client in storehouse, and write-in number of times is reduced, improve the write performance of Hbase databases.

Alternatively, as one embodiment, the day in the first stream process node 620 in dissection process first daily record data Will obtains the second daily record data, and determines that the corresponding form of daily record and region include in second daily record data：

Daily record in the dissection process of first stream process node 620 first daily record data carries out structuring processing and obtains structure The daily record data of change；

First stream process node 620 is according to the table belonging to the daily record data of the structuring and keyword key to the structuring Daily record data be indexed；

First stream process node 620 is according to form, index and scheduled time version, or according to form and indexes to the knot The daily record data of structure carries out content and merges to form second daily record；

First stream process node 620 determines the corresponding form of daily record and region in second daily record according to form and index.

Alternatively, as another embodiment, the system also includes the 3rd stream process node, and the 3rd stream process node is The Message Processing of the content of second daily record data is transmitted between the first stream process node 620 and the second stream process node 630 Person's node,

The content that first stream process node 620 carries the daily record in second daily record data is according to form and Partial route Include to corresponding second stream process node 630：Daily record in second daily record data is sent to by the first stream process node 620 3rd stream process node, by the 3rd stream process node processing and is routed to corresponding second stream process node 630；

Wherein, the 3rd stream process node processing and it is routed to corresponding second stream process node 630 and includes：3rd stream process Node will be according to form, index and scheduled time version, or according to form and indexes second daily record data to receiving Content is carried out to merge to form the 4th daily record data；The content that 3rd stream process node carries the 4th data is according to form and area Domain is routed to corresponding second stream process node 630.

Further, in the above two embodiments, when being merged according to form, index and scheduled time version, The scheduled time version is 30 seconds versions, minute version, 5 minutes versions or 10 minutes versions.

Alternatively, as one embodiment, title in the 3rd daily record data is the by the second stream process node 630 One form, area identification is the data that first area is identified, and is marked by first form in Hbase databases and the first area Know corresponding region server, first form is write in the way of put is operated and the first area identifies corresponding region, bag Include：

When title is the first form in the 3rd daily record data, area identification is that the data that first area is identified are more than Or during equal to the first predetermined threshold, title in the 3rd daily record data is the first form, area by the second stream process node 630 Domain identifier is the data that first area is identified, and is identified by first form in Hbase databases and the first area corresponding Region server, writes first form in the way of put is operated and the first area identifies corresponding region.

When title is the first form in the 3rd daily record data, area identification is the data that first area is identified When the time for writing the Hbase databases is more than or equal to the second predetermined threshold, the second stream process node 630 is by the 3rd daily record In data title be the first form, area identification be first area identify data, by Hbase databases this first Form and the first area identify corresponding region server, and first form and firstth area are write in the way of put is operated The corresponding region of domain identifier.

The method that stream processing system 600 can also carry out embodiment illustrated in fig. 3, implements and refers to shown in Fig. 3, Fig. 4 in fact Example is applied, the embodiment of the present application will not be repeated here.

The embodiment of the present application also discloses a kind of computer-readable recording medium, for storing computer program, the calculating Machine program includes the instruction for being used to perform the method in embodiment illustrated in fig. 3.

In the 1990s, for a technology improvement can clearly distinguish be on hardware improvement (for example, Improvement to circuit structures such as diode, transistor, switches) or software on improvement (for the improvement of method flow).So And, with the development of technology, the improvement of current many method flows can be considered as directly improving for hardware circuit. Designer nearly all obtains corresponding hardware circuit by the way that improved method flow is programmed into hardware circuit.Cause This, it cannot be said that the improvement of a method flow cannot be realized with hardware entities module.For example, PLD (Programmable Logic Device, PLD) (such as field programmable gate array (Field Programmable Gate Array, FPGA)) it is exactly such a integrated circuit, its logic function is determined by user to device programming.By designer Voluntarily programming comes a digital display circuit " integrated " on a piece of PLD, without asking chip maker to design and make Special IC chip.Moreover, nowadays, substitution manually makes IC chip, and this programming is also used instead mostly " patrols Volume compiler (logic compiler) " software realizes that software compiler used is similar when it writes with program development, And the source code before compiling also write by handy specific programming language, this is referred to as hardware description language (Hardware Description Language, HDL), and HDL is also not only a kind of, but have many kinds, such as ABEL (Advanced Boolean Expression Language)、AHDL(Altera Hardware Description Language)、Confluence、CUPL(Cornell University Programming Language)、HDCal、JHDL (Java Hardware Description Language)、Lava、Lola、MyHDL、PALASM、RHDL(Ruby Hardware Description Language) etc., VHDL (Very-High-Speed are most generally used at present Integrated Circuit Hardware Description Language) and Verilog.Those skilled in the art also should This understands, it is only necessary to slightly programming in logic and be programmed into method flow in integrated circuit with above-mentioned several hardware description languages, The hardware circuit for realizing the logical method flow can be just readily available.

Controller can be implemented in any suitable manner, for example, controller can take such as microprocessor or processing Device and storage can by the computer of the computer readable program code (such as software or firmware) of (micro-) computing device Read medium, gate, switch, application specific integrated circuit (Application Specific Integrated Circuit, ASIC), the form of programmable logic controller (PLC) and embedded microcontroller, the example of controller includes but is not limited to following microcontroller Device：ARC 625D, Atmel AT91SAM, Microchip PIC18F26K20 and Silicone Labs C8051F320, are deposited Memory controller is also implemented as a part for the control logic of memory.It is also known in the art that except with Pure computer readable program code mode is realized beyond controller, can be made completely by the way that method and step is carried out into programming in logic Obtain controller and come real in the form of gate, switch, application specific integrated circuit, programmable logic controller (PLC) and embedded microcontroller etc. Existing identical function.Therefore this controller is considered a kind of hardware component, and various for realizing to including in it The device of function can also be considered as the structure in hardware component.Or even, can be by for realizing that the device of various functions is regarded For that not only can be the software module of implementation method but also can be the structure in hardware component.

System, device, module or unit that above-described embodiment is illustrated, can specifically be realized by computer chip or entity, Or realized by the product with certain function.It is a kind of typically to realize that equipment is computer.Specifically, computer for example may be used Think personal computer, laptop computer, cell phone, camera phone, smart phone, personal digital assistant, media play It is any in device, navigation equipment, electronic mail equipment, game console, tablet PC, wearable device or these equipment The combination of equipment.

For convenience of description, it is divided into various units during description apparatus above with function to describe respectively.Certainly, this is being implemented The function of each unit can be realized in same or multiple softwares and/or hardware during application.

It should be understood by those skilled in the art that, embodiments of the invention can be provided as method, system or computer program Product.Therefore, the present invention can be using the reality in terms of complete hardware embodiment, complete software embodiment or combination software and hardware Apply the form of example.Moreover, the present invention can be used in one or more computers for wherein including computer usable program code The computer program production that usable storage medium is implemented on (including but is not limited to magnetic disk storage, CD-ROM, optical memory etc.) The form of product.

The present invention is the flow with reference to method according to embodiments of the present invention, equipment (system) and computer program product Figure and/or block diagram are described.It should be understood that can be by every first-class in computer program instructions implementation process figure and/or block diagram Journey and/or the flow in square frame and flow chart and/or block diagram and/or the combination of square frame.These computer programs can be provided The processor of all-purpose computer, special-purpose computer, Embedded Processor or other programmable data processing devices is instructed to produce A raw machine so that produced by the instruction of computer or the computing device of other programmable data processing devices for real The device for the function of being specified in present one flow of flow chart or one square frame of multiple flows and/or block diagram or multiple square frames.

These computer program instructions, which may be alternatively stored in, can guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works so that the instruction being stored in the computer-readable memory, which is produced, to be included referring to Make the manufacture of device, the command device realize in one flow of flow chart or multiple flows and/or one square frame of block diagram or The function of being specified in multiple square frames.

These computer program instructions can be also loaded into computer or other programmable data processing devices so that in meter Series of operation steps is performed on calculation machine or other programmable devices to produce computer implemented processing, thus in computer or The instruction performed on other programmable devices is provided for realizing in one flow of flow chart or multiple flows and/or block diagram one The step of function of being specified in individual square frame or multiple square frames.

In a typical configuration, computing device includes one or more processors (CPU), input/output interface, net Network interface and internal memory.

Internal memory potentially includes the volatile memory in computer-readable medium, random access memory (RAM) and/or The forms such as Nonvolatile memory, such as read-only storage (ROM) or flash memory (flash RAM).Internal memory is computer-readable medium Example.

Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method Or technology come realize information store.Information can be computer-readable instruction, data structure, the module of program or other data. The example of the storage medium of computer includes, but are not limited to phase transition internal memory (PRAM), static RAM (SRAM), moved State random access memory (DRAM), other kinds of random access memory (RAM), read-only storage (ROM), electric erasable Programmable read only memory (EEPROM), fast flash memory bank or other memory techniques, read-only optical disc read-only storage (CD-ROM), Digital versatile disc (DVD) or other optical storages, magnetic cassette tape, the storage of tape magnetic rigid disk or other magnetic storage apparatus Or any other non-transmission medium, the information that can be accessed by a computing device available for storage.Define, calculate according to herein Machine computer-readable recording medium does not include temporary computer readable media (transitory media), such as data-signal and carrier wave of modulation.

It should also be noted that, term " comprising ", "comprising" or its any other variant are intended to nonexcludability Comprising so that process, method, commodity or equipment including a series of key elements are not only including those key elements, but also wrap Include other key elements being not expressly set out, or also include for this process, method, commodity or equipment intrinsic want Element.In the absence of more restrictions, the key element limited by sentence "including a ...", it is not excluded that wanted including described Also there is other identical element in process, method, commodity or the equipment of element.

It will be understood by those skilled in the art that embodiments herein can be provided as method, system or computer program product. Therefore, the application can be using the embodiment in terms of complete hardware embodiment, complete software embodiment or combination software and hardware Form.Deposited moreover, the application can use to can use in one or more computers for wherein including computer usable program code The shape for the computer program product that storage media is implemented on (including but is not limited to magnetic disk storage, CD-ROM, optical memory etc.) Formula.

The application can be described in the general context of computer executable instructions, such as program Module.Usually, program module includes performing particular task or realizes routine, program, object, the group of particular abstract data type Part, data structure etc..The application can also be put into practice in a distributed computing environment, in these DCEs, by Remote processing devices connected by communication network perform task.In a distributed computing environment, program module can be with Positioned at including in the local and remote computer-readable storage medium including storage device.

Each embodiment in this specification is described by the way of progressive, identical similar portion between each embodiment Divide mutually referring to what each embodiment was stressed is the difference with other embodiment.It is real especially for system Apply for example, because it is substantially similar to embodiment of the method, so description is fairly simple, related part is referring to embodiment of the method Part explanation.

Embodiments herein is the foregoing is only, the application is not limited to.For those skilled in the art For, the application can have various modifications and variations.It is all any modifications made within spirit herein and principle, equivalent Replace, improve etc., it should be included within the scope of claims hereof.

Claims

1. a kind of wiring method of Hbase databases, it is characterised in that methods described includes：

The first daily record data distributed in the first stream process node receiving stream distribution node, the stream distribution node prepares to read The log data of Hbase databases and the source node for the message flow distributed are written to, the first stream process node is straight Receive the message stream process node of the data of the stream distribution node distribution；

Daily record in the first stream process node in the first daily record data described in dissection process obtains the second daily record data, and determines The corresponding form of daily record and region in second daily record data；

The content that the daily record in second daily record data is carried is arrived according to form and Partial route in the first stream process node Corresponding second stream process node, the second stream process node is the end message stream process node of the stream distribution node；

The second stream process node by title in the 3rd daily record data be the first form, area identification is first area The data of mark, corresponding region server is identified by the first form described in Hbase databases and the first area, with The mode for pushing put operations writes first form and the corresponding region of first area mark.

2. the method as described in claim 1, it is characterised in that described in the first stream process node first described in dissection process Daily record in daily record data obtains the second daily record data, and determines the corresponding form of daily record and region in second daily record data Including：

Daily record in the first stream process node in the first daily record data described in dissection process carries out structuring processing and obtains structure The daily record data of change；

According to form, index and scheduled time version in the first stream process node, or according to form and index to the knot The daily record data of structure carries out content and merges to form second daily record；

The corresponding form of daily record and region are determined in second daily record according to form and index in the first stream process node.

3. method as claimed in claim 1 or 2, it is characterised in that in the first stream process node by second daily record data In daily record carry content include according to form and Partial route to corresponding second stream process node：

The daily record in second daily record data is sent to the 3rd stream process node, the 3rd stream in the first stream process node Processing node is that the content that second daily record data is transmitted between the first stream process node and the second stream process node disappears Cease processor's node；

Will be according to form, index and scheduled time version in the 3rd stream process node, or according to form and index to receiving Second daily record data carry out content merge to form the 4th daily record data；

The content the 4th data carried in the 3rd stream process node is according to form and Partial route to corresponding second Handle node.

4. method as claimed in claim 2 or claim 3, it is characterised in that

When being merged according to form, index and scheduled time version, the scheduled time version is 30 seconds versions, minute versions Originally, 5 minutes versions or 10 minutes versions.

5. the method as any one of claim 1-4, it is characterised in that

It is the first form by title in the 3rd daily record data, area identification is the data that first area is identified, and is passed through First form described in Hbase databases and the first area identify corresponding region server, are write in the way of put is operated Enter first form and the first area identifies corresponding region, including：

When in the 3rd daily record data title be the first form, area identification be first area identify data be more than or It is the first form by title in the 3rd daily record data, area identification is first area during equal to the first predetermined threshold The data of mark, corresponding region server is identified by the first form described in Hbase databases and the first area, with The mode of put operations writes first form and the first area identifies corresponding region.

6. the method as any one of claim 1-4, it is characterised in that

When title is the first form in the 3rd daily record data, area identification is that the data that first area is identified are write When the time for entering the Hbase databases is more than or equal to the second predetermined threshold, by title in the 3rd daily record data For the first form, area identification is the data that first area is identified, and passes through the first form described in Hbase databases and described the The corresponding region server of one area identification, writes first form and first area mark in the way of put is operated Corresponding region.

7. a kind of stream processing system, it is characterised in that including：Stream distribution node, the first stream process node and the second stream process section Point, wherein,

Stream distribution node, reads the log data for preparing to be written to Hbase databases and divides to each first stream process node Hair；

First stream process node, the first daily record data of receiving stream distribution node distribution, the first daily record number described in dissection process Daily record in obtains the second daily record data, and determines the corresponding form of daily record and region in second daily record data, and will The content that daily record in second daily record data is carried according to form and Partial route to corresponding second stream process node, its In, the first stream process node is the message stream process node for the data for directly receiving the stream distribution node distribution；

Second stream process node, the daily record received is merged according to form and region to form the 3rd daily record data, and will Title is the first form in 3rd daily record data, and area identification is the data that first area is identified, and passes through Hbase numbers Corresponding region server is identified according to the first form described in storehouse and the first area, is write in the way of pushing put operations First form and the first area identify corresponding region, wherein, the second stream process node is the stream distribution The end message stream process node of node.

8. system as claimed in claim 7, it is characterised in that the first daily record number described in dissection process in the first stream process node Daily record in obtains the second daily record data, and determines that the corresponding form of daily record and region include in second daily record data：

Daily record in first daily record data described in first stream process node dissection process carries out structuring processing and obtains structuring Daily record data；

First stream process node is according to the day of the table belonging to the daily record data of the structuring and keyword key to the structuring Will data are indexed；

First stream process node is according to form, index and scheduled time version, or according to form and indexes to the structuring Daily record data carry out content merge to form second daily record；

First stream process node determines in second daily record corresponding form of daily record and region according to form and index.

9. system as claimed in claim 7 or 8, it is characterised in that the system also includes the 3rd stream process node, described the Three stream process nodes are the content that second daily record data is transmitted between the first stream process node and the second stream process node Message Processing person's node,

The content that first stream process node carries the daily record in second daily record data according to form and Partial route to pair The the second stream process node answered includes：Daily record in second daily record data is sent at the 3rd stream by the first stream process node Node is managed, by the 3rd stream process node processing and corresponding second stream process node is routed to；

Wherein, the 3rd stream process node processing and it is routed to corresponding second stream process node and includes：3rd stream process node will Second daily record data received is carried out according to form, index and scheduled time version, or according to form and index Content merges to form the 4th daily record data；The content that 3rd stream process node carries the 4th data is according to form and region It is routed to corresponding second stream process node.

10. system as claimed in claim 8 or 9, it is characterised in that

11. the system as any one of claim 7-10, it is characterised in that

Title in 3rd daily record data is the first form by the second stream process node, and area identification is marked for first area The data of knowledge, corresponding region server is identified by the first form described in Hbase databases and the first area, with The mode of put operations writes first form and the first area identifies corresponding region, including：

When in the 3rd daily record data title be the first form, area identification be first area identify data be more than or During equal to the first predetermined threshold, title in the 3rd daily record data is the first form, region by the second stream process node The data of first area mark are designated, it is corresponding with first area mark by the first form described in Hbase databases Region server, first form and the first area are write in the way of put is operated and identifies corresponding region.

12. the system as any one of claim 7-10, it is characterised in that

When title is the first form in the 3rd daily record data, area identification is that the data that first area is identified are write When the time for entering the Hbase databases is more than or equal to the second predetermined threshold, the second stream process node is by the 3rd daily record Title is the first form in data, and area identification is the data that first area is identified, and passes through described in Hbase databases the One form and the first area identify corresponding region server, and first form and institute are write in the way of put is operated State first area and identify corresponding region.