Method, controller and the system of data storage
Technical field
This application involves field of storage, and more particularly, to method, controller and the system of a kind of storage of data.
Background technique
Under across Free Region (Availability Zone, AZ) dual-active framework, main cluster provides read-write to user equipment
Service, standby cluster are served only for backing up the data of main cluster, for being mass-sended to main collection when main cluster breaks down and loses data
Send the data of backup to restore data, standby cluster does not provide read-write service to user equipment.Between main cluster and standby cluster,
It is synchronous by synchrodata ahead log (Write-Ahead Logging, WAL) Lai Shixian data.Active and standby cluster is all made of
Distributed memory system (Hadoop Database, HBase), active and standby cluster according to data WAL generate respectively respective meta and
HFile, and the merging compaction operation of HFile is executed respectively.Due to meeting band during executing compaction
Carry out the IO pressure of very big bandwidth pressure and short time and calculate pressure, HBase performance is caused to decline to a great extent.Main cluster to
Family equipment provides read-write service, but main cluster can be brought during executing compaction very big bandwidth pressure and
The IO pressure of short time, and the execution of compaction can consume computing capability, and HBase performance is caused to decline to a great extent, and influence
User experience.
Summary of the invention
The application provides method, controller and the system of a kind of data storage, and the controller of main cluster receives standby cluster
The controller deletion of file after the merging that controller is sent, main cluster was stored in not merging on the controller of main cluster originally
File, which has included the file after the merging that the controller of standby cluster is sent, and the controller of main cluster is not
Need the union operation carried out, therefore, avoid brought during executing compaction very big bandwidth pressure and
The IO pressure of short time, improves user experience, improves system performance.
In a first aspect, providing a kind of method of data storage, this method is applied to group system, which includes main collection
Group and standby cluster, this method are executed by the first controller, which is the controller of the main cluster, and second controller is
The controller for cluster, this method comprises:
File, First ray number and first identifier information after receiving the merging of second controller transmission, first sequence
Row number is the sequence number of the first data in the file after the merging, which is used to indicate the file after the merging
This for the regional number in cluster be first area number;According to the First ray number, last in the file after determining the merging
The sequence number of the data of a write-in;The sequence number for the data that the last one in file after comparing the merging is written and do not merge
The sequence number for the data that the last one in file is written, the file not merged are that the main cluster inner region number is the first area
Number file;It is last in file after the sequence number for the data that the last one in the file not merged is written is less than the merging
When the sequence number of the data of one write-in, the file not merged is deleted;File in the main cluster, after storing the merging.
Therefore, text of the first controller of the main cluster after receiving the merging that the second controller for cluster is sent
After part, First ray number and first identifier information, according to the First ray number, the last one in the file after determining the merging is write
The sequence number of the data entered, then the text for comparing the sequence number for the data that the last one in the file after the merging is written and not merging
The sequence number for the data that the last one in part is written, when the sequence number for the data that the last one in the file not merged is written is small
When the sequence number for the data that the last one in the file after the merging is written, the file not merged is deleted, in the main cluster
Interior, after storing the merging file.Therefore the union operation that first controller does not need to carry out is avoided and is being executed
The consumption of the IO pressure and computing capability of very big bandwidth pressure and short time, Jin Erti are brought during compaction
The high performance of main cluster, improves user experience.
With reference to first aspect, in certain implementations of first aspect, this method further include:, will in the main cluster
The regional number of file after the merging is set as the first area number.
At this point, by setting the first area number for the regional number of the file after the merging, it is established that first area with
The association of file after the merging, in order to according to the first area locating file.
With reference to first aspect, in certain implementations of first aspect, which is the file after the merging
In the last one write-in data sequence number.
With reference to first aspect, in certain implementations of first aspect, in the merging that the reception second controller is sent
Before rear file, First ray number and first identifier information, this method further include: generate the first log, the first log packet
Include the information for the first area that regional number in the main cluster is the first area number, the information of the first area include this first
The associated file not merged in region;First log is sent to the second controller.
At this point, the controller of main cluster is generating the firstth area for including the regional number in the main cluster for the first area number
When first log of the information in domain, first log is sent to the second controller for cluster, is used for the second controller
The information for obtaining the first area, according to the information of the first area, the regional number being associated in standby cluster is the first area number
First area and standby cluster in the file not merged, to keep the first area in main cluster and standby cluster information
Unanimously.
With reference to first aspect, in certain implementations of first aspect, in the merging that the reception second controller is sent
Before rear file, First ray number and first identifier information, this method further include: generate the second log, the second log packet
Include the data of the file not synthesized;Second log is sent to the second controller;According to the data of the file not synthesized,
Generate the file not merged.
Second aspect provides a kind of method of data storage, and this method is applied to group system, which includes main collection
Group and standby cluster, this method are executed by second controller, which is the controller for cluster, first controller
For the controller of the main cluster, this method comprises:
When the number for the file not merged in the first area in cluster reaches first threshold, this is not merged
File mergences records First ray number with the file after being merged, which is in the file after the merging
The sequence number of first data;File, First ray number and the first identifier information after merging are sent to first controller, this
One identification information be used to indicate the file after the merging this for the regional number in cluster be first area number.
Therefore, in the first area in standby cluster to the file mergences not merged with the file after being merged, and
First ray number is recorded, sends file, First ray number and the first identifier after merging to first controller of the main cluster
Information, by by and after file, First ray number and first identifier information be sent to the controller of main cluster, so as to avoid
Main cluster merges operation.Therefore, avoid brought during executing compaction very big bandwidth pressure and
The IO pressure of short time and consumption to computing capability, and then the performance of main cluster is improved, improve user experience.
In conjunction with second aspect, in certain implementations of second aspect, this method further include:
Receive the first log of first controller transmission, first log include regional number in the main cluster be this
The information of the first area of one regional number, the information of the first area include the associated file not merged in the first area;
First log is played back, the information of the first area is obtained;According to the information of the first area, the regional number being associated in standby cluster
For the file not merged in the first area and standby cluster of the first area number.
In conjunction with second aspect, in certain implementations of second aspect, this method further include: receive first controller
The second log sent, which includes the data of the file not synthesized;It is raw according to the data of the file not synthesized
The file not merged at this.
The third aspect provides a kind of controller, and for the controller application in group system, which includes main cluster and standby
Cluster, the controller are the first controller, which is the controller of the main cluster, and second controller is this for cluster
Controller, which includes:
Receiving module, file, First ray number and the first identifier after merging for receiving second controller transmission
Information, the First ray number are the sequence number of the first data in the file after the merging, which is used to indicate
File after the merging for the regional number in cluster is first area number at this;Processing module is used for according to the First ray number,
The sequence number for the data that the last one in file after determining the merging is written;After the processing module is also used to compare the merging
The sequence number of the data of the last one write-in in the sequence number for the data that the last one in file is written and the file not merged, should
The file not merged is the file that the main cluster inner region number is the first area number;The processing module is also used to not merge when this
File in the last one write-in data sequence number be less than the merging after file in the last one write-in data sequence
When row number, the file not merged is deleted;Memory module, for the file in the main cluster, after storing the merging.
In conjunction with the third aspect, in certain implementations of the third aspect, which is also used to: in the main cluster
It is interior, the first area number is set by the regional number of the file after the merging.
In conjunction with the third aspect, in certain implementations of the third aspect, which is the file after the merging
In the last one write-in data sequence number.
In conjunction with the third aspect, in certain implementations of the third aspect, which is also used to generate the first log,
First log includes information of the regional number in the main cluster for the first area of the first area number, the letter of the first area
Breath includes the associated file not merged in the first area;The controller further include: sending module is used for second control
Device sends first log.
In conjunction with the third aspect, in certain implementations of the third aspect, which is also used to generate the second log,
Second log includes the data of the file not synthesized;The sending module is also used to send the second day to the second controller
Will;The data for the file that the memory module is also used to not synthesized according to this generate the file not merged.
Fourth aspect, provides a kind of controller of data storage, which includes in group system, the system
Main cluster and standby cluster, the controller are second controller execution, which is the controller for cluster, this first
Controller is the controller of the main cluster, which includes:
Processing module: for reaching first threshold when the number for the file not merged in the first area in cluster
When, the file after which is merged, and First ray number is recorded, which is the conjunction
The sequence number of the first data in file after and;Sending module, for sending the file after merging, the to first controller
One sequence number and first identifier information, the first identifier information be used to indicate the file after the merging this for the region in cluster
Number be first area number.
In conjunction with fourth aspect, in certain implementations of fourth aspect, the controller further include:
Receiving module, for receiving the first log of first controller transmission, which includes in the main cluster
Regional number be the first area number first area information, the information of the first area include the first area it is associated should
The file not merged;The processing module is also used to play back first log, obtains the information of the first area;The processing module is also
For the information according to the first area, it is associated with first area and standby cluster of the regional number in standby cluster for the first area number
The interior file not merged.
In conjunction with fourth aspect, in certain implementations of fourth aspect, which is also used to receive first control
The second log that device processed is sent, which includes the data of the file not synthesized;The processing module is also used to basis should
The data for the file not synthesized generate the file not merged.
5th aspect, provides a kind of system, which includes such as the third aspect and any middle implementation of the third aspect
In controller and such as the controller in fourth aspect and any middle implementation of fourth aspect.
6th aspect, provides a kind of controller, including processor, memory and interface, and the interface is used for and second
Controller communication, the memory is for storing computer program code, and the computer program code includes instruction, when described
When processor executes described instruction, the controller is executed in any optional implementation of first aspect or first aspect
Method.
7th aspect, provides a kind of controller, including processor, memory and interface, and the interface is used for and first
Controller communication, the memory is for storing computer program code, and the computer program code includes instruction, when described
When processor executes described instruction, the controller is executed in any optional implementation of second aspect or second aspect
Method.
Eighth aspect provides a kind of computer readable storage medium, is stored in the computer readable storage medium
Computer executed instructions, when at least one processor of controller executes the computer executed instructions, controller executes the
Method in any optional implementation of one side or first aspect.
9th aspect, provides a kind of computer readable storage medium, is stored in the computer readable storage medium
Computer executed instructions, when at least one processor of memory executes the computer executed instructions, controller executes the
Method in any optional implementation of two aspects or second aspect.
Tenth aspect, provides a kind of computer program product, and the computer program product includes that computer execution refers to
It enables, in a computer-readable storage medium, at least one processor of controller can be from for computer executed instructions storage
The computer readable storage medium reads the computer executed instructions, at least one described processor executes the computer and holds
Row instruction is so that controller executes the method in any optional implementation of first aspect or first aspect.
Tenth on the one hand, provides a kind of computer program product, and the computer program product includes that computer executes
Instruction, in a computer-readable storage medium, at least one processor of memory can be with for computer executed instructions storage
The computer executed instructions are read from the computer readable storage medium, at least one described processor executes the computer
It executes instruction so that memory executes the method in any optional implementation of second aspect or second aspect.
Detailed description of the invention
Fig. 1 is the system architecture diagram of the method and controller according to a kind of storage of data of the application.
Fig. 2 is the topology example figure of the controller 11 in Fig. 1.
Fig. 3 is the schematic flow chart according to a kind of method of data of the application storage.
Fig. 4 is the schematic flow chart according to a kind of method of data of the application storage.
Fig. 5 is the schematic flow chart according to a kind of method of data of the application storage.
Fig. 6 is the schematic block diagram according to the controller of the application.
Fig. 7 is the schematic block diagram according to the controller of the application.
Fig. 8 shows the schematic block diagram of device provided by the present application.
Specific embodiment
Below in conjunction with attached drawing, the technical solution in the application is described.
Fig. 1 is system architecture Figure 100 of the method and controller according to a kind of storage of data of the application.The system be across
AZ dual-active architecture system, as shown in Figure 1, system 100 includes user equipment 10, one main cluster controllers 11 and a standby cluster
Controller 12.Main cluster controller 11 and standby cluster controller 12 can be a kind of calculating equipment, such as server, desktop computer
Etc..Writing system and application program are installed on main cluster controller 11 and standby cluster controller 12.Main clustered control
Device 11 can receive the request of the input and output (I/O) from user equipment.Main cluster controller 11 can also store in I/O request
The data of carrying, and write the data into persistent storage equipment.Main cluster provides read-write service, standby collection to user equipment
Group is served only for backing up the data of main cluster, for being backed up to main collection pocket transmission when main cluster breaks down and loses data
For data to restore data, standby cluster does not provide read-write service to user equipment.Between main cluster and standby cluster, pass through synchronization
Data ahead log (Write-Ahead Logging, WAL) Lai Shixian data are synchronous.
In Fig. 1, main cluster controller 11 and standby cluster controller 12 are illustrated only, main cluster and standby collection are not shown
Other equipment of group, but in main cluster may further include other equipment such as server, computer etc., it may also in standby cluster
It further include other equipment such as server, computer etc..
Main cluster and standby cluster can use distributed memory system Hbase.HBase be a high reliability, high-performance,
Towards column, telescopic distributed memory system.User equipment enters when data are written first writes WAL, after write-in WAL success, then
Write buffer memstore.WAL is first write the data to, which is the storing data on hard disk, and at this time after power down, data are still
, then memstore is write data into, if memstore power down at this time, hard disk can also be gone to read data.It is slow in memstore
After the data deposited are met certain condition, executing flush operation makes the real rule of data, forms a data file HFile.With
Data write-in is increasing, and flush times number can be also increasing, and then HFile data file will be more and more.However, too
More data files will lead to IO number of data query and increase, therefore HBase attempts constantly to merge these files, this
Merging process is known as Compaction.
Hbase storing data in the form of a table.Table is made of row and column, and logical construction is as shown in table 1.
The logical construction of 1 Hbase tables of data of table
The ranks bivariate table traditional for 1, each row of data of HBase have 1 row-key, and in a table
Data dimension holds sequence, does byte sequence according to row-key+CF:qualifier+ timestamp, when data query just by dichotomy
It can complete quickly to position.Row-key can be arbitrary string, and maximum length 64kb is ranked up according to lexcographical order.Column family
(Column Family, CF), the data in going are grouped according to column family, and column family also influences the physical store of HBase data, because
This, they must define in advance and not modify easily.Every row possesses identical column family in table, although row is not needed in each column family
In storing data.Each column in CF are all 1 qualifier;1 qualifier can only belong to 1 CF.Column qualifier
(column qualifier), the data in column family are positioned by column qualifier or column.Column qualifier need not define in advance, column
Qualifier need not be consistent between not going together.Each row of data in HBase, there is 1 timestamp, under default situations
HBase can save the information of same 3 different editions of a line, when data are written, timestamp can by HBase automatic assignment (when
Preceding system time, is accurate to millisecond), it can also explicit manual assignment.In Hbase, Row-key+CF+Qulifier+ mono-
Timestamp can just navigate to a cell.
The physical store of HBase data is briefly described below.When physical store, 1 logic two-dimensional table can root
According to CF, multiple files are split as, it is multiple Hfile in file that 1 CF, which corresponds to 1 file,.Hbase system
Master will be split on direction that table is expert at, and multirow records to form 1 HRegion, and 1 table is divided into multiple HRegion,
HRegion is the minimum unit of HBase distributed storage and load balancing.Contain multiple CF, and the object of HBase in HRegion
Reason storage is carried out based on CF, thus HRegion in storage inside can mark off multiple HStore, each HStore responsible 1
The physical store of a CF.
It is then made of memory part (Memstrore) and disk segment (HFile) inside HStore, data are first written
Memstore, and HFile is generated when Memstore overflows, HFile finally lands on the Block of HDFS.
In the prior art, active and standby cluster generates respective meta and HFile according to data WAL respectively, and executes respectively
The merging compaction of HFile is operated.Due to can be brought during executing compaction very big bandwidth pressure with
And the IO pressure of short time, cause HBase performance to decline to a great extent.Main cluster provides read-write to user equipment and services, and main cluster
The IO pressure that very big bandwidth pressure and short time can be brought during executing compaction, leads to HBase performance
It declines to a great extent, affects user experience.
Fig. 2 is the topology example figure of the controller 11 in Fig. 1, as shown in Fig. 2, controller 11 includes interface card 110, processing
Device 112, memory 111 and interface card 113.
Interface card 110, for and user equipment communication, for receiving the instruction of user equipment transmission, controller 11 can be with
The write instruction of user equipment is received by interface card 110.For example, write instruction includes keyword (key) and value (value),
The keyword is the mark of described value.For a specific example, described value can be the various information of student, the key
Word can be the student student number or other indicate the mark of student attribute in a certain respect.
Interface card 113, is used for and standby cluster controller 12 communicates.
Processor 112 is a central processing unit (central processing unit, CPU).Implement in the present invention
In example, processor 112, which can be used for receiving the write instruction from user equipment or read, to be instructed and handles these instructions.
Processor 112 said write can also be instructed in data be sent to persistent storage equipment.Processor 112, be also used to for
The data distribute logical address, the corresponding relationship between the keyword and the logical address of the distribution are saved, to be used for
In the future according to the corresponding relationship between keyword and the logical address of the distribution, the data are read.
Memory 111, including volatile memory, nonvolatile memory or combinations thereof.Volatile memory is, for example,
Random access storage device (random-access memory, RAM).Nonvolatile memory is, for example, that floppy disk, hard disk, solid-state are hard
The various machine readable medias that can store program code such as disk (solid state disk, SSD), CD.Memory 111 has
There is guarantor's Electricity Functional, when guarantor's Electricity Functional refers to that system generation power down powers on again, the data stored in memory 111 will not lose
It loses.Memory 111 can have one or more, for temporarily storing from the received data of host or from persistent storage equipment
The data of reading, for example, controller 11 receive user equipment send multiple write instructions when, can by it is the multiple write-in refer to
Data in order are stored temporarily in memory 111.
The application in order to better understand is with same as shown in Figure 1 or similar system below according to Fig. 3 to Fig. 8
Example is illustrated the embodiment of the present invention.
Fig. 3 is according to a kind of schematic flow chart of the method 200 of data of the application storage, and this method is applied to cluster
System, the system include main cluster and standby cluster, which is the controller of the main cluster, which is should
The controller of standby cluster.As shown in figure 3, this method 200 includes the following contents.
In 210, when the number for the file not merged in the first area in cluster reaches first threshold, this
Two controllers the file mergences not merged is merged after file, and record First ray number, the First ray number
For the sequence number of the first data in the file after the merging.
In 220, which sends the file after merging, First ray number and the first mark to first controller
Know information, the first identifier information be used to indicate the file after the merging this for the regional number in cluster be first area number.
In 230, which receives the file after the merging of second controller transmission, First ray number and the
One identification information.
In 240, according to the First ray number, the last one in the file after determining the merging is written first controller
Data sequence number.
In 250, first controller compare in the file after the merging data that the last one is written sequence number and
The sequence number for the data that the last one in the file not merged is written, the file not merged are that the main cluster inner region number is this
The file of first area number.
In 260, first controller is small in the sequence number for the data being written when the last one in the file not merged
When the sequence number for the data that the last one in the file after the merging is written, the file not merged is deleted.
In 270, file of first controller in the main cluster, after storing the merging.
Therefore, in the embodiment of the present application, by as the file not merged in the first area in cluster
Number is when reaching first threshold, the second controller file mergences not merged is merged after file, and record the
One sequence number, then the second controller by file, First ray number and the first identifier information after the merging be sent to this
One controller, this first with controller according to the First ray number, in the file after determining the merging the last one be written number
According to sequence number, which compares in the file after the merging sequence number for the data that the last one is written and does not merge
File in the last one write-in data sequence number, in the sequence of the data of the last one write-in in the file not merged
When row number is less than the sequence number of the data of the last one write-in in the file after the merging, the file not merged is deleted, is stored
File after the merging.Operation is merged so as to avoid the first controller of main cluster.Therefore, it avoids and is executing
The IO pressure of very big bandwidth pressure and short time and the consumption to computing capability are brought during compaction, in turn
The HBase performance for improving main cluster, improves user experience.
It should be understood that region in this application, can be expressed as Region, but further include other representations.
Optionally, this method 200 further include: in the main cluster, set this for the regional number of the file after the merging
First area number.
Optionally, which is the sequence number of the data of the last one write-in in the file after the merging.
Specifically, can all record a sequence number when each data is in write-in, the sequence number is according to incremental
Sequence arranges.The second controller will record the sequence number of the data of the last one write-in when carrying out file mergences, that is,
Have recorded a current maximum sequence number.
It should be understood that the First ray number can also be the sequence number of the first data in the file after the merging.This first
Data can be the data of the last one write-in in the file after the merging, first in file after being also possible to the merging
The data of a write-in, or can also be the data of any write-in in the file after the merging.
For example, the sequence number of penultimate data can be recorded according to prior agreement, the second controller, this first
When controller receives the sequence number, according to prior agreement, penultimate data in the file of Serial No. merging are determined
Sequence number, therefore 1 sequence number for obtaining in the file of the merging data that the last one is written can be added to the sequence number.
In another example in not about timing in advance, when which sends the First ray to first controller,
A parameter is carried simultaneously, which is used to indicate the data that the last one in the file after first data merge with this is written
Sequence number difference.First controller is receiving the First ray number and when the parameter, according to the First ray number and
The parameter determines the sequence number for the data that the last one in the file of the merging is written.
Optionally, file, First ray number and the first identifier information after the merging that the reception second controller is sent
Before, this method further include: first controller generates the first log, which includes that the regional number in the main cluster is
The information of the first area of the first area number, the information of the first area include the associated text not merged in the first area
Part;First log is sent to the second controller.
Specifically, can generate the first log when the first area of main cluster changes, which will
First log is sent to the second controller, for the second controller according to first log, generates being somebody's turn to do for cluster
First area.
Optionally, which includes the home key (start key) and end key (end key) of the first area.
For example, the view information of Region is stored in meta table in HBase system, which preserves maintenance
Some information of cluster and the framework of cluster.When the view information of the Region of main cluster changes, can generate WAL days
Will, first controller send the WAL log to the second controller, the view information of the Region of change can be synchronized to
The second controller, the second controller play back the WAL log, generate standby cluster according to the view information of the Region of change
Region。
Optionally, which receives the first log of first controller transmission, which includes the master
Regional number in cluster is the information of the first area of the first area number, and the information of the first area includes that the first area is closed
The file not merged of connection;First log is played back, the information of the first area is obtained;According to the information of the first area,
The regional number being associated in standby cluster is the file not merged in the first area and standby cluster of the first area number.
Optionally, file, First ray number and the first identifier information after the merging that the reception second controller is sent
Before, this method further include: generate the second log, which includes the data of the file not synthesized;To second control
Device processed sends second log;According to the data of the file not synthesized, the file not merged is generated.
Optionally, the second controller receive first controller transmission the second log, second log include this not
The data of the file of synthesis;According to the data of the file not synthesized, the file not merged is generated.
Fig. 4 is according to a kind of schematic flow chart of the method 300 of data of the application storage, and this method is applied to cluster
System, the system include main cluster and standby cluster, which is the controller of the main cluster, for managing main cluster system
System, the 4th controller are the controller for cluster, and for managing standby group system, the first server is for storing the master
The data of cluster, the second server is for storing this for the data of cluster, and the third controller management first server should
The 4th controller management first server.It should be understood that Fig. 4 is shown when active and standby cluster separately disposes control system and storage system
The method of data storage when system.As shown in figure 4, this method 300 includes the following contents.
In 301, when the number for the file not merged in the first area in cluster reaches first threshold, this
Four controllers the file mergences not merged is merged after file, and record First ray number, the First ray number
For the sequence number of the first data in the file after the merging.
In 302, the 4th controller sends the file after the merging, the First ray number to the second server.
In 303, which receives the file after the merging, the First ray number, and after storing the merging
File.
In 304, the second server and the file after merging, First ray number and the are sent to the first server device
One identification information, the first identifier information be used to indicate the file after the merging this for the regional number in cluster be first area
Number.
In 305, the first server device receive the second server transmission merging after file, First ray number and
First identifier information.
In 306, the 4th controller sends first message to the third controller, the first message be used to indicate this
Two servers have sent the file after the merging, First ray number and first identifier information to the first server.
In 307, after which receives the first message, second message is sent to the first server, it should
Second message is used to indicate the first server delete operation.
In 308, which receives the second message, and according to the First ray number, after determining the merging
File in the last one write-in data sequence number.
In 309, the first server compare in the file after the merging data that the last one is written sequence number and
The sequence number for the data that the last one in the file not merged is written, the file not merged are that the main cluster inner region number is this
The file of first area number.
In 310, the first server device is in the sequence number when the data of the last one write-in in the file not merged
Less than in the file after the merging the last one be written data sequence number when, delete the file not merged.
In 311, file of the first server in the main cluster, after storing the merging.
Therefore, in the embodiment of the present application, by this for not closing in the combined first area for carrying out file in cluster
And the number of file when reaching first threshold, the second server file mergences not merged is merged after text
Part, and record First ray number, then the second server is by file, First ray number and the first identifier information after the merging
It is sent to the first server, the first server is when being connected to delete command, according to the First ray number, after determining the merging
File in the last one write-in data sequence number, which compares the last one in the file after the merging and writes
The sequence number for the data that the last one in the sequence number of the data entered and the file not merged is written is working as the file not merged
In the last one write-in data sequence number be less than the merging after file in the last one write-in data sequence number when,
Delete the file not merged, the file after storing the merging.Behaviour is merged so as to avoid the third controller of main cluster
Make.Therefore, avoid brought during executing compaction very big bandwidth pressure and short time IO pressure and
Consumption to computing capability, and then the HBase performance of main cluster is improved, improve user experience.
It should be understood that region in this application, can be expressed as Region, but further include other representations.
Optionally, this method 300 further include: in the main cluster, the third controller is by the area of the file after the merging
Domain number is set as the first area number.
Optionally, which is the sequence number of the data of the last one write-in in the file after the merging.
Specifically, can all record a sequence number when each data is in write-in, the sequence number is according to incremental
Sequence arranges.4th controller will record the sequence number of the data of the last one write-in when carrying out file mergences, that is,
Have recorded a current maximum sequence number.
Description can be with reference to the associated description in method 200, in order to avoid repeating, herein not as described in the First ray number
It repeats again.
Optionally, this method further include: the third controller generates the first log, which includes in the main cluster
Regional number be the first area number first area information, the information of the first area include the first area it is associated should
The file not merged;First log is sent to the 4th controller.
Specifically, can generate the first log when the first area of main cluster changes, which will
First log is sent to the 4th controller, for the 4th controller according to first log, generates being somebody's turn to do for cluster
First area.
For example, the view information of Region is stored in meta table in HBase system, which preserves maintenance
Some information of cluster and the framework of cluster.When the view information of the Region of main cluster changes, can generate WAL days
Will, the third controller send the WAL log to the 4th controller, the view information of the Region of change can be synchronized to
4th controller, the 4th controller play back the WAL log, generate standby cluster according to the view information of the Region of change
Region。
Optionally, the 4th controller receives the first log of third controller transmission, which includes the master
Regional number in cluster is the information of the first area of the first area number, and the information of the first area includes that the first area is closed
The file not merged of connection;First log is played back, the information of the first area is obtained;According to the information of the first area,
The regional number being associated in standby cluster is the file not merged in the first area and standby cluster of the first area number.
Optionally, this method further include: generate the second log, which includes the data of the file not synthesized;
Second log is sent to the 4th controller;According to the data of the file not synthesized, the file not merged is generated.
Optionally, the 4th controller receive the third controller transmission the second log, second log include this not
The data of the file of synthesis;According to the data of the file not synthesized, the file not merged is generated.
The embodiment of the present application in order to better understand, presently in connection with Fig. 5, to the method for data storage of the application a kind of into
Row description, as shown in figure 5, main assemblage classification region 01 and region 02, the data of the newest deposit of record under region 01
Serial ID is 26000.The region division of standby cluster is consistent with the main cluster, and has divided region 01 and region 02,
The serial ID of the data of the lower newest deposit recorded in region 01 is 26000.Union operation is executed in standby cluster, sequence number is less than
Or the file mergences equal to 20000 is a file, and records the last one sequence number 20000 of the file of merging.Standby collection
Group by after merging file and sequence number 20000 be sent to main cluster, main cluster deletes text of all sequences number less than 20000
Part, and the file after merging is associated with by main cluster with the region 01 of main cluster.
It should be understood that figure from group system fifth is that be described, the specific execution of controller or server in cluster
Movement can refer to method 200 or method 300, and details are not described herein again.
Fig. 6 is the schematic block diagram according to the controller of the application.As shown in fig. 6, the controller 400 includes in following
Hold.
Receiving module 410, the file after merging, First ray number and the first mark for receiving second controller transmission
Know information, which is the sequence number of the first data in the file after the merging, and the first identifier information is for referring to
File after showing the merging for the regional number in cluster is first area number at this.
Processing module 420, for the number that according to the First ray number, the last one in the file after determining the merging is written
According to sequence number.
The processing module 420 is also used to compare in the file after the merging sequence number and not for the data that the last one is written
The sequence number for the data that the last one in combined file is written, the file not merged be the main cluster inner region number be this
The file of one regional number.
The processing module 420 is also used to be somebody's turn to do when the sequence number of the data of the last one write-in in the file not merged is less than
When the sequence number for the data that the last one in the file after merging is written, the file not merged is deleted.
Memory module 430, for the file in the main cluster, after storing the merging.
Optionally, which is also used to: in the main cluster, the regional number of the file after the merging being arranged
For the first area number.
Optionally, which is the sequence number of the data of the last one write-in in the file after the merging.
Optionally, which is also used to generate the first log, which includes the region in the main cluster
Number for the first area number first area information, the information of the first area includes that associated this in the first area does not merge
File;The controller further include: sending module, for sending first log to the second controller.
Optionally, which is also used to generate the second log, which includes the file not synthesized
Data;The sending module is also used to send second log to the second controller;The memory module is also used to not closed according to this
At file data, generate the file not merged.
Above controller 400 and the first controller in 200 embodiment of method are completely corresponding, execute phase by corresponding module
The step of answering can specifically refer to corresponding embodiment of the method.
It should be noted that the receiving module 410, the processing module 420 and the memory module 430 can be provided separately,
Also it can integrate together, realized with a processing chip.
Fig. 7 is the schematic block diagram according to the controller of the application.As shown in fig. 7, the controller 500 includes:
Processing module 510, for reaching first when the number for the file not merged in the first area in the standby cluster
When threshold value, file after the file mergences not merged is merged, and First ray number is recorded, the First ray
Number for the first data in the file after the merging sequence number;
Mould 520 is sent, for sending the file after merging, First ray number and first identifier letter to first controller
Breath, it is first area that the first identifier information, which is used to indicate regional number of the file after the merging in the standby cluster,
Number.
Optionally, controller further include: receiving module should for receiving the first log of first controller transmission
First log includes information of the regional number in the main cluster for the first area of the first area number, the information of the first area
Including the associated file not merged in the first area;The processing module is also used to play back first log, obtain this first
The information in region;The processing module is also used to the information according to the first area, the regional number being associated in standby cluster be this first
The file not merged in the first area of regional number and standby cluster.
Optionally, which is also used to receive the second log of first controller transmission, which includes
The data of the file not synthesized;The data for the file that the processing module is also used to not synthesized according to this generate what this did not merged
File.
Above controller 500 and the second controller in 200 embodiment of method are completely corresponding, execute phase by corresponding module
The step of answering can specifically refer to corresponding embodiment of the method.
It should be noted that processing module 510 and sending module 520 can be provided separately, also can integrate together, with
One processing chip is realized.
Fig. 8 shows the schematic block diagram of device 600 provided by the present application.The device 600 includes:
Memory 610, processor 620, input/output interface 630.Wherein, memory 610, processor 620 and input/
Output interface 630 is connected by internal connecting path, and the memory 610 is for storing program instruction, and the processor 620 is for holding
The program instruction of the row memory 610 storage, to control the data and information that input/output interface 630 receives input, output behaviour
Make the data such as result.
Optionally, when the code is performed, which may be implemented each behaviour of method 200 or method 300
Make, for sake of simplicity, details are not described herein.At this point, the device 600, which can be the first controller, is also possible to second controller,.
It should be understood that in the embodiment of the present application, which can be central processing unit (Central
Processing Unit, CPU), the processor 620 can also be other general processors, digital signal processor (DSP), specially
With integrated circuit (ASIC), ready-made programmable gate array (FPGA) either other programmable logic device, discrete gate or crystal
Pipe logical device, discrete hardware components etc..
Input/output interface 630, which can be, sends and receives function for realizing signal.
Those of ordinary skill in the art may be aware that list described in conjunction with the examples disclosed in the embodiments of the present disclosure
Member and algorithm steps can be realized with the combination of electronic hardware or computer software and electronic hardware.These functions are actually
It is implemented in hardware or software, the specific application and design constraint depending on technical solution.Professional technician
Each specific application can be used different methods to achieve the described function, but this realization is it is not considered that exceed
Scope of the present application.
It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description,
The specific work process of device and unit, can refer to corresponding processes in the foregoing method embodiment, and details are not described herein.
In several embodiments provided herein, it should be understood that disclosed systems, devices and methods, it can be with
It realizes by another way.For example, the apparatus embodiments described above are merely exemplary, for example, the unit
It divides, only a kind of logical function partition, there may be another division manner in actual implementation, such as multiple units or components
It can be combined or can be integrated into another system, or some features can be ignored or not executed.Another point, it is shown or
The mutual coupling, direct-coupling or communication connection discussed can be through some interfaces, the indirect coupling of device or unit
It closes or communicates to connect, can be electrical property, mechanical or other forms.
The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit
The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple
In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme
's.
It, can also be in addition, each functional unit in each embodiment of the application can integrate in one processing unit
It is that each unit physically exists alone, can also be integrated in one unit with two or more units.
It, can be with if the function is realized in the form of SFU software functional unit and when sold or used as an independent product
It is stored in a computer readable storage medium.Based on this understanding, the technical solution of the application is substantially in other words
The part of the part that contributes to existing technology or the technical solution can be embodied in the form of software products, the meter
Calculation machine software product is stored in a storage medium, including some instructions are used so that a computer equipment (can be a
People's computer, server or network equipment etc.) execute each embodiment the method for the application all or part of the steps.
And storage medium above-mentioned includes: that USB flash disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), arbitrary access are deposited
The various media that can store program code such as reservoir (Random Access Memory, RAM), magnetic or disk.
The above, the only specific embodiment of the application, but the protection scope of the application is not limited thereto, it is any
Those familiar with the art within the technical scope of the present application, can easily think of the change or the replacement, and should all contain
Lid is within the scope of protection of this application.Therefore, the protection scope of the application should be based on the protection scope of the described claims.