Background technique
With the development and application of database technology, the data volume of database purchase is growing day by day, while quickly, neatly
The complex query processing for carrying out big data quantity also becomes new demand.OLAP (On-Line Analytical Processing,
On-line analytical processing), dedicated for supporting complicated analysis operation, stress the decision branch to decision-maker and senior management staff
It holds.Under usual condition, OLAP user only needs to inquire a small number of several data column, can be loaded using line storage many useless
Data column, cause query performance to decline.The basic query method of distributed column storage reads member letter first from zookeeper
Breath, and then go each machine in cluster to read all data files, and then read from each data file and meet condition
Record, this mode directly result in that data access amount is excessive, affect the query performance of OLAP.
Apply for content
The purpose of the application is to provide a kind of method and apparatus for creating concordance list, optimizes bottom storage organization, from
And it is provided conveniently for data query.
According to the one aspect of the application, a kind of method for creating concordance list is provided, which comprises
Create the structure of the metamessage of the data source corresponding data table got, wherein the metamessage packet of the tables of data
The location information of all data files in the tables of data is included, the data file stores in column form;
The data structure for creating concordance list to be created determines rope to be created described in the tables of data according to the request of user
Draw the index column of table, it is described to index the part column being classified as in the tables of data and described to be created according to index column creation
The structure of the corresponding metamessage of concordance list, wherein the corresponding metamessage of the concordance list to be created includes the index to be created
The location information of index file in table;
The data file that current data row in the data source generates is distributed to from node, and according to the tables of data
The allocated location information of the structure of metamessage and the data file updates the member letter of the data file corresponding data table
Breath;
The information of the index file of the concordance list to be created is distributed into the corresponding index file from node, and more
The corresponding metamessage of the new concordance list to be created.
Further, the structure of index file includes BPlusTree structure in the data structure of the concordance list to be created,
Wherein, the leaf node of the BPlusTree structure includes key assignments and location information value.
Further, the data structure of the creation concordance list to be created, comprising:
The key assignments of the leaf node is determined according to the value of the index column of the concordance list to be created;
Row where the filename of the affiliated data file of the index column and the index column is in the data file
In offset determine the location information value of the leaf node.
Further, the concordance list to be created includes Hash class global index's table and/or range class global index table.
It further, will be in the concordance list to be created when the concordance list to be created is Hash class global index's table
The information of the corresponding index file of index column is distributed into the corresponding index file from node of the concordance list to be created, packet
It includes:
According to the cryptographic Hash that the value of the index column of Hash class global index table determines, by Hash class global index
The corresponding key assignments of the index file of table and location information value are distributed into the corresponding index file from node.
Further, the cryptographic Hash determined according to the value of the index column of Hash class global index table, by the Hash
The corresponding key assignments of index file and location information value of class global index table distribute corresponding to Hash class global index table
From the index file of node, comprising:
According to the value of the index column of Hash class global index table and described the index column is determined from the number of node
Cryptographic Hash;
It is the key assignments and location information of leaf node in the corresponding BPlusTree structure of i by the cryptographic Hash of the index column
Value distribution is to i+1 from the index file of node, wherein i is natural number.
Further, when the concordance list to be created is range class global index's table, D will be in the concordance list to be created
The information of the corresponding index file of index column is distributed into the corresponding index file from node of the concordance list to be created, packet
It includes:
Range of distribution section is determined according to the sampled result that the value to the index column is sampled, and is recorded each from section
The range of distribution section of point and its corresponding index column;
The information of the index file of range class global index table is distributed to correspondence according to the range of distribution section
Slave node index file in.
Further, the information of the index file of range class global index table is divided according to the range of distribution section
It is assigned in the corresponding index file from node of range class global index table, comprising:
The value of index column of range class global index table is compared with the range of distribution section of the record, really
Range of distribution section where the value of the fixed index column;
Range of distribution section where the value of the index column, by the corresponding BPlusTree of the value of the index column
The key assignments of leaf node and location information value distribute the index file from node corresponding to the range of distribution section in structure.
Further, the information of the corresponding index file of index column in the concordance list to be created is distributed to described wait create
When indexing in the corresponding index file from node of table, further includes:
If the existing key assignments in the index file;Then new location information value is merged with old location information value
To the corresponding leaf node of the key assignments.
Further, the information of the corresponding index file of index column in the concordance list to be created is distributed to described wait create
When indexing in the corresponding index file from node of table, further includes:
If the key assignments is not present in the index file, new leaf section is inserted into the BPlusTree structure
Point stores the key assignments and location information value to the new leaf node.
Further, the data file that current data row in the data source generates is distributed corresponding to the tables of data
Before node, further includes:
It is when the line number of the data source reaches the size threshold value of preset data file, then current data row is newly-generated
For a data file, newly-generated data text is distributed corresponding from node to the tables of data, and updates the data text
The metamessage of the corresponding tables of data of part.
Further, the information of the corresponding index file of index column in the concordance list to be created is distributed to described wait create
Before indexing in the corresponding index file from node of table, further includes:
When the size of the index file of the concordance list to be created reaches preset index file size threshold value, generate new
Index file, and by the updating location information of the new index file to the corresponding metamessage of the concordance list to be created
In.
On the other hand according to the application, a kind of equipment for creating concordance list is additionally provided, the equipment includes:
First creating device, the structure of the metamessage for creating the data source corresponding data table got, wherein described
The metamessage of tables of data includes the location information of all data files in the tables of data, and the data file is deposited in column form
Storage;
Second creating device determines the number according to the request of user for creating the data structure of concordance list to be created
It is described to index the part column being classified as in the tables of data according to the index column of concordance list to be created described in table, and according to the rope
Draw the structure of the corresponding metamessage of the column creation concordance list to be created, wherein the corresponding metamessage of the concordance list to be created
Location information including index file in the concordance list to be created;
Data file distributor, the data file for generating current data row in the data source are distributed to from section
Point, and according to the allocated location information of the structure of the metamessage of the tables of data and the data file, update the number
According to the metamessage of file corresponding data table;
Distributor, for distributing the information of the index file of the concordance list to be created to the corresponding rope from node
In quotation part, and update the corresponding metamessage of the concordance list to be created.
Further, the structure of index file includes BPlusTree structure in the data structure of the concordance list to be created,
Wherein, the leaf node of the BPlusTree structure includes key assignments and location information value.
Further, second creating device is used for:
The key assignments of the leaf node is determined according to the value of the index column of the concordance list to be created;
Row where the filename of the affiliated data file of the index column and the index column is in the data file
In offset determine the location information value of the leaf node.
Further, the concordance list to be created includes Hash class global index's table and/or range class global index table.
Further, when the concordance list to be created is Hash class global index's table, the distributor is used for:
According to the cryptographic Hash that the value of the index column of Hash class global index table determines, by Hash class global index
The corresponding key assignments of the index file of table and location information value are distributed into the corresponding index file from node.
Further, the distributor is used for:
According to the value of the index column of Hash class global index table and described the index column is determined from the number of node
Cryptographic Hash;
It is the key assignments and location information of leaf node in the corresponding BPlusTree structure of i by the cryptographic Hash of the index column
Value distribution is to i+1 from the index file of node, wherein i is natural number.
Further, when the concordance list to be created is range class global index's table, the distributor is used for:
Range of distribution section is determined according to the sampled result that the value to the index column is sampled, and is recorded each from section
The range of distribution section of point and its corresponding index column;
The information of the index file of range class global index table is distributed to correspondence according to the range of distribution section
Slave node index file in.
Further, the distributor is used for:
The value of index column of range class global index table is compared with the range of distribution section of the record, really
Range of distribution section where the value of the fixed index column;
Range of distribution section where the value of the index column, by the corresponding BPlusTree of the value of the index column
The key assignments of leaf node and location information value distribute the index file from node corresponding to the range of distribution section in structure.
Further, the distributor is also used to:
If the existing key assignments in the index file;Then new location information value is merged with old location information value
To the corresponding leaf node of the key assignments.
Further, the distributor is also used to:
If the key assignments is not present in the index file, new leaf section is inserted into the BPlusTree structure
Point stores the key assignments and location information value to the new leaf node.
Further, the equipment further include:
Data file device is generated, the size threshold value of preset data file is reached for the line number when the data source
When, then current data row new life is become into a data file, newly-generated data text is distributed corresponding to the tables of data
From node, and update the metamessage of the corresponding tables of data of the data file.
Further, the equipment further include:
Index file device is generated, the size for the index file when the concordance list to be created reaches preset index
When file size threshold value, generate new index file, and by the updating location information of the new index file to described wait create
It indexes in the corresponding metamessage of table.
Compared with prior art, the structure of the metamessage for the data source corresponding data table that the application is got by creation,
Wherein, the metamessage of the tables of data includes the location information of all data files in the tables of data, the data file with
The form of column stores;Then, the data structure for creating concordance list to be created determines institute in the tables of data according to the request of user
The index column of concordance list to be created is stated, it is described to index the part column being classified as in the tables of data, and created according to the index column
The structure of the corresponding metamessage of the concordance list to be created, wherein the corresponding metamessage of the concordance list to be created includes described
The location information of index file in concordance list to be created;Then, data file current data row in the data source generated
Distribution is to from node, and according to the allocated location information of the structure of the metamessage of the tables of data and the data file,
Update the metamessage of the data file corresponding data table;The information of the index file of the concordance list to be created is distributed to right
In the index file for the slave node answered, and the corresponding metamessage of the concordance list to be created is updated, and then optimizes bottom storage
Structure provides the information of index file, to can quickly position according to the information of index file when being applied to data query
To the data file for the condition that meets, the amount of access of data is greatly reduced, query performance is improved.
Specific embodiment
The application is described in further detail with reference to the accompanying drawing.
In a typical configuration of this application, terminal, the equipment of service network and trusted party include one or more
Processor (CPU), input/output interface, network interface and memory.
Memory may include the non-volatile memory in computer-readable medium, random access memory (RAM) and/or
The forms such as Nonvolatile memory, such as read-only memory (ROM) or flash memory (flash RAM).Memory is computer-readable medium
Example.
Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method
Or technology come realize information store.Information can be computer readable instructions, data structure, the module of program or other data.
The example of the storage medium of computer includes, but are not limited to phase change memory (PRAM), static random access memory (SRAM), moves
State random access memory (DRAM), other kinds of random access memory (RAM), read-only memory (ROM), electric erasable
Programmable read only memory (EEPROM), flash memory or other memory techniques, read-only disc read only memory (CD-ROM) (CD-ROM),
Digital versatile disc (DVD) or other optical storage, magnetic cassettes, magnetic tape disk storage or other magnetic storage devices or
Any other non-transmission medium, can be used for storage can be accessed by a computing device information.As defined in this article, computer
Readable medium does not include non-temporary computer readable media (transitory media), such as the data-signal and carrier wave of modulation.
Fig. 1 shows the one aspect according to the application, a kind of method flow schematic diagram of the creation concordance list provided, described
Method includes: step S11~step S14, is applied in distributed system,
In step s 11, the structure of the metamessage of the data source corresponding data table got is created, wherein the data
The metamessage of table includes the location information of all data files in the tables of data, and the data file stores in column form;
In one embodiment of the application, distributed global index table is created, then is believed firstly the need of the member of creation data source corresponding data table
Breath, metamessage include all data files location information on each machine hard disk in the cluster that tables of data is included.It needs
Illustrate, needs to create the structure of tables of data before the corresponding metamessage of creation tables of data, the structure of tables of data includes data
The structure of file includes the data of data source in tables of data, the data in data source are stored as tables of data, and tables of data is deposited
Storage form is data file.Distributed system frame diagram in one embodiment of the application is as shown in Fig. 2, include client
(client), host node (Master), one or several from node (slave) and zookeeper, can will be in each tables of data
Data file is stored in solid state hard disk (SSD), and in the embodiment of the present application, the metamessage of tables of data is stored in zookeeper
In, wherein the zookeeper is the coordination system of performance distributed application.
In step s 12, the data structure for creating concordance list to be created determines in the tables of data according to the request of user
The index column of the concordance list to be created, it is described to index the part column being classified as in the tables of data, and created according to the index column
Build the structure of the corresponding metamessage of the concordance list to be created, wherein the corresponding metamessage of the concordance list to be created includes institute
State the location information of index file in concordance list to be created;Above-described embodiment is connect, according to actual needs, determines and needs to create the overall situation
The column of index, i.e., selected part column are used as index column from tables of data, and then distributed complete using selected index column creation
The corresponding metamessage of office's concordance list, wherein the metamessage include global index's table included all index files in the cluster
Location information on each machine hard disk.The concordance list to be created is created successfully can be used for number in distributed system afterwards
According to inquiry, can satisfy while user only needs to inquire the demand of a small number of several data column and avoid reading all data
File greatly reduces the data volume of access.
It should be noted that creating the concordance list pair to be created according to the index column in one embodiment of the application
When the structure for the metamessage answered, it can be and a corresponding concordance list is established according to each index column, then create concordance list
The structure of corresponding metamessage is also possible to when creating concordance list according to several in the corresponding index column of user demand certainly
Index column establishes a concordance list jointly.
Wherein, the data structure of global index's table is created, the structure of global index's table includes the structure of index file.It is preferred that
Ground, the structure of index file includes BPlusTree structure in the data structure of the concordance list to be created, wherein described
The leaf node of BPlusTree structure includes key assignments and location information value.Here, the index file of global index's table can be used
BPlusTree structure carries out tissue storage, and the leaf node of BPlusTree structure includes tuple<key assignments, and location information value>(<
Key, value >), BPlusTree structure can effectively be ranked up the index column data of input, so as to check quickly fastly
Ask the position of the corresponding record of index column, the query task of quick response data.
In one specific embodiment of the application, the data source of example as shown in Figure 3, the data source is recorded for 1000 totally,
(id), name (name), age (age), four column data of gender (sex) are identified including address.User shown in Fig. 4 is directed to the number
According to the inquiry sql sentence in source, user needs the data arranged id to be filtered screening, thus needs to arrange id creation global index
Table, wherein table A is expressed as the corresponding tables of data of data source shown in Fig. 3, Sql 1: " Select*from table A
What where id=1 " indicated inquiry 1 sentence of sql execution is to inquire the data that id is 1 in tables of data A to arrange corresponding data,
His Sql sentence indicates similar with its.
In step s 13, the data file that current data row in the data source generates is distributed to from node, and according to
It is corresponding to update the data file for the allocated location information of the structure of the metamessage of the tables of data and the data file
The metamessage of tables of data;Here, distributed column storage platform as shown in Figure 2, distributed column storage platform storing data
When source data, data file is uniformly distributed each machine in the cluster, every machine according to load balance principle by Master
Several data files (FileSegment) containing the tables of data, the structure of FileSegment is as shown in figure 5, data file
It is stored in a manner of column storage, when OLAP user only needs to inquire a small number of several data column, column stores energy
The corresponding data for needing to read column are only enough provided for user, and then greatly improve the search efficiency of OLPA.
Preferably, before step S13, further includes: step S13 ', when the line number of the data source reaches preset data
When the size threshold value of file, then current data row new life is become into a data file, newly-generated data text is distributed to institute
It states that tables of data is corresponding from node, and updates the metamessage of the corresponding tables of data of the data file.
In one embodiment of the application, whenever the line number of the data in data in EMS memory source is equal to the size of a data file
When range, i.e., current data line is generated as a data file, while the host node (Master) in cluster is according to load
Homeostatic principle is distributed into cluster in the hard disk of a certain machine and is stored, and updates the corresponding metamessage of data list file, this
When, also start to create for global index's table of the data file.
In step S14, the information of the index file of the concordance list to be created is distributed to the corresponding rope from node
In quotation part, and update the corresponding metamessage of the concordance list to be created.Here, by after the data file distribution storage of generation,
It needs the information by its corresponding index file to be allocated into the index file of corresponding slave, and updates global index's table
In corresponding metamessage.Wherein, the information of index file may include the value, described of the index column in concordance list that creation is completed
Offset etc. of the row in the data file where the filename of the affiliated data file of index column, the index column.
In one embodiment of the application, the concordance list to be created includes Hash class global index's table or range class overall situation rope
Draw table.Here, the allocation strategy of the index file of Hash class and range class global index table in the cluster is slightly different.Hash class
Global index's table is the machine being distributed in distributed type assemblies according to the cryptographic Hash of index column decision index file, and range class
Global index's table is then that index file is assigned to corresponding machine according to the range of index train value.
Preferably, in step s 12, the leaf node is determined according to the value of the index column of the concordance list to be created
Key assignments;According to rope in the metamessage of the location information of data file, the concordance list to be created in the metamessage of the tables of data
Offset in the location information of quotation part and the index column indexed file determines the location information value of the leaf node.
In one embodiment of the application, the index file of Hash class global index's table and Hash class global index table is all made of
BPlusTree structure carries out tissue storage.The leaf node of BPlusTree includes tuple<key, and value>, wherein key value is
The value of index column, derive from data file, value be the index column where data file information and meet being recorded in for condition
Offset in the index file.Data file is evenly distributed in each in cluster by Master according to load balancing principle
Machine.Every machine contains several data files (FileSegment) of the tables of data, the structure of FileSegment such as Fig. 5
It is shown;When the corresponding global index's table of distributed column storage platform creation data source, the corresponding rope of Hash class global index table
Quotation part is HashIndexFileSegment, and the corresponding index file of range class global index table is
RangeIndexFileSegment。
Preferably, when the concordance list to be created is Hash class global index's table, in step S14, according to the Kazakhstan
The cryptographic Hash that the value of the index column of Xi Lei global index table determines, the index file of Hash class global index table is corresponding
Key assignments and location information value are distributed into the corresponding index file from node of Hash class global index table.Specifically, exist
In step S14, according to the value of the index column of Hash class global index table and described the index is determined from the number of node
The cryptographic Hash of the index column is that the key assignments of leaf node and position are believed in the corresponding BPlusTree structure of i by the cryptographic Hash of column
Breath value is distributed to i+1 from the index file of node, wherein i is natural number, thus reasonably by the information of index file
Distribution reaches equally distributed purpose into each slave.It should be noted that by determining cryptographic Hash, so that it is determined that rope
The information of quotation part is assigned the information (being such as assigned to machine 1) of machine extremely, that is, determined by cryptographic Hash it is corresponding <
Key, the information of the machine of value>distribution extremely, only as<key, value>be assigned into the index file of corresponding machine
When, leaf point is just really created completion in corresponding BPlusTree structure.
In one embodiment of the application, it is assumed that have 1 master, n platform slave in distributed type assemblies.Hash class overall situation rope
Draw table by seeking cryptographic Hash to index column, and the value of index column is the key value of leaf node in BPlusTree structure, index column
The offset of the information and index column corresponding record of corresponding data file in the data file is value.By key and key
Cryptographic Hash be i the location information value of record be assigned in the index file of i+1 platform slave, wherein cryptographic Hash can
To be determined according to slave number in the value of index column and distributed type assemblies.
Preferably, when the concordance list to be created is range class global index's table, in step S14, according to described
The sampled result that the value of index column is sampled determines range of distribution section, and records each from node and its corresponding index
The range of distribution section of column;The information of the index file of range class global index table is divided according to the range of distribution section
It is assigned in the corresponding index file from node of range class global index table.Specifically, in step S14, by the model
The value of the index column of Wei Lei global index table is compared with the range of distribution section of the record, determines the value of the index column
The range of distribution section at place;Range of distribution section where the value of the index column, the value of the index column is corresponding
BPlusTree structure in leaf node key assignments and location information value distribute it is corresponding from node to the range of distribution section
Index file.
In one embodiment of the application, it is assumed that have 1 Master, n platform slave in distributed type assemblies.Range class overall situation rope
Draw table to sample by the value of the index column to data list file, n range is set according to sampled result, so that each range
Interior data volume is evenly distributed as much as possible, and the range intervals of every machine and its corresponding index column are recorded in Master.
When generating an index file, the value of the index column of index file can be compared with n range in Master, according to
Offset of the affiliated range areas by the corresponding data file information of the column and in the data file, i.e.,<key, value>update
Into the index file of the corresponding slave in range areas.
In one embodiment of the application, in step S14, if the existing key assignments in the index file;It then will be new
Location information value and old location information value be fused to the corresponding leaf node of the key assignments.If not deposited in the index file
In the key assignments, then it is inserted into new leaf node in the BPlusTree structure, the key assignments and location information value are stored
To the new leaf node.
Here, as general<key, when value>be assigned to the index file of corresponding slave, if in the index file
There are the corresponding leaf nodes of key value, then merge new value with old value;If there is no should in the index file
Key value is then inserted into new leaf node, and general<key, value>tuple storage are into new leaf node.
Preferably, the method also includes step S14 ', when the size of the index file of the concordance list to be created reaches
When preset index file size threshold value, new index file is generated, and by the updating location information of the new index file
Into the corresponding metamessage of the concordance list to be created.In one embodiment of the application, when Hash class global index's table and range
When the maximum magnitude for the size that the index file of class global index table respectively reaches an index file, new index text is regenerated
Part, and by the updating location information of index file into the corresponding metamessage of global index's table.
There are 1 Master, 3 slave in one preferred embodiment of the application, such as in distributed system, by data text
The maximum magnitude of part is set as 25 rows, and when the number of data lines of input is equal to maximum magnitude 25, distributed column storage platform will
Current data line is output to a certain machine in cluster according to load balancing principle as a data file, Master
Corresponding data file (FileSegment) in SSD, and the corresponding metamessage of more new data table.For Hash class global index
Table, key are the value of id, and value is the offset of the information and id corresponding record of the corresponding data file of id in the data file
Information.Take Hash by key value, by Hash result be 0,1,2<key, value>tuple be separately dispensed into cluster the 1st,
2, in the index file (HashIndexFileSegment) of the Hash class concordance list of 3 machines, as shown in fig. 6, working as id=1
When, Hash result 1, then by its<key, value>storage is in the 2nd slave into cluster, as id=2, Hash result
Be 2, then by its<key, value>storage is in the 3rd slave into cluster;As id=3, Hash result 0, then by its <
Key, value > storage is in the 1st slave into cluster, wherein and key | key%3=cryptographic Hash } refer to key and slave
Number (here number be 3) between carry out remainder, obtain cryptographic Hash, key is the value of id here.
When creating range class global index's table, to range division result such as Fig. 6 institute after the key value sampling of data source
Show, range division principle should make the record number in each range intervals close as far as possible, the result of range partition include [1,
333], [334,666] and [667,999] three sections, and the corresponding range area three slave is stored in Master
Domain, it is when the value of the index column in data block meets some range intervals, the storage of its index information is corresponding to the range intervals
Slave machine range class concordance list data file (RangeIndexFileSegment) in, with BPlusTree leaf
The form of node exists, and such as id=5, key value falls in first range intervals, then by its corresponding<key, value>information is deposited
In the index file for storing up First machine.
It should be noted that either Hash class concordance list or range class concordance list, in general<key, value>distribution is extremely
When in the index file of corresponding slave, need to judge in the structure BPlusTree of index file whether existing corresponding rope
Draw the id of column, if having existed some id in BPlusTree, directly by the id arrange corresponding data block (Block) information and
The line number being listed in Block is merged with original value.If the key value is not present in the index file, insertion is new
Leaf node, general<key, value>storage is into new leaf node.
When the maximum magnitude for the size that the index file of above two class index respectively reaches an index file, regeneration
New index file, and by the updating location information of index file into the corresponding metamessage of global index's table;Pass through the application
The method of the creation concordance list, obtains the corresponding index file of data source, Fig. 7 shows the data in one embodiment of the application
The index file of one Hash class concordance list of source creation, is 0 since all key values take Hash result, so the index is literary
Part (HashIndexFileSegment) is located in the slave 1 of cluster, storage organization BPlusTree, leaf node storage
<key, value>, if the correspondence value of key=3 is fs1:4, indicate that the record of id=3 is located in FileSegment1, and
Offset is 4;Fig. 8 shows the index file of a range class concordance list of the creation of the data source in one embodiment of the application, institute
There is key value to be respectively less than 333, which is located in the slave 1 of cluster, storage knot
Structure is BPlusTree, and leaf node stores<key, value>, if the correspondence value of key=1 is fs1:1:2, indicate id=
1 record is located in FileSegment1, and offset is 1 and 2.
In conclusion optimizing bottom storage organization by the concordance list of herein described creation, can be applied to be distributed
The inquiry of data in formula system meets the number of the part column of user demand by Hash class concordance list and the inquiry of range class concordance list
According to avoiding reading all data files, greatly reduce the data volume of access, improve the processing speed of on-line analytical processing task
Degree.Certainly, the concordance list of herein described creation can also be applied to using scenes such as concordance list distributing system resources, not
It is confined to the inquiry applied to data.
Fig. 9 is shown according to further aspect of the application, a kind of device structure schematic diagram of the creation concordance list provided, institute
Stating equipment includes: the first creating device 11, the second creating device 12, data file distributor 13 and distributor 14, application
In distributed system,
First creating device 11, the structure of the metamessage for creating the data source corresponding data table got, wherein institute
The metamessage for stating tables of data includes the location information of all data files in the tables of data, and the data file is in column form
Storage;In one embodiment of the application, distributed global index table is created, then firstly the need of creation data source corresponding data table
Metamessage, metamessage include all data files location information on each machine hard disk in the cluster that tables of data is included.
It should be noted that needing to create the structure of tables of data before the corresponding metamessage of creation tables of data, the structure of tables of data includes
The structure of data file includes the data of data source in tables of data, and the data in data source are stored as tables of data, and tables of data
Storage form be data file.Distributed system frame diagram in one embodiment of the application is as shown in Fig. 2, include client
(client), host node (Master), one or several from node (slave) and zookeeper, can will be in each tables of data
Data file is stored in solid state hard disk (SSD), and in the embodiment of the present application, the metamessage of tables of data is stored in zookeeper
In, wherein the zookeeper is the coordination system of performance distributed application.
Second creating device 12, for creating the data structure of concordance list to be created, according to the request of user determination
The index column of concordance list to be created described in tables of data, it is described to index the part column being classified as in the tables of data, and according to described
Index column creates the structure of the corresponding metamessage of the concordance list to be created, wherein the corresponding member letter of the concordance list to be created
Breath includes the location information of index file in the concordance list to be created;Above-described embodiment is connect, according to actual needs, determines and needs
The column for creating global index, i.e., selected part column are used as index column from tables of data, and then are created using selected index column
The corresponding metamessage of distributed global index's table, wherein the metamessage includes all index files that global index's table is included
Location information on each machine hard disk in the cluster.The concordance list to be created is created successfully can be used for distribution afterwards
The inquiry of data in system can satisfy while user only needs to inquire the demand of a small number of several data column and avoid reading institute
Some data files greatly reduce the data volume of access.
It should be noted that creating the concordance list pair to be created according to the index column in one embodiment of the application
When the structure for the metamessage answered, it can be and a corresponding concordance list is established according to each index column, then create concordance list
The structure of corresponding metamessage is also possible to when creating concordance list according to several in the corresponding index column of user demand certainly
Index column establishes a concordance list jointly.
Wherein, the data structure of global index's table is created, the structure of global index's table includes the structure of index file.It is preferred that
Ground, the structure of index file includes BPlusTree structure in the data structure of the concordance list to be created, wherein described
The leaf node of BPlusTree structure includes key assignments and location information value.Here, the index file of global index's table can be used
BPlusTree structure carries out tissue storage, and the leaf node of BPlusTree structure includes tuple<key assignments, and location information value>(<
Key, value >), BPlusTree structure can effectively be ranked up the index column data of input, so as to check quickly fastly
Ask the position of the corresponding record of index column, the query task of quick response data.
In one specific embodiment of the application, the data source of example as shown in Figure 3, the data source is recorded for 1000 totally,
(id), name (name), age (age), four column data of gender (sex) are identified including address.User shown in Fig. 4 is directed to the number
According to the inquiry sql sentence in source, user needs the data arranged id to be filtered screening, thus needs to arrange id creation global index
Table, wherein table A is expressed as the corresponding tables of data of data source shown in Fig. 3, Sql 1: " Select*from table A
What where id=1 " indicated inquiry 1 sentence of sql execution is to inquire the data that id is 1 in tables of data A to arrange corresponding data,
His Sql sentence indicates that meaning is similar with its.
Data file distributor 13, for by the data source current data row generate data file distribute to from
Node, and according to the allocated location information of the structure of the metamessage of the tables of data and the data file, described in update
The metamessage of data file corresponding data table;Here, distributed column storage platform as shown in Figure 2, distributed column storage
When platform storing data source data, data file is uniformly distributed each machine in the cluster according to load balance principle by Master
Device, every machine contain several data files (FileSegment) of the tables of data, the structure of FileSegment such as Fig. 5 institute
To show, data file is stored in such a way that column stores, when OLAP user only needs to inquire a small number of several data column,
Column storage can only provide the corresponding data column for needing to read for user, and then greatly improve the search efficiency of OLPA.
Preferably, the equipment further include: generate data file device 13 ', reach for the line number when the data source
When the size threshold value of preset data file, then current data row new life is become into a data file, by newly-generated data
Text distribution is corresponding from node to the tables of data, and updates the metamessage of the corresponding tables of data of the data file.
In one embodiment of the application, whenever the line number of the data in data in EMS memory source is equal to the size of a data file
When range, i.e., current data line is generated as a data file, while the host node (Master) in cluster is according to load
Homeostatic principle is distributed into cluster in the hard disk of a certain machine and is stored, and updates the corresponding metamessage of data list file, this
When, also start to create for global index's table of the data file.
Distributor 14, for distributing the information of the index file of the concordance list to be created to corresponding from node
In index file, and update the corresponding metamessage of the concordance list to be created.Here, distributing the data file of generation to storage
Afterwards, it needs the information by its corresponding index file to be allocated into the index file of corresponding slave, and updates global index
In the corresponding metamessage of table.Wherein, the information of index file may include the value of the index column in the concordance list of creation completion, institute
State offset etc. of the row where the filename of the affiliated data file of index column, the index column in the data file.
In one embodiment of the application, the concordance list to be created includes Hash class global index's table or range class overall situation rope
Draw table.Here, the allocation strategy of the index file of Hash class and range class global index table in the cluster is slightly different.Hash class
Global index's table is the machine being distributed in distributed type assemblies according to the cryptographic Hash of index column decision index file, and range class
Global index's table is then that index file is assigned to corresponding machine according to the range of index train value.
Preferably, the second creating device 12, the value for the index column according to the concordance list to be created determine the leaf
The key assignments of child node;According to the member of the location information of data file, the concordance list to be created in the metamessage of the tables of data
Offset in information in the location information of index file and the index column indexed file determines the position of the leaf node
Set the value of information.In one embodiment of the application, the index file of Hash class global index's table and Hash class global index table is adopted
Tissue storage is carried out with BPlusTree structure.The leaf node of BPlusTree include tuple<key, value>, wherein key value
For the value of index column, data file is derived from, value is the record of the data file information and the condition that meets where the index column
Offset in the index file.Data file is evenly distributed in each in cluster by Master according to load balancing principle
A machine.Every machine contains several data files (FileSegment) of the tables of data, and the structure of FileSegment is such as
Shown in Fig. 5;When the corresponding global index's table of distributed column storage platform creation data source, Hash class global index table is corresponding
Index file is HashIndexFileSegment, and the corresponding index file of range class global index table is
RangeIndexFileSegment。
Preferably, when the concordance list to be created is Hash class global index's table, distributor 14 is used for according to
The cryptographic Hash that the value of the index column of Hash class global index table determines, the index file of Hash class global index table is corresponding
Key assignments and location information value distribute into the corresponding index file from node of Hash class global index table.Specifically,
Distributor 14, for according to the value of the index column of Hash class global index table and it is described from the number of node determine described in
The cryptographic Hash of the index column is the key assignments of leaf node and position in the corresponding BPlusTree structure of i by the cryptographic Hash of index column
It sets the value of information to distribute to i+1 from the index file of node, wherein i is natural number, thus reasonably by index file
Information is distributed into each slave, and equally distributed purpose is reached.It should be noted that by determining cryptographic Hash, so that it is determined that
The information of index file is assigned the information (being such as assigned to machine 1) of machine extremely, that is, has determined cryptographic Hash is corresponding
<key, the information of the machine of value>distribution extremely, only as<key, value>be assigned to the index file of corresponding machine
When middle, leaf point is just really created completion in corresponding BPlusTree structure.
In one embodiment of the application, it is assumed that have 1 master, n platform slave in distributed type assemblies.Hash class overall situation rope
Draw table by seeking cryptographic Hash to index column, and the value of index column is the key value of leaf node in BPlusTree structure, index column
The offset of the information and index column corresponding record of corresponding data file in the data file is value.By key and key
Cryptographic Hash be i the location information value of record be assigned in the index file of i+1 platform slave, wherein cryptographic Hash can
To be determined according to slave number in the value of index column and distributed type assemblies.
Preferably, when the concordance list to be created is range class global index's table, distributor 14, for according to institute
It states the sampled result that the value of index column is sampled and determines range of distribution section, and record each from node and its corresponding rope
Draw the range of distribution section of column;According to the range of distribution section by the information of the index file of range class global index table
Distribution is into the corresponding index file from node of range class global index table.Specifically, distributor 14 are used for institute
The value of index column for stating range class global index table is compared with the range of distribution section of the record, determines the index column
Value where range of distribution section;Range of distribution section where the value of the index column, by the value of the index column
In corresponding BPlusTree structure the key assignments of leaf node and location information value distribute to the range of distribution section it is corresponding from
The index file of node.
In one embodiment of the application, it is assumed that have 1 Master, n platform slave in distributed type assemblies.Range class overall situation rope
Draw table to sample by the value of the index column to data list file, n range is set according to sampled result, so that each range
Interior data volume is evenly distributed as much as possible, and the range intervals of every machine and its corresponding index column are recorded in Master.
When generating an index file, the value of the index column of index file can be compared with n range in Master, according to
Offset of the affiliated range areas by the corresponding data file information of the column and in the data file, i.e.,<key, value>update
Into the index file of the corresponding slave in range areas.
In one embodiment of the application, distributor 14, if for the existing key assignments in the index file;Then will
New location information value and old location information value are fused to the corresponding leaf node of the key assignments.If in the index file not
There are the key assignments, then are inserted into new leaf node in the BPlusTree structure, the key assignments and location information value are deposited
It stores up to the new leaf node.
Here, as general<key, when value>be assigned to the index file of corresponding slave, if in the index file
There are the corresponding leaf nodes of key value, then merge new value with old value;If there is no should in the index file
Key value is then inserted into new leaf node, and general<key, value>tuple storage are into new leaf node.
Preferably, the equipment further include: index file device 14 ' is generated, for working as the rope of the concordance list to be created
When the size of quotation part reaches preset index file size threshold value, new index file is generated, and the new index is literary
The updating location information of part is into the corresponding metamessage of the concordance list to be created.In one embodiment of the application, when Hash class
When the maximum magnitude for the size that the index file of global index's table and range class global index table respectively reaches an index file,
New index file is regenerated, and by the updating location information of index file into the corresponding metamessage of global index's table.
There are 1 Master, 3 slave in one preferred embodiment of the application, such as in distributed system, by data text
The maximum magnitude of part is set as 25 rows, and when the number of data lines of input is equal to maximum magnitude 25, distributed column storage platform will
Current data line is output to a certain machine in cluster according to load balancing principle as a data file, Master
Corresponding data file (FileSegment) in SSD, and the corresponding metamessage of more new data table.For Hash class global index
Table, key are the value of id, and value is the offset of the information and id corresponding record of the corresponding data file of id in the data file
Information.Take Hash by key value, by Hash result be 0,1,2<key, value>tuple be separately dispensed into cluster the 1st,
2, in the index file (HashIndexFileSegment) of the Hash class concordance list of 3 machines, as shown in fig. 6, working as id=1
When, Hash result 1, then by its<key, value>storage is in the 2nd slave into cluster, as id=2, Hash result
Be 2, then by its<key, value>storage is in the 3rd slave into cluster;As id=3, Hash result 0, then by its <
Key, value > storage is in the 1st slave into cluster, wherein and key | key%3=cryptographic Hash } refer to key and slave
Number (here number be 3) between carry out remainder, obtain cryptographic Hash, key is the value of id here.
When creating range class global index's table, to range division result such as Fig. 6 institute after the key value sampling of data source
Show, range division principle should make the record number in each range intervals close as far as possible, the result of range partition include [1,
333], [334,666] and [667,999] three sections, and the corresponding range area three slave is stored in Master
Domain, it is when the value of the index column in data block meets some range intervals, the storage of its index information is corresponding to the range intervals
Slave machine range class concordance list data file (RangeIndexFileSegment) in, with BPlusTree leaf
The form of node exists, and such as id=5, key value falls in first range intervals, then by its corresponding<key, value>information is deposited
In the index file for storing up First machine.
It should be noted that either Hash class concordance list or range class concordance list, in general<key, value>distribution is extremely
When in the index file of corresponding slave, need to judge in the structure BPlusTree of index file whether existing corresponding rope
Draw the id of column, if having existed some id in BPlusTree, directly by the id arrange corresponding data block (Block) information and
The line number being listed in Block is merged with original value.If the key value is not present in the index file, insertion is new
Leaf node, general<key, value>storage is into new leaf node.
When the maximum magnitude for the size that the index file of above two class index respectively reaches an index file, regeneration
New index file, and by the updating location information of index file into the corresponding metamessage of global index's table;Pass through the application
The method of the creation concordance list, obtains the corresponding index file of data source, Fig. 7 shows the data in one embodiment of the application
The index file of one Hash class concordance list of source creation, is 0 since all key values take Hash result, so the index is literary
Part (HashIndexFileSegment) is located in the slave 1 of cluster, storage organization BPlusTree, leaf node storage
<key, value>, if the correspondence value of key=3 is fs1:4, indicate that the record of id=3 is located in FileSegment1, and
Offset is 4;Fig. 8 shows the index file of a range class concordance list of the creation of the data source in one embodiment of the application, institute
There is key value to be respectively less than 333, which is located in the slave 1 of cluster, storage knot
Structure is BPlusTree, and leaf node stores<key, value>, if the correspondence value of key=1 is fs1:1:2, indicate id=
1 record is located in FileSegment1, and offset is 1 and 2.
In conclusion optimizing bottom storage organization by the concordance list of herein described creation, can be applied to be distributed
The inquiry of data in formula system meets the number of the part column of user demand by Hash class concordance list and the inquiry of range class concordance list
According to avoiding reading all data files, greatly reduce the data volume of access, improve the processing speed of on-line analytical processing task
Degree.Certainly, the concordance list of herein described creation can also be applied to using scenes such as concordance list distributing system resources, not
It is confined to the inquiry applied to data.
Obviously, those skilled in the art can carry out various modification and variations without departing from the essence of the application to the application
Mind and range.In this way, if these modifications and variations of the application belong to the range of the claim of this application and its equivalent technologies
Within, then the application is also intended to include these modifications and variations.
It should be noted that the application can be carried out in the assembly of software and/or software and hardware, for example, can adopt
With specific integrated circuit (ASIC), general purpose computer or any other realized similar to hardware device.In one embodiment
In, the software program of the application can be executed to implement the above steps or functions by processor.Similarly, the application
Software program (including relevant data structure) can be stored in computer readable recording medium, for example, RAM memory,
Magnetic or optical driver or floppy disc and similar devices.In addition, hardware can be used to realize in some steps or function of the application, example
Such as, as the circuit cooperated with processor thereby executing each step or function.
In addition, a part of the application can be applied to computer program product, such as computer program instructions, when its quilt
When computer executes, by the operation of the computer, it can call or provide according to the present processes and/or technical solution.
And the program instruction of the present processes is called, it is possibly stored in fixed or moveable recording medium, and/or pass through
Broadcast or the data flow in other signal-bearing mediums and transmitted, and/or be stored according to described program instruction operation
In the working storage of computer equipment.Here, including a device according to one embodiment of the application, which includes using
Memory in storage computer program instructions and processor for executing program instructions, wherein when the computer program refers to
When enabling by processor execution, method and/or skill of the device operation based on aforementioned multiple embodiments according to the application are triggered
Art scheme.
It is obvious to a person skilled in the art that the application is not limited to the details of above-mentioned exemplary embodiment, Er Qie
In the case where without departing substantially from spirit herein or essential characteristic, the application can be realized in other specific forms.Therefore, no matter
From the point of view of which point, the present embodiments are to be considered as illustrative and not restrictive, and scope of the present application is by appended power
Benefit requires rather than above description limits, it is intended that all by what is fallen within the meaning and scope of the equivalent elements of the claims
Variation is included in the application.Any reference signs in the claims should not be construed as limiting the involved claims.This
Outside, it is clear that one word of " comprising " does not exclude other units or steps, and odd number is not excluded for plural number.That states in device claim is multiple
Unit or device can also be implemented through software or hardware by a unit or device.The first, the second equal words are used to table
Show title, and does not indicate any particular order.