Background technology
With the development and application of database technology, the data volume of database purchase is growing day by day, while quickly, neatly
Carrying out the complex query processing of big data quantity also turns into new demand.OLAP (On-Line Analytical Processing,
On-line analytical processing), dedicated for supporting complicated analysis operation, stress the decision-making branch to decision-maker and senior management staff
Hold.Under usual condition, OLAP user only needs to a small number of several data row of inquiry, can be loaded using line storage many useless
Data are arranged, and cause query performance to decline.The basic query method of distributed column storage reads member letter first from zookeeper
Breath, and then go the machine of each in cluster to read all data files, and then reading meets condition from each data file
Record, this mode directly results in that data access amount is excessive, have impact on OLAP query performance.
Apply for content
The purpose of the application is to provide a kind of method and apparatus for creating concordance list, optimizes bottom storage organization, from
And provide convenient for data query.
According to the one side of the application there is provided a kind of method for creating concordance list, methods described includes:
The structure of the metamessage of the data source corresponding data table got is created, wherein, the metamessage bag of the tables of data
The positional information of all data files in the tables of data is included, the data file is stored in column form;
The data structure of concordance list to be created is created, rope to be created described in the tables of data is determined according to the request of user
Draw the index column of table, the index is classified as the part row in the tables of data, and creates described to be created according to the index column
The structure of the corresponding metamessage of concordance list, wherein, the corresponding metamessage of the concordance list to be created includes the index to be created
The positional information of index file in table;
The data file that current data row in the data source is generated is distributed to from node, and according to the tables of data
The allocated positional information of the structure of metamessage and the data file, updates the member letter of the data file corresponding data table
Breath;
The information of the index file of the concordance list to be created is distributed into the corresponding index file from node, and more
The corresponding metamessage of the new concordance list to be created.
Further, the structure of index file includes BPlusTree structures in the data structure of the concordance list to be created,
Wherein, the leaf node of the BPlusTree structures includes key assignments and positional information value.
Further, the data structure for creating the concordance list to be created, including:
The key assignments of the leaf node is determined according to the value of the index column of the concordance list to be created;
Row according to where the filename and the index column of the affiliated data file of the index column is in the data file
In offset determine the positional information value of the leaf node.
Further, the concordance list to be created includes Hash class global index's table and/or scope class global index table.
Further, when the concordance list to be created is Hash class global index's table, by the concordance list to be created
The information of the corresponding index file of index column is distributed into the corresponding index file from node of concordance list to be created, bag
Include:
The cryptographic Hash determined according to the value of the index column of the Hash class global index table, by the Hash class global index
The corresponding key assignments of index file and positional information value of table are distributed into the corresponding index file from node.
Further, the cryptographic Hash determined according to the value of the index column of the Hash class global index table, by the Hash
The corresponding key assignments of index file and positional information value of class global index table distribute corresponding to the Hash class global index table
From the index file of node, including:
According to the value of the index column of the Hash class global index table and described from the number of node determine the index column
Cryptographic Hash;
By the key assignments and positional information that the cryptographic Hash of the index column is leaf node in the corresponding BPlusTree structures of i
Value distribution is individual from the index file of node to i+1, wherein, i is natural number.
Further, when the concordance list to be created is scope class global index's table, D is by the concordance list to be created
The information of the corresponding index file of index column is distributed into the corresponding index file from node of concordance list to be created, bag
Include:
The sampled result sampled according to the value to the index column determines that range of distribution is interval, and records each from section
The range of distribution of point and its corresponding index column is interval;
The information of the index file of the scope class global index table is distributed to correspondence according to the range of distribution is interval
The index file from node in.
Further, according to the information point of the interval index file by the scope class global index table of the range of distribution
In being assigned to the corresponding index file from node of the scope class global index table, including:
The value of index column of the scope class global index table and the range of distribution interval of the record are compared, really
Range of distribution where the value of the fixed index column is interval;
Range of distribution according to where the value of the index column is interval, by the corresponding BPlusTree of the value of the index column
The key assignments of leaf node and positional information value are distributed to the interval corresponding index file from node of the range of distribution in structure.
Further, the information of the corresponding index file of index column in the concordance list to be created is distributed to described and waits to create
When in indexing the corresponding index file from node of table, also include:
If the existing key assignments in the index file;Then new positional information value is merged with old positional information value
To the corresponding leaf node of the key assignments.
Further, the information of the corresponding index file of index column in the concordance list to be created is distributed to described and waits to create
When in indexing the corresponding index file from node of table, also include:
If the key assignments is not present in the index file, new leaf section is inserted in the BPlusTree structures
Point, the key assignments and positional information value are stored to the new leaf node.
Further, data file current data row in the data source generated distributes corresponding to the tables of data
Before node, also include:
It is when the line number of the data source reaches the size threshold value of default data file, then current data row is newly-generated
For a data file, distribute corresponding from node to the tables of data by newly-generated data text, and update the data text
The metamessage of the corresponding tables of data of part.
Further, the information of the corresponding index file of index column in the concordance list to be created is distributed to described and waits to create
Before in indexing the corresponding index file from node of table, also include:
When the size of the index file of the concordance list to be created reaches default index file size threshold value, generation is new
Index file, and by the updating location information of the new index file to the corresponding metamessage of the concordance list to be created
In.
According to the application on the other hand, a kind of equipment for creating concordance list is additionally provided, the equipment includes:
First creating device, the structure of the metamessage for creating the data source corresponding data table got, wherein, it is described
The metamessage of tables of data includes the positional information of all data files in the tables of data, and the data file is deposited in column form
Storage;
Second creating device, the data structure for creating concordance list to be created, the number is determined according to the request of user
According to the index column of concordance list to be created described in table, the index is classified as the part row in the tables of data, and according to the rope
Draw the structure that row create the corresponding metamessage of the concordance list to be created, wherein, the corresponding metamessage of the concordance list to be created
Include the positional information of index file in the concordance list to be created;
Data file distributor, the data file for current data row in the data source to be generated is distributed to from section
Point, and the structure and the allocated positional information of the data file of the metamessage according to the tables of data, update the number
According to the metamessage of file corresponding data table;
Distributor, for the information of the index file of the concordance list to be created to be distributed to the corresponding rope from node
In quotation part, and update the corresponding metamessage of the concordance list to be created.
Further, the structure of index file includes BPlusTree structures in the data structure of the concordance list to be created,
Wherein, the leaf node of the BPlusTree structures includes key assignments and positional information value.
Further, second creating device is used for:
The key assignments of the leaf node is determined according to the value of the index column of the concordance list to be created;
Row according to where the filename and the index column of the affiliated data file of the index column is in the data file
In offset determine the positional information value of the leaf node.
Further, the concordance list to be created includes Hash class global index's table and/or scope class global index table.
Further, when the concordance list to be created is Hash class global index's table, the distributor is used for:
The cryptographic Hash determined according to the value of the index column of the Hash class global index table, by the Hash class global index
The corresponding key assignments of index file and positional information value of table are distributed into the corresponding index file from node.
Further, the distributor is used for:
According to the value of the index column of the Hash class global index table and described from the number of node determine the index column
Cryptographic Hash;
By the key assignments and positional information that the cryptographic Hash of the index column is leaf node in the corresponding BPlusTree structures of i
Value distribution is individual from the index file of node to i+1, wherein, i is natural number.
Further, when the concordance list to be created is scope class global index's table, the distributor is used for:
The sampled result sampled according to the value to the index column determines that range of distribution is interval, and records each from section
The range of distribution of point and its corresponding index column is interval;
The information of the index file of the scope class global index table is distributed to correspondence according to the range of distribution is interval
The index file from node in.
Further, the distributor is used for:
The value of index column of the scope class global index table and the range of distribution interval of the record are compared, really
Range of distribution where the value of the fixed index column is interval;
Range of distribution according to where the value of the index column is interval, by the corresponding BPlusTree of the value of the index column
The key assignments of leaf node and positional information value are distributed to the interval corresponding index file from node of the range of distribution in structure.
Further, the distributor is additionally operable to:
If the existing key assignments in the index file;Then new positional information value is merged with old positional information value
To the corresponding leaf node of the key assignments.
Further, the distributor is additionally operable to:
If the key assignments is not present in the index file, new leaf section is inserted in the BPlusTree structures
Point, the key assignments and positional information value are stored to the new leaf node.
Further, the equipment also includes:
Generate data file device, the size threshold value for reaching default data file when the line number of the data source
When, then current data row new life is turned into a data file, distribute corresponding to the tables of data by newly-generated data text
From node, and update the metamessage of the corresponding tables of data of the data file.
Further, the equipment also includes:
Index file device is generated, the size for the index file when the concordance list to be created reaches default index
During file size threshold value, new index file is generated, and the updating location information of the new index file is waited to create to described
Index in the corresponding metamessage of table.
Compared with prior art, the structure of the metamessage for the data source corresponding data table that the application is got by establishment,
Wherein, the metamessage of the tables of data include the tables of data in all data files positional information, the data file with
The form storage of row;Then, the data structure of concordance list to be created is created, institute in the tables of data is determined according to the request of user
The index column of concordance list to be created is stated, the index is classified as the part row in the tables of data, and is created according to the index column
The structure of the corresponding metamessage of the concordance list to be created, wherein, the corresponding metamessage of the concordance list to be created includes described
The positional information of index file in concordance list to be created;Then, data file current data row in the data source generated
Distribution is extremely from node, and the structure and the allocated positional information of the data file of the metamessage according to the tables of data,
Update the metamessage of the data file corresponding data table;The information of the index file of the concordance list to be created is distributed to right
In the index file from node answered, and the corresponding metamessage of the concordance list to be created is updated, and then optimize bottom storage
Structure, there is provided the information of index file when applied to data query, so as to can quickly be positioned according to the information of index file
To the data file for the condition that meets, the visit capacity of data is greatly reduced, query performance is improved.
Embodiment
The application is described in further detail below in conjunction with the accompanying drawings.
In one typical configuration of the application, terminal, the equipment of service network and trusted party include one or more
Processor (CPU), input/output interface, network interface and internal memory.
Internal memory potentially includes the volatile memory in computer-readable medium, random access memory (RAM) and/or
The forms such as Nonvolatile memory, such as read-only storage (ROM) or flash memory (flash RAM).Internal memory is computer-readable medium
Example.
Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method
Or technology realizes information Store.Information can be computer-readable instruction, data structure, the module of program or other data.
The example of the storage medium of computer includes, but are not limited to phase transition internal memory (PRAM), static RAM (SRAM), moved
State random access memory (DRAM), other kinds of random access memory (RAM), read-only storage (ROM), electric erasable
Programmable read only memory (EEPROM), fast flash memory bank or other memory techniques, read-only optical disc read-only storage (CD-ROM),
Digital versatile disc (DVD) or other optical storages, magnetic cassette tape, magnetic disk storage or other magnetic storage apparatus or
Any other non-transmission medium, the information that can be accessed by a computing device available for storage.Defined according to herein, computer
Computer-readable recording medium does not include the data-signal and carrier wave of non-temporary computer readable media (transitory media), such as modulation.
Fig. 1 show according to the one side of the application there is provided a kind of establishment concordance list method flow schematic diagram, it is described
Method includes:Step S11~step S14, applied in distributed system,
In step s 11, the structure of the metamessage of the data source corresponding data table got is created, wherein, the data
The metamessage of table includes the positional information of all data files in the tables of data, and the data file is stored in column form;
In the embodiment of the application one, distributed global index table is created, then is believed firstly the need of the member for creating data source corresponding data table
Breath, metamessage includes positional information of all data files that are included of tables of data in the cluster on each machine hard disk.Need
Illustrate, need to create the structure of tables of data before creating the corresponding metamessage of tables of data, the structure of tables of data includes data
The structure of file, tables of data includes the data of data source, is tables of data by data storage in data source, and tables of data is deposited
Storage form is data file.Distributed system frame diagram in the embodiment of the application one is as shown in Fig. 2 including client
(client), host node (Master), one or several from node (slave) and zookeeper, can be by each tables of data
Data file is stored in solid state hard disc (SSD), in the embodiment of the present application, and the metamessage of tables of data is stored in zookeeper
In, wherein, the zookeeper is the coordination system that performance distributed is applied.
In step s 12, the data structure of concordance list to be created is created, is determined according to the request of user in the tables of data
The index column of the concordance list to be created, the index is classified as the part row in the tables of data, and is created according to the index column
The structure of the corresponding metamessage of the concordance list to be created is built, wherein, the corresponding metamessage of the concordance list to be created includes institute
State the positional information of index file in concordance list to be created;Above-described embodiment is connect, according to the actual requirements, it is determined that needing to create global
The row of index, i.e., selected part is arranged as index column from tables of data, and then creates distributed complete using selected index column
The corresponding metamessage of office's concordance list, wherein, the metamessage includes all index files that global index's table included in the cluster
Positional information on each machine hard disk.The concordance list to be created, which is created, successfully can be used for afterwards number in distributed system
According to inquiry, disclosure satisfy that user only needs to avoid reading all data while the demand of a small number of several data row of inquiry
File, greatly reduces the data volume of access.
It should be noted that in the embodiment of the application one, the concordance list pair to be created is created according to the index column
During the structure for the metamessage answered, it can be that a corresponding concordance list is set up according to each index column, then create concordance list
The structure of corresponding metamessage, can also be some in the corresponding index column of user's request during establishment concordance list certainly
Index column sets up a concordance list jointly.
Wherein, the data structure of global index's table is created, the structure of global index's table includes the structure of index file.It is preferred that
The structure of index file includes BPlusTree structures in ground, the data structure of the concordance list to be created, wherein, it is described
The leaf node of BPlusTree structures includes key assignments and positional information value.Here, the index file of global index's table can be used
BPlusTree structures carry out tissue storage, and the leaf node of BPlusTree structures includes tuple<Key assignments, positional information value>(<
Key, value>), BPlusTree structures effectively can be ranked up to the index column data of input, so as to fast quick checking
Ask the position of the corresponding record of index column, the query task of quick response data.
In the specific embodiment of the application one, the data source of example as shown in Figure 3, the data source totally 1000 records,
Including address mark (id), name (name), age (age), the column data of sex (sex) four.User shown in Fig. 4 is directed to the number
According to the inquiry sql sentences in source, user needs to carry out filtering screening to the data that id is arranged, thus needs to create global index to id row
Table, wherein, table A are expressed as the corresponding tables of data of data source shown in Fig. 3, Sql 1:“Select*from table A
What where id=1 " represented inquiry sql 1 sentence execution is to inquire about id in tables of data A to arrange corresponding data for 1 data, its
He represents similar with its by Sql sentences.
In step s 13, data file current data row in the data source generated is distributed to from node, and according to
The allocated positional information of the structure of the metamessage of the tables of data and the data file, updates the data file correspondence
The metamessage of tables of data;Here, distributed column storage platform as shown in Figure 2, distributed column storage platform data storage
During source data, data file is uniformly distributed each machine in the cluster, every machine by Master according to load balance principle
Several data files (FileSegment) containing the tables of data, FileSegment structure is as shown in figure 5, data file
Stored in the way of column is stored, when OLAP user only needs to a small number of several data row of inquiry, column storage energy
The corresponding data row for needing to read enough only are provided for user, and then greatly improve OLPA search efficiency.
Preferably, before step S13, also include:Step S13 ', when the line number of the data source reaches default data
During the size threshold value of file, then current data row new life is turned into a data file, by newly-generated data text distribution to institute
State that tables of data is corresponding from node, and update the metamessage of the corresponding tables of data of the data file.
In the embodiment of the application one, whenever the line number of the data in data in EMS memory source is equal to the size of a data file
During scope, i.e., current data row is generated as a data file, while the host node (Master) in cluster is according to load
Homeostatic principle is distributed into cluster to be stored in the hard disk of a certain machine, and updates the data the corresponding metamessage of list file, this
When, also begin to create for global index's table of the data file.
In step S14, the information of the index file of the concordance list to be created is distributed to the corresponding rope from node
In quotation part, and update the corresponding metamessage of the concordance list to be created.Here, after the data file distribution of generation is stored,
Need the information of its corresponding index file being allocated into correspondence slave index file, and update global index's table
In corresponding metamessage.Wherein, the information of index file can include the value, described for creating the index column in the concordance list completed
The offset of row in the data file where the filename of the affiliated data file of index column, the index column etc..
In the embodiment of the application one, the concordance list to be created includes Hash class global index's table or the global rope of scope class
Draw table.Here, the allocation strategy of the index file of Hash class and scope class global index table in the cluster is slightly different.Hash class
Global index's table is the machine being distributed according to the cryptographic Hash of index column decision index file in distributed type assemblies, and scope class
Global index's table is then that index file is assigned into corresponding machine according to the scope of index train value.
Preferably, in step s 12, the leaf node is determined according to the value of the index column of the concordance list to be created
Key assignments;According to rope in the positional information of data file, the metamessage of the concordance list to be created in the metamessage of the tables of data
Offset in the positional information of quotation part and the index column indexed file determines the positional information value of the leaf node.
In the embodiment of the application one, the index file of Hash class global index's table and Hash class global index table is used
BPlusTree structures carry out tissue storage.BPlusTree leaf node includes tuple<Key, value>, wherein, key values are
The value of index column, from data file, the record of data file informations of the value where the index column and the condition that meets exists
Offset in the index file.Data file is evenly distributed in each in cluster according to load balancing principle by Master
Machine.Every machine contains several data files (FileSegment) of the tables of data, FileSegment structure such as Fig. 5
It is shown;When distributed column storage platform creates data source corresponding global index's table, the corresponding rope of Hash class global index table
Quotation part is HashIndexFileSegment, and the corresponding index file of scope class global index table is
RangeIndexFileSegment。
Preferably, when the concordance list to be created is Hash class global index's table, in step S14, breathed out according to described
The cryptographic Hash that the value of the index column of Xi Lei global indexs table is determined, the index file of the Hash class global index table is corresponding
Key assignments and positional information value are distributed into the corresponding index file from node of the Hash class global index table.Specifically, exist
In step S14, according to the value of the index column of the Hash class global index table and described from the number of node the index is determined
The cryptographic Hash of row, by the key assignments and position letter that the cryptographic Hash of the index column is leaf node in the corresponding BPlusTree structures of i
Breath value is distributed to i+1 from the index file of node, wherein, i is natural number, so that reasonably by the information of index file
Distribution reaches equally distributed purpose into each slave.It should be noted that by determining cryptographic Hash, so that it is determined that rope
The information of quotation part is allocated the information of machine extremely (being such as allocated to machine 1), that is, determines cryptographic Hash is corresponding<
Key, value>The information of distribution machine extremely, only when<Key, value>It is allocated into the index file of corresponding machine
When, leaf point is just really created completion in its corresponding BPlusTree structure.
In the embodiment of the application one, it is assumed that have 1 master, n platforms slave in distributed type assemblies.Hash class overall situation rope
Draw table by the way that cryptographic Hash is sought index column, and the value of index column is the key values of leaf node in BPlusTree structures, index column
The offset of the information and index column corresponding record of corresponding data file in the data file is value.By key and key
Cryptographic Hash be assigned to for the i positional information value of record in i+1 platform slave index file, wherein, cryptographic Hash can
Determined with the slave numbers in the value and distributed type assemblies according to index column.
Preferably, when the concordance list to be created is scope class global index's table, in step S14, according to described
The sampled result that the value of index column is sampled determines that range of distribution is interval, and records each from node and its corresponding index
The range of distribution of row is interval;According to the information point of the interval index file by the scope class global index table of the range of distribution
In being assigned to the corresponding index file from node of the scope class global index table.Specifically, in step S14, by the model
The value of the index column of Wei Lei global indexs table is compared with the range of distribution interval of the record, determines the value of the index column
The range of distribution at place is interval;Range of distribution according to where the value of the index column is interval, by the value correspondence of the index column
BPlusTree structures in leaf node key assignments and positional information value distribute interval corresponding from node to the range of distribution
Index file.
In the embodiment of the application one, it is assumed that have 1 Master, n platforms slave in distributed type assemblies.Scope class overall situation rope
Draw table to sample by the value of the index column to data list file, n scope is set according to sampled result so that each scope
Interior data volume, which is tried one's best, to be uniformly distributed, and records in Master the range intervals of every machine and its corresponding index column.
When generating an index file, n scope in the value and Master of the index column of index file can be compared, according to
Offset of the affiliated range areas by the corresponding data file information of the row and in the data file, i.e.,<Key, value>Update
Into the corresponding slave in range areas index file.
In the embodiment of the application one, in step S14, if the existing key assignments in the index file;Then will be new
Positional information value and old positional information value be fused to the corresponding leaf node of the key assignments.If not deposited in the index file
In the key assignments, then new leaf node is inserted in the BPlusTree structures, the key assignments and positional information value are stored
To the new leaf node.
Will here, working as<Key, value>When being assigned to corresponding slave index file, if in the index file
In the presence of the corresponding leaf node of key values, then new value is merged with old value;Should if being not present in the index file
Key values, then insert new leaf node, will<Key, value>Tuple is stored into new leaf node.
Preferably, methods described also includes:Step S14 ', when the size of the index file of the concordance list to be created reaches
During default index file size threshold value, new index file is generated, and by the updating location information of the new index file
Into the corresponding metamessage of the concordance list to be created.In the embodiment of the application one, when Hash class global index's table and scope
When the index file of class global index table each reaches the maximum magnitude of size of an index file, new index text is regenerated
Part, and by the updating location information of index file into the corresponding metamessage of global index's table.
In the preferred embodiment of the application one, such as there are 1 Master, 3 slave in distributed system, by data text
The maximum magnitude of part is set as 25 rows, and when the number of data lines of input is equal to maximum magnitude 25, distributed column storage platform will
Current data row is output to a certain machine in cluster as a data file, Master according to load balancing principle
Corresponding data file (FileSegment) in SSD, and update the data the corresponding metamessage of table.For Hash class global index
Table, key is id value, and value is the offset of the information and id corresponding records of the corresponding data files of id in the data file
Information.Hash is taken by key values, is 0,1,2 by Hash result<Key, value>Tuple be separately dispensed into cluster the 1st,
2nd, in the index file (HashIndexFileSegment) of the Hash class concordance list of 3 machines, as shown in fig. 6, working as id=1
When, Hash result is 1, then by it<Key, value>Store in the 2nd slave in cluster, as id=2, Hash result
For 2, then by it<Key, value>Store in the 3rd slave in cluster;As id=3, Hash result is 0, then by it<
Key, value>Store in the 1st slave in cluster, wherein, key | and key%3=cryptographic Hash } refer to key and slave
Number (here number be 3) between carry out remainder, obtain cryptographic Hash, key is id value here.
When creating scope class global index's table, to scope division result such as Fig. 6 institutes after the key values sampling of data source
Show, scope division principle, which should try one's best, make it that the record number in each range intervals is approached, the result of range partition comprising [1,
333], [334,666] and [667,999] three intervals, and each self-corresponding scope areas of three slave of storage in Master
Domain, when the value of the index column in data block meets some range intervals, by its index information storage to range intervals correspondence
Slave machines scope class concordance list data file (RangeIndexFileSegment) in, with BPlusTree leaves
The form of node is present, and such as id=5, key values fall in first range intervals, then its is corresponding<Key, value>Information is deposited
In the index file for storing up First machine.
It should be noted that either Hash class concordance list or scope class concordance list, will<Key, value>Distribution is extremely
When in corresponding slave index file, it is necessary in judging the structure BPlusTree of index file whether existing corresponding rope
Draw the id of row, if there is some id in BPlusTree, directly by the id arrange corresponding data block (Block) information and
The line number being listed in Block is merged with original value.If the key values are not present in the index file, insertion is new
Leaf node, will<Key, value>Store in new leaf node.
When the index file that two classes are indexed more than each reaches the maximum magnitude of size of an index file, regeneration
New index file, and by the updating location information of index file into the corresponding metamessage of global index's table;Pass through the application
The method of described establishment concordance list, obtains the corresponding index file of data source, Fig. 7 shows the data in the embodiment of the application one
The index file for the Hash class concordance list that source is created, because all key values take Hash result to be 0, so the index is literary
Part (HashIndexFileSegment) is located in the slave 1 of cluster, and storage organization is BPlusTree, leaf node storage
<Key, value>, such as key=3 corresponding value is fs1:4, represent that id=3 record is located in FileSegment1, and
Offset is 4;Fig. 8 shows the index file for the scope class concordance list that the data source in the embodiment of the application one is created, institute
There are key values to be respectively less than 333, the index file (RangeIndexFileSegment) is located in the slave 1 of cluster, storage knot
Structure is BPlusTree, and leaf node stores<Key, value>, such as key=1 corresponding value is fs1:1:2, represent id=
1 record is located in FileSegment1, and offset is 1 and 2.
In summary, by the concordance list of herein described establishment, bottom storage organization is optimized, distribution is can apply to
The inquiry of data in formula system, the number that the part of user's request is arranged is met by Hash class concordance list and the inquiry of scope class concordance list
According to, it is to avoid all data files are read, greatly reduce the data volume of access, the processing speed of on-line analytical processing task is improved
Degree.Certainly, the concordance list of herein described establishment, can also be applied to utilize the scenes such as concordance list distributing system resource, not
It is confined to the inquiry applied to data.
Fig. 9 show according to further aspect of the application there is provided a kind of establishment concordance list device structure schematic diagram, institute
Stating equipment includes:First creating device 11, the second creating device 12, data file distributor 13 and distributor 14, application
In distributed system,
First creating device 11, the structure of the metamessage for creating the data source corresponding data table got, wherein, institute
Stating the metamessage of tables of data includes the positional information of all data files in the tables of data, and the data file is in column form
Storage;In the embodiment of the application one, distributed global index table is created, then firstly the need of establishment data source corresponding data table
Metamessage, metamessage includes positional information of all data files that are included of tables of data in the cluster on each machine hard disk.
It should be noted that needing to create the structure of tables of data before creating the corresponding metamessage of tables of data, the structure of tables of data includes
The structure of data file, tables of data includes the data of data source, is tables of data by the data storage in data source, and tables of data
Storage form be data file.Distributed system frame diagram in the embodiment of the application one is as shown in Fig. 2 including client
(client), host node (Master), one or several from node (slave) and zookeeper, can be by each tables of data
Data file is stored in solid state hard disc (SSD), in the embodiment of the present application, and the metamessage of tables of data is stored in zookeeper
In, wherein, the zookeeper is the coordination system that performance distributed is applied.
Second creating device 12, the data structure for creating concordance list to be created, according to being determined the request of user
The index column of concordance list to be created described in tables of data, the index is classified as the part row in the tables of data, and according to described
Index column creates the structure of the corresponding metamessage of the concordance list to be created, wherein, the corresponding member letter of the concordance list to be created
Breath includes the positional information of index file in the concordance list to be created;Above-described embodiment is connect, according to the actual requirements, it is determined that needing
The row of global index are created, i.e., selected part is arranged as index column from tables of data, and then created using selected index column
The corresponding metamessage of distributed global index's table, wherein, the metamessage includes all index files that global index's table is included
Positional information on each machine hard disk in the cluster.The concordance list to be created is created successfully can be used for distribution afterwards
The inquiry of data in system, disclosure satisfy that user only needs to avoid reading institute while the demand of a small number of several data row of inquiry
Some data files, greatly reduce the data volume of access.
It should be noted that in the embodiment of the application one, the concordance list pair to be created is created according to the index column
During the structure for the metamessage answered, it can be that a corresponding concordance list is set up according to each index column, then create concordance list
The structure of corresponding metamessage, can also be some in the corresponding index column of user's request during establishment concordance list certainly
Index column sets up a concordance list jointly.
Wherein, the data structure of global index's table is created, the structure of global index's table includes the structure of index file.It is preferred that
The structure of index file includes BPlusTree structures in ground, the data structure of the concordance list to be created, wherein, it is described
The leaf node of BPlusTree structures includes key assignments and positional information value.Here, the index file of global index's table can be used
BPlusTree structures carry out tissue storage, and the leaf node of BPlusTree structures includes tuple<Key assignments, positional information value>(<
Key, value>), BPlusTree structures effectively can be ranked up to the index column data of input, so as to fast quick checking
Ask the position of the corresponding record of index column, the query task of quick response data.
In the specific embodiment of the application one, the data source of example as shown in Figure 3, the data source totally 1000 records,
Including address mark (id), name (name), age (age), the column data of sex (sex) four.User shown in Fig. 4 is directed to the number
According to the inquiry sql sentences in source, user needs to carry out filtering screening to the data that id is arranged, thus needs to create global index to id row
Table, wherein, table A are expressed as the corresponding tables of data of data source shown in Fig. 3, Sql 1:“Select*from table A
What where id=1 " represented inquiry sql 1 sentence execution is to inquire about id in tables of data A to arrange corresponding data for 1 data, its
He represents that implication is similar with its by Sql sentences.
Data file distributor 13, for by the data source current data row generate data file distribute to from
Node, and the structure and the allocated positional information of the data file of the metamessage according to the tables of data, update described
The metamessage of data file corresponding data table;Here, distributed column storage platform as shown in Figure 2, distributed column storage
During platform data storage source data, data file is uniformly distributed each machine in the cluster by Master according to load balance principle
Device, every machine contains several data files (FileSegment) of the tables of data, FileSegment structure such as Fig. 5 institutes
Show, data file is stored in the way of column is stored, when OLAP user only needs to a small number of several data row of inquiry,
Column storage can only provide the corresponding data row for needing to read for user, and then greatly improve OLPA search efficiency.
Preferably, the equipment also includes:Data file device 13 ' is generated, for being reached when the line number of the data source
During the size threshold value of default data file, then current data row new life is turned into a data file, by newly-generated data
Text distribution is corresponding from node to the tables of data, and updates the metamessage of the corresponding tables of data of the data file.
In the embodiment of the application one, whenever the line number of the data in data in EMS memory source is equal to the size of a data file
During scope, i.e., current data row is generated as a data file, while the host node (Master) in cluster is according to load
Homeostatic principle is distributed into cluster to be stored in the hard disk of a certain machine, and updates the data the corresponding metamessage of list file, this
When, also begin to create for global index's table of the data file.
Distributor 14, for the information of the index file of the concordance list to be created to be distributed to corresponding from node
In index file, and update the corresponding metamessage of the concordance list to be created.Stored here, the data file of generation is distributed
Afterwards, it is necessary to which the information of its corresponding index file is allocated into correspondence slave index file, and update global index
In the corresponding metamessage of table.Wherein, the information of index file can include value, the institute for creating the index column in the concordance list completed
State the offset of the row where the filename of the affiliated data file of index column, the index column in the data file etc..
In the embodiment of the application one, the concordance list to be created includes Hash class global index's table or the global rope of scope class
Draw table.Here, the allocation strategy of the index file of Hash class and scope class global index table in the cluster is slightly different.Hash class
Global index's table is the machine being distributed according to the cryptographic Hash of index column decision index file in distributed type assemblies, and scope class
Global index's table is then that index file is assigned into corresponding machine according to the scope of index train value.
Preferably, the second creating device 12, the value for the index column according to the concordance list to be created determines the leaf
The key assignments of child node;According to the positional information of data file, the member of the concordance list to be created in the metamessage of the tables of data
Offset in information in the positional information of index file and the index column indexed file determines the position of the leaf node
Put the value of information.In the embodiment of the application one, the index file of Hash class global index's table and Hash class global index table is adopted
Tissue storage is carried out with BPlusTree structures.BPlusTree leaf node includes tuple<Key, value>, wherein, key values
For the value of index column, from data file, the record of data file informations of the value where the index column and the condition that meets
Offset in the index file.Data file is evenly distributed in each in cluster by Master according to load balancing principle
Individual machine.Every machine contains several data files (FileSegment) of the tables of data, and FileSegment structure is such as
Shown in Fig. 5;When distributed column storage platform creates data source corresponding global index's table, Hash class global index table is corresponding
Index file is HashIndexFileSegment, and the corresponding index file of scope class global index table is
RangeIndexFileSegment。
Preferably, when the concordance list to be created is Hash class global index's table, distributor 14 is used for according to described
The cryptographic Hash that the value of the index column of Hash class global index table is determined, by the index file correspondence of the Hash class global index table
Key assignments and positional information value distribute into the corresponding index file from node of the Hash class global index table.Specifically,
Distributor 14, for described in the value of the index column according to the Hash class global index table and the number determination from node
The cryptographic Hash of index column, the key assignments for leaf node in the corresponding BPlusTree structures of i and position by the cryptographic Hash of the index column
The value of information is put to distribute to i+1 from the index file of node, wherein, i is natural number, so that reasonably by index file
Information is distributed into each slave, reaches equally distributed purpose.It should be noted that by determining cryptographic Hash, so that it is determined that
The information of index file is allocated the information (being such as allocated to machine 1) of machine extremely, that is, determines cryptographic Hash correspondence
's<Key, value>The information of distribution machine extremely, only when<Key, value>It is allocated to the index file of corresponding machine
When middle, leaf point is just really created completion in its corresponding BPlusTree structure.
In the embodiment of the application one, it is assumed that have 1 master, n platforms slave in distributed type assemblies.Hash class overall situation rope
Draw table by the way that cryptographic Hash is sought index column, and the value of index column is the key values of leaf node in BPlusTree structures, index column
The offset of the information and index column corresponding record of corresponding data file in the data file is value.By key and key
Cryptographic Hash be assigned to for the i positional information value of record in i+1 platform slave index file, wherein, cryptographic Hash can
Determined with the slave numbers in the value and distributed type assemblies according to index column.
Preferably, when the concordance list to be created is scope class global index's table, distributor 14, for according to institute
State the sampled result that the value of index column sampled and determine that range of distribution is interval, and record each from node and its corresponding rope
The range of distribution for drawing row is interval;According to the information of the interval index file by the scope class global index table of the range of distribution
Distribution is into the corresponding index file from node of the scope class global index table.Specifically, distributor 14, for by institute
The value of index column and the range of distribution interval of the record for stating scope class global index table are compared, and determine the index column
Value where range of distribution it is interval;Range of distribution according to where the value of the index column is interval, by the value of the index column
In corresponding BPlusTree structures the key assignments and positional information value of leaf node distribute to the range of distribution it is interval it is corresponding from
The index file of node.
In the embodiment of the application one, it is assumed that have 1 Master, n platforms slave in distributed type assemblies.Scope class overall situation rope
Draw table to sample by the value of the index column to data list file, n scope is set according to sampled result so that each scope
Interior data volume, which is tried one's best, to be uniformly distributed, and records in Master the range intervals of every machine and its corresponding index column.
When generating an index file, n scope in the value and Master of the index column of index file can be compared, according to
Offset of the affiliated range areas by the corresponding data file information of the row and in the data file, i.e.,<Key, value>Update
Into the corresponding slave in range areas index file.
In the embodiment of the application one, distributor 14, if for the existing key assignments in the index file;Then will
New positional information value and old positional information value are fused to the corresponding leaf node of the key assignments.If in the index file not
There is the key assignments, then insert new leaf node in the BPlusTree structures, the key assignments and positional information value are deposited
Store up to the new leaf node.
Will here, working as<Key, value>When being assigned to corresponding slave index file, if in the index file
In the presence of the corresponding leaf node of key values, then new value is merged with old value;Should if being not present in the index file
Key values, then insert new leaf node, will<Key, value>Tuple is stored into new leaf node.
Preferably, the equipment also includes:Index file device 14 ' is generated, for when the rope of the concordance list to be created
When the size of quotation part reaches default index file size threshold value, new index file is generated, and the new index is literary
The updating location information of part is into the corresponding metamessage of the concordance list to be created.In the embodiment of the application one, when Hash class
When the index file of global index's table and scope class global index table each reaches the maximum magnitude of size of an index file,
The new index file of regeneration, and by the updating location information of index file into the corresponding metamessage of global index's table.
In the preferred embodiment of the application one, such as there are 1 Master, 3 slave in distributed system, by data text
The maximum magnitude of part is set as 25 rows, and when the number of data lines of input is equal to maximum magnitude 25, distributed column storage platform will
Current data row is output to a certain machine in cluster as a data file, Master according to load balancing principle
Corresponding data file (FileSegment) in SSD, and update the data the corresponding metamessage of table.For Hash class global index
Table, key is id value, and value is the offset of the information and id corresponding records of the corresponding data files of id in the data file
Information.Hash is taken by key values, is 0,1,2 by Hash result<Key, value>Tuple be separately dispensed into cluster the 1st,
2nd, in the index file (HashIndexFileSegment) of the Hash class concordance list of 3 machines, as shown in fig. 6, working as id=1
When, Hash result is 1, then by it<Key, value>Store in the 2nd slave in cluster, as id=2, Hash result
For 2, then by it<Key, value>Store in the 3rd slave in cluster;As id=3, Hash result is 0, then by it<
Key, value>Store in the 1st slave in cluster, wherein, key | and key%3=cryptographic Hash } refer to key and slave
Number (here number be 3) between carry out remainder, obtain cryptographic Hash, key is id value here.
When creating scope class global index's table, to scope division result such as Fig. 6 institutes after the key values sampling of data source
Show, scope division principle, which should try one's best, make it that the record number in each range intervals is approached, the result of range partition comprising [1,
333], [334,666] and [667,999] three intervals, and each self-corresponding scope areas of three slave of storage in Master
Domain, when the value of the index column in data block meets some range intervals, by its index information storage to range intervals correspondence
Slave machines scope class concordance list data file (RangeIndexFileSegment) in, with BPlusTree leaves
The form of node is present, and such as id=5, key values fall in first range intervals, then its is corresponding<Key, value>Information is deposited
In the index file for storing up First machine.
It should be noted that either Hash class concordance list or scope class concordance list, will<Key, value>Distribution is extremely
When in corresponding slave index file, it is necessary in judging the structure BPlusTree of index file whether existing corresponding rope
Draw the id of row, if there is some id in BPlusTree, directly by the id arrange corresponding data block (Block) information and
The line number being listed in Block is merged with original value.If the key values are not present in the index file, insertion is new
Leaf node, will<Key, value>Store in new leaf node.
When the index file that two classes are indexed more than each reaches the maximum magnitude of size of an index file, regeneration
New index file, and by the updating location information of index file into the corresponding metamessage of global index's table;Pass through the application
The method of described establishment concordance list, obtains the corresponding index file of data source, Fig. 7 shows the data in the embodiment of the application one
The index file for the Hash class concordance list that source is created, because all key values take Hash result to be 0, so the index is literary
Part (HashIndexFileSegment) is located in the slave 1 of cluster, and storage organization is BPlusTree, leaf node storage
<Key, value>, such as key=3 corresponding value is fs1:4, represent that id=3 record is located in FileSegment1, and
Offset is 4;Fig. 8 shows the index file for the scope class concordance list that the data source in the embodiment of the application one is created, institute
There are key values to be respectively less than 333, the index file (RangeIndexFileSegment) is located in the slave 1 of cluster, storage knot
Structure is BPlusTree, and leaf node stores<Key, value>, such as key=1 corresponding value is fs1:1:2, represent id=
1 record is located in FileSegment1, and offset is 1 and 2.
In summary, by the concordance list of herein described establishment, bottom storage organization is optimized, distribution is can apply to
The inquiry of data in formula system, the number that the part of user's request is arranged is met by Hash class concordance list and the inquiry of scope class concordance list
According to, it is to avoid all data files are read, greatly reduce the data volume of access, the processing speed of on-line analytical processing task is improved
Degree.Certainly, the concordance list of herein described establishment, can also be applied to utilize the scenes such as concordance list distributing system resource, not
It is confined to the inquiry applied to data.
Obviously, those skilled in the art can carry out the essence of various changes and modification without departing from the application to the application
God and scope.So, if these modifications and modification of the application belong to the scope of the application claim and its equivalent technologies
Within, then the application is also intended to comprising including these changes and modification.
It should be noted that the application can be carried out in the assembly of software and/or software and hardware, for example, can adopt
Realized with application specific integrated circuit (ASIC), general purpose computer or any other similar hardware device.In one embodiment
In, the software program of the application can realize steps described above or function by computing device.Similarly, the application
Software program (including related data structure) can be stored in computer readable recording medium storing program for performing, for example, RAM memory,
Magnetically or optically driver or floppy disc and similar devices.In addition, some steps or function of the application can employ hardware to realize, example
Such as, as coordinating with processor so as to performing the circuit of each step or function.
In addition, the part of the application can be applied to computer program product, such as computer program instructions, when its quilt
When computer is performed, by the operation of the computer, it can call or provide according to the present processes and/or technical scheme.
And the programmed instruction of the present processes is called, it is possibly stored in fixed or moveable recording medium, and/or pass through
Broadcast or the data flow in other signal bearing medias and be transmitted, and/or be stored according to described program instruction operation
In the working storage of computer equipment.Here, including a device according to one embodiment of the application, the device includes using
In the memory and processor for execute program instructions of storage computer program instructions, wherein, when the computer program refers to
When order is by the computing device, method and/or skill of the plant running based on foregoing multiple embodiments according to the application are triggered
Art scheme.
It is obvious to a person skilled in the art that the application is not limited to the details of above-mentioned one exemplary embodiment, Er Qie
In the case of without departing substantially from spirit herein or essential characteristic, the application can be realized in other specific forms.Therefore, no matter
From the point of view of which point, embodiment all should be regarded as exemplary, and be nonrestrictive, scope of the present application is by appended power
Profit is required rather than described above is limited, it is intended that all in the implication and scope of the equivalency of claim by falling
Change is included in the application.Any reference in claim should not be considered as to the claim involved by limitation.This
Outside, it is clear that the word of " comprising " one is not excluded for other units or step, and odd number is not excluded for plural number.That is stated in device claim is multiple
Unit or device can also be realized by a unit or device by software or hardware.The first, the second grade word is used for table
Show title, and be not offered as any specific order.