Background technique
With the development and application of database technology, the data volume of database purchase is growing day by day, while quickly, neatly
The complex query processing for carrying out big data quantity also becomes new demand.OLAP (On-Line Analytical Processing,
On-line analytical processing), dedicated for supporting complicated analysis operation, stress the decision branch to decision-maker and senior management staff
It holds.Under usual condition, OLAP user only needs to inquire a small number of several data column, can be loaded using line storage many useless
Data column, cause query performance to decline.The basic query method of distributed column storage reads member letter first from zookeeper
Breath, and then go each machine in cluster to read all data files, and then read from each data file and meet condition
Record, this mode directly result in that data access amount is excessive, affect the query performance of OLAP.
Apply for content
The purpose of the application is to provide a kind of method and apparatus of inquiry based on concordance list, solves in the prior art
The problem of data access amount is excessive when inquiring data, influences the query performance of on-line analytical processing.
According to the one aspect of the application, a kind of method of inquiry in host node end group in concordance list is provided, it is described
Method includes:
The index column of concordance list described in the data structure lookup for passing through concordance list according to the inquiry request of user, and judge institute
Whether the value for stating index column is determining value, obtains judging result;
The type of the concordance list is determined according to the judging result;
The slave node where the corresponding index file of the concordance list is determined by the type of the concordance list;
The location information of the identified index file from node is sent according to the corresponding metamessage of the concordance list
To described from node, and the location information of the identified data file from node is sent out according to the metamessage of the tables of data
It send to described from node.
Further, the structure of index file includes BPlusTree structure in the data structure of the concordance list.
Further, the BPlusTree structure includes the key assignments and location information value of leaf node, wherein the key
Value is determining according to the value of the index column of the concordance list,
The location information value is according to the row where the filename of the affiliated data file of the index column and the index column
Offset in the data file determines.
On the other hand according to the application, a kind of method in the inquiry from node side based on concordance list is provided, it is described
Method includes:
The location information of the index file sent according to host node determines the index by the data structure of index file
The value of index column where file in concordance list;
The information of index file described in the concordance list is obtained according to the value of the index column;
According to the location information of data file in the metamessage of the tables of data of host node transmission, judgement is described from node
In with the presence or absence of user query request corresponding data file, if so, the number according to the acquisition of information of the index file
According to file.
Further, the information of the index file includes the key assignments and position letter in the data structure of the index file
When breath value, the information of index file described in the concordance list is obtained according to the value of the index column, comprising:
The corresponding key of the value of index column described in the data structure for determining the index file according to the value of the index column
Value;
Location information value in the data structure of the index file is obtained according to the corresponding key assignments of the value of the index column.
Further, the location information value in the data structure of the concordance list, comprising:
The filename of the affiliated data file of index column and row where the index column are in the data file
Offset.
According to the another aspect of the application, a kind of host node device of inquiry based on concordance list, the master are additionally provided
Node device includes:
Judgment means, for the rope according to the inquiry request of user by concordance list described in the data structure lookup of concordance list
Draw column, and judges whether the value of the index column is determining value, obtains judging result;
Types of devices is determined, for determining the type of the concordance list according to the judging result;
Positioning device, for by the type of the concordance list determine where the corresponding index file of the concordance list from
Node;
Sending device, for according to the corresponding metamessage of the concordance list by the identified index file from node
Location information be sent to it is described from node, and according to the metamessage of the tables of data by the identified data file from node
Location information be sent to it is described from node.
Further, the structure of index file includes BPlusTree structure in the data structure of the concordance list.
Further, the BPlusTree structure includes the key assignments and location information value of leaf node, wherein
The key assignments is determining according to the value of the index column of the concordance list,
The location information value is according to the row where the filename of the affiliated data file of the index column and the index column
Offset in the data file determines.
According to the application another aspect, additionally provide a kind of slave node device of inquiry based on concordance list, it is described from
Node device includes:
Determining device, the data structure that the location information of the index file for being sent according to host node passes through index file
The value of index column where determining the index file in concordance list;
Acquisition device, for obtaining the information of index file described in the concordance list according to the value of the index column;
Inquiry unit, the location information of data file in the metamessage of the tables of data for being sent according to the host node,
Judgement is described to request corresponding data file with the presence or absence of user query from node, if so, according to the index file
Data file described in acquisition of information.
Further, the information of the index file includes the key assignments and position letter in the data structure of the index file
When breath value, the acquisition device is used for:
The corresponding key of the value of index column described in the data structure for determining the index file according to the value of the index column
Value;
Location information value in the data structure of the index file is obtained according to the corresponding key assignments of the value of the index column.
Further, the location information value in the data structure of the concordance list, comprising:
The filename of the affiliated data file of index column and row where the index column are in the data file
Offset.
Compared with prior art, the application is by passing through the data structure lookup institute of concordance list according to the inquiry request of user
The index column of concordance list is stated, and judges whether the value of the index column is determining value, obtains judging result;It is tied according to the judgement
Fruit determines the type of the concordance list;Where determining the corresponding index file of the concordance list by the type of the concordance list
From node;The location information of the identified index file from node is sent to according to the corresponding metamessage of the concordance list
It is described to send the location information of the identified data file from node from node, and according to the metamessage of the tables of data
To described from node.In the data that from node side, the location information of the index file sent according to host node passes through index file
The value of index column where index file described in structure determination in concordance list;The concordance list is obtained according to the value of the index column
Described in index file information;According to the host node send tables of data metamessage in data file location information,
Judgement is described to request corresponding data file with the presence or absence of user query from node, if so, according to the index file
Data file described in acquisition of information.To by being dynamically selected suitable global index's table according to querying condition and quickly determining
Index file is loaded onto memory, combined filtering conditional filtering then according to the metamessage of global index's table by index file position
Meet the data file and offset of condition out.If data file is present in local, subsequent query processing is directly carried out, it is no
Then query task and index information are again assigned to the machine where data file by host node, finally, where data file
The data file for the condition that meets is loaded onto memory by machine, and data are read from data file according to offset, returns to inquiry knot
Fruit.By the querying method of the application, meet the different screening conditions of user, effectively filters out the data text of the condition of satisfaction
Part greatly reduces reading data amount when inquiry, shortens query time, preferably improves the search efficiency of OLAP.
Specific embodiment
The application is described in further detail with reference to the accompanying drawing.
In a typical configuration of this application, terminal, the equipment of service network and trusted party include one or more
Processor (CPU), input/output interface, network interface and memory.
Memory may include the non-volatile memory in computer-readable medium, random access memory (RAM) and/or
The forms such as Nonvolatile memory, such as read-only memory (ROM) or flash memory (flash RAM).Memory is computer-readable medium
Example.
Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method
Or technology come realize information store.Information can be computer readable instructions, data structure, the module of program or other data.
The example of the storage medium of computer includes, but are not limited to phase change memory (PRAM), static random access memory (SRAM), moves
State random access memory (DRAM), other kinds of random access memory (RAM), read-only memory (ROM), electric erasable
Programmable read only memory (EEPROM), flash memory or other memory techniques, read-only disc read only memory (CD-ROM) (CD-ROM),
Digital versatile disc (DVD) or other optical storage, magnetic cassettes, magnetic tape disk storage or other magnetic storage devices or
Any other non-transmission medium, can be used for storage can be accessed by a computing device information.As defined in this article, computer
Readable medium does not include non-temporary computer readable media (transitory media), such as the data-signal and carrier wave of modulation.
Fig. 1 is shown according to the application on one side, provides a kind of method of inquiry in host node end group in concordance list
Flow diagram, wherein the described method includes: step S11~step S14, is preferably applied to data in distributed system and looks into
It askes,
In step s 11, the rope of the concordance list according to the data structure lookup that the inquiry request of user passes through concordance list
Draw column, and judges whether the value of the index column is determining value, obtains judging result;In one embodiment of the application, according to
Whether the querying condition at family is determining value so as to the value for judging the index column found, is determined and is indexed according to the inquiry request of user
The index column of table includes the key assignments determined according to the value of the index column of concordance list in the data structure of concordance list, is judged
As a result, to provide the possibility of choice of dynamical respective index table for user.
In one embodiment of the application, the structure of index file includes BPlusTree knot in the data structure of the concordance list
Structure, here, the structure of index file can be BPlusTree structure, it is preferable that in the described embodiment, BPlusTree structure
Key assignments and location information value including leaf node, the leaf node of BPlusTree structure include tuple < key assignments, location information
Value>(<key, value>), BPlusTree structure can effectively be ranked up the index column data of input, so as to fast
The position of the corresponding record of index column, the query task of quick response data are ask in quick checking.Wherein, the key assignments (key) is according to institute
The value for stating the index column of concordance list determines that the location information value (value) is according to the text of the affiliated data file of the index column
Offset of the row in the data file where part name and the index column determines that key value is the value of index column, derives from
Data file, value is data file information where the index column and to meet being recorded in for condition inclined in the index file
Shifting amount.
In step s 12, the type of the concordance list is determined according to the judging result;Here, according in judging result
Whether the value of index column is to determine that value is dynamically selected the type of concordance list: if the judging result is the value of the index column
To determine value, it is determined that the type of the concordance list is Hash class global index table, if the judging result is the rope
The value for drawing column is non-determined value, it is determined that the type of the concordance list is range class global index table.
In one embodiment of the application, the type of the concordance list includes that Hash class global index's table and/or range class are complete
Office's concordance list.Here, the allocation strategy of the index file of Hash class and range class global index table in the cluster is slightly different.It breathes out
Xi Lei global index table is the machine being distributed in distributed type assemblies according to the cryptographic Hash of index column decision index file, and model
Wei Lei global index table is then that index file is assigned to corresponding machine according to the range of index train value.Therefore, it is necessary to bases
Querying condition dynamically selects the type of concordance list, after determining the type of concordance list, is carried out according to respective allocation strategy true
Determine the slave node where index file.
Distributed system frame diagram in one embodiment of the application is as shown in Fig. 2, include client (client), host node
(Master), one or several from node (slave) and zookeeper, data file in each tables of data can be stored in
In solid state hard disk (SSD), in the embodiment of the present application, the metamessage of tables of data is stored in zookeeper, wherein described
Zookeeper is the coordination system of performance distributed application.
Connect above-described embodiment, the inquiry sql sentence of one data source of user's needle shown in Fig. 3, such as Sql 1: " Select*
What from table A where id=1 " indicated inquiry 1 sentence of sql execution is to inquire the data that id is 1 in tables of data A to arrange
Corresponding data, according to the querying condition of the sql 1, it may be determined that the value of index column is the value 1 of id, is determining value, then chooses Kazakhstan
Xi Lei global index table carries out inquiry data;And Sql 3: " id < 5 Select*from table A where " indicate inquiry
What 5 sentence of sql executed is id data corresponding less than 5 data column in inquiry tables of data A, is executed according to 5 sentence of sql
Querying condition, it may be determined that the value of index column is less than 5, is not determining value, therefore needs to choose for the query statement of the sql 5
The concordance list different from sql 1, range of choice class global index table carry out inquiry data.
In one embodiment of the application, when the type of the concordance list be Hash class global index's table when, step S13 it
Before, the method also includes step S12 ', the distribution determined according to the cryptographic Hash of the index column of Hash class global index table
Rule distributes the value of information of the index file of Hash class global index table to corresponding from node, wherein the index
The information of file includes the key assignments of leaf node and location information value in the BPlusTree structure of the index file.Here, root
According to actual demand, the column for needing to create global index are determined, i.e., selected part column are used as index column from tables of data, and then utilize
Selected index column creates the corresponding metamessage of distributed global index table, wherein the metamessage includes table institute, global index
All index files for the including location information on each machine hard disk in the cluster.Concordance list is created successfully can be used for afterwards
The inquiry of data in distributed system can satisfy while user only needs to inquire the demand of a small number of several data column and avoid
All data files are read, the data volume of access is greatly reduced.
Specifically, in step S12 ', according to the value of the index column of Hash class global index table and described from node
Number determine the cryptographic Hash of the index column, be leaf in the corresponding BPlusTree structure of i by the cryptographic Hash of the index column
The key assignments and location information value of node are distributed to i+1 from the index file of node, wherein i is natural number, thus rationally
Ground distributes the information of index file into each slave, reaches equally distributed purpose.It should be noted that passing through determination
Cryptographic Hash determines so that it is determined that the information of index file is assigned the information (being such as assigned to machine 1) of machine extremely
By cryptographic Hash it is corresponding<key, the information of the machine of value>distribution extremely.
In one embodiment of the application, it is assumed that have 1 master, n platform slave in distributed type assemblies.Hash class overall situation rope
Draw table by seeking cryptographic Hash to index column, and the value of index column is the key value of leaf node in BPlusTree structure, index column
The offset of the information and index column corresponding record of corresponding data file in the data file is value.By key and key
Cryptographic Hash be i the location information value of record be assigned in the index file of i+1 platform slave, wherein cryptographic Hash can
To be determined according to slave number in the value of index column and distributed type assemblies.
In one embodiment of the application, when the type of the concordance list be range class global index's table when, step S13 it
Before, the method also includes: step S121 and step S122 is adopted in step S121 according to the value to the index column
The sampled result of sample determines range of distribution section, and records each from node and the range of distribution area of its corresponding index column
Between;In step S122, according to the allocation rule that the range of distribution section determines, by the rope of range class global index table
The information of quotation part is distributed to corresponding from node, wherein the information of the index file includes the index file
The key assignments of leaf node and location information value in BPlusTree structure.Specifically, in step S122, the range class is complete
The value of index column of office's concordance list is compared with the range of distribution section of the record, where the value for determining the index column
Range of distribution section;Range of distribution section where the value of the index column, the value of the index column is corresponding
The key assignments of leaf node and location information value are distributed corresponding from node to the range of distribution section in BPlusTree structure
Index file.
In one embodiment of the application, it is assumed that have 1 Master, n platform slave in distributed type assemblies.Range class overall situation rope
Draw table to sample by the value of the index column to data list file, n range is set according to sampled result, so that each range
Interior data volume is evenly distributed as much as possible, and the range intervals of every machine and its corresponding index column are recorded in Master.
When generating an index file, the value of the index column of index file can be compared with n range in Master, according to
Offset of the affiliated range areas by the corresponding data file information of the column and in the data file, i.e.,<key, value>update
Into the index file of the corresponding slave in range areas.
It should be noted that the size of index file reaches preset index file size threshold value when creating concordance list
When, new index file is generated, Master believes the updating location information of the new index file to the corresponding member of concordance list
In breath.In one embodiment of the application, the data source of example as shown in Figure 4, the data source totally 1000 records, including address
Identify (id), name (name), age (age), four column data of gender (sex).Using the data in Fig. 4 as data source, it is assumed that
There are 1 Master, 3 slave the maximum magnitude of data file to be set as 25 rows, when the data of input in distributed system
When line number is equal to maximum magnitude 25, distributed column storage platform is using current data line as a data file, Master
Corresponding data file (FileSegment) in the SSD of a certain machine is output in cluster according to load balancing principle, and
The corresponding metamessage of more new data table, for Hash class global index table, key is the value of id, and value is the corresponding data text of id
The offset information of the information and id corresponding record of part in the data file.Take Hash by key value, by Hash result be 0,1,
2<key, value>tuple is separately dispensed into the index file of the Hash class concordance list of the 1st, 2,3 machine in cluster
(HashIndexFileSegment) in, as shown in figure 5, as id=1, Hash result 1, then by its<key, value>deposit
Store up in the 2nd slave in cluster, as id=2, Hash result 2, then by its<key, value>storage is into cluster
The 3rd slave in;As id=3, Hash result 0, then by its<key, the 1st into cluster of value>storage
In slave, wherein { key | key%3=cryptographic Hash } refers to be taken between the number (number is 3 here) of key and slave
It is remaining, cryptographic Hash is obtained, key is the value of id here.
When creating range class global index's table, to range division result such as Fig. 5 institute after the key value sampling of data source
Show, range division principle should make the record number in each range intervals close as far as possible, the result of range partition include [1,
333], [334,666] and [667,999] three sections, and the corresponding range area three slave is stored in Master
Domain, it is when the value of the index column in data block meets some range intervals, the storage of its index information is corresponding to the range intervals
Slave machine range class concordance list data file (RangeIndexFileSegment) in, with BPlusTree leaf
The form of node exists, and such as id=5, key value falls in first range intervals, then by its corresponding<key, value>information is deposited
In the index file for storing up First machine.
Above-described embodiment is connect, in step s 13, the corresponding index of the concordance list is determined by the type of the concordance list
Slave node where file;In one embodiment of the application, after the type for determining concordance list, according to the type pair of respective concordance list
The allocation strategy answered calculates slave node of the index file of the condition of satisfaction in distributed type assemblies, and specific implementation can lead to
Cross following embodiment realization:
When the concordance list selected is Hash class global index's table, the rope is determined by Hash class global index table
Draw the cryptographic Hash of column;The slave section where the corresponding index file of Hash class global index table is determined according to the cryptographic Hash
Point.Here, then choosing Hash class global index table when querying condition is index column when determining value, calculating in querying condition
The cryptographic Hash j of train value then meets in+1 slave of jth of the index file of condition in distributed type assemblies.
When the concordance list selected is range class global index's table, the rope is determined by range class global index table
Draw range of distribution section belonging to the value of column;Determine that range class global index's table is corresponding according to determining range of distribution section
Index file where slave node.Here, if querying condition is the uncertain value of index column, selection range class global index
Table, the range areas according to belonging to index column determine that the index file for the condition that meets is distributed in the corresponding slave of cluster.
In one specific embodiment of the application, inquiry sql1 and sql2 according to Fig.3, corresponding key value is determination
Value, then take Hash to it, obtains id=1 and corresponds to cryptographic Hash to be 1, determine that its index file is located in the slave2 of cluster, and id
=27 corresponding cryptographic Hash are 0, determine that its index file is located in the slave1 of cluster.Key in sql3 is uncertain value, is looked into
Inquiry condition is id < 5, is fallen in [1,333] section just, determines that its index file is located in slave1 immediately.Rope is being determined
After machine where quotation part, the subsequent query step of two different global index's tables is identical.
In step S14, according to the corresponding metamessage of the concordance list by the identified index file from node
Location information be sent to it is described from node, and according to the metamessage of the tables of data by the identified data file from node
Location information be sent to it is described from node.Here, Master obtains above-mentioned slave according to the metamessage of corresponding global index's table
In all index files location information and be sent to corresponding slave, while master should according to the metamessage of tables of data
The data file location information stored in slave is sent to slave together;For example, sql1 is directed to, according to the member of global index's table
Information finds index file location information all in slave2 and sends it to slave2, while the institute that will be stored on slave2
There is the location information of data file to send together.
It should be noted that after step s 14, it is described further include: step S15 receives the institute from node feeding back
State the location information value in the data structure of index file.Here, when Master sends data file information to a certain slave,
And the data file without containing the location information value (value) in the data structure for meeting index file in the slave, then
Master can receive the value in the data structure by the slave index file fed back, receive described from node feeding back
After location information value in the data structure of the index file, step S16 is executed, is worth according to the positional information from described
The metamessage of tables of data redefines the slave node where the data file;The position for the index file for including by the concordance list
The location information for the data file that confidence breath and the tables of data include is re-transmitted to the slave node redefined.Here,
Task is reassigned to other slave for storing the data file by Master according to the position of data file, and will be corresponding
Data file location information and value information be sent to other slave.For example, being sent to this itself number according to Master
According to the file information, find the relevant information that fs1 is not found in local data file, then illustrate fs1 cluster other
In slave, at this point, receive slave2 found<key, value>information feedback, Matser by searching for tables of data member
Information finds fs1 and is located in slave1 file, then general<key, and value>information is sent to slave1, and inquiry later
Task hands to slave1.
Fig. 6 is shown according to the application on the other hand, provides a kind of side in the inquiry from node side based on concordance list
Method flow diagram, wherein the described method includes: step S21~step S23, is preferably applied to data in distributed system
Inquiry,
In the step s 21, the data that the location information of the index file sent according to host node passes through the index file
The value of index column where index file described in structure determination in concordance list;In one embodiment of the application, from node slave root
Index file is loaded onto memory according to the location information of the index file received, and then is looked into according to the location information of index file
The value of index column where finding index file in concordance list obtains the rope according to the value of the index column in step S22
Draw the information of index file described in table;Here, the information of the index file includes in the data structure of the index file
Key assignments and when location information value, index column described in the data structure that determines the index file according to the value of the index column
The corresponding key assignments of value;Position in the data structure of the index file is obtained according to the corresponding key assignments of the value of the index column to believe
Breath value.Wherein, the location information value in the data structure of the concordance list includes: the file of the affiliated data file of the index column
Offset of the row in the data file where name and the index column.Connect above-described embodiment, due to index file with
The storage of BPlusTree structure, wherein the key assignments in the leaf node of the structure is the value of index column, therefore can be rapidly
By finding qualified key assignments key in the value indexed file of the index column in querying condition, and read its location information
It is worth (value), i.e. this records corresponding data file information and offset, for example, slave2 believes local index file
Breath is loaded into memory, finds the node of key=1 in each BPlusTree structure, and read its value value, i.e. fs1:1 2,
Indicate that the record of id=1 is located in data file 1 (FileSegment1), and offset is 1 and 2.
In step S23, according to the location information of data file in the metamessage of the tables of data of host node transmission, sentence
The disconnected user query that whether there is from node request corresponding data file, if so, according to the letter of the index file
Breath obtains the data file.In one embodiment of the application, the data file information of the slave is sent in conjunction with Master,
If directly query steps are carried out to these data files, according to value containing the data file for meeting value in the slave
In offset therefrom read the data for meeting condition in each data file, return to query result.
Preferably, if the user query that are not present from node request corresponding data file, by the data knot
Location information value in structure feeds back to the host node.In one embodiment of the application, for the data text not in the slave
Remaining value information is passed to Master by part, slave, is divided task again according to the position of data file by Master
Dispensing stores other slave of these data files, and corresponding data file location information and value information are sent to it
He is slave.For example, being sent to this itself data file information according to Master, find not look in local data file
To the relevant information of fs1, then illustrate fs1 in other slave of cluster, at this point, slave2 by find<key, value>letter
Breath is sent to Master, then Matser finds fs1 by searching for the metamessage of tables of data and is located in slave1 file, then incite somebody to action <
Key, value > information is sent to slave1, and query task later is handed to slave1.Slave1 adds data file
It is loaded onto memory, the data for meeting condition in each data file are therefrom read according to the offset in value, returns to inquiry knot
Fs1 is loaded onto memory, according to the offset 1 in value for example, slave1 receives the information and task that Master is sent by fruit
With 2, two records that offset in data file is 1 and 2 are read, query result is returned.
It should be noted that when being not present from node, user query request is corresponding in the above embodiments of the present application
When data file, value is only passed into host node from node, host node only goes the metamessage of tables of data to find data at this time
Task is sent to accordingly from node by the slave node of file and the location information of data file, accordingly from the direct root of node
It is loaded according to the location information of data file, and extracts the data of corresponding offset, to improve the efficiency of data query.
In conclusion it is simultaneously quickly true to be dynamically selected suitable global index's table according to querying condition first in inquiry
Determine index file position, index file is loaded onto memory, combined filtering condition sieve then according to the metamessage of global index's table
Select the data file and offset of the condition of satisfaction.If data file is present in local, subsequent query processing is directly carried out,
Otherwise query task and index information are again assigned to the machine where data file by host node, finally, where data file
Machine the data file for the condition that meets is loaded onto memory, data are read from data file according to offset, return to inquiry
As a result.This method passes through the creation of two class distribution global index tables, meets the different screening conditions of user, effectively filters out
The data file for meeting condition greatly reduces reading data amount when inquiry, shortens query time, preferably improves OLAP
Search efficiency.
Fig. 7, which is shown, additionally provides a kind of host node device of inquiry based on concordance list according to the application another aspect
Structural schematic diagram, wherein the host node device include: judgment means 11, determine types of devices 12, positioning device 13 and hair
Device 14 is sent, data query in distributed system is preferably applied to,
Judgment means 11, for passing through concordance list described in the data structure lookup of concordance list according to the inquiry request of user
Index column, and judge whether the value of the index column is determining value, obtains judging result;In one embodiment of the application, according to
Whether the querying condition of user is determining value so as to the value for judging the index column found, determines rope according to the inquiry request of user
Draw the index column of table, includes the key assignments determined according to the value of the index column of concordance list in the data structure of concordance list, sentenced
Break as a result, to provide the possibility of choice of dynamical respective index table for user.
In one embodiment of the application, the structure of index file includes BPlusTree knot in the data structure of the concordance list
Structure, here, the structure of index file can be BPlusTree structure, it is preferable that in the described embodiment, BPlusTree structure
Key assignments and location information value including leaf node, the leaf node of BPlusTree structure include tuple < key assignments, location information
Value>(<key, value>), BPlusTree structure can effectively be ranked up the index column data of input, so as to fast
The position of the corresponding record of index column, the query task of quick response data are ask in quick checking.Wherein, the key assignments (key) is according to institute
The value for stating the index column of concordance list determines that the location information value (value) is according to the text of the affiliated data file of the index column
Offset of the row in the data file where part name and the index column determines that key value is the value of index column, derives from
Data file, value is data file information where the index column and to meet being recorded in for condition inclined in the index file
Shifting amount.
Types of devices 12 is determined, for determining the type of the concordance list according to the judging result;Here, according to judgement
As a result whether the value of middle index column is to determine that value is dynamically selected the type of concordance list: if the judging result is the index
The value of column is to determine value, it is determined that the type of the concordance list is Hash class global index table, if the judging result is
The value of the index column is non-determined value, it is determined that the type of the concordance list is range class global index table.
In one embodiment of the application, the type of the concordance list includes that Hash class global index's table and/or range class are complete
Office's concordance list.Here, the allocation strategy of the index file of Hash class and range class global index table in the cluster is slightly different.It breathes out
Xi Lei global index table is the machine being distributed in distributed type assemblies according to the cryptographic Hash of index column decision index file, and model
Wei Lei global index table is then that index file is assigned to corresponding machine according to the range of index train value.Therefore, it is necessary to bases
Querying condition dynamically selects the type of concordance list, after determining the type of concordance list, is carried out according to respective allocation strategy true
Determine the slave node where index file.
Distributed system frame diagram in one embodiment of the application is as shown in Fig. 2, include client (client), host node
(Master), one or several from node (slave) and zookeeper, data file in each tables of data can be stored in
In solid state hard disk (SSD), in the embodiment of the present application, the metamessage of tables of data is stored in zookeeper, wherein described
Zookeeper is the coordination system of performance distributed application.
Connect above-described embodiment, the inquiry sql sentence of one data source of user's needle shown in Fig. 3, such as Sql 1: " Select*
What from table A where id=1 " indicated inquiry 1 sentence of sql execution is to inquire the data that id is 1 in tables of data A to arrange
Corresponding data, according to the querying condition of the sql 1, it may be determined that the value of index column is the value 1 of id, is determining value, then chooses Kazakhstan
Xi Lei global index table carries out inquiry data;And Sql 3: " id < 5 Select*from table A where " indicate inquiry
What 5 sentence of sql executed is id data corresponding less than 5 data column in inquiry tables of data A, is executed according to 5 sentence of sql
Querying condition, it may be determined that the value of index column is less than 5, is not determining value, therefore needs to choose for the query statement of the sql 5
The concordance list different from sql 1, range of choice class global index table carry out inquiry data.
In one embodiment of the application, when the type of the concordance list is Hash class global index's table, the host node
Equipment further includes distributor 12 ', the distribution determined for the cryptographic Hash according to the index column of Hash class global index table
Rule distributes the value of information of the index file of Hash class global index table to corresponding from node, wherein the index
The information of file includes the key assignments of leaf node and location information value in the BPlusTree structure of the index file.Here, root
According to actual demand, the column for needing to create global index are determined, i.e., selected part column are used as index column from tables of data, and then utilize
Selected index column creates the corresponding metamessage of distributed global index table, wherein the metamessage includes table institute, global index
All index files for the including location information on each machine hard disk in the cluster.Concordance list is created successfully can be used for afterwards
The inquiry of data in distributed system can satisfy while user only needs to inquire the demand of a small number of several data column and avoid
All data files are read, the data volume of access is greatly reduced.
Specifically, distributor 12 ', for according to the value of the index column of Hash class global index table and described from section
The number of point determines the cryptographic Hash of the index column, is the i corresponding BPlusTree structure middle period by the cryptographic Hash of the index column
The key assignments and location information value of child node are distributed to i+1 from the index file of node, wherein i is natural number, to close
Reason ground distributes the information of index file into each slave, reaches equally distributed purpose.It should be noted that by true
Cryptographic Hash is determined, so that it is determined that the information of index file is assigned the information (being such as assigned to machine 1) of machine extremely, i.e., really
Determined by cryptographic Hash it is corresponding<key, the information of the machine of value>distribution extremely.
In one embodiment of the application, it is assumed that have 1 master, n platform slave in distributed type assemblies.Hash class overall situation rope
Draw table by seeking cryptographic Hash to index column, and the value of index column is the key value of leaf node in BPlusTree structure, index column
The offset of the information and index column corresponding record of corresponding data file in the data file is value.By key and key
Cryptographic Hash be i the location information value of record be assigned in the index file of i+1 platform slave, wherein cryptographic Hash can
To be determined according to slave number in the value of index column and distributed type assemblies.
In one embodiment of the application, when the type of the concordance list is range class global index's table, the host node
Equipment further include: section determining device 121 and information distribution apparatus 122, section determining device 121, for according to the rope
Draw the sampled result that the values of column is sampled and determine range of distribution section, and records each from node and its corresponding index column
Range of distribution section;Information distribution apparatus 122, the allocation rule for being determined according to the range of distribution section will be described
The information of the index file of range class global index table is distributed to corresponding from node, wherein the packet of the index file
Include the key assignments of leaf node and location information value in the BPlusTree structure of the index file.Specifically, information distribution apparatus
122, for the value of index column of range class global index table to be compared with the range of distribution section of the record, really
Range of distribution section where the value of the fixed index column;Range of distribution section where the value of the index column, by institute
The key assignments of leaf node and location information value in the corresponding BPlusTree structure of value of index column is stated to distribute to the range of distribution
The corresponding index file from node in section.
In one embodiment of the application, it is assumed that have 1 Master, n platform slave in distributed type assemblies.Range class overall situation rope
Draw table to sample by the value of the index column to data list file, n range is set according to sampled result, so that each range
Interior data volume is evenly distributed as much as possible, and the range intervals of every machine and its corresponding index column are recorded in Master.
When generating an index file, the value of the index column of index file can be compared with n range in Master, according to
Offset of the affiliated range areas by the corresponding data file information of the column and in the data file, i.e.,<key, value>update
Into the index file of the corresponding slave in range areas.
It should be noted that the size of index file reaches preset index file size threshold value when creating concordance list
When, new index file is generated, Master believes the updating location information of the new index file to the corresponding member of concordance list
In breath.In one embodiment of the application, the data source of example as shown in Figure 4, the data source totally 1000 records, including address
Identify (id), name (name), age (age), four column data of gender (sex).Using the data in Fig. 4 as data source, it is assumed that
There are 1 Master, 3 slave the maximum magnitude of data file to be set as 25 rows, when the data of input in distributed system
When line number is equal to maximum magnitude 25, distributed column storage platform is using current data line as a data file, Master
Corresponding data file (FileSegment) in the SSD of a certain machine is output in cluster according to load balancing principle, and
The corresponding metamessage of more new data table, for Hash class global index table, key is the value of id, and value is the corresponding data text of id
The offset information of the information and id corresponding record of part in the data file.Take Hash by key value, by Hash result be 0,1,
2<key, value>tuple is separately dispensed into the index file of the Hash class concordance list of the 1st, 2,3 machine in cluster
(HashIndexFileSegment) in, as shown in figure 5, as id=1, Hash result 1, then by its<key, value>deposit
Store up in the 2nd slave in cluster, as id=2, Hash result 2, then by its<key, value>storage is into cluster
The 3rd slave in;As id=3, Hash result 0, then by its<key, the 1st into cluster of value>storage
In slave, wherein { key | key%3=cryptographic Hash } refers to be taken between the number (number is 3 here) of key and slave
It is remaining, cryptographic Hash is obtained, key is the value of id here.
When creating range class global index's table, to range division result such as Fig. 5 institute after the key value sampling of data source
Show, range division principle should make the record number in each range intervals close as far as possible, the result of range partition include [1,
333], [334,666] and [667,999] three sections, and the corresponding range area three slave is stored in Master
Domain, it is when the value of the index column in data block meets some range intervals, the storage of its index information is corresponding to the range intervals
Slave machine range class concordance list data file (RangeIndexFileSegment) in, with BPlusTree leaf
The form of node exists, and such as id=5, key value falls in first range intervals, then by its corresponding<key, value>information is deposited
In the index file for storing up First machine.
Connect above-described embodiment, positioning device 13, for determining that the concordance list is corresponding by the type of the concordance list
Slave node where index file;In one embodiment of the application, after the type for determining concordance list, according to the class of respective concordance list
The corresponding allocation strategy of type calculates slave node of the index file of the condition of satisfaction in distributed type assemblies, and specific implementation can
To be realized by following embodiment:
When the concordance list selected is Hash class global index's table, the rope is determined by Hash class global index table
Draw the cryptographic Hash of column;The slave section where the corresponding index file of Hash class global index table is determined according to the cryptographic Hash
Point.Here, then choosing Hash class global index table when querying condition is index column when determining value, calculating in querying condition
The cryptographic Hash j of train value then meets in+1 slave of jth of the index file of condition in distributed type assemblies.
When the concordance list selected is range class global index's table, the rope is determined by range class global index table
Draw range of distribution section belonging to the value of column;Determine that range class global index's table is corresponding according to determining range of distribution section
Index file where slave node.Here, if querying condition is the uncertain value of index column, selection range class global index
Table, the range areas according to belonging to index column determine that the index file for the condition that meets is distributed in the corresponding slave of cluster.
In one specific embodiment of the application, inquiry sql1 and sql2 according to Fig.3, corresponding key value is determination
Value, then take Hash to it, obtains id=1 and corresponds to cryptographic Hash to be 1, determine that its index file is located in the slave2 of cluster, and id
=27 corresponding cryptographic Hash are 0, determine that its index file is located in the slave1 of cluster.Key in sql3 is uncertain value, is looked into
Inquiry condition is id < 5, is fallen in [1,333] section just, determines that its index file is located in slave1 immediately.Rope is being determined
After machine where quotation part, the subsequent query step of two different global index's tables is identical.
Sending device 14, for according to the corresponding metamessage of the concordance list by the identified index file from node
Location information be sent to it is described from node, and according to the metamessage of the tables of data by the identified data text from node
The location information of part is sent to described from node.Here, Master is above-mentioned according to the acquisition of the metamessage of corresponding global index's table
The location information of all index files and it is sent to corresponding slave in slave, while master is according to the metamessage of tables of data
The data file location information stored in the slave is sent to slave together;For example, sql1 is directed to, according to global index's table
Metamessage find index file location information all in slave2 and send it to slave2, while will be stored on slave2
The location informations of all data files send together.
It should be noted that the host node device further include: reception device 15, it is described from node feeding back for receiving
Location information value in the data structure of the index file.Here, when Master sends data file information to a certain
Slave, and the data text without containing the location information value (value) in the data structure for meeting index file in the slave
Part, then Master can receive the value in the data structure by the slave index file fed back, receive described from node
After location information value in the data structure of the index file of feedback, step S16 is executed, is worth according to the positional information
Slave node where redefining the data file from the metamessage of the tables of data;The index text for including by the concordance list
The location information for the data file that the location information of part and the tables of data include is re-transmitted to the slave node redefined.
Here, task to be reassigned to other slave for storing the data file by Master according to the position of data file, and will
Corresponding data file location information and value information are sent to other slave.For example, being sent to this itself according to Master
Data file information, find the relevant information that fs1 is not found in local data file, then illustrate fs1 cluster its
In his slave, at this point, receive slave2 found<key, value>information feedback, Matser is by searching for tables of data
Metamessage finds fs1 and is located in slave1 file, then general<key, and value>information is sent to slave1, and looking into later
Inquiry task hands to slave1.
Fig. 8 is shown according to the application on the other hand, additionally provides a kind of slave node device of inquiry based on concordance list
Structural schematic diagram, wherein it is described to comprise determining that device 21, acquisition device 22 and inquiry unit 23 from node device, preferably
Applied to data query in distributed system,
Determining device 21, the location information of the index file for being sent according to host node pass through the number of the index file
According to the value of the index column in concordance list where index file described in structure determination;In one embodiment of the application, from node slave
Index file is loaded onto memory according to the location information of the index file received, and then according to the location information of index file
The value of index column where finding index file in concordance list, acquisition device 22 are used to obtain institute according to the value of the index column
State the information of index file described in concordance list;Here, the information of the index file includes the data knot of the index file
When key assignments and location information value in structure, rope described in the data structure that determines the index file according to the value of the index column
Draw the corresponding key assignments of value of column;Position in the data structure of the index file is obtained according to the corresponding key assignments of the value of the index column
Set the value of information.Wherein, the location information value in the data structure of the concordance list includes: the affiliated data file of the index column
Offset of the row in the data file where filename and the index column.Above-described embodiment is connect, due to index file
With the storage of BPlusTree structure, wherein the key assignments in the leaf node of the structure is the value of index column, therefore can be quick
Ground finds qualified key assignments key in the value indexed file by the index column in querying condition, and reads its position letter
Breath value (value), i.e. this record corresponding data file information and offset, for example, slave2 is by local index file
Information is loaded into memory, finds the node of key=1 in each BPlusTree structure, and reads its value value, i.e. fs1:1
2, indicate that the record of id=1 is located in data file 1 (FileSegment1), and offset is 1 and 2.
Inquiry unit 23, the position letter of data file in the metamessage of the tables of data for being sent according to the host node
Breath, judgement is described to request corresponding data file with the presence or absence of user query from node, if so, according to the index file
Acquisition of information described in data file.In one embodiment of the application, the data file letter of the slave is sent in conjunction with Master
Breath, if directly carrying out query steps to these data files containing the data file for meeting value in the slave, according to
Offset in value therefrom reads the data for meeting condition in each data file, returns to query result.
Preferably, it is described from node device further include feedback device 24, if there is no users to look into from node for described
It askes and requests corresponding data file, then the location information value in the data structure is fed back into the host node.In the application
In one embodiment, for the data file not in the slave, remaining value information is passed to Master by slave, by
Task is reassigned to other slave for storing these data files according to the position of data file by Master, and will be corresponding
Data file location information and value information be sent to other slave.For example, being sent to this itself number according to Master
According to the file information, find the relevant information that fs1 is not found in local data file, then illustrate fs1 cluster other
In slave, at this point, slave2 by find<key, value>information is sent to Master, then Matser is by searching for tables of data
Metamessage find fs1 and be located in slave1 file, then general<key, value>information is sent to slave1, and later
Query task hands to slave1.Data file is loaded onto memory by Slave1, is therefrom read according to the offset in value every
Meet the data of condition in a data file, return to query result, for example, slave1 receives the information that Master is sent and appoints
Business, is loaded onto memory for fs1, according to the offset 1 and 2 in value, reads two notes that offset in data file is 1 and 2
Record returns to query result.
It should be noted that when being not present from node, user query request is corresponding in the above embodiments of the present application
When data file, value is only passed into host node from node, host node only goes the metamessage of tables of data to find data at this time
Task is sent to accordingly from node by the slave node of file and the location information of data file, accordingly from the direct root of node
It is loaded according to the location information of data file, and extracts the data of corresponding offset, to improve the efficiency of data query.
In conclusion it is simultaneously quickly true to be dynamically selected suitable global index's table according to querying condition first in inquiry
Determine index file position, index file is loaded onto memory, combined filtering condition sieve then according to the metamessage of global index's table
Select the data file and offset of the condition of satisfaction.If data file is present in local, subsequent query processing is directly carried out,
Otherwise query task and index information are again assigned to the machine where data file by host node, finally, where data file
Machine the data file for the condition that meets is loaded onto memory, data are read from data file according to offset, return to inquiry
As a result.This method passes through the creation of two class distribution global index tables, meets the different screening conditions of user, effectively filters out
The data file for meeting condition greatly reduces reading data amount when inquiry, shortens query time, preferably improves OLAP
Search efficiency.
Obviously, those skilled in the art can carry out various modification and variations without departing from the essence of the application to the application
Mind and range.In this way, if these modifications and variations of the application belong to the range of the claim of this application and its equivalent technologies
Within, then the application is also intended to include these modifications and variations.
It should be noted that the application can be carried out in the assembly of software and/or software and hardware, for example, can adopt
With specific integrated circuit (ASIC), general purpose computer or any other realized similar to hardware device.In one embodiment
In, the software program of the application can be executed to implement the above steps or functions by processor.Similarly, the application
Software program (including relevant data structure) can be stored in computer readable recording medium, for example, RAM memory,
Magnetic or optical driver or floppy disc and similar devices.In addition, hardware can be used to realize in some steps or function of the application, example
Such as, as the circuit cooperated with processor thereby executing each step or function.
In addition, a part of the application can be applied to computer program product, such as computer program instructions, when its quilt
When computer executes, by the operation of the computer, it can call or provide according to the present processes and/or technical solution.
And the program instruction of the present processes is called, it is possibly stored in fixed or moveable recording medium, and/or pass through
Broadcast or the data flow in other signal-bearing mediums and transmitted, and/or be stored according to described program instruction operation
In the working storage of computer equipment.Here, including a device according to one embodiment of the application, which includes using
Memory in storage computer program instructions and processor for executing program instructions, wherein when the computer program refers to
When enabling by processor execution, method and/or skill of the device operation based on aforementioned multiple embodiments according to the application are triggered
Art scheme.
It is obvious to a person skilled in the art that the application is not limited to the details of above-mentioned exemplary embodiment, Er Qie
In the case where without departing substantially from spirit herein or essential characteristic, the application can be realized in other specific forms.Therefore, no matter
From the point of view of which point, the present embodiments are to be considered as illustrative and not restrictive, and scope of the present application is by appended power
Benefit requires rather than above description limits, it is intended that all by what is fallen within the meaning and scope of the equivalent elements of the claims
Variation is included in the application.Any reference signs in the claims should not be construed as limiting the involved claims.This
Outside, it is clear that one word of " comprising " does not exclude other units or steps, and odd number is not excluded for plural number.That states in device claim is multiple
Unit or device can also be implemented through software or hardware by a unit or device.The first, the second equal words are used to table
Show title, and does not indicate any particular order.