CN106940715B - A kind of method and apparatus of the inquiry based on concordance list - Google Patents

A kind of method and apparatus of the inquiry based on concordance list Download PDF

Info

Publication number
CN106940715B
CN106940715B CN201710138728.4A CN201710138728A CN106940715B CN 106940715 B CN106940715 B CN 106940715B CN 201710138728 A CN201710138728 A CN 201710138728A CN 106940715 B CN106940715 B CN 106940715B
Authority
CN
China
Prior art keywords
index
value
file
node
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710138728.4A
Other languages
Chinese (zh)
Other versions
CN106940715A (en
Inventor
张常淳
周立
吕程
周翠翠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Transwarp Technology Shanghai Co Ltd
Original Assignee
Xinghuan Information Technology (shanghai) Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xinghuan Information Technology (shanghai) Co Ltd filed Critical Xinghuan Information Technology (shanghai) Co Ltd
Priority to CN201710138728.4A priority Critical patent/CN106940715B/en
Publication of CN106940715A publication Critical patent/CN106940715A/en
Application granted granted Critical
Publication of CN106940715B publication Critical patent/CN106940715B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2425Iterative querying; Query formulation based on the results of a preceding query

Abstract

The purpose of the application is to provide a kind of method and apparatus of inquiry based on concordance list, the application judges whether the value of the index column is determining value by the inquiry request according to user, and then select corresponding concordance list, the location information of the identified index file from node is sent to according to the corresponding metamessage of the concordance list described from node, and according to the metamessage of the tables of data is sent to the location information of the identified data file from node described from node.From node side, according to the location information of data file in the metamessage of the tables of data of host node transmission, judgement is described to request corresponding data file with the presence or absence of user query from node, if, the then data file according to the acquisition of information of the index file effectively filters out the data file of the condition of satisfaction to meet the different screening conditions of user, reading data amount when inquiry is greatly reduced, the efficiency for shortening query time, improving data query.

Description

A kind of method and apparatus of the inquiry based on concordance list
Technical field
This application involves computer field more particularly to a kind of method and apparatus of the inquiry based on concordance list.
Background technique
With the development and application of database technology, the data volume of database purchase is growing day by day, while quickly, neatly The complex query processing for carrying out big data quantity also becomes new demand.OLAP (On-Line Analytical Processing, On-line analytical processing), dedicated for supporting complicated analysis operation, stress the decision branch to decision-maker and senior management staff It holds.Under usual condition, OLAP user only needs to inquire a small number of several data column, can be loaded using line storage many useless Data column, cause query performance to decline.The basic query method of distributed column storage reads member letter first from zookeeper Breath, and then go each machine in cluster to read all data files, and then read from each data file and meet condition Record, this mode directly result in that data access amount is excessive, affect the query performance of OLAP.
Apply for content
The purpose of the application is to provide a kind of method and apparatus of inquiry based on concordance list, solves in the prior art The problem of data access amount is excessive when inquiring data, influences the query performance of on-line analytical processing.
According to the one aspect of the application, a kind of method of inquiry in host node end group in concordance list is provided, it is described Method includes:
The index column of concordance list described in the data structure lookup for passing through concordance list according to the inquiry request of user, and judge institute Whether the value for stating index column is determining value, obtains judging result;
The type of the concordance list is determined according to the judging result;
The slave node where the corresponding index file of the concordance list is determined by the type of the concordance list;
The location information of the identified index file from node is sent according to the corresponding metamessage of the concordance list To described from node, and the location information of the identified data file from node is sent out according to the metamessage of the tables of data It send to described from node.
Further, the structure of index file includes BPlusTree structure in the data structure of the concordance list.
Further, the BPlusTree structure includes the key assignments and location information value of leaf node, wherein the key Value is determining according to the value of the index column of the concordance list,
The location information value is according to the row where the filename of the affiliated data file of the index column and the index column Offset in the data file determines.
On the other hand according to the application, a kind of method in the inquiry from node side based on concordance list is provided, it is described Method includes:
The location information of the index file sent according to host node determines the index by the data structure of index file The value of index column where file in concordance list;
The information of index file described in the concordance list is obtained according to the value of the index column;
According to the location information of data file in the metamessage of the tables of data of host node transmission, judgement is described from node In with the presence or absence of user query request corresponding data file, if so, the number according to the acquisition of information of the index file According to file.
Further, the information of the index file includes the key assignments and position letter in the data structure of the index file When breath value, the information of index file described in the concordance list is obtained according to the value of the index column, comprising:
The corresponding key of the value of index column described in the data structure for determining the index file according to the value of the index column Value;
Location information value in the data structure of the index file is obtained according to the corresponding key assignments of the value of the index column.
Further, the location information value in the data structure of the concordance list, comprising:
The filename of the affiliated data file of index column and row where the index column are in the data file Offset.
According to the another aspect of the application, a kind of host node device of inquiry based on concordance list, the master are additionally provided Node device includes:
Judgment means, for the rope according to the inquiry request of user by concordance list described in the data structure lookup of concordance list Draw column, and judges whether the value of the index column is determining value, obtains judging result;
Types of devices is determined, for determining the type of the concordance list according to the judging result;
Positioning device, for by the type of the concordance list determine where the corresponding index file of the concordance list from Node;
Sending device, for according to the corresponding metamessage of the concordance list by the identified index file from node Location information be sent to it is described from node, and according to the metamessage of the tables of data by the identified data file from node Location information be sent to it is described from node.
Further, the structure of index file includes BPlusTree structure in the data structure of the concordance list.
Further, the BPlusTree structure includes the key assignments and location information value of leaf node, wherein
The key assignments is determining according to the value of the index column of the concordance list,
The location information value is according to the row where the filename of the affiliated data file of the index column and the index column Offset in the data file determines.
According to the application another aspect, additionally provide a kind of slave node device of inquiry based on concordance list, it is described from Node device includes:
Determining device, the data structure that the location information of the index file for being sent according to host node passes through index file The value of index column where determining the index file in concordance list;
Acquisition device, for obtaining the information of index file described in the concordance list according to the value of the index column;
Inquiry unit, the location information of data file in the metamessage of the tables of data for being sent according to the host node, Judgement is described to request corresponding data file with the presence or absence of user query from node, if so, according to the index file Data file described in acquisition of information.
Further, the information of the index file includes the key assignments and position letter in the data structure of the index file When breath value, the acquisition device is used for:
The corresponding key of the value of index column described in the data structure for determining the index file according to the value of the index column Value;
Location information value in the data structure of the index file is obtained according to the corresponding key assignments of the value of the index column.
Further, the location information value in the data structure of the concordance list, comprising:
The filename of the affiliated data file of index column and row where the index column are in the data file Offset.
Compared with prior art, the application is by passing through the data structure lookup institute of concordance list according to the inquiry request of user The index column of concordance list is stated, and judges whether the value of the index column is determining value, obtains judging result;It is tied according to the judgement Fruit determines the type of the concordance list;Where determining the corresponding index file of the concordance list by the type of the concordance list From node;The location information of the identified index file from node is sent to according to the corresponding metamessage of the concordance list It is described to send the location information of the identified data file from node from node, and according to the metamessage of the tables of data To described from node.In the data that from node side, the location information of the index file sent according to host node passes through index file The value of index column where index file described in structure determination in concordance list;The concordance list is obtained according to the value of the index column Described in index file information;According to the host node send tables of data metamessage in data file location information, Judgement is described to request corresponding data file with the presence or absence of user query from node, if so, according to the index file Data file described in acquisition of information.To by being dynamically selected suitable global index's table according to querying condition and quickly determining Index file is loaded onto memory, combined filtering conditional filtering then according to the metamessage of global index's table by index file position Meet the data file and offset of condition out.If data file is present in local, subsequent query processing is directly carried out, it is no Then query task and index information are again assigned to the machine where data file by host node, finally, where data file The data file for the condition that meets is loaded onto memory by machine, and data are read from data file according to offset, returns to inquiry knot Fruit.By the querying method of the application, meet the different screening conditions of user, effectively filters out the data text of the condition of satisfaction Part greatly reduces reading data amount when inquiry, shortens query time, preferably improves the search efficiency of OLAP.
Detailed description of the invention
By reading a detailed description of non-restrictive embodiments in the light of the attached drawings below, the application's is other Feature, objects and advantages will become more apparent upon:
Fig. 1 is shown according to the application on one side, provides a kind of method of inquiry in host node end group in concordance list Flow diagram;
Fig. 2 shows the distributed system frame diagrams in one embodiment of the application;
Fig. 3 shows the query structure sentence of data source in the embodiment in the application;
Fig. 4 shows the data source of the embodiment in the application;
Fig. 5 shows the information distribution schematic diagram of index file after creation concordance list in the embodiment in the application;
Fig. 6 is shown according to the application on the other hand, provides a kind of side in the inquiry from node side based on concordance list Method flow diagram;
Fig. 7, which is shown, additionally provides a kind of host node device of inquiry based on concordance list according to the application another aspect Structural schematic diagram;
Fig. 8 is shown according to the application on the other hand, additionally provides a kind of slave node device of inquiry based on concordance list Structural schematic diagram.
The same or similar appended drawing reference represents the same or similar component in attached drawing.
Specific embodiment
The application is described in further detail with reference to the accompanying drawing.
In a typical configuration of this application, terminal, the equipment of service network and trusted party include one or more Processor (CPU), input/output interface, network interface and memory.
Memory may include the non-volatile memory in computer-readable medium, random access memory (RAM) and/or The forms such as Nonvolatile memory, such as read-only memory (ROM) or flash memory (flash RAM).Memory is computer-readable medium Example.
Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method Or technology come realize information store.Information can be computer readable instructions, data structure, the module of program or other data. The example of the storage medium of computer includes, but are not limited to phase change memory (PRAM), static random access memory (SRAM), moves State random access memory (DRAM), other kinds of random access memory (RAM), read-only memory (ROM), electric erasable Programmable read only memory (EEPROM), flash memory or other memory techniques, read-only disc read only memory (CD-ROM) (CD-ROM), Digital versatile disc (DVD) or other optical storage, magnetic cassettes, magnetic tape disk storage or other magnetic storage devices or Any other non-transmission medium, can be used for storage can be accessed by a computing device information.As defined in this article, computer Readable medium does not include non-temporary computer readable media (transitory media), such as the data-signal and carrier wave of modulation.
Fig. 1 is shown according to the application on one side, provides a kind of method of inquiry in host node end group in concordance list Flow diagram, wherein the described method includes: step S11~step S14, is preferably applied to data in distributed system and looks into It askes,
In step s 11, the rope of the concordance list according to the data structure lookup that the inquiry request of user passes through concordance list Draw column, and judges whether the value of the index column is determining value, obtains judging result;In one embodiment of the application, according to Whether the querying condition at family is determining value so as to the value for judging the index column found, is determined and is indexed according to the inquiry request of user The index column of table includes the key assignments determined according to the value of the index column of concordance list in the data structure of concordance list, is judged As a result, to provide the possibility of choice of dynamical respective index table for user.
In one embodiment of the application, the structure of index file includes BPlusTree knot in the data structure of the concordance list Structure, here, the structure of index file can be BPlusTree structure, it is preferable that in the described embodiment, BPlusTree structure Key assignments and location information value including leaf node, the leaf node of BPlusTree structure include tuple < key assignments, location information Value>(<key, value>), BPlusTree structure can effectively be ranked up the index column data of input, so as to fast The position of the corresponding record of index column, the query task of quick response data are ask in quick checking.Wherein, the key assignments (key) is according to institute The value for stating the index column of concordance list determines that the location information value (value) is according to the text of the affiliated data file of the index column Offset of the row in the data file where part name and the index column determines that key value is the value of index column, derives from Data file, value is data file information where the index column and to meet being recorded in for condition inclined in the index file Shifting amount.
In step s 12, the type of the concordance list is determined according to the judging result;Here, according in judging result Whether the value of index column is to determine that value is dynamically selected the type of concordance list: if the judging result is the value of the index column To determine value, it is determined that the type of the concordance list is Hash class global index table, if the judging result is the rope The value for drawing column is non-determined value, it is determined that the type of the concordance list is range class global index table.
In one embodiment of the application, the type of the concordance list includes that Hash class global index's table and/or range class are complete Office's concordance list.Here, the allocation strategy of the index file of Hash class and range class global index table in the cluster is slightly different.It breathes out Xi Lei global index table is the machine being distributed in distributed type assemblies according to the cryptographic Hash of index column decision index file, and model Wei Lei global index table is then that index file is assigned to corresponding machine according to the range of index train value.Therefore, it is necessary to bases Querying condition dynamically selects the type of concordance list, after determining the type of concordance list, is carried out according to respective allocation strategy true Determine the slave node where index file.
Distributed system frame diagram in one embodiment of the application is as shown in Fig. 2, include client (client), host node (Master), one or several from node (slave) and zookeeper, data file in each tables of data can be stored in In solid state hard disk (SSD), in the embodiment of the present application, the metamessage of tables of data is stored in zookeeper, wherein described Zookeeper is the coordination system of performance distributed application.
Connect above-described embodiment, the inquiry sql sentence of one data source of user's needle shown in Fig. 3, such as Sql 1: " Select* What from table A where id=1 " indicated inquiry 1 sentence of sql execution is to inquire the data that id is 1 in tables of data A to arrange Corresponding data, according to the querying condition of the sql 1, it may be determined that the value of index column is the value 1 of id, is determining value, then chooses Kazakhstan Xi Lei global index table carries out inquiry data;And Sql 3: " id < 5 Select*from table A where " indicate inquiry What 5 sentence of sql executed is id data corresponding less than 5 data column in inquiry tables of data A, is executed according to 5 sentence of sql Querying condition, it may be determined that the value of index column is less than 5, is not determining value, therefore needs to choose for the query statement of the sql 5 The concordance list different from sql 1, range of choice class global index table carry out inquiry data.
In one embodiment of the application, when the type of the concordance list be Hash class global index's table when, step S13 it Before, the method also includes step S12 ', the distribution determined according to the cryptographic Hash of the index column of Hash class global index table Rule distributes the value of information of the index file of Hash class global index table to corresponding from node, wherein the index The information of file includes the key assignments of leaf node and location information value in the BPlusTree structure of the index file.Here, root According to actual demand, the column for needing to create global index are determined, i.e., selected part column are used as index column from tables of data, and then utilize Selected index column creates the corresponding metamessage of distributed global index table, wherein the metamessage includes table institute, global index All index files for the including location information on each machine hard disk in the cluster.Concordance list is created successfully can be used for afterwards The inquiry of data in distributed system can satisfy while user only needs to inquire the demand of a small number of several data column and avoid All data files are read, the data volume of access is greatly reduced.
Specifically, in step S12 ', according to the value of the index column of Hash class global index table and described from node Number determine the cryptographic Hash of the index column, be leaf in the corresponding BPlusTree structure of i by the cryptographic Hash of the index column The key assignments and location information value of node are distributed to i+1 from the index file of node, wherein i is natural number, thus rationally Ground distributes the information of index file into each slave, reaches equally distributed purpose.It should be noted that passing through determination Cryptographic Hash determines so that it is determined that the information of index file is assigned the information (being such as assigned to machine 1) of machine extremely By cryptographic Hash it is corresponding<key, the information of the machine of value>distribution extremely.
In one embodiment of the application, it is assumed that have 1 master, n platform slave in distributed type assemblies.Hash class overall situation rope Draw table by seeking cryptographic Hash to index column, and the value of index column is the key value of leaf node in BPlusTree structure, index column The offset of the information and index column corresponding record of corresponding data file in the data file is value.By key and key Cryptographic Hash be i the location information value of record be assigned in the index file of i+1 platform slave, wherein cryptographic Hash can To be determined according to slave number in the value of index column and distributed type assemblies.
In one embodiment of the application, when the type of the concordance list be range class global index's table when, step S13 it Before, the method also includes: step S121 and step S122 is adopted in step S121 according to the value to the index column The sampled result of sample determines range of distribution section, and records each from node and the range of distribution area of its corresponding index column Between;In step S122, according to the allocation rule that the range of distribution section determines, by the rope of range class global index table The information of quotation part is distributed to corresponding from node, wherein the information of the index file includes the index file The key assignments of leaf node and location information value in BPlusTree structure.Specifically, in step S122, the range class is complete The value of index column of office's concordance list is compared with the range of distribution section of the record, where the value for determining the index column Range of distribution section;Range of distribution section where the value of the index column, the value of the index column is corresponding The key assignments of leaf node and location information value are distributed corresponding from node to the range of distribution section in BPlusTree structure Index file.
In one embodiment of the application, it is assumed that have 1 Master, n platform slave in distributed type assemblies.Range class overall situation rope Draw table to sample by the value of the index column to data list file, n range is set according to sampled result, so that each range Interior data volume is evenly distributed as much as possible, and the range intervals of every machine and its corresponding index column are recorded in Master. When generating an index file, the value of the index column of index file can be compared with n range in Master, according to Offset of the affiliated range areas by the corresponding data file information of the column and in the data file, i.e.,<key, value>update Into the index file of the corresponding slave in range areas.
It should be noted that the size of index file reaches preset index file size threshold value when creating concordance list When, new index file is generated, Master believes the updating location information of the new index file to the corresponding member of concordance list In breath.In one embodiment of the application, the data source of example as shown in Figure 4, the data source totally 1000 records, including address Identify (id), name (name), age (age), four column data of gender (sex).Using the data in Fig. 4 as data source, it is assumed that There are 1 Master, 3 slave the maximum magnitude of data file to be set as 25 rows, when the data of input in distributed system When line number is equal to maximum magnitude 25, distributed column storage platform is using current data line as a data file, Master Corresponding data file (FileSegment) in the SSD of a certain machine is output in cluster according to load balancing principle, and The corresponding metamessage of more new data table, for Hash class global index table, key is the value of id, and value is the corresponding data text of id The offset information of the information and id corresponding record of part in the data file.Take Hash by key value, by Hash result be 0,1, 2<key, value>tuple is separately dispensed into the index file of the Hash class concordance list of the 1st, 2,3 machine in cluster (HashIndexFileSegment) in, as shown in figure 5, as id=1, Hash result 1, then by its<key, value>deposit Store up in the 2nd slave in cluster, as id=2, Hash result 2, then by its<key, value>storage is into cluster The 3rd slave in;As id=3, Hash result 0, then by its<key, the 1st into cluster of value>storage In slave, wherein { key | key%3=cryptographic Hash } refers to be taken between the number (number is 3 here) of key and slave It is remaining, cryptographic Hash is obtained, key is the value of id here.
When creating range class global index's table, to range division result such as Fig. 5 institute after the key value sampling of data source Show, range division principle should make the record number in each range intervals close as far as possible, the result of range partition include [1, 333], [334,666] and [667,999] three sections, and the corresponding range area three slave is stored in Master Domain, it is when the value of the index column in data block meets some range intervals, the storage of its index information is corresponding to the range intervals Slave machine range class concordance list data file (RangeIndexFileSegment) in, with BPlusTree leaf The form of node exists, and such as id=5, key value falls in first range intervals, then by its corresponding<key, value>information is deposited In the index file for storing up First machine.
Above-described embodiment is connect, in step s 13, the corresponding index of the concordance list is determined by the type of the concordance list Slave node where file;In one embodiment of the application, after the type for determining concordance list, according to the type pair of respective concordance list The allocation strategy answered calculates slave node of the index file of the condition of satisfaction in distributed type assemblies, and specific implementation can lead to Cross following embodiment realization:
When the concordance list selected is Hash class global index's table, the rope is determined by Hash class global index table Draw the cryptographic Hash of column;The slave section where the corresponding index file of Hash class global index table is determined according to the cryptographic Hash Point.Here, then choosing Hash class global index table when querying condition is index column when determining value, calculating in querying condition The cryptographic Hash j of train value then meets in+1 slave of jth of the index file of condition in distributed type assemblies.
When the concordance list selected is range class global index's table, the rope is determined by range class global index table Draw range of distribution section belonging to the value of column;Determine that range class global index's table is corresponding according to determining range of distribution section Index file where slave node.Here, if querying condition is the uncertain value of index column, selection range class global index Table, the range areas according to belonging to index column determine that the index file for the condition that meets is distributed in the corresponding slave of cluster.
In one specific embodiment of the application, inquiry sql1 and sql2 according to Fig.3, corresponding key value is determination Value, then take Hash to it, obtains id=1 and corresponds to cryptographic Hash to be 1, determine that its index file is located in the slave2 of cluster, and id =27 corresponding cryptographic Hash are 0, determine that its index file is located in the slave1 of cluster.Key in sql3 is uncertain value, is looked into Inquiry condition is id < 5, is fallen in [1,333] section just, determines that its index file is located in slave1 immediately.Rope is being determined After machine where quotation part, the subsequent query step of two different global index's tables is identical.
In step S14, according to the corresponding metamessage of the concordance list by the identified index file from node Location information be sent to it is described from node, and according to the metamessage of the tables of data by the identified data file from node Location information be sent to it is described from node.Here, Master obtains above-mentioned slave according to the metamessage of corresponding global index's table In all index files location information and be sent to corresponding slave, while master should according to the metamessage of tables of data The data file location information stored in slave is sent to slave together;For example, sql1 is directed to, according to the member of global index's table Information finds index file location information all in slave2 and sends it to slave2, while the institute that will be stored on slave2 There is the location information of data file to send together.
It should be noted that after step s 14, it is described further include: step S15 receives the institute from node feeding back State the location information value in the data structure of index file.Here, when Master sends data file information to a certain slave, And the data file without containing the location information value (value) in the data structure for meeting index file in the slave, then Master can receive the value in the data structure by the slave index file fed back, receive described from node feeding back After location information value in the data structure of the index file, step S16 is executed, is worth according to the positional information from described The metamessage of tables of data redefines the slave node where the data file;The position for the index file for including by the concordance list The location information for the data file that confidence breath and the tables of data include is re-transmitted to the slave node redefined.Here, Task is reassigned to other slave for storing the data file by Master according to the position of data file, and will be corresponding Data file location information and value information be sent to other slave.For example, being sent to this itself number according to Master According to the file information, find the relevant information that fs1 is not found in local data file, then illustrate fs1 cluster other In slave, at this point, receive slave2 found<key, value>information feedback, Matser by searching for tables of data member Information finds fs1 and is located in slave1 file, then general<key, and value>information is sent to slave1, and inquiry later Task hands to slave1.
Fig. 6 is shown according to the application on the other hand, provides a kind of side in the inquiry from node side based on concordance list Method flow diagram, wherein the described method includes: step S21~step S23, is preferably applied to data in distributed system Inquiry,
In the step s 21, the data that the location information of the index file sent according to host node passes through the index file The value of index column where index file described in structure determination in concordance list;In one embodiment of the application, from node slave root Index file is loaded onto memory according to the location information of the index file received, and then is looked into according to the location information of index file The value of index column where finding index file in concordance list obtains the rope according to the value of the index column in step S22 Draw the information of index file described in table;Here, the information of the index file includes in the data structure of the index file Key assignments and when location information value, index column described in the data structure that determines the index file according to the value of the index column The corresponding key assignments of value;Position in the data structure of the index file is obtained according to the corresponding key assignments of the value of the index column to believe Breath value.Wherein, the location information value in the data structure of the concordance list includes: the file of the affiliated data file of the index column Offset of the row in the data file where name and the index column.Connect above-described embodiment, due to index file with The storage of BPlusTree structure, wherein the key assignments in the leaf node of the structure is the value of index column, therefore can be rapidly By finding qualified key assignments key in the value indexed file of the index column in querying condition, and read its location information It is worth (value), i.e. this records corresponding data file information and offset, for example, slave2 believes local index file Breath is loaded into memory, finds the node of key=1 in each BPlusTree structure, and read its value value, i.e. fs1:1 2, Indicate that the record of id=1 is located in data file 1 (FileSegment1), and offset is 1 and 2.
In step S23, according to the location information of data file in the metamessage of the tables of data of host node transmission, sentence The disconnected user query that whether there is from node request corresponding data file, if so, according to the letter of the index file Breath obtains the data file.In one embodiment of the application, the data file information of the slave is sent in conjunction with Master, If directly query steps are carried out to these data files, according to value containing the data file for meeting value in the slave In offset therefrom read the data for meeting condition in each data file, return to query result.
Preferably, if the user query that are not present from node request corresponding data file, by the data knot Location information value in structure feeds back to the host node.In one embodiment of the application, for the data text not in the slave Remaining value information is passed to Master by part, slave, is divided task again according to the position of data file by Master Dispensing stores other slave of these data files, and corresponding data file location information and value information are sent to it He is slave.For example, being sent to this itself data file information according to Master, find not look in local data file To the relevant information of fs1, then illustrate fs1 in other slave of cluster, at this point, slave2 by find<key, value>letter Breath is sent to Master, then Matser finds fs1 by searching for the metamessage of tables of data and is located in slave1 file, then incite somebody to action < Key, value > information is sent to slave1, and query task later is handed to slave1.Slave1 adds data file It is loaded onto memory, the data for meeting condition in each data file are therefrom read according to the offset in value, returns to inquiry knot Fs1 is loaded onto memory, according to the offset 1 in value for example, slave1 receives the information and task that Master is sent by fruit With 2, two records that offset in data file is 1 and 2 are read, query result is returned.
It should be noted that when being not present from node, user query request is corresponding in the above embodiments of the present application When data file, value is only passed into host node from node, host node only goes the metamessage of tables of data to find data at this time Task is sent to accordingly from node by the slave node of file and the location information of data file, accordingly from the direct root of node It is loaded according to the location information of data file, and extracts the data of corresponding offset, to improve the efficiency of data query.
In conclusion it is simultaneously quickly true to be dynamically selected suitable global index's table according to querying condition first in inquiry Determine index file position, index file is loaded onto memory, combined filtering condition sieve then according to the metamessage of global index's table Select the data file and offset of the condition of satisfaction.If data file is present in local, subsequent query processing is directly carried out, Otherwise query task and index information are again assigned to the machine where data file by host node, finally, where data file Machine the data file for the condition that meets is loaded onto memory, data are read from data file according to offset, return to inquiry As a result.This method passes through the creation of two class distribution global index tables, meets the different screening conditions of user, effectively filters out The data file for meeting condition greatly reduces reading data amount when inquiry, shortens query time, preferably improves OLAP Search efficiency.
Fig. 7, which is shown, additionally provides a kind of host node device of inquiry based on concordance list according to the application another aspect Structural schematic diagram, wherein the host node device include: judgment means 11, determine types of devices 12, positioning device 13 and hair Device 14 is sent, data query in distributed system is preferably applied to,
Judgment means 11, for passing through concordance list described in the data structure lookup of concordance list according to the inquiry request of user Index column, and judge whether the value of the index column is determining value, obtains judging result;In one embodiment of the application, according to Whether the querying condition of user is determining value so as to the value for judging the index column found, determines rope according to the inquiry request of user Draw the index column of table, includes the key assignments determined according to the value of the index column of concordance list in the data structure of concordance list, sentenced Break as a result, to provide the possibility of choice of dynamical respective index table for user.
In one embodiment of the application, the structure of index file includes BPlusTree knot in the data structure of the concordance list Structure, here, the structure of index file can be BPlusTree structure, it is preferable that in the described embodiment, BPlusTree structure Key assignments and location information value including leaf node, the leaf node of BPlusTree structure include tuple < key assignments, location information Value>(<key, value>), BPlusTree structure can effectively be ranked up the index column data of input, so as to fast The position of the corresponding record of index column, the query task of quick response data are ask in quick checking.Wherein, the key assignments (key) is according to institute The value for stating the index column of concordance list determines that the location information value (value) is according to the text of the affiliated data file of the index column Offset of the row in the data file where part name and the index column determines that key value is the value of index column, derives from Data file, value is data file information where the index column and to meet being recorded in for condition inclined in the index file Shifting amount.
Types of devices 12 is determined, for determining the type of the concordance list according to the judging result;Here, according to judgement As a result whether the value of middle index column is to determine that value is dynamically selected the type of concordance list: if the judging result is the index The value of column is to determine value, it is determined that the type of the concordance list is Hash class global index table, if the judging result is The value of the index column is non-determined value, it is determined that the type of the concordance list is range class global index table.
In one embodiment of the application, the type of the concordance list includes that Hash class global index's table and/or range class are complete Office's concordance list.Here, the allocation strategy of the index file of Hash class and range class global index table in the cluster is slightly different.It breathes out Xi Lei global index table is the machine being distributed in distributed type assemblies according to the cryptographic Hash of index column decision index file, and model Wei Lei global index table is then that index file is assigned to corresponding machine according to the range of index train value.Therefore, it is necessary to bases Querying condition dynamically selects the type of concordance list, after determining the type of concordance list, is carried out according to respective allocation strategy true Determine the slave node where index file.
Distributed system frame diagram in one embodiment of the application is as shown in Fig. 2, include client (client), host node (Master), one or several from node (slave) and zookeeper, data file in each tables of data can be stored in In solid state hard disk (SSD), in the embodiment of the present application, the metamessage of tables of data is stored in zookeeper, wherein described Zookeeper is the coordination system of performance distributed application.
Connect above-described embodiment, the inquiry sql sentence of one data source of user's needle shown in Fig. 3, such as Sql 1: " Select* What from table A where id=1 " indicated inquiry 1 sentence of sql execution is to inquire the data that id is 1 in tables of data A to arrange Corresponding data, according to the querying condition of the sql 1, it may be determined that the value of index column is the value 1 of id, is determining value, then chooses Kazakhstan Xi Lei global index table carries out inquiry data;And Sql 3: " id < 5 Select*from table A where " indicate inquiry What 5 sentence of sql executed is id data corresponding less than 5 data column in inquiry tables of data A, is executed according to 5 sentence of sql Querying condition, it may be determined that the value of index column is less than 5, is not determining value, therefore needs to choose for the query statement of the sql 5 The concordance list different from sql 1, range of choice class global index table carry out inquiry data.
In one embodiment of the application, when the type of the concordance list is Hash class global index's table, the host node Equipment further includes distributor 12 ', the distribution determined for the cryptographic Hash according to the index column of Hash class global index table Rule distributes the value of information of the index file of Hash class global index table to corresponding from node, wherein the index The information of file includes the key assignments of leaf node and location information value in the BPlusTree structure of the index file.Here, root According to actual demand, the column for needing to create global index are determined, i.e., selected part column are used as index column from tables of data, and then utilize Selected index column creates the corresponding metamessage of distributed global index table, wherein the metamessage includes table institute, global index All index files for the including location information on each machine hard disk in the cluster.Concordance list is created successfully can be used for afterwards The inquiry of data in distributed system can satisfy while user only needs to inquire the demand of a small number of several data column and avoid All data files are read, the data volume of access is greatly reduced.
Specifically, distributor 12 ', for according to the value of the index column of Hash class global index table and described from section The number of point determines the cryptographic Hash of the index column, is the i corresponding BPlusTree structure middle period by the cryptographic Hash of the index column The key assignments and location information value of child node are distributed to i+1 from the index file of node, wherein i is natural number, to close Reason ground distributes the information of index file into each slave, reaches equally distributed purpose.It should be noted that by true Cryptographic Hash is determined, so that it is determined that the information of index file is assigned the information (being such as assigned to machine 1) of machine extremely, i.e., really Determined by cryptographic Hash it is corresponding<key, the information of the machine of value>distribution extremely.
In one embodiment of the application, it is assumed that have 1 master, n platform slave in distributed type assemblies.Hash class overall situation rope Draw table by seeking cryptographic Hash to index column, and the value of index column is the key value of leaf node in BPlusTree structure, index column The offset of the information and index column corresponding record of corresponding data file in the data file is value.By key and key Cryptographic Hash be i the location information value of record be assigned in the index file of i+1 platform slave, wherein cryptographic Hash can To be determined according to slave number in the value of index column and distributed type assemblies.
In one embodiment of the application, when the type of the concordance list is range class global index's table, the host node Equipment further include: section determining device 121 and information distribution apparatus 122, section determining device 121, for according to the rope Draw the sampled result that the values of column is sampled and determine range of distribution section, and records each from node and its corresponding index column Range of distribution section;Information distribution apparatus 122, the allocation rule for being determined according to the range of distribution section will be described The information of the index file of range class global index table is distributed to corresponding from node, wherein the packet of the index file Include the key assignments of leaf node and location information value in the BPlusTree structure of the index file.Specifically, information distribution apparatus 122, for the value of index column of range class global index table to be compared with the range of distribution section of the record, really Range of distribution section where the value of the fixed index column;Range of distribution section where the value of the index column, by institute The key assignments of leaf node and location information value in the corresponding BPlusTree structure of value of index column is stated to distribute to the range of distribution The corresponding index file from node in section.
In one embodiment of the application, it is assumed that have 1 Master, n platform slave in distributed type assemblies.Range class overall situation rope Draw table to sample by the value of the index column to data list file, n range is set according to sampled result, so that each range Interior data volume is evenly distributed as much as possible, and the range intervals of every machine and its corresponding index column are recorded in Master. When generating an index file, the value of the index column of index file can be compared with n range in Master, according to Offset of the affiliated range areas by the corresponding data file information of the column and in the data file, i.e.,<key, value>update Into the index file of the corresponding slave in range areas.
It should be noted that the size of index file reaches preset index file size threshold value when creating concordance list When, new index file is generated, Master believes the updating location information of the new index file to the corresponding member of concordance list In breath.In one embodiment of the application, the data source of example as shown in Figure 4, the data source totally 1000 records, including address Identify (id), name (name), age (age), four column data of gender (sex).Using the data in Fig. 4 as data source, it is assumed that There are 1 Master, 3 slave the maximum magnitude of data file to be set as 25 rows, when the data of input in distributed system When line number is equal to maximum magnitude 25, distributed column storage platform is using current data line as a data file, Master Corresponding data file (FileSegment) in the SSD of a certain machine is output in cluster according to load balancing principle, and The corresponding metamessage of more new data table, for Hash class global index table, key is the value of id, and value is the corresponding data text of id The offset information of the information and id corresponding record of part in the data file.Take Hash by key value, by Hash result be 0,1, 2<key, value>tuple is separately dispensed into the index file of the Hash class concordance list of the 1st, 2,3 machine in cluster (HashIndexFileSegment) in, as shown in figure 5, as id=1, Hash result 1, then by its<key, value>deposit Store up in the 2nd slave in cluster, as id=2, Hash result 2, then by its<key, value>storage is into cluster The 3rd slave in;As id=3, Hash result 0, then by its<key, the 1st into cluster of value>storage In slave, wherein { key | key%3=cryptographic Hash } refers to be taken between the number (number is 3 here) of key and slave It is remaining, cryptographic Hash is obtained, key is the value of id here.
When creating range class global index's table, to range division result such as Fig. 5 institute after the key value sampling of data source Show, range division principle should make the record number in each range intervals close as far as possible, the result of range partition include [1, 333], [334,666] and [667,999] three sections, and the corresponding range area three slave is stored in Master Domain, it is when the value of the index column in data block meets some range intervals, the storage of its index information is corresponding to the range intervals Slave machine range class concordance list data file (RangeIndexFileSegment) in, with BPlusTree leaf The form of node exists, and such as id=5, key value falls in first range intervals, then by its corresponding<key, value>information is deposited In the index file for storing up First machine.
Connect above-described embodiment, positioning device 13, for determining that the concordance list is corresponding by the type of the concordance list Slave node where index file;In one embodiment of the application, after the type for determining concordance list, according to the class of respective concordance list The corresponding allocation strategy of type calculates slave node of the index file of the condition of satisfaction in distributed type assemblies, and specific implementation can To be realized by following embodiment:
When the concordance list selected is Hash class global index's table, the rope is determined by Hash class global index table Draw the cryptographic Hash of column;The slave section where the corresponding index file of Hash class global index table is determined according to the cryptographic Hash Point.Here, then choosing Hash class global index table when querying condition is index column when determining value, calculating in querying condition The cryptographic Hash j of train value then meets in+1 slave of jth of the index file of condition in distributed type assemblies.
When the concordance list selected is range class global index's table, the rope is determined by range class global index table Draw range of distribution section belonging to the value of column;Determine that range class global index's table is corresponding according to determining range of distribution section Index file where slave node.Here, if querying condition is the uncertain value of index column, selection range class global index Table, the range areas according to belonging to index column determine that the index file for the condition that meets is distributed in the corresponding slave of cluster.
In one specific embodiment of the application, inquiry sql1 and sql2 according to Fig.3, corresponding key value is determination Value, then take Hash to it, obtains id=1 and corresponds to cryptographic Hash to be 1, determine that its index file is located in the slave2 of cluster, and id =27 corresponding cryptographic Hash are 0, determine that its index file is located in the slave1 of cluster.Key in sql3 is uncertain value, is looked into Inquiry condition is id < 5, is fallen in [1,333] section just, determines that its index file is located in slave1 immediately.Rope is being determined After machine where quotation part, the subsequent query step of two different global index's tables is identical.
Sending device 14, for according to the corresponding metamessage of the concordance list by the identified index file from node Location information be sent to it is described from node, and according to the metamessage of the tables of data by the identified data text from node The location information of part is sent to described from node.Here, Master is above-mentioned according to the acquisition of the metamessage of corresponding global index's table The location information of all index files and it is sent to corresponding slave in slave, while master is according to the metamessage of tables of data The data file location information stored in the slave is sent to slave together;For example, sql1 is directed to, according to global index's table Metamessage find index file location information all in slave2 and send it to slave2, while will be stored on slave2 The location informations of all data files send together.
It should be noted that the host node device further include: reception device 15, it is described from node feeding back for receiving Location information value in the data structure of the index file.Here, when Master sends data file information to a certain Slave, and the data text without containing the location information value (value) in the data structure for meeting index file in the slave Part, then Master can receive the value in the data structure by the slave index file fed back, receive described from node After location information value in the data structure of the index file of feedback, step S16 is executed, is worth according to the positional information Slave node where redefining the data file from the metamessage of the tables of data;The index text for including by the concordance list The location information for the data file that the location information of part and the tables of data include is re-transmitted to the slave node redefined. Here, task to be reassigned to other slave for storing the data file by Master according to the position of data file, and will Corresponding data file location information and value information are sent to other slave.For example, being sent to this itself according to Master Data file information, find the relevant information that fs1 is not found in local data file, then illustrate fs1 cluster its In his slave, at this point, receive slave2 found<key, value>information feedback, Matser is by searching for tables of data Metamessage finds fs1 and is located in slave1 file, then general<key, and value>information is sent to slave1, and looking into later Inquiry task hands to slave1.
Fig. 8 is shown according to the application on the other hand, additionally provides a kind of slave node device of inquiry based on concordance list Structural schematic diagram, wherein it is described to comprise determining that device 21, acquisition device 22 and inquiry unit 23 from node device, preferably Applied to data query in distributed system,
Determining device 21, the location information of the index file for being sent according to host node pass through the number of the index file According to the value of the index column in concordance list where index file described in structure determination;In one embodiment of the application, from node slave Index file is loaded onto memory according to the location information of the index file received, and then according to the location information of index file The value of index column where finding index file in concordance list, acquisition device 22 are used to obtain institute according to the value of the index column State the information of index file described in concordance list;Here, the information of the index file includes the data knot of the index file When key assignments and location information value in structure, rope described in the data structure that determines the index file according to the value of the index column Draw the corresponding key assignments of value of column;Position in the data structure of the index file is obtained according to the corresponding key assignments of the value of the index column Set the value of information.Wherein, the location information value in the data structure of the concordance list includes: the affiliated data file of the index column Offset of the row in the data file where filename and the index column.Above-described embodiment is connect, due to index file With the storage of BPlusTree structure, wherein the key assignments in the leaf node of the structure is the value of index column, therefore can be quick Ground finds qualified key assignments key in the value indexed file by the index column in querying condition, and reads its position letter Breath value (value), i.e. this record corresponding data file information and offset, for example, slave2 is by local index file Information is loaded into memory, finds the node of key=1 in each BPlusTree structure, and reads its value value, i.e. fs1:1 2, indicate that the record of id=1 is located in data file 1 (FileSegment1), and offset is 1 and 2.
Inquiry unit 23, the position letter of data file in the metamessage of the tables of data for being sent according to the host node Breath, judgement is described to request corresponding data file with the presence or absence of user query from node, if so, according to the index file Acquisition of information described in data file.In one embodiment of the application, the data file letter of the slave is sent in conjunction with Master Breath, if directly carrying out query steps to these data files containing the data file for meeting value in the slave, according to Offset in value therefrom reads the data for meeting condition in each data file, returns to query result.
Preferably, it is described from node device further include feedback device 24, if there is no users to look into from node for described It askes and requests corresponding data file, then the location information value in the data structure is fed back into the host node.In the application In one embodiment, for the data file not in the slave, remaining value information is passed to Master by slave, by Task is reassigned to other slave for storing these data files according to the position of data file by Master, and will be corresponding Data file location information and value information be sent to other slave.For example, being sent to this itself number according to Master According to the file information, find the relevant information that fs1 is not found in local data file, then illustrate fs1 cluster other In slave, at this point, slave2 by find<key, value>information is sent to Master, then Matser is by searching for tables of data Metamessage find fs1 and be located in slave1 file, then general<key, value>information is sent to slave1, and later Query task hands to slave1.Data file is loaded onto memory by Slave1, is therefrom read according to the offset in value every Meet the data of condition in a data file, return to query result, for example, slave1 receives the information that Master is sent and appoints Business, is loaded onto memory for fs1, according to the offset 1 and 2 in value, reads two notes that offset in data file is 1 and 2 Record returns to query result.
It should be noted that when being not present from node, user query request is corresponding in the above embodiments of the present application When data file, value is only passed into host node from node, host node only goes the metamessage of tables of data to find data at this time Task is sent to accordingly from node by the slave node of file and the location information of data file, accordingly from the direct root of node It is loaded according to the location information of data file, and extracts the data of corresponding offset, to improve the efficiency of data query.
In conclusion it is simultaneously quickly true to be dynamically selected suitable global index's table according to querying condition first in inquiry Determine index file position, index file is loaded onto memory, combined filtering condition sieve then according to the metamessage of global index's table Select the data file and offset of the condition of satisfaction.If data file is present in local, subsequent query processing is directly carried out, Otherwise query task and index information are again assigned to the machine where data file by host node, finally, where data file Machine the data file for the condition that meets is loaded onto memory, data are read from data file according to offset, return to inquiry As a result.This method passes through the creation of two class distribution global index tables, meets the different screening conditions of user, effectively filters out The data file for meeting condition greatly reduces reading data amount when inquiry, shortens query time, preferably improves OLAP Search efficiency.
Obviously, those skilled in the art can carry out various modification and variations without departing from the essence of the application to the application Mind and range.In this way, if these modifications and variations of the application belong to the range of the claim of this application and its equivalent technologies Within, then the application is also intended to include these modifications and variations.
It should be noted that the application can be carried out in the assembly of software and/or software and hardware, for example, can adopt With specific integrated circuit (ASIC), general purpose computer or any other realized similar to hardware device.In one embodiment In, the software program of the application can be executed to implement the above steps or functions by processor.Similarly, the application Software program (including relevant data structure) can be stored in computer readable recording medium, for example, RAM memory, Magnetic or optical driver or floppy disc and similar devices.In addition, hardware can be used to realize in some steps or function of the application, example Such as, as the circuit cooperated with processor thereby executing each step or function.
In addition, a part of the application can be applied to computer program product, such as computer program instructions, when its quilt When computer executes, by the operation of the computer, it can call or provide according to the present processes and/or technical solution. And the program instruction of the present processes is called, it is possibly stored in fixed or moveable recording medium, and/or pass through Broadcast or the data flow in other signal-bearing mediums and transmitted, and/or be stored according to described program instruction operation In the working storage of computer equipment.Here, including a device according to one embodiment of the application, which includes using Memory in storage computer program instructions and processor for executing program instructions, wherein when the computer program refers to When enabling by processor execution, method and/or skill of the device operation based on aforementioned multiple embodiments according to the application are triggered Art scheme.
It is obvious to a person skilled in the art that the application is not limited to the details of above-mentioned exemplary embodiment, Er Qie In the case where without departing substantially from spirit herein or essential characteristic, the application can be realized in other specific forms.Therefore, no matter From the point of view of which point, the present embodiments are to be considered as illustrative and not restrictive, and scope of the present application is by appended power Benefit requires rather than above description limits, it is intended that all by what is fallen within the meaning and scope of the equivalent elements of the claims Variation is included in the application.Any reference signs in the claims should not be construed as limiting the involved claims.This Outside, it is clear that one word of " comprising " does not exclude other units or steps, and odd number is not excluded for plural number.That states in device claim is multiple Unit or device can also be implemented through software or hardware by a unit or device.The first, the second equal words are used to table Show title, and does not indicate any particular order.

Claims (36)

1. a kind of method of the inquiry in host node end group in concordance list, wherein the described method includes:
The index column of concordance list described in the data structure lookup for passing through concordance list according to the inquiry request of user, and judge the rope Whether the value for drawing column is determining value, obtains judging result;
The type of the concordance list is determined according to the judging result;
The slave node where the corresponding index file of the concordance list is determined by the type of the concordance list;
The location information of the identified index file from node is sent to institute according to the corresponding metamessage of the concordance list It states from node, and according to the metamessage of the affiliated tables of data of the index column by the position of the identified data file from node Information is sent to described from node.
2. according to the method described in claim 1, the structure of index file includes in the data structure of the concordance list BPlusTree structure.
3. according to the method described in claim 2, wherein, the BPlusTree structure includes key assignments and the position of leaf node The value of information, wherein
The key assignments is determined according to the value of the index column of the concordance list;
The location information value is according to the row where the filename of the affiliated data file of the index column and the index column in institute The offset stated in data file determines.
4. according to the method described in claim 1, wherein, the type of the concordance list include Hash class global index's table and/or Range class global index table.
5. according to the method described in claim 4, wherein, the type of the concordance list is determined according to the judging result, comprising:
If the judging result is that the value of the index column is determining value, it is determined that the type of the concordance list is the Hash class Global index's table.
6. according to the method described in claim 4, wherein, the type of the concordance list is determined according to the judging result, comprising:
If the judging result is that the value of the index column is non-determined value, it is determined that the type of the concordance list is the range Class global index table.
7. according to the method described in claim 5, wherein, when the type of the concordance list is Hash class global index's table, leading to Cross the concordance list type determine slave node where the corresponding index file of the concordance list before, comprising:
According to the allocation rule that the cryptographic Hash of the index column of Hash class global index table determines, by the Hash class overall situation rope The value of information for drawing the index file of table is distributed to corresponding from node, wherein the information of the index file includes the index The key assignments and location information value of leaf node in the BPlusTree structure of file.
8. according to the method described in claim 7, wherein, the cryptographic Hash according to the index column of Hash class global index table is true Fixed allocation rule distributes the information of the index file of Hash class global index table to corresponding from node, comprising:
According to the value of the index column and the cryptographic Hash for determining the index column from the number of node;
It is that the corresponding key assignments of i and location information value are distributed to a index text from node of i+1 by the cryptographic Hash of the index column In part, wherein i is natural number.
9. according to the method described in claim 8, wherein, determining the corresponding rope of the concordance list by the type of the concordance list Slave node where quotation part, comprising:
The cryptographic Hash of the index column is determined by Hash class global index table;
The slave node where the corresponding index file of Hash class global index table is determined according to the cryptographic Hash.
10. according to the method described in claim 6, wherein, when the type of the concordance list is range class global index's table, leading to Cross the concordance list type determine slave node where the corresponding index file of the concordance list before, comprising:
Determine range of distribution section according to the sampled result that the value to the index column is sampled, and record each from node with And its range of distribution section of corresponding index column;
According to the allocation rule that the range of distribution section determines, by the information of the index file of range class global index table Distribution is to corresponding from node, wherein the information of the index file includes the BPlusTree structure middle period of the index file The key assignments and location information value of child node.
11. according to the method described in claim 10, wherein, according to the allocation rule that the range of distribution section determines, by institute The information for stating the index file of range class global index table is distributed to corresponding from node, comprising:
The value of index column of range class global index table is compared with the range of distribution section of the record, determines institute State the range of distribution section where the value of index column;
Range of distribution section where the value of the index column, by the corresponding BPlusTree structure of the value of the index column The key assignments and location information value of middle leaf node are distributed into the corresponding index file from node in the range of distribution section.
12. according to the method for claim 11, wherein determine that the concordance list is corresponding by the type of the concordance list Slave node where index file, comprising:
Range of distribution section belonging to the value of the index column is determined by range class global index table;
The slave node where the corresponding index file of range class global index table is determined according to determining range of distribution section.
13. according to the method described in claim 1, wherein, the metamessage according to the affiliated tables of data of the index column is by institute The location information of data file on determining slave node is sent to described after node, further includes:
It receives described from the location information value in the data structure of the index file of node feeding back.
14. according to the method for claim 13, wherein receive the data knot of the index file from node feeding back After location information value in structure, comprising:
It is worth according to the positional information from the metamessage of the affiliated tables of data of the index column and redefines the data file place Slave node;
The location information for the data file for including by the location information for the index file that the concordance list includes and the tables of data It is re-transmitted to the slave node redefined.
15. a kind of method in the inquiry from node side based on concordance list, wherein the described method includes:
The location information of the index file sent according to host node determines the index by the data structure of the index file The value of index column where file in concordance list;
The information of index file described in the concordance list is obtained according to the value of the index column;
The location information of data file in the metamessage of the affiliated tables of data of index column sent according to the host node, described in judgement Corresponding data file is requested with the presence or absence of user query from node, if so, according to the acquisition of information of the index file The data file.
16. according to the method for claim 15, wherein the information of the index file includes the data of the index file When key assignments and location information value in structure, the letter of index file described in the concordance list is obtained according to the value of the index column Breath, comprising:
The corresponding key assignments of the value of index column described in the data structure for determining the index file according to the value of the index column;
Location information value in the data structure of the index file is obtained according to the corresponding key assignments of the value of the index column.
17. according to the method for claim 16, wherein the location information value in the data structure of the concordance list, comprising:
The filename of the affiliated data file of index column and offset of the row in the data file where the index column Amount.
18. according to the method for claim 15, wherein according to the member for the affiliated tables of data of index column that the host node is sent The location information of data file in information, judgement it is described from node with the presence or absence of user query request corresponding data file it Afterwards, comprising:
If described, there is no user query to request corresponding data file from node, and the position in the data structure is believed Breath value feeds back to the host node.
19. a kind of host node device of the inquiry based on concordance list, wherein the host node device includes:
Judgment means, for the index according to the inquiry request of user by concordance list described in the data structure lookup of concordance list Column, and judge whether the value of the index column is determining value, obtains judging result;
Types of devices is determined, for determining the type of the concordance list according to the judging result;
Positioning device, for determining the slave section where the corresponding index file of the concordance list by the type of the concordance list Point;
Sending device, for according to the corresponding metamessage of the concordance list by the position of the identified index file from node Information be sent to it is described from node, and according to the metamessage of the affiliated tables of data of the index column by the identified number from node It is sent to according to the location information of file described from node.
20. host node device according to claim 19, the structure packet of index file in the data structure of the concordance list Include BPlusTree structure.
21. host node device according to claim 20, wherein the BPlusTree structure includes the key of leaf node Value and location information value, wherein
The key assignments is determined according to the value of the index column of the concordance list;
The location information value is according to the row where the filename of the affiliated data file of the index column and the index column in institute The offset stated in data file determines.
22. host node device according to claim 19, wherein the type of the concordance list includes Hash class global index Table and/or range class global index table.
23. host node device according to claim 22, wherein the determining types of devices is used for:
If the judging result is that the value of the index column is determining value, it is determined that the type of the concordance list is the Hash class Global index's table.
24. host node device according to claim 22, wherein the determining types of devices is used for:
If the judging result is that the value of the index column is non-determined value, it is determined that the type of the concordance list is the range Class global index table.
25. host node device according to claim 23, wherein when the type of the concordance list is Hash class global index When table, the host node device further include:
Distributor, for the allocation rule that the cryptographic Hash according to the index column of Hash class global index table determines, by institute The value of information for stating the index file of Hash class global index table is distributed to corresponding from node, wherein the letter of the index file Breath includes the key assignments and location information value of leaf node in the BPlusTree structure of the index file.
26. host node device according to claim 25, wherein the distributor is used for:
According to the value of the index column and the cryptographic Hash for determining the index column from the number of node;
It is that the corresponding key assignments of i and location information value are distributed to a index text from node of i+1 by the cryptographic Hash of the index column In part, wherein i is natural number.
27. host node device according to claim 26, wherein the positioning device is used for:
The cryptographic Hash of the index column is determined by Hash class global index table;
The slave node where the corresponding index file of Hash class global index table is determined according to the cryptographic Hash.
28. host node device according to claim 24, wherein when the type of the concordance list is range class global index When table, the host node device further include:
Section determining device, for determining range of distribution section according to the sampled result sampled to the value of the index column, And it records each from node and the range of distribution section of its corresponding index column;
Information distribution apparatus, the allocation rule for being determined according to the range of distribution section, by range class global index The information of the index file of table is distributed to corresponding from node, wherein the information of the index file includes the index file BPlusTree structure in leaf node key assignments and location information value.
29. host node device according to claim 28, wherein the information distribution apparatus is used for:
The value of index column of range class global index table is compared with the range of distribution section of the record, determines institute State the range of distribution section where the value of index column;
Range of distribution section where the value of the index column, by the corresponding BPlusTree structure of the value of the index column The key assignments and location information value of middle leaf node are distributed into the corresponding index file from node in the range of distribution section.
30. host node device according to claim 29, wherein the positioning device is used for:
Range of distribution section belonging to the value of the index column is determined by range class global index table;
The slave node where the corresponding index file of range class global index table is determined according to determining range of distribution section.
31. host node device according to claim 19, wherein the host node device further include:
Reception device, it is described from the location information value in the data structure of the index file of node feeding back for receiving.
32. host node device according to claim 31, wherein the host node device further include:
Device is redistributed, is redefined for being worth according to the positional information from the metamessage of the affiliated tables of data of the index column Slave node where the data file;
The location information for the data file for including by the location information for the index file that the concordance list includes and the tables of data It is re-transmitted to the slave node redefined.
33. a kind of slave node device of the inquiry based on concordance list, wherein described to include: from node device
Determining device, the data structure that the location information of the index file for being sent according to host node passes through the index file The value of index column where determining the index file in concordance list;
Acquisition device, for obtaining the information of index file described in the concordance list according to the value of the index column;
Inquiry unit, the position of data file in the metamessage of the affiliated tables of data of index column for being sent according to the host node Information, judgement is described to request corresponding data file with the presence or absence of user query from node, if so, according to the index text Data file described in the acquisition of information of part.
34. according to claim 33 from node device, wherein the information of the index file includes the index file Data structure in key assignments and when location information value, the acquisition device is used for:
The corresponding key assignments of the value of index column described in the data structure for determining the index file according to the value of the index column;
Location information value in the data structure of the index file is obtained according to the corresponding key assignments of the value of the index column.
35. according to claim 34 from node device, wherein the location information in the data structure of the concordance list Value, comprising:
The filename of the affiliated data file of index column and offset of the row in the data file where the index column Amount.
36. according to claim 33 from node device, wherein described from node device further include:
Feedback device, if for there is no user query to request corresponding data file in the index file, by the number The host node is fed back to according to the location information value in structure.
CN201710138728.4A 2017-03-09 2017-03-09 A kind of method and apparatus of the inquiry based on concordance list Active CN106940715B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710138728.4A CN106940715B (en) 2017-03-09 2017-03-09 A kind of method and apparatus of the inquiry based on concordance list

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710138728.4A CN106940715B (en) 2017-03-09 2017-03-09 A kind of method and apparatus of the inquiry based on concordance list

Publications (2)

Publication Number Publication Date
CN106940715A CN106940715A (en) 2017-07-11
CN106940715B true CN106940715B (en) 2019-11-15

Family

ID=59469094

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710138728.4A Active CN106940715B (en) 2017-03-09 2017-03-09 A kind of method and apparatus of the inquiry based on concordance list

Country Status (1)

Country Link
CN (1) CN106940715B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111522791B (en) * 2020-04-30 2023-05-30 电子科技大学 Distributed file repeated data deleting system and method
CN111782632A (en) * 2020-06-28 2020-10-16 百度在线网络技术(北京)有限公司 Data processing method, device, equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1965316A (en) * 2004-04-09 2007-05-16 甲骨文国际公司 Index for accessing XML data
CN101674334A (en) * 2009-09-30 2010-03-17 华中科技大学 Access control method of network storage equipment
CN102033954A (en) * 2010-12-24 2011-04-27 东北大学 Full text retrieval inquiry index method for extensible markup language document in relational database
CN103384878A (en) * 2011-02-25 2013-11-06 数创株式会社 Distributed data base system and data structure for distributed data base

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1965316A (en) * 2004-04-09 2007-05-16 甲骨文国际公司 Index for accessing XML data
CN101674334A (en) * 2009-09-30 2010-03-17 华中科技大学 Access control method of network storage equipment
CN102033954A (en) * 2010-12-24 2011-04-27 东北大学 Full text retrieval inquiry index method for extensible markup language document in relational database
CN103384878A (en) * 2011-02-25 2013-11-06 数创株式会社 Distributed data base system and data structure for distributed data base

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于元素区间编码的GML数据索引方法;於时才等;《兰州理工大学学报》;20130531;第39卷(第3期);第88-93页 *
基于动态结点流行度的B+树索引研究;王冬;《中国优秀硕士学位论文全文数据库信息科技辑》;20150315;I138-2875 *

Also Published As

Publication number Publication date
CN106940715A (en) 2017-07-11

Similar Documents

Publication Publication Date Title
US11323347B2 (en) Systems and methods for social graph data analytics to determine connectivity within a community
CN102859516B (en) Generating improved document classification data using historical search results
CN106960020B (en) A kind of method and apparatus creating concordance list
CN102831122B (en) Data storage method, inquiring method and inquiring device for workflow table
CN104424287B (en) Data query method and apparatus
EP1681823A1 (en) A method and a system to organize and manage a semantic web service discovery
WO2011047474A1 (en) Systems and methods for social graph data analytics to determine connectivity within a community
US8364714B2 (en) Servicing query with access path security in relational database management system
CN108304444A (en) Information query method and device
CN103353901B (en) The orderly management method of table data based on Hadoop distributed file system and system
JP2015069461A (en) Information processing device
US11308066B1 (en) Optimized database partitioning
Van Herwegen et al. Query execution optimization for clients of triple pattern fragments
US11222272B2 (en) Methods and systems for advanced content cacheability determination
CN106649687A (en) Method and device for on-line analysis and processing of large data
CN106940715B (en) A kind of method and apparatus of the inquiry based on concordance list
CN107491463A (en) The optimization method and system of data query
US7797333B1 (en) System and method for returning results of a query from one or more slave nodes to one or more master nodes of a database system
GB2565540A (en) System and methods for joining datasets
US20080082516A1 (en) System for and method of searching distributed data base, and information management device
US7406461B1 (en) System and method for processing a request to perform an activity associated with a precompiled query
US20140067840A1 (en) System and method for retrieving information
CN103020300B (en) Method and device for information retrieval
US20220342887A1 (en) Predictive query processing
US20100036837A1 (en) Information search method and information search apparatus

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder
CP01 Change in the name or title of a patent holder

Address after: 200233 11-12 / F, building B, 88 Hongcao Road, Xuhui District, Shanghai

Patentee after: Star link information technology (Shanghai) Co.,Ltd.

Address before: 200233 11-12 / F, building B, 88 Hongcao Road, Xuhui District, Shanghai

Patentee before: TRANSWARP TECHNOLOGY (SHANGHAI) Co.,Ltd.

PE01 Entry into force of the registration of the contract for pledge of patent right
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: A method and device for querying based on index tables

Effective date of registration: 20230616

Granted publication date: 20191115

Pledgee: Bank of China Limited by Share Ltd. Shanghai Xuhui branch

Pledgor: Star link information technology (Shanghai) Co.,Ltd.

Registration number: Y2023310000252