CN106960020B - A kind of method and apparatus creating concordance list - Google Patents

A kind of method and apparatus creating concordance list Download PDF

Info

Publication number
CN106960020B
CN106960020B CN201710140132.8A CN201710140132A CN106960020B CN 106960020 B CN106960020 B CN 106960020B CN 201710140132 A CN201710140132 A CN 201710140132A CN 106960020 B CN106960020 B CN 106960020B
Authority
CN
China
Prior art keywords
index
data
file
created
concordance list
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710140132.8A
Other languages
Chinese (zh)
Other versions
CN106960020A (en
Inventor
张常淳
吕程
周立
周翠翠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Transwarp Technology Shanghai Co Ltd
Original Assignee
Xinghuan Information Technology (shanghai) Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xinghuan Information Technology (shanghai) Co Ltd filed Critical Xinghuan Information Technology (shanghai) Co Ltd
Priority to CN201710140132.8A priority Critical patent/CN106960020B/en
Publication of CN106960020A publication Critical patent/CN106960020A/en
Application granted granted Critical
Publication of CN106960020B publication Critical patent/CN106960020B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2264Multidimensional index structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2272Management thereof

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Storage Device Security (AREA)

Abstract

The purpose of the application is to provide a kind of method and apparatus for creating concordance list, and the application passes through the structure of the metamessage for the data source corresponding data table that creation is got;Then, the data structure for creating concordance list to be created determines the index column of concordance list to be created described in the tables of data according to the request of user, and the structure of the corresponding metamessage of the concordance list to be created is created according to the index column;The data file that current data row in the data source generates is distributed to from node;The information of the index file of the concordance list to be created is distributed into the corresponding index file from node, and then optimize bottom storage organization, when being applied to data query, the information of index file is provided, to can quickly navigate to the data file of the condition of satisfaction according to the information of index file, the amount of access for greatly reducing data, improves query performance.

Description

A kind of method and apparatus creating concordance list
Technical field
This application involves computer field more particularly to a kind of method and apparatus for creating concordance list.
Background technique
With the development and application of database technology, the data volume of database purchase is growing day by day, while quickly, neatly The complex query processing for carrying out big data quantity also becomes new demand.OLAP (On-Line Analytical Processing, On-line analytical processing), dedicated for supporting complicated analysis operation, stress the decision branch to decision-maker and senior management staff It holds.Under usual condition, OLAP user only needs to inquire a small number of several data column, can be loaded using line storage many useless Data column, cause query performance to decline.The basic query method of distributed column storage reads member letter first from zookeeper Breath, and then go each machine in cluster to read all data files, and then read from each data file and meet condition Record, this mode directly result in that data access amount is excessive, affect the query performance of OLAP.
Apply for content
The purpose of the application is to provide a kind of method and apparatus for creating concordance list, optimizes bottom storage organization, from And it is provided conveniently for data query.
According to the one aspect of the application, a kind of method for creating concordance list is provided, which comprises
Create the structure of the metamessage of the data source corresponding data table got, wherein the metamessage packet of the tables of data The location information of all data files in the tables of data is included, the data file stores in column form;
The data structure for creating concordance list to be created determines rope to be created described in the tables of data according to the request of user Draw the index column of table, it is described to index the part column being classified as in the tables of data and described to be created according to index column creation The structure of the corresponding metamessage of concordance list, wherein the corresponding metamessage of the concordance list to be created includes the index to be created The location information of index file in table;
The data file that current data row in the data source generates is distributed to from node, and according to the tables of data The allocated location information of the structure of metamessage and the data file updates the member letter of the data file corresponding data table Breath;
The information of the index file of the concordance list to be created is distributed into the corresponding index file from node, and more The corresponding metamessage of the new concordance list to be created.
Further, the structure of index file includes BPlusTree structure in the data structure of the concordance list to be created, Wherein, the leaf node of the BPlusTree structure includes key assignments and location information value.
Further, the data structure of the creation concordance list to be created, comprising:
The key assignments of the leaf node is determined according to the value of the index column of the concordance list to be created;
Row where the filename of the affiliated data file of the index column and the index column is in the data file In offset determine the location information value of the leaf node.
Further, the concordance list to be created includes Hash class global index's table and/or range class global index table.
It further, will be in the concordance list to be created when the concordance list to be created is Hash class global index's table The information of the corresponding index file of index column is distributed into the corresponding index file from node of the concordance list to be created, packet It includes:
According to the cryptographic Hash that the value of the index column of Hash class global index table determines, by Hash class global index The corresponding key assignments of the index file of table and location information value are distributed into the corresponding index file from node.
Further, the cryptographic Hash determined according to the value of the index column of Hash class global index table, by the Hash The corresponding key assignments of index file and location information value of class global index table distribute corresponding to Hash class global index table From the index file of node, comprising:
According to the value of the index column of Hash class global index table and described the index column is determined from the number of node Cryptographic Hash;
It is the key assignments and location information of leaf node in the corresponding BPlusTree structure of i by the cryptographic Hash of the index column Value distribution is to i+1 from the index file of node, wherein i is natural number.
Further, when the concordance list to be created is range class global index's table, D will be in the concordance list to be created The information of the corresponding index file of index column is distributed into the corresponding index file from node of the concordance list to be created, packet It includes:
Range of distribution section is determined according to the sampled result that the value to the index column is sampled, and is recorded each from section The range of distribution section of point and its corresponding index column;
The information of the index file of range class global index table is distributed to correspondence according to the range of distribution section Slave node index file in.
Further, the information of the index file of range class global index table is divided according to the range of distribution section It is assigned in the corresponding index file from node of range class global index table, comprising:
The value of index column of range class global index table is compared with the range of distribution section of the record, really Range of distribution section where the value of the fixed index column;
Range of distribution section where the value of the index column, by the corresponding BPlusTree of the value of the index column The key assignments of leaf node and location information value distribute the index file from node corresponding to the range of distribution section in structure.
Further, the information of the corresponding index file of index column in the concordance list to be created is distributed to described wait create When indexing in the corresponding index file from node of table, further includes:
If the existing key assignments in the index file;Then new location information value is merged with old location information value To the corresponding leaf node of the key assignments.
Further, the information of the corresponding index file of index column in the concordance list to be created is distributed to described wait create When indexing in the corresponding index file from node of table, further includes:
If the key assignments is not present in the index file, new leaf section is inserted into the BPlusTree structure Point stores the key assignments and location information value to the new leaf node.
Further, the data file that current data row in the data source generates is distributed corresponding to the tables of data Before node, further includes:
It is when the line number of the data source reaches the size threshold value of preset data file, then current data row is newly-generated For a data file, newly-generated data text is distributed corresponding from node to the tables of data, and updates the data text The metamessage of the corresponding tables of data of part.
Further, the information of the corresponding index file of index column in the concordance list to be created is distributed to described wait create Before indexing in the corresponding index file from node of table, further includes:
When the size of the index file of the concordance list to be created reaches preset index file size threshold value, generate new Index file, and by the updating location information of the new index file to the corresponding metamessage of the concordance list to be created In.
On the other hand according to the application, a kind of equipment for creating concordance list is additionally provided, the equipment includes:
First creating device, the structure of the metamessage for creating the data source corresponding data table got, wherein described The metamessage of tables of data includes the location information of all data files in the tables of data, and the data file is deposited in column form Storage;
Second creating device determines the number according to the request of user for creating the data structure of concordance list to be created It is described to index the part column being classified as in the tables of data according to the index column of concordance list to be created described in table, and according to the rope Draw the structure of the corresponding metamessage of the column creation concordance list to be created, wherein the corresponding metamessage of the concordance list to be created Location information including index file in the concordance list to be created;
Data file distributor, the data file for generating current data row in the data source are distributed to from section Point, and according to the allocated location information of the structure of the metamessage of the tables of data and the data file, update the number According to the metamessage of file corresponding data table;
Distributor, for distributing the information of the index file of the concordance list to be created to the corresponding rope from node In quotation part, and update the corresponding metamessage of the concordance list to be created.
Further, the structure of index file includes BPlusTree structure in the data structure of the concordance list to be created, Wherein, the leaf node of the BPlusTree structure includes key assignments and location information value.
Further, second creating device is used for:
The key assignments of the leaf node is determined according to the value of the index column of the concordance list to be created;
Row where the filename of the affiliated data file of the index column and the index column is in the data file In offset determine the location information value of the leaf node.
Further, the concordance list to be created includes Hash class global index's table and/or range class global index table.
Further, when the concordance list to be created is Hash class global index's table, the distributor is used for:
According to the cryptographic Hash that the value of the index column of Hash class global index table determines, by Hash class global index The corresponding key assignments of the index file of table and location information value are distributed into the corresponding index file from node.
Further, the distributor is used for:
According to the value of the index column of Hash class global index table and described the index column is determined from the number of node Cryptographic Hash;
It is the key assignments and location information of leaf node in the corresponding BPlusTree structure of i by the cryptographic Hash of the index column Value distribution is to i+1 from the index file of node, wherein i is natural number.
Further, when the concordance list to be created is range class global index's table, the distributor is used for:
Range of distribution section is determined according to the sampled result that the value to the index column is sampled, and is recorded each from section The range of distribution section of point and its corresponding index column;
The information of the index file of range class global index table is distributed to correspondence according to the range of distribution section Slave node index file in.
Further, the distributor is used for:
The value of index column of range class global index table is compared with the range of distribution section of the record, really Range of distribution section where the value of the fixed index column;
Range of distribution section where the value of the index column, by the corresponding BPlusTree of the value of the index column The key assignments of leaf node and location information value distribute the index file from node corresponding to the range of distribution section in structure.
Further, the distributor is also used to:
If the existing key assignments in the index file;Then new location information value is merged with old location information value To the corresponding leaf node of the key assignments.
Further, the distributor is also used to:
If the key assignments is not present in the index file, new leaf section is inserted into the BPlusTree structure Point stores the key assignments and location information value to the new leaf node.
Further, the equipment further include:
Data file device is generated, the size threshold value of preset data file is reached for the line number when the data source When, then current data row new life is become into a data file, newly-generated data text is distributed corresponding to the tables of data From node, and update the metamessage of the corresponding tables of data of the data file.
Further, the equipment further include:
Index file device is generated, the size for the index file when the concordance list to be created reaches preset index When file size threshold value, generate new index file, and by the updating location information of the new index file to described wait create It indexes in the corresponding metamessage of table.
Compared with prior art, the structure of the metamessage for the data source corresponding data table that the application is got by creation, Wherein, the metamessage of the tables of data includes the location information of all data files in the tables of data, the data file with The form of column stores;Then, the data structure for creating concordance list to be created determines institute in the tables of data according to the request of user The index column of concordance list to be created is stated, it is described to index the part column being classified as in the tables of data, and created according to the index column The structure of the corresponding metamessage of the concordance list to be created, wherein the corresponding metamessage of the concordance list to be created includes described The location information of index file in concordance list to be created;Then, data file current data row in the data source generated Distribution is to from node, and according to the allocated location information of the structure of the metamessage of the tables of data and the data file, Update the metamessage of the data file corresponding data table;The information of the index file of the concordance list to be created is distributed to right In the index file for the slave node answered, and the corresponding metamessage of the concordance list to be created is updated, and then optimizes bottom storage Structure provides the information of index file, to can quickly position according to the information of index file when being applied to data query To the data file for the condition that meets, the amount of access of data is greatly reduced, query performance is improved.
Detailed description of the invention
By reading a detailed description of non-restrictive embodiments in the light of the attached drawings below, the application's is other Feature, objects and advantages will become more apparent upon:
Fig. 1 shows a kind of method flow schematic diagram of creation concordance list according to the application one aspect;
Fig. 2 shows the distributed system frames of the embodiment in the application;
Fig. 3 shows the data source of the embodiment in the application;
Fig. 4 shows the query structure sentence of data source in the embodiment in the application;
Fig. 5 shows the structural schematic diagram of the data file of tables of data in the embodiment in the application;
Fig. 6 shows the information distribution schematic diagram of index file after creation concordance list in the embodiment in the application;
Fig. 7 shows the structural schematic diagram of an index file of Hash class concordance list in the embodiment in the application;
Fig. 8 shows the structural schematic diagram of an index file of range class concordance list in the embodiment in the application;
Fig. 9 shows a kind of structural schematic diagram of the equipment of creation concordance list according to the application other side.
The same or similar appended drawing reference represents the same or similar component in attached drawing.
Specific embodiment
The application is described in further detail with reference to the accompanying drawing.
In a typical configuration of this application, terminal, the equipment of service network and trusted party include one or more Processor (CPU), input/output interface, network interface and memory.
Memory may include the non-volatile memory in computer-readable medium, random access memory (RAM) and/or The forms such as Nonvolatile memory, such as read-only memory (ROM) or flash memory (flash RAM).Memory is computer-readable medium Example.
Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method Or technology come realize information store.Information can be computer readable instructions, data structure, the module of program or other data. The example of the storage medium of computer includes, but are not limited to phase change memory (PRAM), static random access memory (SRAM), moves State random access memory (DRAM), other kinds of random access memory (RAM), read-only memory (ROM), electric erasable Programmable read only memory (EEPROM), flash memory or other memory techniques, read-only disc read only memory (CD-ROM) (CD-ROM), Digital versatile disc (DVD) or other optical storage, magnetic cassettes, magnetic tape disk storage or other magnetic storage devices or Any other non-transmission medium, can be used for storage can be accessed by a computing device information.As defined in this article, computer Readable medium does not include non-temporary computer readable media (transitory media), such as the data-signal and carrier wave of modulation.
Fig. 1 shows the one aspect according to the application, a kind of method flow schematic diagram of the creation concordance list provided, described Method includes: step S11~step S14, is applied in distributed system,
In step s 11, the structure of the metamessage of the data source corresponding data table got is created, wherein the data The metamessage of table includes the location information of all data files in the tables of data, and the data file stores in column form; In one embodiment of the application, distributed global index table is created, then is believed firstly the need of the member of creation data source corresponding data table Breath, metamessage include all data files location information on each machine hard disk in the cluster that tables of data is included.It needs Illustrate, needs to create the structure of tables of data before the corresponding metamessage of creation tables of data, the structure of tables of data includes data The structure of file includes the data of data source in tables of data, the data in data source are stored as tables of data, and tables of data is deposited Storage form is data file.Distributed system frame diagram in one embodiment of the application is as shown in Fig. 2, include client (client), host node (Master), one or several from node (slave) and zookeeper, can will be in each tables of data Data file is stored in solid state hard disk (SSD), and in the embodiment of the present application, the metamessage of tables of data is stored in zookeeper In, wherein the zookeeper is the coordination system of performance distributed application.
In step s 12, the data structure for creating concordance list to be created determines in the tables of data according to the request of user The index column of the concordance list to be created, it is described to index the part column being classified as in the tables of data, and created according to the index column Build the structure of the corresponding metamessage of the concordance list to be created, wherein the corresponding metamessage of the concordance list to be created includes institute State the location information of index file in concordance list to be created;Above-described embodiment is connect, according to actual needs, determines and needs to create the overall situation The column of index, i.e., selected part column are used as index column from tables of data, and then distributed complete using selected index column creation The corresponding metamessage of office's concordance list, wherein the metamessage include global index's table included all index files in the cluster Location information on each machine hard disk.The concordance list to be created is created successfully can be used for number in distributed system afterwards According to inquiry, can satisfy while user only needs to inquire the demand of a small number of several data column and avoid reading all data File greatly reduces the data volume of access.
It should be noted that creating the concordance list pair to be created according to the index column in one embodiment of the application When the structure for the metamessage answered, it can be and a corresponding concordance list is established according to each index column, then create concordance list The structure of corresponding metamessage is also possible to when creating concordance list according to several in the corresponding index column of user demand certainly Index column establishes a concordance list jointly.
Wherein, the data structure of global index's table is created, the structure of global index's table includes the structure of index file.It is preferred that Ground, the structure of index file includes BPlusTree structure in the data structure of the concordance list to be created, wherein described The leaf node of BPlusTree structure includes key assignments and location information value.Here, the index file of global index's table can be used BPlusTree structure carries out tissue storage, and the leaf node of BPlusTree structure includes tuple<key assignments, and location information value>(< Key, value >), BPlusTree structure can effectively be ranked up the index column data of input, so as to check quickly fastly Ask the position of the corresponding record of index column, the query task of quick response data.
In one specific embodiment of the application, the data source of example as shown in Figure 3, the data source is recorded for 1000 totally, (id), name (name), age (age), four column data of gender (sex) are identified including address.User shown in Fig. 4 is directed to the number According to the inquiry sql sentence in source, user needs the data arranged id to be filtered screening, thus needs to arrange id creation global index Table, wherein table A is expressed as the corresponding tables of data of data source shown in Fig. 3, Sql 1: " Select*from table A What where id=1 " indicated inquiry 1 sentence of sql execution is to inquire the data that id is 1 in tables of data A to arrange corresponding data, His Sql sentence indicates similar with its.
In step s 13, the data file that current data row in the data source generates is distributed to from node, and according to It is corresponding to update the data file for the allocated location information of the structure of the metamessage of the tables of data and the data file The metamessage of tables of data;Here, distributed column storage platform as shown in Figure 2, distributed column storage platform storing data When source data, data file is uniformly distributed each machine in the cluster, every machine according to load balance principle by Master Several data files (FileSegment) containing the tables of data, the structure of FileSegment is as shown in figure 5, data file It is stored in a manner of column storage, when OLAP user only needs to inquire a small number of several data column, column stores energy The corresponding data for needing to read column are only enough provided for user, and then greatly improve the search efficiency of OLPA.
Preferably, before step S13, further includes: step S13 ', when the line number of the data source reaches preset data When the size threshold value of file, then current data row new life is become into a data file, newly-generated data text is distributed to institute It states that tables of data is corresponding from node, and updates the metamessage of the corresponding tables of data of the data file.
In one embodiment of the application, whenever the line number of the data in data in EMS memory source is equal to the size of a data file When range, i.e., current data line is generated as a data file, while the host node (Master) in cluster is according to load Homeostatic principle is distributed into cluster in the hard disk of a certain machine and is stored, and updates the corresponding metamessage of data list file, this When, also start to create for global index's table of the data file.
In step S14, the information of the index file of the concordance list to be created is distributed to the corresponding rope from node In quotation part, and update the corresponding metamessage of the concordance list to be created.Here, by after the data file distribution storage of generation, It needs the information by its corresponding index file to be allocated into the index file of corresponding slave, and updates global index's table In corresponding metamessage.Wherein, the information of index file may include the value, described of the index column in concordance list that creation is completed Offset etc. of the row in the data file where the filename of the affiliated data file of index column, the index column.
In one embodiment of the application, the concordance list to be created includes Hash class global index's table or range class overall situation rope Draw table.Here, the allocation strategy of the index file of Hash class and range class global index table in the cluster is slightly different.Hash class Global index's table is the machine being distributed in distributed type assemblies according to the cryptographic Hash of index column decision index file, and range class Global index's table is then that index file is assigned to corresponding machine according to the range of index train value.
Preferably, in step s 12, the leaf node is determined according to the value of the index column of the concordance list to be created Key assignments;According to rope in the metamessage of the location information of data file, the concordance list to be created in the metamessage of the tables of data Offset in the location information of quotation part and the index column indexed file determines the location information value of the leaf node. In one embodiment of the application, the index file of Hash class global index's table and Hash class global index table is all made of BPlusTree structure carries out tissue storage.The leaf node of BPlusTree includes tuple<key, and value>, wherein key value is The value of index column, derive from data file, value be the index column where data file information and meet being recorded in for condition Offset in the index file.Data file is evenly distributed in each in cluster by Master according to load balancing principle Machine.Every machine contains several data files (FileSegment) of the tables of data, the structure of FileSegment such as Fig. 5 It is shown;When the corresponding global index's table of distributed column storage platform creation data source, the corresponding rope of Hash class global index table Quotation part is HashIndexFileSegment, and the corresponding index file of range class global index table is RangeIndexFileSegment。
Preferably, when the concordance list to be created is Hash class global index's table, in step S14, according to the Kazakhstan The cryptographic Hash that the value of the index column of Xi Lei global index table determines, the index file of Hash class global index table is corresponding Key assignments and location information value are distributed into the corresponding index file from node of Hash class global index table.Specifically, exist In step S14, according to the value of the index column of Hash class global index table and described the index is determined from the number of node The cryptographic Hash of the index column is that the key assignments of leaf node and position are believed in the corresponding BPlusTree structure of i by the cryptographic Hash of column Breath value is distributed to i+1 from the index file of node, wherein i is natural number, thus reasonably by the information of index file Distribution reaches equally distributed purpose into each slave.It should be noted that by determining cryptographic Hash, so that it is determined that rope The information of quotation part is assigned the information (being such as assigned to machine 1) of machine extremely, that is, determined by cryptographic Hash it is corresponding < Key, the information of the machine of value>distribution extremely, only as<key, value>be assigned into the index file of corresponding machine When, leaf point is just really created completion in corresponding BPlusTree structure.
In one embodiment of the application, it is assumed that have 1 master, n platform slave in distributed type assemblies.Hash class overall situation rope Draw table by seeking cryptographic Hash to index column, and the value of index column is the key value of leaf node in BPlusTree structure, index column The offset of the information and index column corresponding record of corresponding data file in the data file is value.By key and key Cryptographic Hash be i the location information value of record be assigned in the index file of i+1 platform slave, wherein cryptographic Hash can To be determined according to slave number in the value of index column and distributed type assemblies.
Preferably, when the concordance list to be created is range class global index's table, in step S14, according to described The sampled result that the value of index column is sampled determines range of distribution section, and records each from node and its corresponding index The range of distribution section of column;The information of the index file of range class global index table is divided according to the range of distribution section It is assigned in the corresponding index file from node of range class global index table.Specifically, in step S14, by the model The value of the index column of Wei Lei global index table is compared with the range of distribution section of the record, determines the value of the index column The range of distribution section at place;Range of distribution section where the value of the index column, the value of the index column is corresponding BPlusTree structure in leaf node key assignments and location information value distribute it is corresponding from node to the range of distribution section Index file.
In one embodiment of the application, it is assumed that have 1 Master, n platform slave in distributed type assemblies.Range class overall situation rope Draw table to sample by the value of the index column to data list file, n range is set according to sampled result, so that each range Interior data volume is evenly distributed as much as possible, and the range intervals of every machine and its corresponding index column are recorded in Master. When generating an index file, the value of the index column of index file can be compared with n range in Master, according to Offset of the affiliated range areas by the corresponding data file information of the column and in the data file, i.e.,<key, value>update Into the index file of the corresponding slave in range areas.
In one embodiment of the application, in step S14, if the existing key assignments in the index file;It then will be new Location information value and old location information value be fused to the corresponding leaf node of the key assignments.If not deposited in the index file In the key assignments, then it is inserted into new leaf node in the BPlusTree structure, the key assignments and location information value are stored To the new leaf node.
Here, as general<key, when value>be assigned to the index file of corresponding slave, if in the index file There are the corresponding leaf nodes of key value, then merge new value with old value;If there is no should in the index file Key value is then inserted into new leaf node, and general<key, value>tuple storage are into new leaf node.
Preferably, the method also includes step S14 ', when the size of the index file of the concordance list to be created reaches When preset index file size threshold value, new index file is generated, and by the updating location information of the new index file Into the corresponding metamessage of the concordance list to be created.In one embodiment of the application, when Hash class global index's table and range When the maximum magnitude for the size that the index file of class global index table respectively reaches an index file, new index text is regenerated Part, and by the updating location information of index file into the corresponding metamessage of global index's table.
There are 1 Master, 3 slave in one preferred embodiment of the application, such as in distributed system, by data text The maximum magnitude of part is set as 25 rows, and when the number of data lines of input is equal to maximum magnitude 25, distributed column storage platform will Current data line is output to a certain machine in cluster according to load balancing principle as a data file, Master Corresponding data file (FileSegment) in SSD, and the corresponding metamessage of more new data table.For Hash class global index Table, key are the value of id, and value is the offset of the information and id corresponding record of the corresponding data file of id in the data file Information.Take Hash by key value, by Hash result be 0,1,2<key, value>tuple be separately dispensed into cluster the 1st, 2, in the index file (HashIndexFileSegment) of the Hash class concordance list of 3 machines, as shown in fig. 6, working as id=1 When, Hash result 1, then by its<key, value>storage is in the 2nd slave into cluster, as id=2, Hash result Be 2, then by its<key, value>storage is in the 3rd slave into cluster;As id=3, Hash result 0, then by its < Key, value > storage is in the 1st slave into cluster, wherein and key | key%3=cryptographic Hash } refer to key and slave Number (here number be 3) between carry out remainder, obtain cryptographic Hash, key is the value of id here.
When creating range class global index's table, to range division result such as Fig. 6 institute after the key value sampling of data source Show, range division principle should make the record number in each range intervals close as far as possible, the result of range partition include [1, 333], [334,666] and [667,999] three sections, and the corresponding range area three slave is stored in Master Domain, it is when the value of the index column in data block meets some range intervals, the storage of its index information is corresponding to the range intervals Slave machine range class concordance list data file (RangeIndexFileSegment) in, with BPlusTree leaf The form of node exists, and such as id=5, key value falls in first range intervals, then by its corresponding<key, value>information is deposited In the index file for storing up First machine.
It should be noted that either Hash class concordance list or range class concordance list, in general<key, value>distribution is extremely When in the index file of corresponding slave, need to judge in the structure BPlusTree of index file whether existing corresponding rope Draw the id of column, if having existed some id in BPlusTree, directly by the id arrange corresponding data block (Block) information and The line number being listed in Block is merged with original value.If the key value is not present in the index file, insertion is new Leaf node, general<key, value>storage is into new leaf node.
When the maximum magnitude for the size that the index file of above two class index respectively reaches an index file, regeneration New index file, and by the updating location information of index file into the corresponding metamessage of global index's table;Pass through the application The method of the creation concordance list, obtains the corresponding index file of data source, Fig. 7 shows the data in one embodiment of the application The index file of one Hash class concordance list of source creation, is 0 since all key values take Hash result, so the index is literary Part (HashIndexFileSegment) is located in the slave 1 of cluster, storage organization BPlusTree, leaf node storage <key, value>, if the correspondence value of key=3 is fs1:4, indicate that the record of id=3 is located in FileSegment1, and Offset is 4;Fig. 8 shows the index file of a range class concordance list of the creation of the data source in one embodiment of the application, institute There is key value to be respectively less than 333, which is located in the slave 1 of cluster, storage knot Structure is BPlusTree, and leaf node stores<key, value>, if the correspondence value of key=1 is fs1:1:2, indicate id= 1 record is located in FileSegment1, and offset is 1 and 2.
In conclusion optimizing bottom storage organization by the concordance list of herein described creation, can be applied to be distributed The inquiry of data in formula system meets the number of the part column of user demand by Hash class concordance list and the inquiry of range class concordance list According to avoiding reading all data files, greatly reduce the data volume of access, improve the processing speed of on-line analytical processing task Degree.Certainly, the concordance list of herein described creation can also be applied to using scenes such as concordance list distributing system resources, not It is confined to the inquiry applied to data.
Fig. 9 is shown according to further aspect of the application, a kind of device structure schematic diagram of the creation concordance list provided, institute Stating equipment includes: the first creating device 11, the second creating device 12, data file distributor 13 and distributor 14, application In distributed system,
First creating device 11, the structure of the metamessage for creating the data source corresponding data table got, wherein institute The metamessage for stating tables of data includes the location information of all data files in the tables of data, and the data file is in column form Storage;In one embodiment of the application, distributed global index table is created, then firstly the need of creation data source corresponding data table Metamessage, metamessage include all data files location information on each machine hard disk in the cluster that tables of data is included. It should be noted that needing to create the structure of tables of data before the corresponding metamessage of creation tables of data, the structure of tables of data includes The structure of data file includes the data of data source in tables of data, and the data in data source are stored as tables of data, and tables of data Storage form be data file.Distributed system frame diagram in one embodiment of the application is as shown in Fig. 2, include client (client), host node (Master), one or several from node (slave) and zookeeper, can will be in each tables of data Data file is stored in solid state hard disk (SSD), and in the embodiment of the present application, the metamessage of tables of data is stored in zookeeper In, wherein the zookeeper is the coordination system of performance distributed application.
Second creating device 12, for creating the data structure of concordance list to be created, according to the request of user determination The index column of concordance list to be created described in tables of data, it is described to index the part column being classified as in the tables of data, and according to described Index column creates the structure of the corresponding metamessage of the concordance list to be created, wherein the corresponding member letter of the concordance list to be created Breath includes the location information of index file in the concordance list to be created;Above-described embodiment is connect, according to actual needs, determines and needs The column for creating global index, i.e., selected part column are used as index column from tables of data, and then are created using selected index column The corresponding metamessage of distributed global index's table, wherein the metamessage includes all index files that global index's table is included Location information on each machine hard disk in the cluster.The concordance list to be created is created successfully can be used for distribution afterwards The inquiry of data in system can satisfy while user only needs to inquire the demand of a small number of several data column and avoid reading institute Some data files greatly reduce the data volume of access.
It should be noted that creating the concordance list pair to be created according to the index column in one embodiment of the application When the structure for the metamessage answered, it can be and a corresponding concordance list is established according to each index column, then create concordance list The structure of corresponding metamessage is also possible to when creating concordance list according to several in the corresponding index column of user demand certainly Index column establishes a concordance list jointly.
Wherein, the data structure of global index's table is created, the structure of global index's table includes the structure of index file.It is preferred that Ground, the structure of index file includes BPlusTree structure in the data structure of the concordance list to be created, wherein described The leaf node of BPlusTree structure includes key assignments and location information value.Here, the index file of global index's table can be used BPlusTree structure carries out tissue storage, and the leaf node of BPlusTree structure includes tuple<key assignments, and location information value>(< Key, value >), BPlusTree structure can effectively be ranked up the index column data of input, so as to check quickly fastly Ask the position of the corresponding record of index column, the query task of quick response data.
In one specific embodiment of the application, the data source of example as shown in Figure 3, the data source is recorded for 1000 totally, (id), name (name), age (age), four column data of gender (sex) are identified including address.User shown in Fig. 4 is directed to the number According to the inquiry sql sentence in source, user needs the data arranged id to be filtered screening, thus needs to arrange id creation global index Table, wherein table A is expressed as the corresponding tables of data of data source shown in Fig. 3, Sql 1: " Select*from table A What where id=1 " indicated inquiry 1 sentence of sql execution is to inquire the data that id is 1 in tables of data A to arrange corresponding data, His Sql sentence indicates that meaning is similar with its.
Data file distributor 13, for by the data source current data row generate data file distribute to from Node, and according to the allocated location information of the structure of the metamessage of the tables of data and the data file, described in update The metamessage of data file corresponding data table;Here, distributed column storage platform as shown in Figure 2, distributed column storage When platform storing data source data, data file is uniformly distributed each machine in the cluster according to load balance principle by Master Device, every machine contain several data files (FileSegment) of the tables of data, the structure of FileSegment such as Fig. 5 institute To show, data file is stored in such a way that column stores, when OLAP user only needs to inquire a small number of several data column, Column storage can only provide the corresponding data column for needing to read for user, and then greatly improve the search efficiency of OLPA.
Preferably, the equipment further include: generate data file device 13 ', reach for the line number when the data source When the size threshold value of preset data file, then current data row new life is become into a data file, by newly-generated data Text distribution is corresponding from node to the tables of data, and updates the metamessage of the corresponding tables of data of the data file.
In one embodiment of the application, whenever the line number of the data in data in EMS memory source is equal to the size of a data file When range, i.e., current data line is generated as a data file, while the host node (Master) in cluster is according to load Homeostatic principle is distributed into cluster in the hard disk of a certain machine and is stored, and updates the corresponding metamessage of data list file, this When, also start to create for global index's table of the data file.
Distributor 14, for distributing the information of the index file of the concordance list to be created to corresponding from node In index file, and update the corresponding metamessage of the concordance list to be created.Here, distributing the data file of generation to storage Afterwards, it needs the information by its corresponding index file to be allocated into the index file of corresponding slave, and updates global index In the corresponding metamessage of table.Wherein, the information of index file may include the value of the index column in the concordance list of creation completion, institute State offset etc. of the row where the filename of the affiliated data file of index column, the index column in the data file.
In one embodiment of the application, the concordance list to be created includes Hash class global index's table or range class overall situation rope Draw table.Here, the allocation strategy of the index file of Hash class and range class global index table in the cluster is slightly different.Hash class Global index's table is the machine being distributed in distributed type assemblies according to the cryptographic Hash of index column decision index file, and range class Global index's table is then that index file is assigned to corresponding machine according to the range of index train value.
Preferably, the second creating device 12, the value for the index column according to the concordance list to be created determine the leaf The key assignments of child node;According to the member of the location information of data file, the concordance list to be created in the metamessage of the tables of data Offset in information in the location information of index file and the index column indexed file determines the position of the leaf node Set the value of information.In one embodiment of the application, the index file of Hash class global index's table and Hash class global index table is adopted Tissue storage is carried out with BPlusTree structure.The leaf node of BPlusTree include tuple<key, value>, wherein key value For the value of index column, data file is derived from, value is the record of the data file information and the condition that meets where the index column Offset in the index file.Data file is evenly distributed in each in cluster by Master according to load balancing principle A machine.Every machine contains several data files (FileSegment) of the tables of data, and the structure of FileSegment is such as Shown in Fig. 5;When the corresponding global index's table of distributed column storage platform creation data source, Hash class global index table is corresponding Index file is HashIndexFileSegment, and the corresponding index file of range class global index table is RangeIndexFileSegment。
Preferably, when the concordance list to be created is Hash class global index's table, distributor 14 is used for according to The cryptographic Hash that the value of the index column of Hash class global index table determines, the index file of Hash class global index table is corresponding Key assignments and location information value distribute into the corresponding index file from node of Hash class global index table.Specifically, Distributor 14, for according to the value of the index column of Hash class global index table and it is described from the number of node determine described in The cryptographic Hash of the index column is the key assignments of leaf node and position in the corresponding BPlusTree structure of i by the cryptographic Hash of index column It sets the value of information to distribute to i+1 from the index file of node, wherein i is natural number, thus reasonably by index file Information is distributed into each slave, and equally distributed purpose is reached.It should be noted that by determining cryptographic Hash, so that it is determined that The information of index file is assigned the information (being such as assigned to machine 1) of machine extremely, that is, has determined cryptographic Hash is corresponding <key, the information of the machine of value>distribution extremely, only as<key, value>be assigned to the index file of corresponding machine When middle, leaf point is just really created completion in corresponding BPlusTree structure.
In one embodiment of the application, it is assumed that have 1 master, n platform slave in distributed type assemblies.Hash class overall situation rope Draw table by seeking cryptographic Hash to index column, and the value of index column is the key value of leaf node in BPlusTree structure, index column The offset of the information and index column corresponding record of corresponding data file in the data file is value.By key and key Cryptographic Hash be i the location information value of record be assigned in the index file of i+1 platform slave, wherein cryptographic Hash can To be determined according to slave number in the value of index column and distributed type assemblies.
Preferably, when the concordance list to be created is range class global index's table, distributor 14, for according to institute It states the sampled result that the value of index column is sampled and determines range of distribution section, and record each from node and its corresponding rope Draw the range of distribution section of column;According to the range of distribution section by the information of the index file of range class global index table Distribution is into the corresponding index file from node of range class global index table.Specifically, distributor 14 are used for institute The value of index column for stating range class global index table is compared with the range of distribution section of the record, determines the index column Value where range of distribution section;Range of distribution section where the value of the index column, by the value of the index column In corresponding BPlusTree structure the key assignments of leaf node and location information value distribute to the range of distribution section it is corresponding from The index file of node.
In one embodiment of the application, it is assumed that have 1 Master, n platform slave in distributed type assemblies.Range class overall situation rope Draw table to sample by the value of the index column to data list file, n range is set according to sampled result, so that each range Interior data volume is evenly distributed as much as possible, and the range intervals of every machine and its corresponding index column are recorded in Master. When generating an index file, the value of the index column of index file can be compared with n range in Master, according to Offset of the affiliated range areas by the corresponding data file information of the column and in the data file, i.e.,<key, value>update Into the index file of the corresponding slave in range areas.
In one embodiment of the application, distributor 14, if for the existing key assignments in the index file;Then will New location information value and old location information value are fused to the corresponding leaf node of the key assignments.If in the index file not There are the key assignments, then are inserted into new leaf node in the BPlusTree structure, the key assignments and location information value are deposited It stores up to the new leaf node.
Here, as general<key, when value>be assigned to the index file of corresponding slave, if in the index file There are the corresponding leaf nodes of key value, then merge new value with old value;If there is no should in the index file Key value is then inserted into new leaf node, and general<key, value>tuple storage are into new leaf node.
Preferably, the equipment further include: index file device 14 ' is generated, for working as the rope of the concordance list to be created When the size of quotation part reaches preset index file size threshold value, new index file is generated, and the new index is literary The updating location information of part is into the corresponding metamessage of the concordance list to be created.In one embodiment of the application, when Hash class When the maximum magnitude for the size that the index file of global index's table and range class global index table respectively reaches an index file, New index file is regenerated, and by the updating location information of index file into the corresponding metamessage of global index's table.
There are 1 Master, 3 slave in one preferred embodiment of the application, such as in distributed system, by data text The maximum magnitude of part is set as 25 rows, and when the number of data lines of input is equal to maximum magnitude 25, distributed column storage platform will Current data line is output to a certain machine in cluster according to load balancing principle as a data file, Master Corresponding data file (FileSegment) in SSD, and the corresponding metamessage of more new data table.For Hash class global index Table, key are the value of id, and value is the offset of the information and id corresponding record of the corresponding data file of id in the data file Information.Take Hash by key value, by Hash result be 0,1,2<key, value>tuple be separately dispensed into cluster the 1st, 2, in the index file (HashIndexFileSegment) of the Hash class concordance list of 3 machines, as shown in fig. 6, working as id=1 When, Hash result 1, then by its<key, value>storage is in the 2nd slave into cluster, as id=2, Hash result Be 2, then by its<key, value>storage is in the 3rd slave into cluster;As id=3, Hash result 0, then by its < Key, value > storage is in the 1st slave into cluster, wherein and key | key%3=cryptographic Hash } refer to key and slave Number (here number be 3) between carry out remainder, obtain cryptographic Hash, key is the value of id here.
When creating range class global index's table, to range division result such as Fig. 6 institute after the key value sampling of data source Show, range division principle should make the record number in each range intervals close as far as possible, the result of range partition include [1, 333], [334,666] and [667,999] three sections, and the corresponding range area three slave is stored in Master Domain, it is when the value of the index column in data block meets some range intervals, the storage of its index information is corresponding to the range intervals Slave machine range class concordance list data file (RangeIndexFileSegment) in, with BPlusTree leaf The form of node exists, and such as id=5, key value falls in first range intervals, then by its corresponding<key, value>information is deposited In the index file for storing up First machine.
It should be noted that either Hash class concordance list or range class concordance list, in general<key, value>distribution is extremely When in the index file of corresponding slave, need to judge in the structure BPlusTree of index file whether existing corresponding rope Draw the id of column, if having existed some id in BPlusTree, directly by the id arrange corresponding data block (Block) information and The line number being listed in Block is merged with original value.If the key value is not present in the index file, insertion is new Leaf node, general<key, value>storage is into new leaf node.
When the maximum magnitude for the size that the index file of above two class index respectively reaches an index file, regeneration New index file, and by the updating location information of index file into the corresponding metamessage of global index's table;Pass through the application The method of the creation concordance list, obtains the corresponding index file of data source, Fig. 7 shows the data in one embodiment of the application The index file of one Hash class concordance list of source creation, is 0 since all key values take Hash result, so the index is literary Part (HashIndexFileSegment) is located in the slave 1 of cluster, storage organization BPlusTree, leaf node storage <key, value>, if the correspondence value of key=3 is fs1:4, indicate that the record of id=3 is located in FileSegment1, and Offset is 4;Fig. 8 shows the index file of a range class concordance list of the creation of the data source in one embodiment of the application, institute There is key value to be respectively less than 333, which is located in the slave 1 of cluster, storage knot Structure is BPlusTree, and leaf node stores<key, value>, if the correspondence value of key=1 is fs1:1:2, indicate id= 1 record is located in FileSegment1, and offset is 1 and 2.
In conclusion optimizing bottom storage organization by the concordance list of herein described creation, can be applied to be distributed The inquiry of data in formula system meets the number of the part column of user demand by Hash class concordance list and the inquiry of range class concordance list According to avoiding reading all data files, greatly reduce the data volume of access, improve the processing speed of on-line analytical processing task Degree.Certainly, the concordance list of herein described creation can also be applied to using scenes such as concordance list distributing system resources, not It is confined to the inquiry applied to data.
Obviously, those skilled in the art can carry out various modification and variations without departing from the essence of the application to the application Mind and range.In this way, if these modifications and variations of the application belong to the range of the claim of this application and its equivalent technologies Within, then the application is also intended to include these modifications and variations.
It should be noted that the application can be carried out in the assembly of software and/or software and hardware, for example, can adopt With specific integrated circuit (ASIC), general purpose computer or any other realized similar to hardware device.In one embodiment In, the software program of the application can be executed to implement the above steps or functions by processor.Similarly, the application Software program (including relevant data structure) can be stored in computer readable recording medium, for example, RAM memory, Magnetic or optical driver or floppy disc and similar devices.In addition, hardware can be used to realize in some steps or function of the application, example Such as, as the circuit cooperated with processor thereby executing each step or function.
In addition, a part of the application can be applied to computer program product, such as computer program instructions, when its quilt When computer executes, by the operation of the computer, it can call or provide according to the present processes and/or technical solution. And the program instruction of the present processes is called, it is possibly stored in fixed or moveable recording medium, and/or pass through Broadcast or the data flow in other signal-bearing mediums and transmitted, and/or be stored according to described program instruction operation In the working storage of computer equipment.Here, including a device according to one embodiment of the application, which includes using Memory in storage computer program instructions and processor for executing program instructions, wherein when the computer program refers to When enabling by processor execution, method and/or skill of the device operation based on aforementioned multiple embodiments according to the application are triggered Art scheme.
It is obvious to a person skilled in the art that the application is not limited to the details of above-mentioned exemplary embodiment, Er Qie In the case where without departing substantially from spirit herein or essential characteristic, the application can be realized in other specific forms.Therefore, no matter From the point of view of which point, the present embodiments are to be considered as illustrative and not restrictive, and scope of the present application is by appended power Benefit requires rather than above description limits, it is intended that all by what is fallen within the meaning and scope of the equivalent elements of the claims Variation is included in the application.Any reference signs in the claims should not be construed as limiting the involved claims.This Outside, it is clear that one word of " comprising " does not exclude other units or steps, and odd number is not excluded for plural number.That states in device claim is multiple Unit or device can also be implemented through software or hardware by a unit or device.The first, the second equal words are used to table Show title, and does not indicate any particular order.

Claims (20)

1. a kind of method for creating concordance list, wherein the described method includes:
Create the structure of the metamessage of the data source corresponding data table got, wherein the metamessage of the tables of data includes institute The location information of all data files in tables of data is stated, the data file stores in column form;
The data structure for creating concordance list to be created determines concordance list to be created described in the tables of data according to the request of user Index column, it is described to index the part column being classified as in the tables of data, and the index to be created is created according to the index column The structure of the corresponding metamessage of table, wherein the corresponding metamessage of the concordance list to be created includes in the concordance list to be created The location information of index file;
The data file that current data row in the data source generates is distributed to from node, and according to the member of the tables of data and is believed The allocated location information of the structure of breath and the data file updates the metamessage of the data file corresponding data table;
The information of the index file of the concordance list to be created is distributed into the corresponding index file from node, and updates institute State the corresponding metamessage of concordance list to be created;
Wherein, the structure of index file includes BPlusTree structure in the data structure of the concordance list to be created, wherein institute The leaf node for stating BPlusTree structure includes key assignments and location information value;
Wherein, the data structure of the concordance list to be created is created, comprising:
The key assignments of the leaf node is determined according to the value of the index column of the concordance list to be created;
Row where the filename of the affiliated data file of the index column and the index column is in the data file Offset determines the location information value of the leaf node.
2. according to the method described in claim 1, wherein, the concordance list to be created include Hash class global index's table and/or Range class global index table.
3. according to the method described in claim 2, wherein, when the concordance list to be created is Hash class global index's table, inciting somebody to action The information of the corresponding index file of index column is distributed corresponding from section to the concordance list to be created in the concordance list to be created In the index file of point, comprising:
According to the cryptographic Hash that the value of the index column of Hash class global index table determines, by Hash class global index table The corresponding key assignments of index file and location information value are distributed into the corresponding index file from node.
4. according to the method described in claim 2, wherein, the Hash that is determined according to the index column of Hash class global index table Value distributes the corresponding key assignments of index file of Hash class global index table and location information value global to the Hash class In the corresponding index file from node of concordance list, comprising:
According to the value of the index column of Hash class global index table and the Kazakhstan for determining the index column from the number of node Uncommon value;
It is the key assignments of leaf node and location information value point in the corresponding BPlusTree structure of i by the cryptographic Hash of the index column I+1 is assigned to from the index file of node, wherein i is natural number.
5. according to the method described in claim 2, wherein, when the concordance list to be created is range class global index's table, inciting somebody to action The information of the corresponding index file of index column is distributed corresponding from section to the concordance list to be created in the concordance list to be created In the index file of point, comprising:
Determine range of distribution section according to the sampled result that the value to the index column is sampled, and record each from node with And its range of distribution section of corresponding index column;
According to the range of distribution section by the information of the index file of range class global index table distribute to it is corresponding from In the index file of node.
6. according to the method described in claim 5, wherein, according to the range of distribution section by range class global index table The information of index file distribute into the corresponding index file from node of range class global index table, comprising:
The value of index column of range class global index table is compared with the range of distribution section of the record, determines institute State the range of distribution section where the value of index column;
Range of distribution section where the value of the index column, by the corresponding BPlusTree structure of the value of the index column The key assignments and location information value of middle leaf node distribute the index file from node corresponding to the range of distribution section.
7. the method according to claim 4 or 6, wherein by the corresponding index text of index column in the concordance list to be created When the information of part is distributed into the concordance list to be created corresponding index file from node, further includes:
If the existing key assignments in the index file;New location information value and old location information value are then fused to institute State the corresponding leaf node of key assignments.
8. the method according to claim 4 or 6, wherein by the corresponding index text of index column in the concordance list to be created When the information of part is distributed into the concordance list to be created corresponding index file from node, further includes:
If the key assignments is not present in the index file, new leaf node is inserted into the BPlusTree structure, it will The key assignments and location information value are stored to the new leaf node.
9. according to the method described in claim 1, wherein, the data file that current data row in the data source is generated is distributed It is corresponding before node to the tables of data, further includes:
When the line number of the data source reaches the size threshold value of preset data file, then current data row new life is become one A data file, newly-generated data file is distributed corresponding from node to the tables of data, and updates the data file The metamessage of corresponding tables of data.
10. according to the method described in claim 1, wherein, by the corresponding index file of index column in the concordance list to be created Information distribute into the corresponding index file from node of the concordance list to be created before, further includes:
When the size of the index file of the concordance list to be created reaches preset index file size threshold value, new rope is generated Quotation part, and by the updating location information of the new index file into the corresponding metamessage of the concordance list to be created.
11. a kind of equipment for creating concordance list, wherein the equipment includes:
First creating device, the structure of the metamessage for creating the data source corresponding data table got, wherein the data The metamessage of table includes the location information of all data files in the tables of data, and the data file stores in column form;
Second creating device determines the tables of data according to the request of user for creating the data structure of concordance list to be created Described in concordance list to be created index column, it is described to index the part column being classified as in the tables of data, and according to the index column Create the structure of the corresponding metamessage of the concordance list to be created, wherein the corresponding metamessage of the concordance list to be created includes The location information of index file in the concordance list to be created;
Data file distributor, the data file for generating current data row in the data source are distributed to from node, And according to the allocated location information of the structure of the metamessage of the tables of data and the data file, the data text is updated The metamessage of part corresponding data table;
Distributor, for distributing the information of the index file of the concordance list to be created to the corresponding index text from node In part, and update the corresponding metamessage of the concordance list to be created;
Wherein, the structure of index file includes BPlusTree structure in the data structure of the concordance list to be created, wherein institute The leaf node for stating BPlusTree structure includes key assignments and location information value;
Wherein, second creating device is used for:
The key assignments of the leaf node is determined according to the value of the index column of the concordance list to be created;
Row where the filename of the affiliated data file of the index column and the index column is in the data file Offset determines the location information value of the leaf node.
12. equipment according to claim 11, wherein the concordance list to be created include Hash class global index's table and/ Or range class global index table.
13. equipment according to claim 12, wherein when the concordance list to be created is Hash class global index's table, The distributor is used for:
According to the cryptographic Hash that the value of the index column of Hash class global index table determines, by Hash class global index table The corresponding key assignments of index file and location information value are distributed into the corresponding index file from node.
14. equipment according to claim 13, wherein the distributor is used for:
According to the value of the index column of Hash class global index table and the Kazakhstan for determining the index column from the number of node Uncommon value;
It is the key assignments of leaf node and location information value point in the corresponding BPlusTree structure of i by the cryptographic Hash of the index column I+1 is assigned to from the index file of node, wherein i is natural number.
15. equipment according to claim 12, wherein when the concordance list to be created is range class global index's table, The distributor is used for:
Determine range of distribution section according to the sampled result that the value to the index column is sampled, and record each from node with And its range of distribution section of corresponding index column;
According to the range of distribution section by the information of the index file of range class global index table distribute to it is corresponding from In the index file of node.
16. equipment according to claim 15, wherein the distributor is used for:
The value of index column of range class global index table is compared with the range of distribution section of the record, determines institute State the range of distribution section where the value of index column;
Range of distribution section where the value of the index column, by the corresponding BPlusTree structure of the value of the index column The key assignments and location information value of middle leaf node distribute the index file from node corresponding to the range of distribution section.
17. equipment described in 4 or 16 according to claim 1, wherein the distributor is also used to:
If the existing key assignments in the index file;New location information value and old location information value are then fused to institute State the corresponding leaf node of key assignments.
18. equipment described in 4 or 16 according to claim 1, wherein the distributor is also used to:
If the key assignments is not present in the index file, new leaf node is inserted into the BPlusTree structure, it will The key assignments and location information value are stored to the new leaf node.
19. equipment according to claim 11, wherein the equipment further include:
Data file device is generated, when for reaching the size threshold value of preset data file when the line number of the data source, then Current data row new life is become into a data file, newly-generated data file is distributed corresponding from section to the tables of data Point, and update the metamessage of the corresponding tables of data of the data file.
20. equipment according to claim 11, wherein the equipment further include:
Index file device is generated, the size for the index file when the concordance list to be created reaches preset index file When size threshold value, new index file is generated, and by the updating location information of the new index file to the rope to be created Draw in the corresponding metamessage of table.
CN201710140132.8A 2017-03-09 2017-03-09 A kind of method and apparatus creating concordance list Active CN106960020B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710140132.8A CN106960020B (en) 2017-03-09 2017-03-09 A kind of method and apparatus creating concordance list

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710140132.8A CN106960020B (en) 2017-03-09 2017-03-09 A kind of method and apparatus creating concordance list

Publications (2)

Publication Number Publication Date
CN106960020A CN106960020A (en) 2017-07-18
CN106960020B true CN106960020B (en) 2019-10-22

Family

ID=59470800

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710140132.8A Active CN106960020B (en) 2017-03-09 2017-03-09 A kind of method and apparatus creating concordance list

Country Status (1)

Country Link
CN (1) CN106960020B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110769025B (en) * 2019-09-06 2022-04-22 江苏中云科技有限公司 Method for accelerating data index of multi-tenant-oriented cloud storage system
CN111125216B (en) * 2019-12-10 2024-03-12 中盈优创资讯科技有限公司 Method and device for importing data into Phoenix
CN112231318A (en) * 2020-10-14 2021-01-15 北京人大金仓信息技术股份有限公司 Method and device for creating global index
CN113111034B (en) * 2021-04-07 2023-08-04 山东英信计算机技术有限公司 Index pre-allocation method and device
CN114880322B (en) * 2022-04-21 2023-02-28 广州经传多赢投资咨询有限公司 Financial data column type storage method, system, equipment and storage medium
CN115878612B (en) * 2022-11-17 2023-12-15 北京东方京融教育科技股份有限公司 Database structure and retrieval method thereof

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102436420A (en) * 2010-10-20 2012-05-02 微软公司 Low RAM space, high-throughput persistent key-value store using secondary memory
CN103412897A (en) * 2013-07-25 2013-11-27 中国科学院软件研究所 Parallel data processing method based on distributed structure
CN106326305A (en) * 2015-06-30 2017-01-11 星环信息科技(上海)有限公司 Storage method and equipment for data file and inquiry method and equipment for data file

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102436420A (en) * 2010-10-20 2012-05-02 微软公司 Low RAM space, high-throughput persistent key-value store using secondary memory
CN103412897A (en) * 2013-07-25 2013-11-27 中国科学院软件研究所 Parallel data processing method based on distributed structure
CN106326305A (en) * 2015-06-30 2017-01-11 星环信息科技(上海)有限公司 Storage method and equipment for data file and inquiry method and equipment for data file

Also Published As

Publication number Publication date
CN106960020A (en) 2017-07-18

Similar Documents

Publication Publication Date Title
CN106960020B (en) A kind of method and apparatus creating concordance list
US9424274B2 (en) Management of intermediate data spills during the shuffle phase of a map-reduce job
US8676951B2 (en) Traffic reduction method for distributed key-value store
JP2019194882A (en) Mounting of semi-structure data as first class database element
CN103778148B (en) Life cycle management method and equipment for data file of Hadoop distributed file system
JP6281225B2 (en) Information processing device
US9411867B2 (en) Method and apparatus for processing database data in distributed database system
CN106294352B (en) A kind of document handling method, device and file system
JP5203733B2 (en) Coordinator server, data allocation method and program
JP2016189214A5 (en)
US20160323385A1 (en) Distributed Data Storage Method, Apparatus, and System
CN104298687B (en) A kind of hash partition management method and device
CN109684282A (en) A kind of method and device constructing metadata cache
CN107943952A (en) A kind of implementation method that full-text search is carried out based on Spark frames
CN105608228B (en) A kind of efficient distributed RDF data storage method
CN106326239A (en) Distributed file system and file meta-information management method thereof
US8015195B2 (en) Modifying entry names in directory server
CN107016115B (en) Data export method and device, computer readable storage medium and electronic equipment
CN104834650A (en) Method and system for generating effective query tasks
CN105069074A (en) Strategy configuration file processing method, device and system
CN106940715B (en) A kind of method and apparatus of the inquiry based on concordance list
JP6204753B2 (en) Distributed query processing apparatus, processing method, and processing program
US11847121B2 (en) Compound predicate query statement transformation
US10614055B2 (en) Method and system for tree management of trees under multi-version concurrency control
US20140067840A1 (en) System and method for retrieving information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder
CP01 Change in the name or title of a patent holder

Address after: 200233 11-12 / F, building B, 88 Hongcao Road, Xuhui District, Shanghai

Patentee after: Star link information technology (Shanghai) Co.,Ltd.

Address before: 200233 11-12 / F, building B, 88 Hongcao Road, Xuhui District, Shanghai

Patentee before: TRANSWARP TECHNOLOGY (SHANGHAI) Co.,Ltd.

PE01 Entry into force of the registration of the contract for pledge of patent right
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: A Method and Equipment for Creating Index Tables

Effective date of registration: 20230616

Granted publication date: 20191022

Pledgee: Bank of China Limited by Share Ltd. Shanghai Xuhui branch

Pledgor: Star link information technology (Shanghai) Co.,Ltd.

Registration number: Y2023310000252