CN106960020A - A kind of method and apparatus for creating concordance list - Google Patents

A kind of method and apparatus for creating concordance list Download PDF

Info

Publication number
CN106960020A
CN106960020A CN201710140132.8A CN201710140132A CN106960020A CN 106960020 A CN106960020 A CN 106960020A CN 201710140132 A CN201710140132 A CN 201710140132A CN 106960020 A CN106960020 A CN 106960020A
Authority
CN
China
Prior art keywords
index
data
file
created
concordance list
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710140132.8A
Other languages
Chinese (zh)
Other versions
CN106960020B (en
Inventor
张常淳
吕程
周立
周翠翠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Transwarp Technology Shanghai Co Ltd
Original Assignee
Star Link Information Technology (shanghai) Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Star Link Information Technology (shanghai) Co Ltd filed Critical Star Link Information Technology (shanghai) Co Ltd
Priority to CN201710140132.8A priority Critical patent/CN106960020B/en
Publication of CN106960020A publication Critical patent/CN106960020A/en
Application granted granted Critical
Publication of CN106960020B publication Critical patent/CN106960020B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2264Multidimensional index structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2272Management thereof

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Storage Device Security (AREA)

Abstract

The purpose of the application is to provide a kind of method and apparatus for creating concordance list, the structure of the metamessage for the data source corresponding data table that the application is got by establishment;Then, the data structure of concordance list to be created is created, the index column of concordance list to be created described in the tables of data is determined according to the request of user, and creates according to the index column structure of the corresponding metamessage of the concordance list to be created;The data file that current data row in the data source is generated is distributed to from node;The information of the index file of the concordance list to be created is distributed into the corresponding index file from node, and then optimize bottom storage organization, when applied to data query, the information of index file is provided, so as to can quickly navigate to the data file of the condition of satisfaction according to the information of index file, the visit capacity of data is greatly reduced, query performance is improved.

Description

A kind of method and apparatus for creating concordance list
Technical field
The application is related to computer realm, more particularly to a kind of method and apparatus for creating concordance list.
Background technology
With the development and application of database technology, the data volume of database purchase is growing day by day, while quickly, neatly Carrying out the complex query processing of big data quantity also turns into new demand.OLAP (On-Line Analytical Processing, On-line analytical processing), dedicated for supporting complicated analysis operation, stress the decision-making branch to decision-maker and senior management staff Hold.Under usual condition, OLAP user only needs to a small number of several data row of inquiry, can be loaded using line storage many useless Data are arranged, and cause query performance to decline.The basic query method of distributed column storage reads member letter first from zookeeper Breath, and then go the machine of each in cluster to read all data files, and then reading meets condition from each data file Record, this mode directly results in that data access amount is excessive, have impact on OLAP query performance.
Apply for content
The purpose of the application is to provide a kind of method and apparatus for creating concordance list, optimizes bottom storage organization, from And provide convenient for data query.
According to the one side of the application there is provided a kind of method for creating concordance list, methods described includes:
The structure of the metamessage of the data source corresponding data table got is created, wherein, the metamessage bag of the tables of data The positional information of all data files in the tables of data is included, the data file is stored in column form;
The data structure of concordance list to be created is created, rope to be created described in the tables of data is determined according to the request of user Draw the index column of table, the index is classified as the part row in the tables of data, and creates described to be created according to the index column The structure of the corresponding metamessage of concordance list, wherein, the corresponding metamessage of the concordance list to be created includes the index to be created The positional information of index file in table;
The data file that current data row in the data source is generated is distributed to from node, and according to the tables of data The allocated positional information of the structure of metamessage and the data file, updates the member letter of the data file corresponding data table Breath;
The information of the index file of the concordance list to be created is distributed into the corresponding index file from node, and more The corresponding metamessage of the new concordance list to be created.
Further, the structure of index file includes BPlusTree structures in the data structure of the concordance list to be created, Wherein, the leaf node of the BPlusTree structures includes key assignments and positional information value.
Further, the data structure for creating the concordance list to be created, including:
The key assignments of the leaf node is determined according to the value of the index column of the concordance list to be created;
Row according to where the filename and the index column of the affiliated data file of the index column is in the data file In offset determine the positional information value of the leaf node.
Further, the concordance list to be created includes Hash class global index's table and/or scope class global index table.
Further, when the concordance list to be created is Hash class global index's table, by the concordance list to be created The information of the corresponding index file of index column is distributed into the corresponding index file from node of concordance list to be created, bag Include:
The cryptographic Hash determined according to the value of the index column of the Hash class global index table, by the Hash class global index The corresponding key assignments of index file and positional information value of table are distributed into the corresponding index file from node.
Further, the cryptographic Hash determined according to the value of the index column of the Hash class global index table, by the Hash The corresponding key assignments of index file and positional information value of class global index table distribute corresponding to the Hash class global index table From the index file of node, including:
According to the value of the index column of the Hash class global index table and described from the number of node determine the index column Cryptographic Hash;
By the key assignments and positional information that the cryptographic Hash of the index column is leaf node in the corresponding BPlusTree structures of i Value distribution is individual from the index file of node to i+1, wherein, i is natural number.
Further, when the concordance list to be created is scope class global index's table, D is by the concordance list to be created The information of the corresponding index file of index column is distributed into the corresponding index file from node of concordance list to be created, bag Include:
The sampled result sampled according to the value to the index column determines that range of distribution is interval, and records each from section The range of distribution of point and its corresponding index column is interval;
The information of the index file of the scope class global index table is distributed to correspondence according to the range of distribution is interval The index file from node in.
Further, according to the information point of the interval index file by the scope class global index table of the range of distribution In being assigned to the corresponding index file from node of the scope class global index table, including:
The value of index column of the scope class global index table and the range of distribution interval of the record are compared, really Range of distribution where the value of the fixed index column is interval;
Range of distribution according to where the value of the index column is interval, by the corresponding BPlusTree of the value of the index column The key assignments of leaf node and positional information value are distributed to the interval corresponding index file from node of the range of distribution in structure.
Further, the information of the corresponding index file of index column in the concordance list to be created is distributed to described and waits to create When in indexing the corresponding index file from node of table, also include:
If the existing key assignments in the index file;Then new positional information value is merged with old positional information value To the corresponding leaf node of the key assignments.
Further, the information of the corresponding index file of index column in the concordance list to be created is distributed to described and waits to create When in indexing the corresponding index file from node of table, also include:
If the key assignments is not present in the index file, new leaf section is inserted in the BPlusTree structures Point, the key assignments and positional information value are stored to the new leaf node.
Further, data file current data row in the data source generated distributes corresponding to the tables of data Before node, also include:
It is when the line number of the data source reaches the size threshold value of default data file, then current data row is newly-generated For a data file, distribute corresponding from node to the tables of data by newly-generated data text, and update the data text The metamessage of the corresponding tables of data of part.
Further, the information of the corresponding index file of index column in the concordance list to be created is distributed to described and waits to create Before in indexing the corresponding index file from node of table, also include:
When the size of the index file of the concordance list to be created reaches default index file size threshold value, generation is new Index file, and by the updating location information of the new index file to the corresponding metamessage of the concordance list to be created In.
According to the application on the other hand, a kind of equipment for creating concordance list is additionally provided, the equipment includes:
First creating device, the structure of the metamessage for creating the data source corresponding data table got, wherein, it is described The metamessage of tables of data includes the positional information of all data files in the tables of data, and the data file is deposited in column form Storage;
Second creating device, the data structure for creating concordance list to be created, the number is determined according to the request of user According to the index column of concordance list to be created described in table, the index is classified as the part row in the tables of data, and according to the rope Draw the structure that row create the corresponding metamessage of the concordance list to be created, wherein, the corresponding metamessage of the concordance list to be created Include the positional information of index file in the concordance list to be created;
Data file distributor, the data file for current data row in the data source to be generated is distributed to from section Point, and the structure and the allocated positional information of the data file of the metamessage according to the tables of data, update the number According to the metamessage of file corresponding data table;
Distributor, for the information of the index file of the concordance list to be created to be distributed to the corresponding rope from node In quotation part, and update the corresponding metamessage of the concordance list to be created.
Further, the structure of index file includes BPlusTree structures in the data structure of the concordance list to be created, Wherein, the leaf node of the BPlusTree structures includes key assignments and positional information value.
Further, second creating device is used for:
The key assignments of the leaf node is determined according to the value of the index column of the concordance list to be created;
Row according to where the filename and the index column of the affiliated data file of the index column is in the data file In offset determine the positional information value of the leaf node.
Further, the concordance list to be created includes Hash class global index's table and/or scope class global index table.
Further, when the concordance list to be created is Hash class global index's table, the distributor is used for:
The cryptographic Hash determined according to the value of the index column of the Hash class global index table, by the Hash class global index The corresponding key assignments of index file and positional information value of table are distributed into the corresponding index file from node.
Further, the distributor is used for:
According to the value of the index column of the Hash class global index table and described from the number of node determine the index column Cryptographic Hash;
By the key assignments and positional information that the cryptographic Hash of the index column is leaf node in the corresponding BPlusTree structures of i Value distribution is individual from the index file of node to i+1, wherein, i is natural number.
Further, when the concordance list to be created is scope class global index's table, the distributor is used for:
The sampled result sampled according to the value to the index column determines that range of distribution is interval, and records each from section The range of distribution of point and its corresponding index column is interval;
The information of the index file of the scope class global index table is distributed to correspondence according to the range of distribution is interval The index file from node in.
Further, the distributor is used for:
The value of index column of the scope class global index table and the range of distribution interval of the record are compared, really Range of distribution where the value of the fixed index column is interval;
Range of distribution according to where the value of the index column is interval, by the corresponding BPlusTree of the value of the index column The key assignments of leaf node and positional information value are distributed to the interval corresponding index file from node of the range of distribution in structure.
Further, the distributor is additionally operable to:
If the existing key assignments in the index file;Then new positional information value is merged with old positional information value To the corresponding leaf node of the key assignments.
Further, the distributor is additionally operable to:
If the key assignments is not present in the index file, new leaf section is inserted in the BPlusTree structures Point, the key assignments and positional information value are stored to the new leaf node.
Further, the equipment also includes:
Generate data file device, the size threshold value for reaching default data file when the line number of the data source When, then current data row new life is turned into a data file, distribute corresponding to the tables of data by newly-generated data text From node, and update the metamessage of the corresponding tables of data of the data file.
Further, the equipment also includes:
Index file device is generated, the size for the index file when the concordance list to be created reaches default index During file size threshold value, new index file is generated, and the updating location information of the new index file is waited to create to described Index in the corresponding metamessage of table.
Compared with prior art, the structure of the metamessage for the data source corresponding data table that the application is got by establishment, Wherein, the metamessage of the tables of data include the tables of data in all data files positional information, the data file with The form storage of row;Then, the data structure of concordance list to be created is created, institute in the tables of data is determined according to the request of user The index column of concordance list to be created is stated, the index is classified as the part row in the tables of data, and is created according to the index column The structure of the corresponding metamessage of the concordance list to be created, wherein, the corresponding metamessage of the concordance list to be created includes described The positional information of index file in concordance list to be created;Then, data file current data row in the data source generated Distribution is extremely from node, and the structure and the allocated positional information of the data file of the metamessage according to the tables of data, Update the metamessage of the data file corresponding data table;The information of the index file of the concordance list to be created is distributed to right In the index file from node answered, and the corresponding metamessage of the concordance list to be created is updated, and then optimize bottom storage Structure, there is provided the information of index file when applied to data query, so as to can quickly be positioned according to the information of index file To the data file for the condition that meets, the visit capacity of data is greatly reduced, query performance is improved.
Brief description of the drawings
By reading the detailed description made to non-limiting example made with reference to the following drawings, the application's is other Feature, objects and advantages will become more apparent upon:
Fig. 1 shows a kind of method flow schematic diagram of establishment concordance list according to the application one side;
Fig. 2 shows the distributed system framework of the embodiment in the application;
Fig. 3 shows the data source of the embodiment in the application;
Fig. 4 shows the query structure sentence of data source in the embodiment in the application;
Fig. 5 shows the structural representation of the data file of tables of data in the embodiment in the application;
Fig. 6 shows to create the information distribution schematic diagram of index file after concordance list in the embodiment in the application;
Fig. 7 shows the structural representation of an index file of Hash class concordance list in the embodiment in the application;
Fig. 8 shows the structural representation of an index file of scope class concordance list in the embodiment in the application;
Fig. 9 shows a kind of structural representation of the equipment of establishment concordance list according to the application other side.
Same or analogous reference represents same or analogous part in accompanying drawing.
Embodiment
The application is described in further detail below in conjunction with the accompanying drawings.
In one typical configuration of the application, terminal, the equipment of service network and trusted party include one or more Processor (CPU), input/output interface, network interface and internal memory.
Internal memory potentially includes the volatile memory in computer-readable medium, random access memory (RAM) and/or The forms such as Nonvolatile memory, such as read-only storage (ROM) or flash memory (flash RAM).Internal memory is computer-readable medium Example.
Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method Or technology realizes information Store.Information can be computer-readable instruction, data structure, the module of program or other data. The example of the storage medium of computer includes, but are not limited to phase transition internal memory (PRAM), static RAM (SRAM), moved State random access memory (DRAM), other kinds of random access memory (RAM), read-only storage (ROM), electric erasable Programmable read only memory (EEPROM), fast flash memory bank or other memory techniques, read-only optical disc read-only storage (CD-ROM), Digital versatile disc (DVD) or other optical storages, magnetic cassette tape, magnetic disk storage or other magnetic storage apparatus or Any other non-transmission medium, the information that can be accessed by a computing device available for storage.Defined according to herein, computer Computer-readable recording medium does not include the data-signal and carrier wave of non-temporary computer readable media (transitory media), such as modulation.
Fig. 1 show according to the one side of the application there is provided a kind of establishment concordance list method flow schematic diagram, it is described Method includes:Step S11~step S14, applied in distributed system,
In step s 11, the structure of the metamessage of the data source corresponding data table got is created, wherein, the data The metamessage of table includes the positional information of all data files in the tables of data, and the data file is stored in column form; In the embodiment of the application one, distributed global index table is created, then is believed firstly the need of the member for creating data source corresponding data table Breath, metamessage includes positional information of all data files that are included of tables of data in the cluster on each machine hard disk.Need Illustrate, need to create the structure of tables of data before creating the corresponding metamessage of tables of data, the structure of tables of data includes data The structure of file, tables of data includes the data of data source, is tables of data by data storage in data source, and tables of data is deposited Storage form is data file.Distributed system frame diagram in the embodiment of the application one is as shown in Fig. 2 including client (client), host node (Master), one or several from node (slave) and zookeeper, can be by each tables of data Data file is stored in solid state hard disc (SSD), in the embodiment of the present application, and the metamessage of tables of data is stored in zookeeper In, wherein, the zookeeper is the coordination system that performance distributed is applied.
In step s 12, the data structure of concordance list to be created is created, is determined according to the request of user in the tables of data The index column of the concordance list to be created, the index is classified as the part row in the tables of data, and is created according to the index column The structure of the corresponding metamessage of the concordance list to be created is built, wherein, the corresponding metamessage of the concordance list to be created includes institute State the positional information of index file in concordance list to be created;Above-described embodiment is connect, according to the actual requirements, it is determined that needing to create global The row of index, i.e., selected part is arranged as index column from tables of data, and then creates distributed complete using selected index column The corresponding metamessage of office's concordance list, wherein, the metamessage includes all index files that global index's table included in the cluster Positional information on each machine hard disk.The concordance list to be created, which is created, successfully can be used for afterwards number in distributed system According to inquiry, disclosure satisfy that user only needs to avoid reading all data while the demand of a small number of several data row of inquiry File, greatly reduces the data volume of access.
It should be noted that in the embodiment of the application one, the concordance list pair to be created is created according to the index column During the structure for the metamessage answered, it can be that a corresponding concordance list is set up according to each index column, then create concordance list The structure of corresponding metamessage, can also be some in the corresponding index column of user's request during establishment concordance list certainly Index column sets up a concordance list jointly.
Wherein, the data structure of global index's table is created, the structure of global index's table includes the structure of index file.It is preferred that The structure of index file includes BPlusTree structures in ground, the data structure of the concordance list to be created, wherein, it is described The leaf node of BPlusTree structures includes key assignments and positional information value.Here, the index file of global index's table can be used BPlusTree structures carry out tissue storage, and the leaf node of BPlusTree structures includes tuple<Key assignments, positional information value>(< Key, value>), BPlusTree structures effectively can be ranked up to the index column data of input, so as to fast quick checking Ask the position of the corresponding record of index column, the query task of quick response data.
In the specific embodiment of the application one, the data source of example as shown in Figure 3, the data source totally 1000 records, Including address mark (id), name (name), age (age), the column data of sex (sex) four.User shown in Fig. 4 is directed to the number According to the inquiry sql sentences in source, user needs to carry out filtering screening to the data that id is arranged, thus needs to create global index to id row Table, wherein, table A are expressed as the corresponding tables of data of data source shown in Fig. 3, Sql 1:“Select*from table A What where id=1 " represented inquiry sql 1 sentence execution is to inquire about id in tables of data A to arrange corresponding data for 1 data, its He represents similar with its by Sql sentences.
In step s 13, data file current data row in the data source generated is distributed to from node, and according to The allocated positional information of the structure of the metamessage of the tables of data and the data file, updates the data file correspondence The metamessage of tables of data;Here, distributed column storage platform as shown in Figure 2, distributed column storage platform data storage During source data, data file is uniformly distributed each machine in the cluster, every machine by Master according to load balance principle Several data files (FileSegment) containing the tables of data, FileSegment structure is as shown in figure 5, data file Stored in the way of column is stored, when OLAP user only needs to a small number of several data row of inquiry, column storage energy The corresponding data row for needing to read enough only are provided for user, and then greatly improve OLPA search efficiency.
Preferably, before step S13, also include:Step S13 ', when the line number of the data source reaches default data During the size threshold value of file, then current data row new life is turned into a data file, by newly-generated data text distribution to institute State that tables of data is corresponding from node, and update the metamessage of the corresponding tables of data of the data file.
In the embodiment of the application one, whenever the line number of the data in data in EMS memory source is equal to the size of a data file During scope, i.e., current data row is generated as a data file, while the host node (Master) in cluster is according to load Homeostatic principle is distributed into cluster to be stored in the hard disk of a certain machine, and updates the data the corresponding metamessage of list file, this When, also begin to create for global index's table of the data file.
In step S14, the information of the index file of the concordance list to be created is distributed to the corresponding rope from node In quotation part, and update the corresponding metamessage of the concordance list to be created.Here, after the data file distribution of generation is stored, Need the information of its corresponding index file being allocated into correspondence slave index file, and update global index's table In corresponding metamessage.Wherein, the information of index file can include the value, described for creating the index column in the concordance list completed The offset of row in the data file where the filename of the affiliated data file of index column, the index column etc..
In the embodiment of the application one, the concordance list to be created includes Hash class global index's table or the global rope of scope class Draw table.Here, the allocation strategy of the index file of Hash class and scope class global index table in the cluster is slightly different.Hash class Global index's table is the machine being distributed according to the cryptographic Hash of index column decision index file in distributed type assemblies, and scope class Global index's table is then that index file is assigned into corresponding machine according to the scope of index train value.
Preferably, in step s 12, the leaf node is determined according to the value of the index column of the concordance list to be created Key assignments;According to rope in the positional information of data file, the metamessage of the concordance list to be created in the metamessage of the tables of data Offset in the positional information of quotation part and the index column indexed file determines the positional information value of the leaf node. In the embodiment of the application one, the index file of Hash class global index's table and Hash class global index table is used BPlusTree structures carry out tissue storage.BPlusTree leaf node includes tuple<Key, value>, wherein, key values are The value of index column, from data file, the record of data file informations of the value where the index column and the condition that meets exists Offset in the index file.Data file is evenly distributed in each in cluster according to load balancing principle by Master Machine.Every machine contains several data files (FileSegment) of the tables of data, FileSegment structure such as Fig. 5 It is shown;When distributed column storage platform creates data source corresponding global index's table, the corresponding rope of Hash class global index table Quotation part is HashIndexFileSegment, and the corresponding index file of scope class global index table is RangeIndexFileSegment。
Preferably, when the concordance list to be created is Hash class global index's table, in step S14, breathed out according to described The cryptographic Hash that the value of the index column of Xi Lei global indexs table is determined, the index file of the Hash class global index table is corresponding Key assignments and positional information value are distributed into the corresponding index file from node of the Hash class global index table.Specifically, exist In step S14, according to the value of the index column of the Hash class global index table and described from the number of node the index is determined The cryptographic Hash of row, by the key assignments and position letter that the cryptographic Hash of the index column is leaf node in the corresponding BPlusTree structures of i Breath value is distributed to i+1 from the index file of node, wherein, i is natural number, so that reasonably by the information of index file Distribution reaches equally distributed purpose into each slave.It should be noted that by determining cryptographic Hash, so that it is determined that rope The information of quotation part is allocated the information of machine extremely (being such as allocated to machine 1), that is, determines cryptographic Hash is corresponding< Key, value>The information of distribution machine extremely, only when<Key, value>It is allocated into the index file of corresponding machine When, leaf point is just really created completion in its corresponding BPlusTree structure.
In the embodiment of the application one, it is assumed that have 1 master, n platforms slave in distributed type assemblies.Hash class overall situation rope Draw table by the way that cryptographic Hash is sought index column, and the value of index column is the key values of leaf node in BPlusTree structures, index column The offset of the information and index column corresponding record of corresponding data file in the data file is value.By key and key Cryptographic Hash be assigned to for the i positional information value of record in i+1 platform slave index file, wherein, cryptographic Hash can Determined with the slave numbers in the value and distributed type assemblies according to index column.
Preferably, when the concordance list to be created is scope class global index's table, in step S14, according to described The sampled result that the value of index column is sampled determines that range of distribution is interval, and records each from node and its corresponding index The range of distribution of row is interval;According to the information point of the interval index file by the scope class global index table of the range of distribution In being assigned to the corresponding index file from node of the scope class global index table.Specifically, in step S14, by the model The value of the index column of Wei Lei global indexs table is compared with the range of distribution interval of the record, determines the value of the index column The range of distribution at place is interval;Range of distribution according to where the value of the index column is interval, by the value correspondence of the index column BPlusTree structures in leaf node key assignments and positional information value distribute interval corresponding from node to the range of distribution Index file.
In the embodiment of the application one, it is assumed that have 1 Master, n platforms slave in distributed type assemblies.Scope class overall situation rope Draw table to sample by the value of the index column to data list file, n scope is set according to sampled result so that each scope Interior data volume, which is tried one's best, to be uniformly distributed, and records in Master the range intervals of every machine and its corresponding index column. When generating an index file, n scope in the value and Master of the index column of index file can be compared, according to Offset of the affiliated range areas by the corresponding data file information of the row and in the data file, i.e.,<Key, value>Update Into the corresponding slave in range areas index file.
In the embodiment of the application one, in step S14, if the existing key assignments in the index file;Then will be new Positional information value and old positional information value be fused to the corresponding leaf node of the key assignments.If not deposited in the index file In the key assignments, then new leaf node is inserted in the BPlusTree structures, the key assignments and positional information value are stored To the new leaf node.
Will here, working as<Key, value>When being assigned to corresponding slave index file, if in the index file In the presence of the corresponding leaf node of key values, then new value is merged with old value;Should if being not present in the index file Key values, then insert new leaf node, will<Key, value>Tuple is stored into new leaf node.
Preferably, methods described also includes:Step S14 ', when the size of the index file of the concordance list to be created reaches During default index file size threshold value, new index file is generated, and by the updating location information of the new index file Into the corresponding metamessage of the concordance list to be created.In the embodiment of the application one, when Hash class global index's table and scope When the index file of class global index table each reaches the maximum magnitude of size of an index file, new index text is regenerated Part, and by the updating location information of index file into the corresponding metamessage of global index's table.
In the preferred embodiment of the application one, such as there are 1 Master, 3 slave in distributed system, by data text The maximum magnitude of part is set as 25 rows, and when the number of data lines of input is equal to maximum magnitude 25, distributed column storage platform will Current data row is output to a certain machine in cluster as a data file, Master according to load balancing principle Corresponding data file (FileSegment) in SSD, and update the data the corresponding metamessage of table.For Hash class global index Table, key is id value, and value is the offset of the information and id corresponding records of the corresponding data files of id in the data file Information.Hash is taken by key values, is 0,1,2 by Hash result<Key, value>Tuple be separately dispensed into cluster the 1st, 2nd, in the index file (HashIndexFileSegment) of the Hash class concordance list of 3 machines, as shown in fig. 6, working as id=1 When, Hash result is 1, then by it<Key, value>Store in the 2nd slave in cluster, as id=2, Hash result For 2, then by it<Key, value>Store in the 3rd slave in cluster;As id=3, Hash result is 0, then by it< Key, value>Store in the 1st slave in cluster, wherein, key | and key%3=cryptographic Hash } refer to key and slave Number (here number be 3) between carry out remainder, obtain cryptographic Hash, key is id value here.
When creating scope class global index's table, to scope division result such as Fig. 6 institutes after the key values sampling of data source Show, scope division principle, which should try one's best, make it that the record number in each range intervals is approached, the result of range partition comprising [1, 333], [334,666] and [667,999] three intervals, and each self-corresponding scope areas of three slave of storage in Master Domain, when the value of the index column in data block meets some range intervals, by its index information storage to range intervals correspondence Slave machines scope class concordance list data file (RangeIndexFileSegment) in, with BPlusTree leaves The form of node is present, and such as id=5, key values fall in first range intervals, then its is corresponding<Key, value>Information is deposited In the index file for storing up First machine.
It should be noted that either Hash class concordance list or scope class concordance list, will<Key, value>Distribution is extremely When in corresponding slave index file, it is necessary in judging the structure BPlusTree of index file whether existing corresponding rope Draw the id of row, if there is some id in BPlusTree, directly by the id arrange corresponding data block (Block) information and The line number being listed in Block is merged with original value.If the key values are not present in the index file, insertion is new Leaf node, will<Key, value>Store in new leaf node.
When the index file that two classes are indexed more than each reaches the maximum magnitude of size of an index file, regeneration New index file, and by the updating location information of index file into the corresponding metamessage of global index's table;Pass through the application The method of described establishment concordance list, obtains the corresponding index file of data source, Fig. 7 shows the data in the embodiment of the application one The index file for the Hash class concordance list that source is created, because all key values take Hash result to be 0, so the index is literary Part (HashIndexFileSegment) is located in the slave 1 of cluster, and storage organization is BPlusTree, leaf node storage <Key, value>, such as key=3 corresponding value is fs1:4, represent that id=3 record is located in FileSegment1, and Offset is 4;Fig. 8 shows the index file for the scope class concordance list that the data source in the embodiment of the application one is created, institute There are key values to be respectively less than 333, the index file (RangeIndexFileSegment) is located in the slave 1 of cluster, storage knot Structure is BPlusTree, and leaf node stores<Key, value>, such as key=1 corresponding value is fs1:1:2, represent id= 1 record is located in FileSegment1, and offset is 1 and 2.
In summary, by the concordance list of herein described establishment, bottom storage organization is optimized, distribution is can apply to The inquiry of data in formula system, the number that the part of user's request is arranged is met by Hash class concordance list and the inquiry of scope class concordance list According to, it is to avoid all data files are read, greatly reduce the data volume of access, the processing speed of on-line analytical processing task is improved Degree.Certainly, the concordance list of herein described establishment, can also be applied to utilize the scenes such as concordance list distributing system resource, not It is confined to the inquiry applied to data.
Fig. 9 show according to further aspect of the application there is provided a kind of establishment concordance list device structure schematic diagram, institute Stating equipment includes:First creating device 11, the second creating device 12, data file distributor 13 and distributor 14, application In distributed system,
First creating device 11, the structure of the metamessage for creating the data source corresponding data table got, wherein, institute Stating the metamessage of tables of data includes the positional information of all data files in the tables of data, and the data file is in column form Storage;In the embodiment of the application one, distributed global index table is created, then firstly the need of establishment data source corresponding data table Metamessage, metamessage includes positional information of all data files that are included of tables of data in the cluster on each machine hard disk. It should be noted that needing to create the structure of tables of data before creating the corresponding metamessage of tables of data, the structure of tables of data includes The structure of data file, tables of data includes the data of data source, is tables of data by the data storage in data source, and tables of data Storage form be data file.Distributed system frame diagram in the embodiment of the application one is as shown in Fig. 2 including client (client), host node (Master), one or several from node (slave) and zookeeper, can be by each tables of data Data file is stored in solid state hard disc (SSD), in the embodiment of the present application, and the metamessage of tables of data is stored in zookeeper In, wherein, the zookeeper is the coordination system that performance distributed is applied.
Second creating device 12, the data structure for creating concordance list to be created, according to being determined the request of user The index column of concordance list to be created described in tables of data, the index is classified as the part row in the tables of data, and according to described Index column creates the structure of the corresponding metamessage of the concordance list to be created, wherein, the corresponding member letter of the concordance list to be created Breath includes the positional information of index file in the concordance list to be created;Above-described embodiment is connect, according to the actual requirements, it is determined that needing The row of global index are created, i.e., selected part is arranged as index column from tables of data, and then created using selected index column The corresponding metamessage of distributed global index's table, wherein, the metamessage includes all index files that global index's table is included Positional information on each machine hard disk in the cluster.The concordance list to be created is created successfully can be used for distribution afterwards The inquiry of data in system, disclosure satisfy that user only needs to avoid reading institute while the demand of a small number of several data row of inquiry Some data files, greatly reduce the data volume of access.
It should be noted that in the embodiment of the application one, the concordance list pair to be created is created according to the index column During the structure for the metamessage answered, it can be that a corresponding concordance list is set up according to each index column, then create concordance list The structure of corresponding metamessage, can also be some in the corresponding index column of user's request during establishment concordance list certainly Index column sets up a concordance list jointly.
Wherein, the data structure of global index's table is created, the structure of global index's table includes the structure of index file.It is preferred that The structure of index file includes BPlusTree structures in ground, the data structure of the concordance list to be created, wherein, it is described The leaf node of BPlusTree structures includes key assignments and positional information value.Here, the index file of global index's table can be used BPlusTree structures carry out tissue storage, and the leaf node of BPlusTree structures includes tuple<Key assignments, positional information value>(< Key, value>), BPlusTree structures effectively can be ranked up to the index column data of input, so as to fast quick checking Ask the position of the corresponding record of index column, the query task of quick response data.
In the specific embodiment of the application one, the data source of example as shown in Figure 3, the data source totally 1000 records, Including address mark (id), name (name), age (age), the column data of sex (sex) four.User shown in Fig. 4 is directed to the number According to the inquiry sql sentences in source, user needs to carry out filtering screening to the data that id is arranged, thus needs to create global index to id row Table, wherein, table A are expressed as the corresponding tables of data of data source shown in Fig. 3, Sql 1:“Select*from table A What where id=1 " represented inquiry sql 1 sentence execution is to inquire about id in tables of data A to arrange corresponding data for 1 data, its He represents that implication is similar with its by Sql sentences.
Data file distributor 13, for by the data source current data row generate data file distribute to from Node, and the structure and the allocated positional information of the data file of the metamessage according to the tables of data, update described The metamessage of data file corresponding data table;Here, distributed column storage platform as shown in Figure 2, distributed column storage During platform data storage source data, data file is uniformly distributed each machine in the cluster by Master according to load balance principle Device, every machine contains several data files (FileSegment) of the tables of data, FileSegment structure such as Fig. 5 institutes Show, data file is stored in the way of column is stored, when OLAP user only needs to a small number of several data row of inquiry, Column storage can only provide the corresponding data row for needing to read for user, and then greatly improve OLPA search efficiency.
Preferably, the equipment also includes:Data file device 13 ' is generated, for being reached when the line number of the data source During the size threshold value of default data file, then current data row new life is turned into a data file, by newly-generated data Text distribution is corresponding from node to the tables of data, and updates the metamessage of the corresponding tables of data of the data file.
In the embodiment of the application one, whenever the line number of the data in data in EMS memory source is equal to the size of a data file During scope, i.e., current data row is generated as a data file, while the host node (Master) in cluster is according to load Homeostatic principle is distributed into cluster to be stored in the hard disk of a certain machine, and updates the data the corresponding metamessage of list file, this When, also begin to create for global index's table of the data file.
Distributor 14, for the information of the index file of the concordance list to be created to be distributed to corresponding from node In index file, and update the corresponding metamessage of the concordance list to be created.Stored here, the data file of generation is distributed Afterwards, it is necessary to which the information of its corresponding index file is allocated into correspondence slave index file, and update global index In the corresponding metamessage of table.Wherein, the information of index file can include value, the institute for creating the index column in the concordance list completed State the offset of the row where the filename of the affiliated data file of index column, the index column in the data file etc..
In the embodiment of the application one, the concordance list to be created includes Hash class global index's table or the global rope of scope class Draw table.Here, the allocation strategy of the index file of Hash class and scope class global index table in the cluster is slightly different.Hash class Global index's table is the machine being distributed according to the cryptographic Hash of index column decision index file in distributed type assemblies, and scope class Global index's table is then that index file is assigned into corresponding machine according to the scope of index train value.
Preferably, the second creating device 12, the value for the index column according to the concordance list to be created determines the leaf The key assignments of child node;According to the positional information of data file, the member of the concordance list to be created in the metamessage of the tables of data Offset in information in the positional information of index file and the index column indexed file determines the position of the leaf node Put the value of information.In the embodiment of the application one, the index file of Hash class global index's table and Hash class global index table is adopted Tissue storage is carried out with BPlusTree structures.BPlusTree leaf node includes tuple<Key, value>, wherein, key values For the value of index column, from data file, the record of data file informations of the value where the index column and the condition that meets Offset in the index file.Data file is evenly distributed in each in cluster by Master according to load balancing principle Individual machine.Every machine contains several data files (FileSegment) of the tables of data, and FileSegment structure is such as Shown in Fig. 5;When distributed column storage platform creates data source corresponding global index's table, Hash class global index table is corresponding Index file is HashIndexFileSegment, and the corresponding index file of scope class global index table is RangeIndexFileSegment。
Preferably, when the concordance list to be created is Hash class global index's table, distributor 14 is used for according to described The cryptographic Hash that the value of the index column of Hash class global index table is determined, by the index file correspondence of the Hash class global index table Key assignments and positional information value distribute into the corresponding index file from node of the Hash class global index table.Specifically, Distributor 14, for described in the value of the index column according to the Hash class global index table and the number determination from node The cryptographic Hash of index column, the key assignments for leaf node in the corresponding BPlusTree structures of i and position by the cryptographic Hash of the index column The value of information is put to distribute to i+1 from the index file of node, wherein, i is natural number, so that reasonably by index file Information is distributed into each slave, reaches equally distributed purpose.It should be noted that by determining cryptographic Hash, so that it is determined that The information of index file is allocated the information (being such as allocated to machine 1) of machine extremely, that is, determines cryptographic Hash correspondence 's<Key, value>The information of distribution machine extremely, only when<Key, value>It is allocated to the index file of corresponding machine When middle, leaf point is just really created completion in its corresponding BPlusTree structure.
In the embodiment of the application one, it is assumed that have 1 master, n platforms slave in distributed type assemblies.Hash class overall situation rope Draw table by the way that cryptographic Hash is sought index column, and the value of index column is the key values of leaf node in BPlusTree structures, index column The offset of the information and index column corresponding record of corresponding data file in the data file is value.By key and key Cryptographic Hash be assigned to for the i positional information value of record in i+1 platform slave index file, wherein, cryptographic Hash can Determined with the slave numbers in the value and distributed type assemblies according to index column.
Preferably, when the concordance list to be created is scope class global index's table, distributor 14, for according to institute State the sampled result that the value of index column sampled and determine that range of distribution is interval, and record each from node and its corresponding rope The range of distribution for drawing row is interval;According to the information of the interval index file by the scope class global index table of the range of distribution Distribution is into the corresponding index file from node of the scope class global index table.Specifically, distributor 14, for by institute The value of index column and the range of distribution interval of the record for stating scope class global index table are compared, and determine the index column Value where range of distribution it is interval;Range of distribution according to where the value of the index column is interval, by the value of the index column In corresponding BPlusTree structures the key assignments and positional information value of leaf node distribute to the range of distribution it is interval it is corresponding from The index file of node.
In the embodiment of the application one, it is assumed that have 1 Master, n platforms slave in distributed type assemblies.Scope class overall situation rope Draw table to sample by the value of the index column to data list file, n scope is set according to sampled result so that each scope Interior data volume, which is tried one's best, to be uniformly distributed, and records in Master the range intervals of every machine and its corresponding index column. When generating an index file, n scope in the value and Master of the index column of index file can be compared, according to Offset of the affiliated range areas by the corresponding data file information of the row and in the data file, i.e.,<Key, value>Update Into the corresponding slave in range areas index file.
In the embodiment of the application one, distributor 14, if for the existing key assignments in the index file;Then will New positional information value and old positional information value are fused to the corresponding leaf node of the key assignments.If in the index file not There is the key assignments, then insert new leaf node in the BPlusTree structures, the key assignments and positional information value are deposited Store up to the new leaf node.
Will here, working as<Key, value>When being assigned to corresponding slave index file, if in the index file In the presence of the corresponding leaf node of key values, then new value is merged with old value;Should if being not present in the index file Key values, then insert new leaf node, will<Key, value>Tuple is stored into new leaf node.
Preferably, the equipment also includes:Index file device 14 ' is generated, for when the rope of the concordance list to be created When the size of quotation part reaches default index file size threshold value, new index file is generated, and the new index is literary The updating location information of part is into the corresponding metamessage of the concordance list to be created.In the embodiment of the application one, when Hash class When the index file of global index's table and scope class global index table each reaches the maximum magnitude of size of an index file, The new index file of regeneration, and by the updating location information of index file into the corresponding metamessage of global index's table.
In the preferred embodiment of the application one, such as there are 1 Master, 3 slave in distributed system, by data text The maximum magnitude of part is set as 25 rows, and when the number of data lines of input is equal to maximum magnitude 25, distributed column storage platform will Current data row is output to a certain machine in cluster as a data file, Master according to load balancing principle Corresponding data file (FileSegment) in SSD, and update the data the corresponding metamessage of table.For Hash class global index Table, key is id value, and value is the offset of the information and id corresponding records of the corresponding data files of id in the data file Information.Hash is taken by key values, is 0,1,2 by Hash result<Key, value>Tuple be separately dispensed into cluster the 1st, 2nd, in the index file (HashIndexFileSegment) of the Hash class concordance list of 3 machines, as shown in fig. 6, working as id=1 When, Hash result is 1, then by it<Key, value>Store in the 2nd slave in cluster, as id=2, Hash result For 2, then by it<Key, value>Store in the 3rd slave in cluster;As id=3, Hash result is 0, then by it< Key, value>Store in the 1st slave in cluster, wherein, key | and key%3=cryptographic Hash } refer to key and slave Number (here number be 3) between carry out remainder, obtain cryptographic Hash, key is id value here.
When creating scope class global index's table, to scope division result such as Fig. 6 institutes after the key values sampling of data source Show, scope division principle, which should try one's best, make it that the record number in each range intervals is approached, the result of range partition comprising [1, 333], [334,666] and [667,999] three intervals, and each self-corresponding scope areas of three slave of storage in Master Domain, when the value of the index column in data block meets some range intervals, by its index information storage to range intervals correspondence Slave machines scope class concordance list data file (RangeIndexFileSegment) in, with BPlusTree leaves The form of node is present, and such as id=5, key values fall in first range intervals, then its is corresponding<Key, value>Information is deposited In the index file for storing up First machine.
It should be noted that either Hash class concordance list or scope class concordance list, will<Key, value>Distribution is extremely When in corresponding slave index file, it is necessary in judging the structure BPlusTree of index file whether existing corresponding rope Draw the id of row, if there is some id in BPlusTree, directly by the id arrange corresponding data block (Block) information and The line number being listed in Block is merged with original value.If the key values are not present in the index file, insertion is new Leaf node, will<Key, value>Store in new leaf node.
When the index file that two classes are indexed more than each reaches the maximum magnitude of size of an index file, regeneration New index file, and by the updating location information of index file into the corresponding metamessage of global index's table;Pass through the application The method of described establishment concordance list, obtains the corresponding index file of data source, Fig. 7 shows the data in the embodiment of the application one The index file for the Hash class concordance list that source is created, because all key values take Hash result to be 0, so the index is literary Part (HashIndexFileSegment) is located in the slave 1 of cluster, and storage organization is BPlusTree, leaf node storage <Key, value>, such as key=3 corresponding value is fs1:4, represent that id=3 record is located in FileSegment1, and Offset is 4;Fig. 8 shows the index file for the scope class concordance list that the data source in the embodiment of the application one is created, institute There are key values to be respectively less than 333, the index file (RangeIndexFileSegment) is located in the slave 1 of cluster, storage knot Structure is BPlusTree, and leaf node stores<Key, value>, such as key=1 corresponding value is fs1:1:2, represent id= 1 record is located in FileSegment1, and offset is 1 and 2.
In summary, by the concordance list of herein described establishment, bottom storage organization is optimized, distribution is can apply to The inquiry of data in formula system, the number that the part of user's request is arranged is met by Hash class concordance list and the inquiry of scope class concordance list According to, it is to avoid all data files are read, greatly reduce the data volume of access, the processing speed of on-line analytical processing task is improved Degree.Certainly, the concordance list of herein described establishment, can also be applied to utilize the scenes such as concordance list distributing system resource, not It is confined to the inquiry applied to data.
Obviously, those skilled in the art can carry out the essence of various changes and modification without departing from the application to the application God and scope.So, if these modifications and modification of the application belong to the scope of the application claim and its equivalent technologies Within, then the application is also intended to comprising including these changes and modification.
It should be noted that the application can be carried out in the assembly of software and/or software and hardware, for example, can adopt Realized with application specific integrated circuit (ASIC), general purpose computer or any other similar hardware device.In one embodiment In, the software program of the application can realize steps described above or function by computing device.Similarly, the application Software program (including related data structure) can be stored in computer readable recording medium storing program for performing, for example, RAM memory, Magnetically or optically driver or floppy disc and similar devices.In addition, some steps or function of the application can employ hardware to realize, example Such as, as coordinating with processor so as to performing the circuit of each step or function.
In addition, the part of the application can be applied to computer program product, such as computer program instructions, when its quilt When computer is performed, by the operation of the computer, it can call or provide according to the present processes and/or technical scheme. And the programmed instruction of the present processes is called, it is possibly stored in fixed or moveable recording medium, and/or pass through Broadcast or the data flow in other signal bearing medias and be transmitted, and/or be stored according to described program instruction operation In the working storage of computer equipment.Here, including a device according to one embodiment of the application, the device includes using In the memory and processor for execute program instructions of storage computer program instructions, wherein, when the computer program refers to When order is by the computing device, method and/or skill of the plant running based on foregoing multiple embodiments according to the application are triggered Art scheme.
It is obvious to a person skilled in the art that the application is not limited to the details of above-mentioned one exemplary embodiment, Er Qie In the case of without departing substantially from spirit herein or essential characteristic, the application can be realized in other specific forms.Therefore, no matter From the point of view of which point, embodiment all should be regarded as exemplary, and be nonrestrictive, scope of the present application is by appended power Profit is required rather than described above is limited, it is intended that all in the implication and scope of the equivalency of claim by falling Change is included in the application.Any reference in claim should not be considered as to the claim involved by limitation.This Outside, it is clear that the word of " comprising " one is not excluded for other units or step, and odd number is not excluded for plural number.That is stated in device claim is multiple Unit or device can also be realized by a unit or device by software or hardware.The first, the second grade word is used for table Show title, and be not offered as any specific order.

Claims (24)

1. a kind of method for creating concordance list, wherein, methods described includes:
The structure of the metamessage of the data source corresponding data table got is created, wherein, the metamessage of the tables of data includes institute The positional information of all data files in tables of data is stated, the data file is stored in column form;
The data structure of concordance list to be created is created, concordance list to be created described in the tables of data is determined according to the request of user Index column, the index is classified as the part row in the tables of data, and creates the index to be created according to the index column The structure of the corresponding metamessage of table, wherein, the corresponding metamessage of the concordance list to be created is included in the concordance list to be created The positional information of index file;
The data file that current data row in the data source is generated, which is distributed to from node, and according to the member of the tables of data, to be believed The allocated positional information of the structure of breath and the data file, updates the metamessage of the data file corresponding data table;
The information of the index file of the concordance list to be created is distributed into the corresponding index file from node, and updates institute State the corresponding metamessage of concordance list to be created.
2. the method according to right wants 1, wherein, the structure bag of index file in the data structure of the concordance list to be created BPlusTree structures are included, wherein, the leaf node of the BPlusTree structures includes key assignments and positional information value.
3. method according to claim 2, creates the data structure of the concordance list to be created, including:
The key assignments of the leaf node is determined according to the value of the index column of the concordance list to be created;
Row according to where the filename and the index column of the affiliated data file of the index column is in the data file Offset determines the positional information value of the leaf node.
4. according to the method in any one of claims 1 to 3, wherein, it is global that the concordance list to be created includes Hash class Concordance list and/or scope class global index table.
5. method according to claim 4, wherein, will when the concordance list to be created is Hash class global index's table The information of the corresponding index file of index column is distributed corresponding from section to the concordance list to be created in the concordance list to be created In the index file of point, including:
The cryptographic Hash determined according to the value of the index column of the Hash class global index table, by the Hash class global index table The corresponding key assignments of index file and positional information value are distributed into the corresponding index file from node.
6. method according to claim 5, wherein, the Hash determined according to the index column of the Hash class global index table Value, distributes global to the Hash class by the corresponding key assignments of index file and positional information value of the Hash class global index table In the corresponding index file from node of concordance list, including:
According to the value of the index column of the Hash class global index table and the Kazakhstan that the index column is determined from the number of node Uncommon value;
By the key assignments and positional information value point that the cryptographic Hash of the index column is leaf node in the corresponding BPlusTree structures of i I+1 is assigned to from the index file of node, wherein, i is natural number.
7. the method according to right 4, wherein, will be described when the concordance list to be created is scope class global index's table The information of the corresponding index file of index column is distributed corresponding from node to the concordance list to be created in concordance list to be created In index file, including:
The sampled result sampled according to the value to the index column determines that range of distribution is interval, and record it is each from node with And its range of distribution of corresponding index column is interval;
According to the range of distribution it is interval by the information of the index file of the scope class global index table distribute to it is corresponding from In the index file of node.
8. method according to claim 7, wherein, it is interval by the scope class global index table according to the range of distribution The information of index file distribute into the corresponding index file from node of the scope class global index table, including:
The value of index column of the scope class global index table and the range of distribution interval of the record are compared, institute is determined Range of distribution where stating the value of index column is interval;
Range of distribution according to where the value of the index column is interval, by the corresponding BPlusTree structures of the value of the index column The key assignments and positional information value of middle leaf node are distributed to the interval corresponding index file from node of the range of distribution.
9. the method according to claim 6 or 8, wherein, by the corresponding index text of index column in the concordance list to be created When the information of part is distributed into the corresponding index file from node of concordance list to be created, also include:
If the existing key assignments in the index file;New positional information value and old positional information value are then fused to institute State the corresponding leaf node of key assignments.
10. the method according to claim 6 or 8, wherein, by the corresponding index text of index column in the concordance list to be created When the information of part is distributed into the corresponding index file from node of concordance list to be created, also include:
If the key assignments is not present in the index file, new leaf node is inserted in the BPlusTree structures, will The key assignments and positional information value are stored to the new leaf node.
11. according to the method described in claim 1, wherein, the data file point that current data row in the data source is generated It is assigned to the tables of data corresponding before node, also includes:
When the line number of the data source reaches the size threshold value of default data file, then current data row new life is turned into one Individual data file, distributes corresponding from node to the tables of data, and update the data file pair by newly-generated data text The metamessage for the tables of data answered.
12. according to the method described in claim 1, wherein, by the corresponding index file of index column in the concordance list to be created Information distribute into the corresponding index file from node of concordance list to be created before, also include:
When the size of the index file of the concordance list to be created reaches default index file size threshold value, new rope is generated Quotation part, and by the updating location information of the new index file into the corresponding metamessage of the concordance list to be created.
13. a kind of equipment for creating concordance list, wherein, the equipment includes:
First creating device, the structure of the metamessage for creating the data source corresponding data table got, wherein, the data The metamessage of table includes the positional information of all data files in the tables of data, and the data file is stored in column form;
Second creating device, the data structure for creating concordance list to be created, the tables of data is determined according to the request of user Described in concordance list to be created index column, the index is classified as the part row in the tables of data, and according to the index column The structure of the corresponding metamessage of the concordance list to be created is created, wherein, the corresponding metamessage of the concordance list to be created includes The positional information of index file in the concordance list to be created;
Data file distributor, the data file for current data row in the data source to be generated is distributed to from node, And the structure and the allocated positional information of the data file of the metamessage according to the tables of data, update the data text The metamessage of part corresponding data table;
Distributor, for the information of the index file of the concordance list to be created to be distributed to the corresponding index text from node In part, and update the corresponding metamessage of the concordance list to be created.
14. the equipment according to right wants 13, wherein, the structure of index file in the data structure of the concordance list to be created Including BPlusTree structures, wherein, the leaf node of the BPlusTree structures includes key assignments and positional information value.
15. equipment according to claim 14, second creating device is used for:
The key assignments of the leaf node is determined according to the value of the index column of the concordance list to be created;
Row according to where the filename and the index column of the affiliated data file of the index column is in the data file Offset determines the positional information value of the leaf node.
16. the equipment according to any one of claim 13 to 15, wherein, it is complete that the concordance list to be created includes Hash class Office's concordance list and/or scope class global index table.
17. equipment according to claim 16, wherein, when the concordance list to be created is Hash class global index's table, The distributor is used for:
The cryptographic Hash determined according to the value of the index column of the Hash class global index table, by the Hash class global index table The corresponding key assignments of index file and positional information value are distributed into the corresponding index file from node.
18. equipment according to claim 17, wherein, the distributor is used for:
According to the value of the index column of the Hash class global index table and the Kazakhstan that the index column is determined from the number of node Uncommon value;
By the key assignments and positional information value point that the cryptographic Hash of the index column is leaf node in the corresponding BPlusTree structures of i I+1 is assigned to from the index file of node, wherein, i is natural number.
19. the equipment according to right 16, wherein, it is described when the concordance list to be created is scope class global index's table Distributor is used for:
The sampled result sampled according to the value to the index column determines that range of distribution is interval, and record it is each from node with And its range of distribution of corresponding index column is interval;
According to the range of distribution it is interval by the information of the index file of the scope class global index table distribute to it is corresponding from In the index file of node.
20. equipment according to claim 19, wherein, the distributor is used for:
The value of index column of the scope class global index table and the range of distribution interval of the record are compared, institute is determined Range of distribution where stating the value of index column is interval;
Range of distribution according to where the value of the index column is interval, by the corresponding BPlusTree structures of the value of the index column The key assignments and positional information value of middle leaf node are distributed to the interval corresponding index file from node of the range of distribution.
21. the equipment according to claim 18 or 20, wherein, the distributor is additionally operable to:
If the existing key assignments in the index file;New positional information value and old positional information value are then fused to institute State the corresponding leaf node of key assignments.
22. the equipment according to claim 18 or 20, wherein, the distributor is additionally operable to:
If the key assignments is not present in the index file, new leaf node is inserted in the BPlusTree structures, will The key assignments and positional information value are stored to the new leaf node.
23. equipment according to claim 13, wherein, the equipment also includes:
When generating data file device, size threshold value for reaching default data file when the line number of the data source, then Current data row new life is turned into a data file, distributes corresponding from section to the tables of data by newly-generated data text Point, and update the metamessage of the corresponding tables of data of the data file.
24. equipment according to claim 13, wherein, the equipment also includes:
Index file device is generated, the size for the index file when the concordance list to be created reaches default index file During size threshold value, new index file is generated, and by the updating location information of the new index file to the rope to be created Draw in the corresponding metamessage of table.
CN201710140132.8A 2017-03-09 2017-03-09 A kind of method and apparatus creating concordance list Active CN106960020B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710140132.8A CN106960020B (en) 2017-03-09 2017-03-09 A kind of method and apparatus creating concordance list

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710140132.8A CN106960020B (en) 2017-03-09 2017-03-09 A kind of method and apparatus creating concordance list

Publications (2)

Publication Number Publication Date
CN106960020A true CN106960020A (en) 2017-07-18
CN106960020B CN106960020B (en) 2019-10-22

Family

ID=59470800

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710140132.8A Active CN106960020B (en) 2017-03-09 2017-03-09 A kind of method and apparatus creating concordance list

Country Status (1)

Country Link
CN (1) CN106960020B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110769025A (en) * 2019-09-06 2020-02-07 江苏中云科技有限公司 Method for accelerating data index of multi-tenant-oriented cloud storage system
CN111125216A (en) * 2019-12-10 2020-05-08 中盈优创资讯科技有限公司 Method and device for importing data into Phoenix
CN112231318A (en) * 2020-10-14 2021-01-15 北京人大金仓信息技术股份有限公司 Method and device for creating global index
CN113111034A (en) * 2021-04-07 2021-07-13 山东英信计算机技术有限公司 Index pre-allocation method and device
CN114880322A (en) * 2022-04-21 2022-08-09 广州经传多赢投资咨询有限公司 Financial data column type storage method, system, equipment and storage medium
CN115878612A (en) * 2022-11-17 2023-03-31 石家庄纵宇科技有限公司 Database structure and retrieval method thereof

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102436420A (en) * 2010-10-20 2012-05-02 微软公司 Low RAM space, high-throughput persistent key-value store using secondary memory
CN103412897A (en) * 2013-07-25 2013-11-27 中国科学院软件研究所 Parallel data processing method based on distributed structure
CN106326305A (en) * 2015-06-30 2017-01-11 星环信息科技(上海)有限公司 Storage method and equipment for data file and inquiry method and equipment for data file

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102436420A (en) * 2010-10-20 2012-05-02 微软公司 Low RAM space, high-throughput persistent key-value store using secondary memory
CN103412897A (en) * 2013-07-25 2013-11-27 中国科学院软件研究所 Parallel data processing method based on distributed structure
CN106326305A (en) * 2015-06-30 2017-01-11 星环信息科技(上海)有限公司 Storage method and equipment for data file and inquiry method and equipment for data file

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110769025A (en) * 2019-09-06 2020-02-07 江苏中云科技有限公司 Method for accelerating data index of multi-tenant-oriented cloud storage system
CN110769025B (en) * 2019-09-06 2022-04-22 江苏中云科技有限公司 Method for accelerating data index of multi-tenant-oriented cloud storage system
CN111125216A (en) * 2019-12-10 2020-05-08 中盈优创资讯科技有限公司 Method and device for importing data into Phoenix
CN111125216B (en) * 2019-12-10 2024-03-12 中盈优创资讯科技有限公司 Method and device for importing data into Phoenix
CN112231318A (en) * 2020-10-14 2021-01-15 北京人大金仓信息技术股份有限公司 Method and device for creating global index
CN113111034A (en) * 2021-04-07 2021-07-13 山东英信计算机技术有限公司 Index pre-allocation method and device
CN113111034B (en) * 2021-04-07 2023-08-04 山东英信计算机技术有限公司 Index pre-allocation method and device
CN114880322A (en) * 2022-04-21 2022-08-09 广州经传多赢投资咨询有限公司 Financial data column type storage method, system, equipment and storage medium
CN114880322B (en) * 2022-04-21 2023-02-28 广州经传多赢投资咨询有限公司 Financial data column type storage method, system, equipment and storage medium
CN115878612A (en) * 2022-11-17 2023-03-31 石家庄纵宇科技有限公司 Database structure and retrieval method thereof
CN115878612B (en) * 2022-11-17 2023-12-15 北京东方京融教育科技股份有限公司 Database structure and retrieval method thereof

Also Published As

Publication number Publication date
CN106960020B (en) 2019-10-22

Similar Documents

Publication Publication Date Title
CN106960020B (en) A kind of method and apparatus creating concordance list
US9740706B2 (en) Management of intermediate data spills during the shuffle phase of a map-reduce job
CN103902653B (en) A kind of method and apparatus for building data warehouse table genetic connection figure
JP4669067B2 (en) Dynamic fragment mapping
CN104111924B (en) A kind of Database Systems
CN106294352B (en) A kind of document handling method, device and file system
JP5203733B2 (en) Coordinator server, data allocation method and program
US8756260B2 (en) System for organizing computer data
US20130031229A1 (en) Traffic reduction method for distributed key-value store
CN104216724B (en) A kind of method and system of web application interface upgrade
US20180144061A1 (en) Edge store designs for graph databases
CN107943952A (en) A kind of implementation method that full-text search is carried out based on Spark frames
CN108536692A (en) A kind of generation method of executive plan, device and database server
CN104298687B (en) A kind of hash partition management method and device
CN105608228B (en) A kind of efficient distributed RDF data storage method
US20110179013A1 (en) Search Log Online Analytic Processing
US20080294673A1 (en) Data transfer and storage based on meta-data
CN105069074A (en) Strategy configuration file processing method, device and system
CN103905512B (en) A kind of data processing method and equipment
CN104794237B (en) web information processing method and device
CN106940715B (en) A kind of method and apparatus of the inquiry based on concordance list
US20180144060A1 (en) Processing deleted edges in graph databases
US20080082516A1 (en) System for and method of searching distributed data base, and information management device
US11847121B2 (en) Compound predicate query statement transformation
US20160154812A1 (en) Hybrid database management system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder

Address after: 200233 11-12 / F, building B, 88 Hongcao Road, Xuhui District, Shanghai

Patentee after: Star link information technology (Shanghai) Co.,Ltd.

Address before: 200233 11-12 / F, building B, 88 Hongcao Road, Xuhui District, Shanghai

Patentee before: TRANSWARP TECHNOLOGY (SHANGHAI) Co.,Ltd.

CP01 Change in the name or title of a patent holder
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: A Method and Equipment for Creating Index Tables

Effective date of registration: 20230616

Granted publication date: 20191022

Pledgee: Bank of China Limited by Share Ltd. Shanghai Xuhui branch

Pledgor: Star link information technology (Shanghai) Co.,Ltd.

Registration number: Y2023310000252

PE01 Entry into force of the registration of the contract for pledge of patent right