CN110399535A

CN110399535A - A kind of data query method, device and equipment

Info

Publication number: CN110399535A
Application number: CN201910142002.7A
Authority: CN
Inventors: 黄浩; 王浙明; 万春晓; 黄东波; 陈戈
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-02-26
Filing date: 2019-02-26
Publication date: 2019-11-01
Anticipated expiration: 2039-02-26
Also published as: CN110399535B

Abstract

The invention discloses a kind of data query method and devices, this method is applied to server, the server has pre-generated the index metadata with multilevel index structure, and index metadata is divided into two parts by the multilevel index structure according to index metadata, wherein, a part of index metadata saves in pre-stored index block in memory, and a part of index metadata is stored in disk in pre-stored index block；The structure of the index metadata stored in disk is the junior of index metadata in memory.After receiving inquiry instruction, it is retrieved in the index metadata first stored in memory, after obtaining relevant index metadata, it is retrieved in the index metadata stored in disk again, after obtaining first object index metadata, is concentrated from pre-stored first bitmap and search the first result set corresponding with first object index data.The efficiency of data query is improved in this way, and then improves the supported data scale of system.

Description

A kind of data query method, device and equipment

Technical field

The present invention relates to data processing field more particularly to a kind of data query method, device and equipments.

Background technique

Currently, the mass data real-time query technology that industry is supported generallys use distributed computing architecture, by user query Request splits into multiple subqueries, is distributed to each calculate node parallel computation, and last joint account is as a result, simultaneously using index Technology accelerates data positioning, improves query performance.

But the index technology provided in the prior art, efficiency data query is lower, is unable to satisfy the demand of user.

Summary of the invention

In view of this, solving and counting in the prior art the embodiment of the invention discloses a kind of data query method and device It is investigated that the problem of asking low efficiency.

The embodiment of the invention discloses a kind of data query method, the data query method is applied to server,

The server has pre-generated the index metadata with multilevel index structure, and according to the index metadata Multilevel index structure the index metadata is divided into two parts, wherein a part of index metadata saves in memory In pre-stored index block, a part of index metadata is stored in disk in pre-stored index block；In the disk The structure of the index metadata of storage is the junior of index metadata in memory；

The data query method includes:

Index information in the data query instruction received is handled, the index symbol for meeting indexed format is generated Number；

Index metadata corresponding with the index is searched in pre-stored index block in memory；

In the case where finding index metadata corresponding with the index, preliminary search is obtained as a result, in institute It states and determines the index metadata comprising the preliminary search result in disk in pre-stored index block；

In the index metadata comprising the preliminary search result, the first number of index corresponding with the index is searched According to obtaining first object index metadata；

It is concentrated from pre-stored first bitmap and searches the first result set corresponding with the target index metadata；Institute Stating the first bitmap and concentrating includes the inverted index data constructed by inverted index.

Optionally, index number corresponding with the index is searched in the index block pre-stored in memory According to, comprising:

According to the create-rule of the multilevel index structure of the index metadata, the corresponding multistage of the index is generated Index entry；

Determine the index metadata saved in memory data structure and comprising series；

According to the series of the data structure of index metadata and index metadata in the memory, from the multiple index First object index entry is determined in；

Index member corresponding with the first object index entry is searched in pre-stored index block in the memory Data.

Optionally, described in the index data comprising the preliminary search result, it searches corresponding with the index Index metadata, obtain first object index metadata, comprising:

Determine the index metadata saved in disk data structure and comprising series；

According to the data structure of index metadata in the disk and comprising series, from the multiple index item really Fixed second target index entry；

The first number of index corresponding with the second target index entry is searched in pre-stored index block in the disk According to.

Optionally, the forward index data constructed by forward index are previously stored in the server.

Optionally, further includes:

It is pre- to judge whether the relationship between the data volume in first result set and the radix of preset inquiry column meets If condition；

If meeting preset condition, foundation between the radix that the data volume and preset inquiry in first result set arrange The data of column are specified in first result set described in inverted index data query, and polymerization calculating is carried out to the data of specified column；

If being unsatisfactory for preset condition between the data volume that the data volume and preset inquiry in first result set arrange, According to the data of column specified in result set described in forward index data query, and polymerization calculating is carried out to the data of specified column.

Optionally, the inverted index data are constructed by roaring bitmap algorithm.

Optionally, the building process of the forward index and inverted index includes:

Data to be processed are divided into multiple units according to preset hash algorithm, so that the number for including in each unit It is less than preset amount threshold according to the difference of amount；

The forward index of each unit and the building task of inverted index are executed in spark tool；

During executing the building task of forward index and inverted index, recode to data, to data It is compressed；

During executing the building task of forward index and inverted index, if appointing comprising what is transmitted across machine network Business, merges into a stage for the different phase of the same task executed in different machines, so that the stage is same It is executed in machine.

Optionally, the index information in the described pair of data query instruction received is handled, and generation meets index lattice The index of formula, comprising:

According to MD5 Message Digest 5, the index information received is handled, generates MD5 value.

Optionally, further includes:

The access frequency of every column data in first result set is detected；

If the access frequency of the data of any column is greater than preset access frequency, by data of the column and corresponding Index metadata store into memory.

Optionally, further includes:

It is searched and the index in the corresponding index metadata of data for being greater than default access frequency stored in memory The corresponding index metadata of symbol；

If inquiring index metadata corresponding with the index, the second target index metadata is obtained；

It is concentrated from pre-stored 2nd bitmap and searches corresponding second result set of the second target index metadata.

The embodiment of the invention also discloses a kind of data query device, the data query device is applied to server, institute It states server and has pre-generated the index metadata with multilevel index structure, and the multiple index according to the index metadata The index metadata is divided into two parts by structure, wherein a part of index metadata saves pre-stored in memory In index block, a part of index metadata is stored in disk in pre-stored index block；The index stored in the disk The structure of metadata is the junior of index metadata in memory；

Described device includes:

Inquiry instruction processing unit is handled the index information in the data query instruction received for inquiring, Generate the index for meeting indexed format；

First retrieval unit, for searching rope corresponding with the index in pre-stored index block in memory Draw metadata；

Range determination unit, for obtaining in the case where finding index metadata corresponding with the index Preliminary search in the disk as a result, determine the index member number comprising the preliminary search result in pre-stored index block According to；

Second retrieval unit, for searching and the index in the index metadata comprising the preliminary search result The corresponding index metadata of symbol, obtains first object index metadata；

Third retrieval unit is searched and the target index metadata pair for concentrating from pre-stored first bitmap The first result set answered.

The embodiment of the invention also discloses a kind of computer equipments, comprising:

Processor and memory；

Wherein, the processor is for executing the program stored in the memory；

For storing program, described program is at least used for the memory:

The pre-generated index data with multilevel index structure, and the multilevel index structure according to the index metadata The index metadata is divided into two parts, wherein a part of index metadata saves pre-stored index in memory In block, a part of index metadata is stored in disk in pre-stored index block；The index member number stored in the disk According to structure be memory in index metadata junior；

When receiving data query instruction, the index information in the data query instruction received is handled, it is raw At the index for meeting indexed format；

The embodiment of the invention discloses a kind of data query method and device, this method is applied to server, wherein service The index metadata with multilevel index structure has been pre-generated in device, and a part of index metadata has been saved in memory, Another part index metadata is stored in disk, also, the index metadata saved in disk is index metadata in memory Junior.After receiving inquiry instruction, first retrieved in memory, if retrieve in memory successfully, then in disk into Row retrieval if retrieving successfully in disk, then concentrates from pre-stored first bitmap and searches specific index data.In this way, Partial data in index metadata is stored in memory, another part data are stored in disk, and it is hollow to solve memory Between insufficient problem, improve the supported data scale of system, and decrease the number of access disk, improve data The efficiency of inquiry.In addition to this, bitmap collection has been stored in advance, in this way in data query, no longer need to building bitmap, i.e., without Data conversion need to be carried out again, also improve efficiency data query.

Detailed description of the invention

In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this The embodiment of invention for those of ordinary skill in the art without creative efforts, can also basis The attached drawing of offer obtains other attached drawings.

Fig. 1 shows a kind of flow diagram of data query method provided in an embodiment of the present invention；

Fig. 2 shows the structural schematic diagrams of multiple index data；

Fig. 3 shows the storage form schematic diagram of multiple index data；

Fig. 4 shows the schematic diagram of the building process of forward index provided in an embodiment of the present invention and inverted index；

Fig. 5 shows the another flow diagram of data query method provided in an embodiment of the present invention；

Fig. 6 shows a kind of structural schematic diagram of data query device provided in an embodiment of the present invention；

Fig. 7 shows a kind of structural schematic diagram of computer equipment provided in an embodiment of the present invention；

Fig. 8 shows a kind of structural schematic diagram of data query system；

The framework that Fig. 9 shows Query Manager obtains schematic diagram；

Figure 10 shows the configuration diagram of Query Worker；

Figure 11 shows the schematic diagram of Coordinatorr framework.

Specific embodiment

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.

Applicant it has been investigated that, index data needed for index technology (includes: index metadata and tool in the prior art The index data of body) it is stored entirely in disk, when inquiring data, the data in reading disk are needed, it in this way can be repeatedly right Disk accesses, very time-consuming.In addition to this, in the prior art, there is also a kind of data query modes, will index first number According to being stored in memory, specific index data is stored in disk, although the search efficiency to data increases, Since the data volume of storage is increasing, the data volume of index metadata is also increasing, can no longer meet in this way in memory Huge data volume.Also, during data query, it is also necessary to bitmap is constructed in real time, according to what is temporarily constructed Bitmap is retrieved.In this way, inquiry requires to carry out data conversion every time, it also can be than relatively time-consuming.

Based on the above problem of inventor's discovery, the embodiment of the invention discloses a kind of data query method, this method is answered For server, server has pre-generated the index metadata with multilevel index structure, and according to the index metadata Multilevel index structure the index metadata is divided into two parts, wherein a part of index metadata is stored in memory In pre-stored index block, a part of index metadata is stored in disk in pre-stored index block；In the disk The structure of the index metadata of storage is the junior of index metadata in memory；After server receives data query instruction, Index information in the data query instruction received is handled, generates and meets the index of indexed format, including elder generation Index metadata corresponding with the index is searched in the index block deposited；It is corresponding with the index finding In the case where index metadata, preliminary search is obtained as a result, determining in pre-stored index block comprising institute in the disk State the index metadata of preliminary search result；Then, in the index metadata comprising the preliminary search result, lookup and institute The corresponding index metadata of index is stated, first object index metadata is obtained；And it is looked into from pre-stored bitmap concentration Look for the first result set corresponding with the target index metadata.

It follows that the partial data in index metadata is stored in memory in the embodiment of the present invention, another part Data are stored in disk, solve the problems, such as that space is insufficient in memory, improves the supported data scale of system, and The number for reducing access disk, improves the efficiency of data query.In addition to this, bitmap collection has been stored in advance, has existed in this way When data query, building bitmap is no longer needed to, that is, no longer needs to carry out data conversion, also improves efficiency data query.

With reference to Fig. 1, a kind of flow diagram of data query method provided in an embodiment of the present invention is shown, this method is answered For in server, this method comprises:

Wherein, before carrying out data query, index data (index metadata and specific rope have been stored in advance in server Argument evidence), and in order to improve search efficiency, also for the data volume for adapting to several hundred million grades, index metadata is classified, is obtained The index metadata is divided into two parts to multilevel index structure, and according to the multilevel index structure of index metadata, In, a part of index metadata saves in pre-stored index block in memory, and a part of index metadata is stored in disk In in pre-stored index block.Also, the structure of the index metadata stored in disk is under index metadata in memory Grade.

For example: as illustrated in fig. 2, it is assumed that after some label is carried out Hash operation, obtained index metadata, it is assumed that For 01e76a51f001de4r2ton9df34f7045c2, which is classified, 01 is used as the first order (1st Level), 01e is as the second level (2st level), and 01e7 is as the third level (3stlevel), and 01e76a51 is as the 4th grade (4st level), and so on, afterbody (leaf level) includes all digits, and is corresponding with bitmap data.

Also, index metadata is divided into two parts, the data of top N can be stored in memory, by the position N Data afterwards are stored in disk, and with data in disk are stored according to above-mentioned classification results in memory 's.

Also, when index metadata is stored in memory or disk, it is also understood that in a layered fashion into Row storage, the index metadata in same level-one is stored in the different index block of same layer, as shown in figure 3, the rope of first layer Draw the data for being stored with the first order in block, the data of the second layer are stored in the index block of the second layer, includes at least in each layer One data block includes at least one index metadata corresponding with this layer in each data block.

When receiving data query instruction, the method for data query includes:

S101: the index information in the data query instruction received is handled, the rope for meeting indexed format is generated Draw symbol；

In the present embodiment, the generating mode of index and the generation method of index metadata are consistent, and can pass through Various ways generate, and repeat no more in the present embodiment.

Such as it can be using MD5 (full name in English: Message-DigestAlgorithm, Chinese name: eap-message digest calculation Method) algorithm calculates index information, specifically, S101 includes:

According to MD5 algorithm, the index information received is handled, generates MD5 value.

S102: index metadata corresponding with the index is searched in pre-stored index block in memory；

In the present embodiment, the hierarchical approaches by index according to index metadata carry out classification processing, above-mentioned to teach To the hierarchical approaches of index metadata, the hierarchical approaches one of hierarchical approaches and above-mentioned index metadata for index It causes, just repeats no more herein.After index classification, index can also be divided into two parts, a part is used for It is retrieved in memory, a part in disk for being retrieved.

Wherein, the process retrieved in memory may include:

Determine the index metadata saved in memory index structure and comprising series；

According to the series for the data structure and index metadata for indexing member in the memory, from the multiple index item Determine first object index entry；

For example: if preceding 3 grades of data in index metadata are saved in memory, include in first object index entry Preceding 3 grades of data in index.Assuming that index are as follows: f66d091c69df34f7045c21f700e308bf, f6 can be with For first order index entry, f66 is second level index entry, and f66d can be third level index entry, and f66d091c can be the fourth stage Index entry, and so on, the index with multilevel structure is obtained, and using the index of preceding three-level as first object rope Draw item, is searched in memory.

Also, due to being to be stored according to the hierarchical structure of index metadata to index metadata in memory, inside When being retrieved in depositing, matching to index and index data step by step is needed.The index of the first order if it exists In the case where, the index member number comprising second level index is searched in the index metadata comprising first order index According to successively retrieving.

S103: in the case where finding index metadata corresponding with the index, obtain preliminary search as a result, The index metadata comprising the preliminary search result is determined in pre-stored index block in the disk；

In the present embodiment, in having executed memory search index metadata corresponding with the first index entry the step of after, There are two kinds of lookup results, and one is index metadata corresponding with the first index entry is not found, one is find Index metadata corresponding with the first index entry.

If not finding index metadata corresponding with the first index entry, terminate to retrieve.

If finding index metadata corresponding with the first index entry, preliminary search is obtained as a result, preliminary search result For with the consistent index metadata of the first index entry, the index member number comprising preliminary search result is then determined in disk According to.

S104: in the index metadata comprising the preliminary search result, rope corresponding with the index is searched Draw metadata, obtains first object index metadata；

Mentioned above that index is divided into two parts, a part in memory for retrieving, and a part is in disk Middle retrieval.

In disk, specific retrieving includes:

Index member corresponding with the second target index entry is searched in pre-stored index block in the disk Data.

In the present embodiment, the structure of the index metadata stored in the structure and series and disk of the second target index entry It is consistent with series.For example, storing preceding 3 grades of index metadata in memory, then magnetic if index metadata is divided into 6 grades 3 grades of data after being stored in disk in index metadata.So index is also divided into 6 grades, and the first index entry includes preceding 3 The data of grade, the second target index entry includes rear 3 grades of data.Wherein, if being found in disk and the index phase After corresponding index metadata, first object index metadata is obtained.

Wherein, it is preferred that when being retrieved in memory or disk, can be looked by the way of binary chop It looks for.

S105: it is concentrated from pre-stored first bitmap and searches the first result corresponding with the target index metadata Collection；

In the present embodiment, the afterbody of the index metadata stored in disk can be directed toward corresponding bitmap and concentrate accordingly Position, to get the corresponding specific index data of index.

It includes the inverted index data constructed by inverted index that the bitmap, which is concentrated,.

In the present embodiment, the partial data in index metadata is stored in memory in the embodiment of the present invention, another portion Divided data is stored in disk, is solved the problems, such as that space is insufficient in memory, and decrease the number of access disk, is improved The efficiency of data query.In addition to this, bitmap collection has been stored in advance, in this way in data query, has no longer needed to construct Bitmap no longer needs to carry out data conversion, also improves efficiency data query.

In the present embodiment, with reference to Fig. 4, the building of forward index provided in an embodiment of the present invention and inverted index is shown The schematic diagram of journey, in the present embodiment, which includes:

S401: data to be processed are divided into multiple units according to preset hash algorithm, so as to wrap in each unit The difference of the data volume contained is less than preset amount threshold；

In the present embodiment, applicant it has been investigated that, the difference between the data volume of each unit after division is smaller, then The degree of parallelism of building task is higher, and the execution efficiency of the building task of forward index and inverted index is higher, and speed is faster.

Wherein, preset hash algorithm can be md5, sha256, crc32, murmurhash3_128, it is preferred that Hash Algorithm can be murmurhash3_128.

S402: the forward index of each unit and the building task of inverted index are executed in spark tool；

The building task for executing forward index and inverted index in the present embodiment in spark tool, can be improved building The execution efficiency of task.

S403: it during executing the building task of forward index and inverted index, recodes to data, with right Data are compressed；

In the present embodiment, constructing forward index and during inverted index, to the label data being related to and its Its related data is recoded, and then realizes the compression to data.The memory space that can reduce bitmap data in this way, makes It obtains and in advance saves bitmap data in the server.

S404: during executing the building task of forward index and inverted index, if comprising being transmitted across machine network Task, the different phase of the same task executed in different machines is merged into a stage, so that the stage is same It is performed in one machine.

Applicant it has been investigated that, across machine network transmission task for example can be groupby generic operation, will increase net Network transmission, influences the progress of task execution.

It in the present embodiment, will optimize across the task that machine network transmits, held being located on different machines in the task Two capable stages merge into a stage, and the stage is executed in a machine, can reduce network transmission in this way.

And, it is preferred that inverted index data can be constructed by roaring bitmap algorithm.

Method through this embodiment, improves the efficiency of the building task of forward index and inverted index, and reduces The memory spaces of forward index data and inverted index data.Bitmap data can be stored into server in advance, into And guarantee when inquiring data, building bitmap is no longer needed to, that is, no longer needs to carry out data conversion, data is also improved and looks into Ask efficiency.

In the present embodiment, the data query method in big data platform is applied, can be used for executing label search, Ren Qunhua Picture and class calculate the functions such as service.It is directed to crowd portrayal and class calculates the functions such as service, in addition to inquiring relevant number According to outer, it is also necessary to further processing is executed to the data inquired, and then obtains target data, also, in some cases, It is higher than inverted index using the search efficiency of forward index.Based on this, in the embodiment of the present invention, it is stored in advance in the server By forward index construct forward index data.

With reference to Fig. 5, the another flow diagram of data query method provided in an embodiment of the present invention is shown, in this implementation In example, this method comprises:

S501: judge whether is relationship between data volume in first result set and the data volume of preset inquiry column Meet preset condition；

S502: if meeting preset item between the data volume that the data volume and preset inquiry in first result set arrange Part according to the data of column specified in the first result set described in inverted index data query, and polymerize the data of specified column It calculates；

S503: if being unsatisfactory between the data volume that the data volume and preset inquiry in first result set arrange preset Condition according to the data of column specified in result set described in forward index data query, and carries out polymerization meter to the data of specified column It calculates；

In the present embodiment, judge to inquire specified column using forward index data or inverted index data, it can be with Follow following principle:

Data volume is larger in the first result set, inquire column radix it is smaller when, using forward index data to specified column It is inquired；

Data volume in the first result set is smaller, inquire column radix it is larger when, using inverted index data to specified Column are inquired.

For example: it is based on mentioned above principle, present embodiments provides following Rule of judgment:

N > am；

Wherein, n indicates the size of the data volume of the first result set, and m indicates the radix size of inquiry column, and a is constant, preferably , a=400.

If meeting above-mentioned condition, indicate that data volume is larger in the first result set, the radix for inquiring column is smaller, then according to the row of falling Index data inquires the data of the first result concentrative implementation column；

If being unsatisfactory for above-mentioned condition, indicate that data volume is smaller in the first result set, the radix for inquiring column is larger, then according to just Row's index data inquires the data of the first result concentrative implementation column.

In some cases, data query is carried out using forward index and carries out the speed of data query more than inverted index Fastly.In the present embodiment, by selecting suitable indexed mode to inquire data, efficiency data query is substantially increased.

In the present embodiment, in order to further increase the search efficiency of data, a certain column data that can be high by access frequency It is saved in memory, specifically, including:

The access frequency of every column data in the result set is detected；

Wherein it is obtained by calculation to can be being also possible to of being rule of thumb arranged for preset access frequency.

Also, it is that unit is saved that in memory, index data, which is to arrange,.

The data relatively high for the access frequency of preservation in memory, the corresponding index metadata of the data are also stored in In memory, it is not necessarily to access disk in this way, substantially increases efficiency data query, specifically, further include:

In the present embodiment, the second target index metadata is and the consistent index metadata of index.

In the present embodiment, the 2nd bitmap collection is that the data stored in memory for being greater than default access frequency are corresponding Bitmap collection.

In the present embodiment, by the higher temperature data storage of access frequency into memory, query latency can be significantly reduced. And it is that granularity is cached with column, improves the utilization rate of memory.

With reference to Fig. 6, a kind of structural schematic diagram of data query device provided in an embodiment of the present invention is shown, in this implementation In example, the data query device is applied to server, and the server has pre-generated the index with multilevel index structure Metadata, and the index metadata is divided into two parts according to the multilevel index structure of the index metadata, wherein one In memory in pre-stored index block, a part of index metadata is stored in disk in advance partial index meta-data preservation In the index block of storage；The structure of the index metadata stored in the disk is the junior of index metadata in memory；

Described device includes:

Inquiry instruction processing unit 601, for inquiring at the index information in the data query instruction received Reason generates the index for meeting indexed format；

First retrieval unit 602, it is corresponding with the index for being searched in pre-stored index block in memory Index metadata；

Range determination unit 603, for obtaining in the case where finding index metadata corresponding with the index To preliminary search as a result, determining the index member comprising the preliminary search result in pre-stored index block in the disk Data；

Second retrieval unit 604, for searching and the rope in the index metadata comprising the preliminary search result Draw the corresponding index metadata of symbol, obtains first object index metadata；

Third retrieval unit 605 indexes first number with the target for concentrating to search from pre-stored first bitmap According to corresponding first result set；It includes the inverted index data constructed by inverted index that first bitmap, which is concentrated,.

Optionally, first retrieval unit, is used for:

Optionally, second retrieval unit, is used for:

Optionally, further includes:

Indexed mode judging unit, for judging the radix of data volume and preset inquiry column in first result set Between relationship whether meet preset condition；

Inverted index processing unit, if in first result set data volume and it is preset inquiry column radix it Between meet preset condition, according to the data for specifying column in the first result set described in inverted index data query, and to specified column Data carry out polymerization calculating；

Inverted index processing unit, if the data volume for data volume and preset inquiry column in first result set Between be unsatisfactory for preset condition, according to the data for specifying column in result set described in forward index data query, and to specified column Data carry out polymerization calculating.

Optionally, further include that index construct unit is used for:

Optionally, the inquiry instruction processing unit, is used for:

Optionally, further includes:

Frequency detecting unit is detected for the access frequency to every column data in first result set；

Temperature data storage cell, if the access frequency of the data for any column is greater than preset access frequency, By the data of the column and the storage of corresponding index metadata into memory.

Optionally, further includes:

Temperature data retrieval unit, is used for:

Partial data in index metadata is stored in memory by device through this embodiment, another part data It is stored in disk, solves the problems, such as that space is insufficient in memory, and decrease the number of access disk, improve data The efficiency of inquiry.In addition to this, bitmap collection has been stored in advance, in this way in data query, no longer need to building bitmap, i.e., without Data conversion need to be carried out again, also improve efficiency data query.

With reference to Fig. 7, a kind of structural schematic diagram of computer equipment provided in an embodiment of the present invention is shown, in the present embodiment In, which includes:

Processor 701 and memory 702；

Wherein, the processor 701 is for executing the program stored in the memory 702；

For storing program, described program is at least used for the memory:

Optionally, further includes:

The access frequency of every column data in first result set is detected；

Optionally, further includes:

With reference to Fig. 8, a kind of structural schematic diagram of data query system is disclosed, the system includes: in the present embodiment

Query Manager,Query Worker,Coordinator,Indexing；

Wherein, Query Manager is external query interface, for receiving data inquiry instruction, and parsing data query refers to It enables, the instruction of distribution data query and Fusion query result.

Wherein, the framework of Query Manager is as shown in Figure 9.

The Query Manager is supported in data configuration (crowd wraps patition), data distribution, result merging phase Parallel, time-consuming to reduce operation.

And the segment routing iinformation used in request distribution procedure is obtained by zookeeper to be updated.

In addition to this, Query Manager supports multiple instances deployment, lifting system overall usability and concurrent capability.

Query Worker for after receiving inquiry instruction, executing above-mentioned data query method, and is also executed The index data that coordinator is issued distributes task.

It is registered when Query Worker service starting to coordinator, and existing state is reported by heartbeat.Here it takes Business registration and heartbeat are realized based on zookeeper.

As shown in Figure 10, the configuration diagram for indicating Query Worker, specifically includes: Query Engine and Data Manager。

Wherein, Query Engine is made of Grpc network frame and work sets of threads.Grpc network frame uses Asynchronous mode handles network I/O, and work thread internal calculation task is synchronous execution pattern.The calculating of DMP tag queries is CPU Intensity is inquired in calculating logic without time-consuming I/O operation, and asynchronization or association's journey are on monokaryon almost without income, therefore worker Worker thread model, as far as possible using all cpu cores on upper single machine, is realized on single machine multicore using fixed work number of threads Concurrently, while thread context handover overhead is reduced, makes cpu as far as possible.Engine query process and bottom is indexed related Operation is both designed as thread-safe, supports the execution with Multi-task Concurrency on index fragment.

Data Manager is mainly responsible for the management of index data as the component part of Query Worker node, as follows Index data, load index data, deletion index data and index data is carried to switch in memory and disk.The present invention The inquiry of multiple versions of data, the switching of version is supported not to influence normally to inquire in design.One segment is in the present invention There are active and standby 2 parts in system, request is routed to main fragment under normal circumstances, carries out when main fragment is inquired and occurs abnormal quick Failover requests to be routed to back-up piece simultaneously, guarantees system entirety High Availabitity.

Coordinatorr uses a main prepare more mode, elects leader by zk, major function includes:

1) condition managing of all worker nodes of Poseidon cluster is responsible for service discovery.2) allocation schedule is indexed: negative The management of index metadata and the load balancing of index, Failover processing are blamed, global index arrangement view is constructed.3) rope Lead the way by: based on index distribution, construct routing iinformation, be supplied to Query manager inquiry.

Specific framework is as shown in figure 11.

Wherein, by zookeeper, other modules interact Coordinator with system, and it includes all for obtaining version Segments, worker node serve state.Building<worker, segment_infos>and<segment, worker_ Info > etc. kernel data structures.

Indexing, for executing index construct.

Also, the system uses MPP framework.

It should be noted that all the embodiments in this specification are described in a progressive manner, each embodiment weight Point explanation is the difference from other embodiments, and the same or similar parts between the embodiments can be referred to each other.

The foregoing description of the disclosed embodiments enables those skilled in the art to implement or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, as defined herein General Principle can be realized in other embodiments without departing from the spirit or scope of the present invention.Therefore, of the invention It is not intended to be limited to the embodiments shown herein, and is to fit to and the principles and novel features disclosed herein phase one The widest scope of cause.

Claims

1. a kind of data query method, which is characterized in that the data query method is applied to server,

The server has pre-generated the index metadata with multilevel index structure, and according to the more of the index metadata The index metadata is divided into two parts by grade index structure, wherein a part of index metadata saves preparatory in memory In the index block of storage, a part of index metadata is stored in disk in pre-stored index block；It is stored in the disk Index metadata structure be memory in index metadata junior；

The data query method includes:

Index information in the data query instruction received is handled, the index for meeting indexed format is generated；

In the case where finding index metadata corresponding with the index, preliminary search is obtained as a result, in the magnetic The index metadata comprising the preliminary search result is determined in disk in pre-stored index block；

In the index metadata comprising the preliminary search result, index metadata corresponding with the index is searched, Obtain first object index metadata；

It is concentrated from pre-stored first bitmap and searches the first result set corresponding with the target index metadata；Described It includes the inverted index data constructed by inverted index that one bitmap, which is concentrated,.

2. the method according to claim 1, wherein in the index block pre-stored in memory search with The corresponding index data of the index, comprising:

According to the create-rule of the multilevel index structure of the index metadata, the corresponding multiple index of the index is generated ；

According to the series of the data structure of index metadata and index metadata in the memory, from the multiple index item Determine first object index entry；

Index metadata corresponding with the first object index entry is searched in pre-stored index block in the memory.

3. according to the method described in claim 2, it is characterized in that, described in the index data comprising the preliminary search result In, index metadata corresponding with the index is searched, first object index metadata is obtained, comprising:

According to the data structure of index metadata in the disk and comprising series, the is determined from the multiple index item Two target index entries；

Index metadata corresponding with the second target index entry is searched in pre-stored index block in the disk.

4. the method according to claim 1, wherein being previously stored in the server through forward index structure The forward index data built.

5. according to the method described in claim 4, it is characterized by further comprising:

Judge whether the relationship between the data volume in first result set and the radix of preset inquiry column meets default item Part；

If meeting preset condition between the radix that the data volume and preset inquiry in first result set arrange, according to the row of falling Index data inquires in first result set data for specifying column, and carries out polymerization calculating to the data of specified column；

If being unsatisfactory for preset condition, foundation between the data volume that the data volume and preset inquiry in first result set arrange The data of column are specified in result set described in forward index data query, and polymerization calculating is carried out to the data of specified column.

6. according to the method described in claim 4, it is characterized in that, the inverted index data are calculated by roaring bitmap Method is constructed.

7. according to the method described in claim 4, it is characterized in that, the building process packet of the forward index and inverted index It includes:

Data to be processed are divided into multiple units according to preset hash algorithm, so that the data volume for including in each unit Difference be less than preset amount threshold；

During executing the building task of forward index and inverted index, recode to data, to be carried out to data Compression；

During executing the building task of forward index and inverted index, if comprising being transmitted across machine network for task, it will The different phase of the same task executed in different machines merges into a stage, so that the stage is in the same machine It executes.

8. according to method described in right 1, which is characterized in that described pair receive data query instruction in index information into Row processing, generates the index for meeting indexed format, comprising:

9. the method according to claim 1, wherein further include:

The access frequency of every column data in first result set is detected；

If the access frequency of the data of any column is greater than preset access frequency, by the data of the column and corresponding rope Draw metadata storage into memory.

10. according to the method described in claim 9, it is characterized by further comprising:

It is searched and the index in the corresponding index metadata of data for being greater than default access frequency stored in memory Corresponding index metadata；

11. a kind of data query device, which is characterized in that the data query device is applied to server, and the server is pre- The index metadata with multilevel index structure is first generated, and will be described according to the multilevel index structure of the index metadata Index metadata is divided into two parts, wherein and a part of index metadata saves in pre-stored index block in memory, and one Partial index meta-data preservation is in disk in pre-stored index block；The structure of the index metadata stored in the disk For the junior of index metadata in memory；

Described device includes:

Inquiry instruction processing unit handles the index information in the data query instruction received for inquiring, generates Meet the index of indexed format；

First retrieval unit, for searching index member corresponding with the index in pre-stored index block in memory Data；

Range determination unit, it is preliminary for obtaining in the case where finding index metadata corresponding with the index Query result determines the index metadata comprising the preliminary search result in the disk in pre-stored index block；

Second retrieval unit, for searching and the index in the index metadata comprising the preliminary search result Corresponding index metadata obtains first object index metadata；

Third retrieval unit, for concentrating lookup corresponding with the target index metadata from pre-stored first bitmap First result set；It includes the inverted index data constructed by inverted index that first bitmap, which is concentrated,.

12. a kind of computer equipment characterized by comprising

Processor and memory；

Wherein, the processor is for executing the program stored in the memory；

For storing program, described program is at least used for the memory:

The pre-generated index data with multilevel index structure, and according to the multilevel index structure of the index metadata by institute It states index metadata and is divided into two parts, wherein a part of index metadata saves in pre-stored index block in memory, A part of index metadata is stored in disk in pre-stored index block；The knot of the index metadata stored in the disk Structure is the junior of index metadata in memory；

When receiving data query instruction, the index information in the data query instruction received is handled, symbol is generated Close the index of indexed format；