CA3140571A1

CA3140571A1 - Method and apparatus for clickhouse-based crowd selection

Info

Publication number: CA3140571A1
Application number: CA3140571A
Authority: CA
Inventors: Qiao XIE; Zhiwei Wang; Zhaogui Yang; Weijian Yu
Original assignee: 10353744 Canada Ltd
Current assignee: 10353744 Canada Ltd
Priority date: 2020-11-26
Filing date: 2021-11-26
Publication date: 2022-05-26
Also published as: CN112396462B; CN112396462A

Abstract

A method and apparatus for Clickhouse-based crowd selection relate to big data technologies, and satisfy requirements of the business side for fast and efficient computation of crowd packages.
The method involves: configurating tag information to create spark tasks, and introducing original-table data of tags into a CH table of a Clickhouse engine according to configured information items; based on feature types of tag values, partitioning the CH
table into aggregation tables of different types; acquiring query tags from a user to generate sql query criteria, and converting the sql query criteria into Clickhouse query clauses; and by means of multithreading executing the Clickhouse query clauses parallelly, acquiring query detail data from the aggregation tables of the corresponding types, storing the detail data into a Hive table, and obtaining crowd selection query results. The apparatus implements the described method.

Description

METHOD AND APPARATUS FOR CLICKHOUSE-BASED CROWD SELECTION
BACKGROUND OF THE INVENTION
Technical Field [0001] The present invention relates to the technical field of big data, and more particularly to a method and apparatus for Clickhouse-based crowd selection.
Description of Related Art

[0002] Currently, precise selection of crowd from massive internet data is critical to successful placement of advertising information. Usually, this can be achieved by tagging user characteristic behaviors, using the obtained tags as criteria to screen the target crowd, and placing real-time advertisement. Since advertising campaigns are usually audience specific, criteria used for screening different crowds are highly diverse.

[0003] Conventionally, off-line compute engines such as Elasticsearch and spark are used to determine this selection. However, as the business side has increasingly demanding requirements for computation performance in terms of crowd packages, these existing compute engines are becoming incompetent to satisfy users when it comes to quasi real-time performance. Besides, the data volume of selection criteria has now reached a scale of hundreds of millions, making data pulling extremely time-consuming, and this is adverse to business needs.
SUMMARY OF THE INVENTION

[0004] The objective of the present invention is to provide a method and an apparatus for Clickhouse-based crowd selection, which provide the business side with fast and efficient computation of crowd packages.

Date recue / Date received 202 1-1 1-26

[0005] To achieve the foregoing objective, in a first aspect, the present invention provides a method for Clickhouse-based crowd selection, which comprises:

[0006] configurating tag information to create spark tasks, and introducing original-table data of tags into a CH table of a Clickhouse engine according to configured information items;

[0007] partitioning the CH table into aggregation tables of different types on the basis of feature types of tag values;

[0008] acquiring query tags from a user to generate sql query criteria, and converting the sql query criteria into Clickhouse query clauses; and

[0009] executing the Clickhouse query clauses parallelly by means of multithreading, acquiring query detail data from the aggregation tables of the corresponding types, storing the detail data into a Hive table, and obtaining crowd selection query results.

[0010] Preferably, the step of configurating tag information to create spark tasks, and introducing original-table data of tags into a CH table of a Clickhouse engine according to configured information items comprises:

[0011] configurating the tag information through a visual interface of a business system, wherein the tag information includes three kinds of configuration information items and the corresponding tag original-table data, wherein the three kinds of configuration information items are tag codes, tag names, and tag values; and

[0012] automatically creating spark tasks to process the tag original-table data, and introducing the tag original-table data into the CH table in the Clickhouse engine in formats of tag codes, tag names, and tag values.

[0013] More preferably, the step of based on feature types of tag values, partitioning the CH
table into aggregation tables of different types comprises:

[0014] dividing the feature types of the tag values into an int type, a double type, a string type, and a date type; and

[0015] according to the feature types of the tag values in the tag original-table data, partitioning Date recue / Date received 202 1-1 1-26 the CH table correspondingly into an it-type tag-value aggregation table, a double-type tag-value aggregation table, a string-type tag-value aggregation table, and a date-type tag-value aggregation table.

[0016] Further, after the step of partitioning the CH table into aggregation tables of different types on the basis of feature types of tag values, the method further comprises:

[0017] distributing the aggregation tables across nodes of the Clickhouse engine, for performing distributed query.

[0018] Preferably, the step of acquiring query tags from a user to generate sql query criteria, and converting the sql query criteria into Clickhouse query clauses comprises:

[0019] selecting the query tags through the visual interface of the business system to form the sql query criteria, and sending the sql query criteria to kafka; and

[0020] consuming sql query criteria data, receiving drop tables through SparkStreaming tasks, and converting the sql query criteria into the Clickhouse query clauses.

[0021] Preferably, the step of executing the Clickhouse query clauses parallelly by means of multithreading, acquiring query detail data from the aggregation tables of the corresponding types, storing the detail data into a Hive table, and obtaining crowd selection query results comprises:

[0022] using the Spark tasks to execute the Clickhouse query clauses parallelly by means of multithreading, while acquiring detail data from the corresponding aggregation tables and storing the detail data into the Hive table, so as to obtain the crowd selection query results through computation.

[0023] Preferably, after the step of executing the Clickhouse query clauses parallelly by means of multithreading, acquiring query detail data from the aggregation tables of the corresponding types, storing the detail data into a Hive table, and obtaining crowd selection query results, the method further comprises:

Date recue / Date received 202 1-1 1-26

[0024] storing the detail data of the Hive table into redis, for the business system to use according to business needs.

[0025] As compared to the prior art, the method for Clickhouse-based crowd selection of the present invention has the following beneficial effects:

[0026] With the disclosed method for Clickhouse-based crowd selection, a user is enabled to configure tag information through a back end of a business system and create spark tasks.
Then the user can introduce original-table data of tags into a CH table of a Clickhouse engine according to configured information items. Afterward, according to feature types of tag values, a CH table in a Clickhouse engine can be partitioned into different types of aggregation table. Subsequently, the user can provide query tags to generate sql query criteria.
The criteria are then converted into Clickhouse query clauses. By means of multithreading, the Clickhouse query clauses are executed parallelly. Query detail data are acquired from aggregation tables of the corresponding types, and stored into a Hive table, so that crowd selection query results can be obtained.

[0027] It is thus clear that the present invention uses a concept of Clickhouse + Spark computation to introduce the computation process for criteria of tag combinations into a Clickhouse engine, and use the combination of the Spark compute engine and the Clickhouse compute engine to generate the crowd package, thereby shortening the time required by computation of detail data satisfying the criteria of tag combinations from minutes to seconds.

[0028] In a second aspect, the present invention provides an apparatus for Clickhouse-based crowd selection, which is applicable to the method for Clickhouse-based crowd selection of the foregoing technical scheme. The apparatus comprises:

[0029] a configuring unit, for configurating tag information to create spark tasks, and introducing original-table data of tags into a CH table of a Clickhouse engine according to configured information items;

[0030] a table-partitioning unit, for partitioning the CH table into aggregation tables of different Date recue / Date received 202 1-1 1-26 types on the basis of feature types of tag values;

[0031] a query unit, for acquiring query tags from a user to generate sql query criteria, and converting the sql query criteria into Clickhouse query clauses; and

[0032] a processing unit, for executing the Clickhouse query clauses parallelly by means of multithreading, acquiring query detail data from the aggregation tables of the corresponding types, storing the detail data into a Hive table, and obtaining crowd selection query results.

[0033] Preferably, the table-partitioning unit is for dividing the feature types of the tag values into an int type, a double type, a string type, and a date type, according to the feature types of the tag values in the tag original-table data, partitioning the CH table correspondingly into an it-type tag-value aggregation table, a double-type tag-value aggregation table, a string-type tag-value aggregation table, and a date-type tag-value aggregation table.

[0034] As compared to the prior art, the disclosed apparatus for Clickhouse-based crowd selection provides beneficial effects that are similar to those provided by the disclosed method for Clickhouse-based crowd selection as enumerated above, and thus no repetitions are made herein.

[0035] In a third aspect, the present invention provides a computer-readable storage medium, storing thereon a computer program. When the computer program is executed by a processor, it implements the steps of the method for Clickhouse-based crowd selection as described previously.

[0036] As compared to the prior art, the disclosed computer-readable storage medium provides beneficial effects that are similar to those provided by the disclosed method for Clickhouse-based crowd selection as enumerated above, and thus no repetitions are made herein.
BRIEF DESCRIPTION OF THE DRAWINGS

[0037] The accompanying drawings are provided herein for better understanding of the present invention and form a part of this disclosure. The illustrative embodiments and their Date recue / Date received 202 1-1 1-26 descriptions are for explaining the present invention and by no means form any improper limitation to the present invention, wherein:

[0038] FIG. 1 is a flowchart of the method for Clickhouse-based crowd selection of one embodiment of the present invention;

[0039] FIG. 2 is a schematic drawing illustrating that a CH table is partitioned into four types of aggregation tables according to an embodiment of the present invention; and

[0040] FIG. 3 is another flowchart of the method for Clickhouse-based crowd selection of the embodiment of the present invention.
DETAILED DESCRIPTION OF THE INVENTION

[0041] To make the foregoing objectives, features, and advantages of the present invention clearer and more understandable, the following description will be directed to some embodiments as depicted in the accompanying drawings to detail the technical schemes disclosed in these embodiments. It is, however, to be understood that the embodiments referred herein are only a part of all possible embodiments and thus not exhaustive. Based on the embodiments of the present invention, all the other embodiments can be conceived without creative labor by people of ordinary skill in the art, and all these and other embodiments shall be embraced in the scope of the present invention.
Embodiment 1

[0042] Referring to FIG. 1 through FIG. 3, the present embodiment provides a method for Clickhouse-based crowd selection that comprises:

[0043] configurating tag information to create spark tasks, and introducing original-table data of tags into a CH table of a Clickhouse engine according to configured information items;
partitioning the CH table into aggregation tables of different types on the basis of feature types of tag values; acquiring query tags from a user to generate sql query criteria, and converting the sql query criteria into Clickhouse query clauses; and by means of multithreading executing the Clickhouse query clauses parallelly, acquiring query detail data from the aggregation tables of the corresponding types, storing the detail data into a Hive Date recue / Date received 202 1-1 1-26 table, and obtaining crowd selection query results.

[0044] With the disclosed method for Clickhouse-based crowd selection, a user is enabled to configure tag information through a back end of a business system and create spark tasks.
Then the user can introduce original-table data of tags into a CH table of a Clickhouse engine according to configured information items. Afterward, according to feature types of tag values, a CH table in a Clickhouse engine can be partitioned into different types of aggregation table. Subsequently, the user can provide query tags to generate sql query criteria.
The criteria are then converted into Clickhouse query clauses. By means of multithreading, the Clickhouse query clauses are executed parallelly. Query detail data are acquired from aggregation tables of the corresponding types, and stored into a Hive table, so that crowd selection query results can be obtained.

[0045] It is thus clear that the present invention uses a concept of Clickhouse + Spark computation to introduce the computation process for criteria of tag combinations into a Clickhouse engine, and use the combination of the Spark compute engine and the Clickhouse compute engine to generate the crowd package, thereby shortening the time required by computation of detail data satisfying the criteria of tag combinations from minutes to seconds.

[0046] In the foregoing embodiment, the step of configurating tag information to create spark tasks, and introducing original-table data of tags into a CH table of a Clickhouse engine according to configured information items comprises:

[0047] configurating the tag information through a visual interface of a business system, wherein the tag information includes three kinds of configuration information items and the corresponding tag original-table data, wherein the three kinds of configuration information items are tag codes, tag names, and tag values; and automatically creating spark tasks to process the tag original-table data, and introducing the tag original-table data into the CH
table in the Clickhouse engine in formats of tag codes, tag names, and tag values.

Date recue / Date received 202 1-1 1-26

[0048] In a particular implementation, the information items may include three types, namely tag codes, tag names, and tag values. For instance, if it is desired to know preferences of male members about some merchandise item, the tag code, tag name, and tag value of the information items are set as labelname, labelvalue, and userid, respectively.
Then the tag original-table data are introduced into the CH table in the formats of the foregoing information items. For an example where one of the entries of the CH table is labelname =A, then labelvalue =male(1), userid=[1,2,3,4].

[0049] As a further example, in order to query a crowd package of 18-year-old male members, a user may, through the back end of the business system, visually configure a tag A=gender (labelname=A, labelvalue=1 representing male), and a tag B= age (labelname=B, labelvalue=18), so as to obtain the sql query criteria of "age=18 && gender =male." Then, data of male members and data of 18-year-old members are acquired from the corresponding aggregation tables. These data are assembled and computed before fed back to the user as the query results for crowd selection.

[0050] In the foregoing embodiment, the step of based on feature types of tag values, partitioning the CH table into aggregation tables of different types comprises:

[0051] dividing the feature types of the tag values into an int type, a double type, a string type, and a date type; and according to the feature types of the tag values in the tag original-table data, partitioning the CH table correspondingly into an it-type tag-value aggregation table, a double-type tag-value aggregation table, a string-type tag-value aggregation table, and a date-type tag-value aggregation table.

[0052] As shown in FIG. 2, in a particular implementation, through the back end of the business system, the commonly used tag values are divided into four types, namely the int type, the double type, the string type, and the date type, so as to satisfy diverse business needs about query criteria. Then according to the feature types of the tag values in the tag original-table data, the CH table can correspondingly be partitioned into the it-type tag-value aggregation Date recue / Date received 202 1-1 1-26 table, the double-type tag-value aggregation table, the string-type tag-value aggregation table, and the date-type tag-value aggregation table. This is to say that all the tag values in the int-type tag-value aggregation table are of the int type, and all the tag values in the double-type tag-value aggregation table are of the double type, and so on.

[0053] For example, for checking the access count, the relevant tag values are obviously of the int type, and the conditional operators for crowd package data are count days>= '15' and count days<'30'. In other words, it is to query for the access count generated between the date of 15 and the date of 30.

[0054] For example, for checking the coordinate locations of users, the relevant tag values are obviously of the double type. This type is suitable for queries of coordinate locations, and helps select a member crowd in a certain latitude-longitude coordinate range.

[0055] For example, for checking member levels of users, the relevant tag values are obviously of the string type. This type is favorable to precise query matching.

[0056] For example, for checking access dates of users, the relevant tag values are obviously of the date type. This type is suitable for comparison of dates, such as criteria for crowd packages like a first-for-purchase time later than a certain date.

[0057] To satisfy business needs, in the present embodiment, four types of CH
aggregation tables are introduced, with the following table structures:
Field Type Description labelname String Tag Code labelvalue String/Int64/double/Date Tag Value uv AggregateFunction(groupBitmap, Bitmap generated from unique UInt64) aggregation function numbers of members Date recue / Date received 202 1-1 1-26

[0058] As can be seen, all labelname fields are of the String type. This is usually used to express a tag code. The labelvalue fields may be of the String type, the Int type, the double type, or the Date type, usually for expressing a tag value. uv is usually used to express a tag name.

[0059] With the four types of aggregation tables, data can be acquired from corresponding types of CH aggregation tables according to business needs automatically for subsequent computation, thereby significantly improving computation performance of the system.

[0060] For example, for tags related to the number of days users sleep for, the criteria are described as users sleeping for more than a certain number of days or fewer than a certain number of days, i.e., count days>= '15' and count days<'30'. The aggregation tables of the Int type are correspondingly selected for data acquiring and to support data operators.

[0061] In the foregoing embodiment, after the step of partitioning the CH
table into aggregation tables of different types on the basis of feature types of tag values, the method further comprises: distributing the aggregation tables across nodes of the Clickhouse engine, for performing distributed query.

[0062] In a particular implementation, since the aggregation tables are local tables distributed across nodes of the Clickhouse engine, a distributed table directing to the aggregation tables at the nodes may be used to realize distributed query computation. As demonstrated in tests, when tag data were introduced into a Clickhouse engine from an original table (such as HDFS), the standalone performance was 1 million/sec, showing linear expansion in terms of performance. It was also found through actual measurement that 50 billion data could be introduced into the CH table in one hour. The measurement was performed in a Clickhouse cluster environment, with the following machine configuration: 32 cores, 256G, 4T drive, using a cluster of 6 machines.
Date recue / Date received 202 1-1 1-26

[0063] In the foregoing embodiment, the step of acquiring query tags from a user to generate sql query criteria, and converting the sql query criteria into Clickhouse query clauses comprises:

[0064] selecting the query tags through the visual interface of the business system to form the sql query criteria, and sending the sql query criteria to kafka; and consuming sql query criteria data, receiving drop tables through SparkStreaming tasks, and converting the sql query criteria into the Clickhouse query clauses.

[0065] In the foregoing embodiment, the step of executing the Clickhouse query clauses parallelly by means of multithreading, acquiring query detail data from the aggregation tables of the corresponding types, storing the detail data into a Hive table, and obtaining crowd selection query results comprises:

[0066] using the Spark tasks to execute the Clickhouse query clauses parallelly by means of multithreading, while acquiring detail data from the corresponding aggregation tables and storing the detail data into the Hive table, so as to obtain the crowd selection query results through computation.

[0067] In the foregoing embodiment, after the step of executing the Clickhouse query clauses parallelly by means of multithreading, acquiring query detail data from the aggregation tables of the corresponding types, storing the detail data into a Hive table, and obtaining crowd selection query results, the method further comprises:

[0068] storing the detail data of the Hive table into redis, for the business system to use according to business needs.

[0069] Referring to FIG. 3, for easy understanding, an example is used to explain the scheme as discussed in the foregoing embodiment:

[0070] A user may select query tags through the page of the business system to generate sql query criteria and send them to kafka. SparkStreaming consumes the corresponding kafka data and converts sql into Clickhouse query clauses that are then stored into Mysql. Spark Date recue / Date received 202 1-1 1-26 tasks read Clickhouse query clauses from Mysql. Afterward, by means of multithreading, the Clickhouse query clauses are executed parallelly to query detail data from the CH aggregation tables corresponding to the distributed nodes. The detail data are then stored into a Hive table for the business system to query and use, thereby outputting crowd selection query results.
In addition, the detail data in the hive table may be transferred to redis according to business needs, so as to provide the business system with other query services.

[0071] To sum up, the present embodiment has the following innovations:
1. The present invention uses a concept of Clickhouse + Spark computation to introduce the computation process for criteria of tag combinations into a Clickhouse engine, and use the combination of the Spark compute engine and the Clickhouse compute engine to generate the crowd package, thereby shortening the time required by computation of detail data satisfying the criteria of tag combinations from minutes to seconds.
2. Addition or removal of tags can be visually configured through the back end of the business system, thereby eliminating hassles about complicated operations for adding or removing tags.
3. The present invention leverages the benefits of Clickhouse to solve issues about aggregation performance related to large volume of data, thereby providing better user experience.
4. The present invention saves hardware resources. According to early-stage assessment, Clickhouse only needs half of the hardware computation resources required by elasticsearch. As to storage capacity, for identical data, one Clickhouse machine is equivalent to 27 elasticsearch machines.
Embodiment 2

[0072] The present embodiment provides a Clickhouse-based crowd selection apparatus, which comprises:

[0073] a configuring unit, for configurating tag information to create spark tasks, and Date recue / Date received 202 1-1 1-26 introducing original-table data of tags into a CH table of a Clickhouse engine according to configured information items;

[0074] a table-partitioning unit, for based on feature types of tag values, partitioning the CH
table into aggregation tables of different types;

[0075] a query unit, for acquiring query tags from a user to generate sql query criteria, and converting the sql query criteria into Clickhouse query clauses; and

[0076] a processing unit, for executing the Clickhouse query clauses parallelly by means of multithreading, acquiring query detail data from the aggregation tables of the corresponding types, storing the detail data into a Hive table, and obtaining crowd selection query results.

[0077] Preferably, the table-partitioning unit is for dividing the feature types of the tag values into an int type, a double type, a string type, and a date type, according to the feature types of the tag values in the tag original-table data, partitioning the CH table correspondingly into an it-type tag-value aggregation table, a double-type tag-value aggregation table, a string-type tag-value aggregation table, and a date-type tag-value aggregation table.

[0078] As compared to the prior art, the disclosed apparatus for Clickhouse-based crowd selection provides beneficial effects that are similar to those provided by the method for Clickhouse-based crowd selection as enumerated above, and thus no repetitions are made herein.
Embodiment 3

[0079] The present embodiment provides a computer-readable storage medium, storing thereon a computer program. When the computer program is executed by a processor, it implements the steps of the method for Clickhouse-based crowd selection as described previously.

[0080] As compared to the prior art, the disclosed computer-readable storage medium provides beneficial effects that are similar to those provided by the disclose method as enumerated above, and thus no repetitions are made herein.

Date recue / Date received 202 1-1 1-26

[0081] As will be appreciated by people of ordinary skill in the art, implementation of all or a part of the steps of the method of the present invention as described previously may be realized by having a program instruct related hardware components. The program may be stored in a computer-readable storage medium, and the program is about performing the individual steps of the methods described in the foregoing embodiments. The storage medium may be a ROM/RAM, a magnetic disk, an optical disk, a memory card or the like.

[0082] The present invention has been described with reference to the preferred embodiments and it is understood that the embodiments are not intended to limit the scope of the present invention. Moreover, as the contents disclosed herein should be readily understood and can be implemented by a person skilled in the art, all equivalent changes or modifications which do not depart from the concept of the present invention should be encompassed by the appended claims. Hence, the scope of the present invention shall only be defined by the appended claims.

Date recue / Date received 202 1-1 1-26

Claims

What is claimed is:

1. A method for Clickhouse-based crowd selection, comprising:
configurating tag information to create spark tasks, and introducing original-table data of tags into a CH table of a Clickhouse engine according to configured information items;
partitioning the CH table into aggregation tables of different types on the basis of feature types of tag values;
acquiring query tags from a user to generate sql query criteria, and converting the sql query criteria into Clickhouse query clauses; and executing the Clickhouse query clauses parallelly by means of multithreading, acquiring query detail data from the aggregation tables of the corresponding types, storing the detail data into a Hive table, and obtaining crowd selection query results.

2. The method of claim 1, wherein the step of configurating tag information to create spark tasks, and introducing original-table data of tags into a CH table of a Clickhouse engine according to configured information items comprises:
configurating the tag information through a visual interface of a business system, wherein the tag information includes three kinds of configuration information items and the corresponding tag original-table data, wherein the three kinds of configuration information items are tag codes, tag names, and tag values; and automatically creating spark tasks to process the tag original-table data, and introducing the tag Date recue / Date received 2021-11-26 original-table data into the CH table in the Clickhouse engine in formats of tag codes, tag names, and tag values.

3. The method of claim 2, wherein the step of partitioning the CH table into aggregation tables of different types on the basis of feature types of tag values comprises:
dividing the feature types of the tag values into an int type, a double type, a string type, and a date type; and partitioning the CH table correspondingly into an int-type tag-value aggregation table, a double-type tag-value aggregation table, a string-type tag-value aggregation table, and a date-type tag-value aggregation table according to the feature types of the tag values in the tag original-table data.

4. The method of claim 3, wherein after the step of partitioning the CH table into aggregation tables of different types on the basis of feature types of tag values, the method further comprises:
distributing the aggregation tables across nodes of the Clickhouse engine, for performing distributed query.

5. The method of claim 4, wherein the step of acquiring query tags from a user to generate sql query criteria, and converting the sql query criteria into Clickhouse query clauses comprises:
selecting the query tags through the visual interface of the business system to form the sql query criteria, and sending the sql query criteria to kafka; and consuming sql query criteria data, receiving drop tables through SparkStreaming tasks, and converting the sql query criteria into the Clickhouse query clauses.

Date recue / Date received 2021-11-26

6. The method of any of claims 1 through 5, wherein the step of executing the Clickhouse query clauses parallelly by means of multithreading, acquiring query detail data from the aggregation tables of the corresponding types, storing the detail data into a Hive table, and obtaining crowd selection query results comprises:
using the Spark tasks to execute the Clickhouse query clauses parallelly by means of multithreading, while acquiring detail data from the corresponding aggregation tables and storing the detail data into the Hive table, so as to obtain the crowd selection query results through computation.

7. The method of claim 1, wherein after the step of executing the Clickhouse query clauses parallelly by means of multithreading, acquiring query detail data from the aggregation tables of the corresponding types, storing the detail data into a Hive table, and obtaining crowd selection query results, the method further comprises:
storing the detail data of the Hive table into redis, for the business system to use according to business needs.

8. An apparatus for Clickhouse-based crowd selection, comprising:
a configuring unit, for configurating tag information to create spark tasks, and introducing original-table data of tags into a CH table of a Clickhouse engine according to configured information items;
a table-partitioning unit, for partitioning the CH table into aggregation tables of different types on the basis of feature types of tag values;
a query unit, for acquiring query tags from a user to generate sql query criteria, and converting the sql query criteria into Clickhouse query clauses; and Date recue / Date received 2021-11-26 a processing unit, for executing the Clickhouse query clauses parallelly by means of multithreading, acquiring query detail data from the aggregation tables of the corresponding types, storing the detail data into a Hive table, and obtaining crowd selection query results.

9. The apparatus of claim 8, wherein the table-partitioning unit is for dividing the feature types of the tag values into an int type, a double type, a string type, and a date type, and partitioning the CH table correspondingly into an int-type tag-value aggregation table, a double-type tag-value aggregation table, a string-type tag-value aggregation table, and a date-type tag-value aggregation table according to the feature types of the tag values in the tag original-table data.

10. A computer-readable storage medium storing therein a computer program, wherein the computer program when executed by a processor performing a method as described in any of claims 1 through 7.

Date recue / Date received 2021-11-26